Modelo do Bigtable para o Cloud Storage SequenceFile
Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
O modelo Bigtable para o Cloud Storage SequenceFile é um pipeline que lê dados de uma tabela do Bigtable e grava os dados em um bucket do Cloud Storage no formato SequenceFile. É possível usar o modelo para copiar dados do Bigtable para o Cloud Storage.
Requisitos de pipeline
A tabela do Bigtable precisa existir.
O bucket de saída do Cloud Storage precisa existir antes da execução do pipeline.
Parâmetros do modelo
Parâmetros obrigatórios
bigtableProject: o ID do projeto do Google Cloud que contém a instância do Bigtable em que você quer ler os dados.
bigtableInstanceId: o ID da instância do Bigtable que contém a tabela.
bigtableTableId: o ID da tabela do Cloud Bigtable a ser exportada.
destinationPath: o caminho do Cloud Storage em que os dados são gravados. Exemplo: gs://your-bucket/your-path/.
filenamePrefix: o prefixo do nome de arquivo do SequenceFile. (Exemplo: output-).
o nome da versão, como 2023-09-12-00_RC00, para usar uma versão específica do
modelo, que pode ser encontrada aninhada na respectiva pasta mãe datada no bucket:
gs://dataflow-templates-REGION_NAME/
REGION_NAME:
a região em que você quer
implantar o job do Dataflow, por exemplo, us-central1
BIGTABLE_PROJECT_ID: o ID do projeto do Google Cloud da instância do Bigtable da qual você quer ler os dados.
INSTANCE_ID: o ID da instância do Bigtable que contém a tabela.
TABLE_ID: o ID da tabela do Cloud Bigtable a ser exportada.
APPLICATION_PROFILE_ID: o ID do perfil do aplicativo Bigtable a ser usado para a exportação.
DESTINATION_PATH: o caminho do Cloud Storage em que os dados são gravados, por exemplo, gs://mybucket/somefolder.
FILENAME_PREFIX: prefixo do nome de arquivo do SequenceFile, por exemplo, output-
API
Para executar o modelo usando a API REST, envie uma solicitação HTTP POST. Para mais informações sobre a
API e os respectivos escopos de autorização, consulte
projects.templates.launch.
o nome da versão, como 2023-09-12-00_RC00, para usar uma versão específica do
modelo, que pode ser encontrada aninhada na respectiva pasta mãe datada no bucket:
gs://dataflow-templates-REGION_NAME/
LOCATION:
a região em que você quer
implantar o job do Dataflow, por exemplo, us-central1
BIGTABLE_PROJECT_ID: o ID do projeto do Google Cloud da instância do Bigtable da qual você quer ler os dados.
INSTANCE_ID: o ID da instância do Bigtable que contém a tabela.
TABLE_ID: o ID da tabela do Cloud Bigtable a ser exportada.
APPLICATION_PROFILE_ID: o ID do perfil do aplicativo Bigtable a ser usado para a exportação.
DESTINATION_PATH: o caminho do Cloud Storage em que os dados são gravados, por exemplo, gs://mybucket/somefolder.
FILENAME_PREFIX: prefixo do nome de arquivo do SequenceFile, por exemplo, output-
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2024-12-22 UTC."],[[["\u003cp\u003eThis pipeline template copies data from a Bigtable table to a Cloud Storage bucket in SequenceFile format.\u003c/p\u003e\n"],["\u003cp\u003eThe template requires the Bigtable table and the output Cloud Storage bucket to exist before running.\u003c/p\u003e\n"],["\u003cp\u003eYou need to specify the Bigtable project, instance, table IDs, the destination path, and a filename prefix to use the template.\u003c/p\u003e\n"],["\u003cp\u003eThe template can be run through the Google Cloud console, the gcloud CLI, or the REST API using a variety of parameters that are optional and required.\u003c/p\u003e\n"],["\u003cp\u003eThe template source code can be found on GitHub in the GoogleCloudPlatform/cloud-bigtable-client repository, and specific versions of the template are available.\u003c/p\u003e\n"]]],[],null,["# Bigtable to Cloud Storage SequenceFile template\n\nThe Bigtable to Cloud Storage SequenceFile template is a pipeline that reads\ndata from a Bigtable table and writes the data to a Cloud Storage bucket\nin SequenceFile format. You can use the template to copy data from Bigtable to\nCloud Storage.\n\nPipeline requirements\n---------------------\n\n- The Bigtable table must exist.\n- The output Cloud Storage bucket must exist before running the pipeline.\n\nTemplate parameters\n-------------------\n\n### Required parameters\n\n- **bigtableProject**: The ID of the Google Cloud project that contains the Bigtable instance that you want to read data from.\n- **bigtableInstanceId**: The ID of the Bigtable instance that contains the table.\n- **bigtableTableId**: The ID of the Bigtable table to export.\n- **destinationPath** : The Cloud Storage path where data is written. For example, `gs://your-bucket/your-path/`.\n- **filenamePrefix** : The prefix of the SequenceFile filename. For example, `output-`.\n\n### Optional parameters\n\n- **bigtableAppProfileId** : The ID of the Bigtable application profile to use for the export. If you don't specify an app profile, Bigtable uses the instance's default app profile: \u003chttps://cloud.google.com/bigtable/docs/app-profiles#default-app-profile\u003e.\n- **bigtableStartRow**: The row where to start the export from, defaults to the first row.\n- **bigtableStopRow**: The row where to stop the export, defaults to the last row.\n- **bigtableMaxVersions**: Maximum number of cell versions. Defaults to: 2147483647.\n- **bigtableFilter** : Filter string. See: \u003chttp://hbase.apache.org/book.html#thrift\u003e. Defaults to empty.\n\nRun the template\n----------------\n\n### Console\n\n1. Go to the Dataflow **Create job from template** page.\n[Go to Create job from template](https://console.cloud.google.com/dataflow/createjob)\n2. In the **Job name** field, enter a unique job name.\n3. Optional: For **Regional endpoint** , select a value from the drop-down menu. The default region is `us-central1`.\n\n\n For a list of regions where you can run a Dataflow job, see\n [Dataflow locations](/dataflow/docs/resources/locations).\n4. From the **Dataflow template** drop-down menu, select the **Cloud Bigtable to SequenceFile Files on Cloud Storage** template .\n5. In the provided parameter fields, enter your parameter values.\n6. Click **Run job**.\n\n### gcloud\n\n| **Note:** To use the Google Cloud CLI to run classic templates, you must have [Google Cloud CLI](/sdk/docs/install) version 138.0.0 or later.\n\nIn your shell or terminal, run the template: \n\n```bash\ngcloud dataflow jobs run JOB_NAME \\\n --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/Cloud_Bigtable_to_GCS_SequenceFile \\\n --region REGION_NAME \\\n --parameters \\\nbigtableProject=BIGTABLE_PROJECT_ID,\\\nbigtableInstanceId=INSTANCE_ID,\\\nbigtableTableId=TABLE_ID,\\\nbigtableAppProfileId=APPLICATION_PROFILE_ID,\\\ndestinationPath=DESTINATION_PATH,\\\nfilenamePrefix=FILENAME_PREFIX\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eDESTINATION_PATH\u003c/var\u003e: the Cloud Storage path where data is written, for example, `gs://mybucket/somefolder`\n- \u003cvar translate=\"no\"\u003eFILENAME_PREFIX\u003c/var\u003e: the prefix of the SequenceFile filename, for example, `output-`\n\n### API\n\nTo run the template using the REST API, send an HTTP POST request. For more information on the\nAPI and its authorization scopes, see\n[`projects.templates.launch`](/dataflow/docs/reference/rest/v1b3/projects.templates/launch). \n\n```json\nPOST https://dataflow.googleapis.com/v1b3/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/templates:launch?gcsPath=gs://dataflow-templates-\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/\u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e/Cloud_Bigtable_to_GCS_SequenceFile\n{\n \"jobName\": \"\u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e\",\n \"parameters\": {\n \"bigtableProject\": \"\u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e\",\n \"bigtableInstanceId\": \"\u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e\",\n \"bigtableTableId\": \"\u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e\",\n \"bigtableAppProfileId\": \"\u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e\",\n \"destinationPath\": \"\u003cvar translate=\"no\"\u003eDESTINATION_PATH\u003c/var\u003e\",\n \"filenamePrefix\": \"\u003cvar translate=\"no\"\u003eFILENAME_PREFIX\u003c/var\u003e\",\n },\n \"environment\": { \"zone\": \"us-central1-f\" }\n}\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the Google Cloud project ID where you want to run the Dataflow job\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eDESTINATION_PATH\u003c/var\u003e: the Cloud Storage path where data is written, for example, `gs://mybucket/somefolder`\n- \u003cvar translate=\"no\"\u003eFILENAME_PREFIX\u003c/var\u003e: the prefix of the SequenceFile filename, for example, `output-`\n\nTemplate source code\n--------------------\n\n### Java\n\nThis template's source code is in the [GoogleCloudPlatform/cloud-bigtable-client repository](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles) on GitHub.\n\nWhat's next\n-----------\n\n- Learn about [Dataflow templates](/dataflow/docs/concepts/dataflow-templates).\n- See the list of [Google-provided templates](/dataflow/docs/guides/templates/provided-templates).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]