Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
Die Vorlage "Cloud Storage SequenceFile für Cloud Bigtable" ist eine Pipeline, die Daten aus SequenceFiles in einem Cloud Storage-Bucket liest und in eine Bigtable-Tabelle schreibt. Sie können die Vorlage verwenden, um Daten aus Cloud Storage in Bigtable zu kopieren.
Pipelineanforderungen
Die Bigtable-Tabelle muss vorhanden sein.
Die Eingabe-SequenceFiles müssen in einem Cloud Storage-Bucket vorhanden sein, bevor die Pipeline ausgeführt wird.
Die Eingabe-SequenceFiles müssen aus Bigtable oder HBase exportiert worden sein.
Vorlagenparameter
Erforderliche Parameter
bigtableProject: Die ID des Google Cloud-Projekts, das die Bigtable-Instanz enthält, in die Sie Daten schreiben möchten.
bigtableInstanceId: Die ID der Bigtable-Instanz, die die Tabelle enthält.
bigtableTableId: Die ID der zu importierenden Bigtable-Tabelle.
sourcePattern: Das Muster für den Cloud Storage-Pfad zum Speicherort der Daten. Beispiel: gs://your-bucket/your-path/prefix*.
mutationThrottleLatencyMs: Optional. Legen Sie die Drosselung der Mutationslatenz fest (aktiviert das Feature). Wert in Millisekunden. Die Standardeinstellung ist 0.
Führen Sie die Vorlage aus.
Console
Rufen Sie die Dataflow-Seite Job aus Vorlage erstellen auf.
Den Versionsnamen wie 2023-09-12-00_RC00, um eine bestimmte Version der Vorlage zu verwenden. Diese ist verschachtelt im jeweiligen datierten übergeordneten Ordner im Bucket enthalten: gs://dataflow-templates-REGION_NAME/.
REGION_NAME: die Region, in der Sie Ihren Dataflow-Job bereitstellen möchten, z. B. us-central1
BIGTABLE_PROJECT_ID: Die ID des Google Cloud -Projekts der Bigtable-Instanz, aus der Sie Daten lesen möchten.
INSTANCE_ID: Die ID der Bigtable-Instanz, die die Tabelle enthält.
TABLE_ID: Die ID der zu exportierenden Bigtable-Tabelle.
APPLICATION_PROFILE_ID: Die ID des Bigtable-Anwendungsprofils, das für den Export verwendet werden soll.
SOURCE_PATTERN: Das Muster des Cloud Storage-Pfads, in dem sich die Daten befinden, z. B. gs://mybucket/somefolder/prefix*
API
Senden Sie eine HTTP-POST-Anfrage, um die Vorlage mithilfe der REST API auszuführen. Weitere Informationen zur API und ihren Autorisierungsbereichen finden Sie unter projects.templates.launch.
Den Versionsnamen wie 2023-09-12-00_RC00, um eine bestimmte Version der Vorlage zu verwenden. Diese ist verschachtelt im jeweiligen datierten übergeordneten Ordner im Bucket enthalten: gs://dataflow-templates-REGION_NAME/.
LOCATION: die Region, in der Sie Ihren Dataflow-Job bereitstellen möchten, z. B. us-central1
BIGTABLE_PROJECT_ID: Die ID des Google Cloud -Projekts der Bigtable-Instanz, aus der Sie Daten lesen möchten.
INSTANCE_ID: Die ID der Bigtable-Instanz, die die Tabelle enthält.
TABLE_ID: Die ID der zu exportierenden Bigtable-Tabelle.
APPLICATION_PROFILE_ID: Die ID des Bigtable-Anwendungsprofils, das für den Export verwendet werden soll.
SOURCE_PATTERN: Das Muster des Cloud Storage-Pfads, in dem sich die Daten befinden, z. B. gs://mybucket/somefolder/prefix*
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],["Zuletzt aktualisiert: 2025-08-18 (UTC)."],[[["\u003cp\u003eThis template moves data from SequenceFiles in Cloud Storage to a Bigtable table, effectively copying data between these services.\u003c/p\u003e\n"],["\u003cp\u003eThe Bigtable table and input SequenceFiles within a Cloud Storage bucket must exist prior to initiating the pipeline.\u003c/p\u003e\n"],["\u003cp\u003eRequired parameters for running the template include the Bigtable project ID, instance ID, table ID, and the source pattern for the Cloud Storage data.\u003c/p\u003e\n"],["\u003cp\u003eYou can run the template via the Dataflow console, the gcloud command-line tool, or using the REST API with specific parameters and unique job name.\u003c/p\u003e\n"],["\u003cp\u003eThe template's source code is available in the GoogleCloudPlatform/cloud-bigtable-client GitHub repository, which also allows for the use of the latest template or a specific version.\u003c/p\u003e\n"]]],[],null,["# Cloud Storage SequenceFile to Bigtable template\n\nThe Cloud Storage SequenceFile to Bigtable template is a pipeline that reads\ndata from SequenceFiles in a Cloud Storage bucket and writes the data to a\nBigtable table. You can use the template to copy data from Cloud Storage\nto Bigtable.\n\nPipeline requirements\n---------------------\n\n- The Bigtable table must exist.\n- The input SequenceFiles must exist in a Cloud Storage bucket before running the pipeline.\n- The input SequenceFiles must have been exported from Bigtable or HBase.\n\nTemplate parameters\n-------------------\n\n### Required parameters\n\n- **bigtableProject**: The ID of the Google Cloud project that contains the Bigtable instance that you want to write data to.\n- **bigtableInstanceId**: The ID of the Bigtable instance that contains the table.\n- **bigtableTableId**: The ID of the Bigtable table to import.\n- **sourcePattern** : The Cloud Storage path pattern to the location of the data. For example, `gs://your-bucket/your-path/prefix*`.\n\n### Optional parameters\n\n- **bigtableAppProfileId** : The ID of the Bigtable application profile to use for the import. If you don't specify an application profile, Bigtable uses the instance's default application profile (\u003chttps://cloud.google.com/bigtable/docs/app-profiles#default-app-profile\u003e).\n- **mutationThrottleLatencyMs**: Optional Set mutation latency throttling (enables the feature). Value in milliseconds. Defaults to: 0.\n\nRun the template\n----------------\n\n### Console\n\n1. Go to the Dataflow **Create job from template** page.\n[Go to Create job from template](https://console.cloud.google.com/dataflow/createjob)\n2. In the **Job name** field, enter a unique job name.\n3. Optional: For **Regional endpoint** , select a value from the drop-down menu. The default region is `us-central1`.\n\n\n For a list of regions where you can run a Dataflow job, see\n [Dataflow locations](/dataflow/docs/resources/locations).\n4. From the **Dataflow template** drop-down menu, select the **SequenceFile Files on Cloud Storage to Cloud Bigtable** template.\n5. In the provided parameter fields, enter your parameter values.\n6. Click **Run job**.\n\n### gcloud\n\n| **Note:** To use the Google Cloud CLI to run classic templates, you must have [Google Cloud CLI](/sdk/docs/install) version 138.0.0 or later.\n\nIn your shell or terminal, run the template: \n\n```bash\ngcloud dataflow jobs run JOB_NAME \\\n --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/GCS_SequenceFile_to_Cloud_Bigtable \\\n --region REGION_NAME \\\n --parameters \\\nbigtableProject=BIGTABLE_PROJECT_ID,\\\nbigtableInstanceId=INSTANCE_ID,\\\nbigtableTableId=TABLE_ID,\\\nbigtableAppProfileId=APPLICATION_PROFILE_ID,\\\nsourcePattern=SOURCE_PATTERN\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eSOURCE_PATTERN\u003c/var\u003e: the Cloud Storage path pattern where data is located, for example, `gs://mybucket/somefolder/prefix*`\n\n### API\n\nTo run the template using the REST API, send an HTTP POST request. For more information on the\nAPI and its authorization scopes, see\n[`projects.templates.launch`](/dataflow/docs/reference/rest/v1b3/projects.templates/launch). \n\n```json\nPOST https://dataflow.googleapis.com/v1b3/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/templates:launch?gcsPath=gs://dataflow-templates-\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/\u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e/GCS_SequenceFile_to_Cloud_Bigtable\n{\n \"jobName\": \"\u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e\",\n \"parameters\": {\n \"bigtableProject\": \"\u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e\",\n \"bigtableInstanceId\": \"\u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e\",\n \"bigtableTableId\": \"\u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e\",\n \"bigtableAppProfileId\": \"\u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e\",\n \"sourcePattern\": \"\u003cvar translate=\"no\"\u003eSOURCE_PATTERN\u003c/var\u003e\",\n },\n \"environment\": { \"zone\": \"us-central1-f\" }\n}\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the Google Cloud project ID where you want to run the Dataflow job\n- \u003cvar translate=\"no\"\u003eJOB_NAME\u003c/var\u003e: a unique job name of your choice\n- \u003cvar translate=\"no\"\u003eVERSION\u003c/var\u003e: the version of the template that you want to use\n\n You can use the following values:\n - `latest` to use the latest version of the template, which is available in the **non-dated** parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/latest/](https://console.cloud.google.com/storage/browser/dataflow-templates/latest)\n - the version name, like `2023-09-12-00_RC00`, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket--- [gs://dataflow-templates-\u003cvar translate=\"no\"\u003eREGION_NAME\u003c/var\u003e/](https://console.cloud.google.com/storage/browser/dataflow-templates)\n\n | **Caution:** The **latest** version of templates might update with breaking changes. Your production environments should use templates kept in the most recent **dated** parent folder to prevent these breaking changes from affecting your production workflows.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the [region](/dataflow/docs/resources/locations) where you want to deploy your Dataflow job---for example, `us-central1`\n- \u003cvar translate=\"no\"\u003eBIGTABLE_PROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project of the Bigtable instance that you want to read data from\n- \u003cvar translate=\"no\"\u003eINSTANCE_ID\u003c/var\u003e: the ID of the Bigtable instance that contains the table\n- \u003cvar translate=\"no\"\u003eTABLE_ID\u003c/var\u003e: the ID of the Bigtable table to export\n- \u003cvar translate=\"no\"\u003eAPPLICATION_PROFILE_ID\u003c/var\u003e: the ID of the Bigtable application profile to be used for the export\n- \u003cvar translate=\"no\"\u003eSOURCE_PATTERN\u003c/var\u003e: the Cloud Storage path pattern where data is located, for example, `gs://mybucket/somefolder/prefix*`\n\nTemplate source code\n--------------------\n\n### Java\n\nThis template's source code is in the [GoogleCloudPlatform/cloud-bigtable-client repository](https://github.com/GoogleCloudPlatform/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles) on GitHub.\n\nWhat's next\n-----------\n\n- Learn about [Dataflow templates](/dataflow/docs/concepts/dataflow-templates).\n- See the list of [Google-provided templates](/dataflow/docs/guides/templates/provided-templates).\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]