Schedule a Snowflake transfer

The Snowflake connector provided by the BigQuery Data Transfer Service lets you schedule and manage automated transfer jobs to migrate data from Snowflake into BigQuery using public IP allow lists.

Overview

The Snowflake connector engages migration agents in the Google Kubernetes Engine and triggers a load operation from Snowflake to a staging area within the same cloud provider where Snowflake is hosted. For AWS-hosted Snowflake accounts, the data is first staged in your Amazon S3 bucket, which is then transferred to BigQuery with the BigQuery Data Transfer Service.

The following diagram shows how data is transferred from an AWS-hosted Snowflake data warehouse into BigQuery.

The flow of data during a Snowflake to BigQuery migration.

Limitations

Data transfers made using the Snowflake connector are subject to the following limitations:

  • Data transfers are only supported from Snowflake accounts hosted on AWS. Data transfers from Snowflake accounts hosted on Google Cloud or Microsoft Azure aren't supported.
  • The Snowflake connector doesn't support incremental data transfers.
  • The Snowflake connector only supports transfers from tables within a single Snowflake database and schema. To transfer from tables with multiple Snowflake databases or schemas, you can set up each transfer job separately.
  • The speed of loading data from Snowflake to your Amazon S3 bucket is limited by the Snowflake warehouse you have chosen for this transfer.
  • Data is extracted from Snowflake in the Parquet data format before it is loaded into BigQuery:

    • The following Parquet data types are unsupported:
    • The following Parquet data type is unsupported, but can be converted:

      • TIMESTAMP_NTZ

      Use the global type conversion configuration YAML to override the default behavior of DATETIME conversion to TIMESTAMP when you generate metadata and run the translation engine.

      The configuration YAML might look similar to the following example:

      type: experimental_object_rewriter
      global:
        typeConvert:
          datetime: TIMESTAMP
      

Before you begin

Before you set up a Snowflake transfer, you must perform all the steps listed in this section. The following is a list of all required steps.

  1. Prepare your Google Cloud project
  2. Required BigQuery roles
  3. Prepare your Amazon S3 bucket
  4. Create a Snowflake user with the required permissions
  5. Add network policies
  6. Generate metadata and run translation engine
  7. Assess your Snowflake for any unsupported data types
  8. Gather transfer information

Prepare your Google Cloud project

Create and configure your Google Cloud project for a Snowflake transfer with the following steps:

  1. Create a Google Cloud project or select an existing project.

  2. Verify that you have completed all actions required to enable the BigQuery Data Transfer Service.

  3. Create a BigQuery dataset to store your data. You don't need to create any tables.

Required BigQuery roles

To get the permissions that you need to create a transfer, ask your administrator to grant you the BigQuery Admin (roles/bigquery.admin) IAM role. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create a transfer. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create a transfer:

  • bigquery.transfers.update on the user or service account creating the transfer configuration
  • bigquery.datasets.get on the user or service account creating the transfer configuration
  • bigquery.datasets.update on the user or service account creating the transfer configuration

You might also be able to get these permissions with custom roles or other predefined roles.

Prepare your Amazon S3 bucket

To complete a Snowflake data transfer, you must create a Amazon S3 bucket and then configure it to allow write access from Snowflake.

  1. Create an Amazon S3 bucket. The Amazon S3 bucket is used to stage the data before it is loaded into BigQuery.

  2. Create and configure a Snowflake storage integration object to allow Snowflake to write data into the Amazon S3 bucket as an external stage.

To allow read access on your Amazon S3 bucket, you must also do the following:

  1. Create a dedicated Amazon IAM user and grant it the AmazonS3ReadOnlyAccess policy.

  2. Create an Amazon access key pair for the IAM user.

Create a Snowflake user with the required permissions

During a Snowflake transfer, the Snowflake connector connects to your Snowflake account using a JDBC connection. You must create a new Snowflake user with a custom role that only has the necessary privileges to perform the data transfer:

  // Create and configure new role, MIGRATION_ROLE
  GRANT USAGE
    ON WAREHOUSE WAREHOUSE_NAME
    TO ROLE MIGRATION_ROLE;

  GRANT USAGE
    ON DATABASE DATABASE_NAME
    TO ROLE MIGRATION_ROLE;

  GRANT USAGE
    ON SCHEMA DATABASE_NAME.SCHEMA_NAME
    TO ROLE MIGRATION_ROLE;

  // You can modify this to give select permissions for all tables in a schema
  GRANT SELECT
    ON TABLE DATABASE_NAME.SCHEMA_NAME.TABLE_NAME
    TO ROLE MIGRATION_ROLE;

  GRANT USAGE
    ON STORAGE_INTEGRATION_OBJECT_NAME
    TO ROLE MIGRATION_ROLE;

Replace the following:

  • MIGRATION_ROLE: the name of the custom role you are creating
  • WAREHOUSE_NAME: the name of your data warehouse
  • DATABASE_NAME: the name of your Snowflake database
  • SCHEMA_NAME: the name of your Snowflake schema
  • TABLE_NAME: the name of the Snowflake included in this data transfer
  • STORAGE_INTEGRATION_OBJECT_NAME: the name of your Snowflake storage integration object.

Add network policies

For public connectivity, the Snowflake account allows public connection with database credentials by default. However, you might have configured network rules or policies that could prevent the Snowflake connector from connecting to your account. In this case, you must add the necessary IP addresses to your allowlist.

The following table is a list of IP addresses for the regional and multi-regional locations used for public transfers. You can either add the IP addresses that only correspond to your dataset's location, or you can add all the IP addresses listed in the table. These are IP addresses reserved by Google for BigQuery Data Transfer Service data transfers.

To add an IP address to an allowlist, do the following:

  1. Create a network rule with type = IPV4. The BigQuery Data Transfer Service uses a JDBC connection to connect to the Snowflake account.
  2. Create a network policy with the network rule that you created earlier and the IP address from the following table.

Regional locations

Region description Region name IP addresses
Americas
Columbus, Ohio us-east5 34.162.72.184
34.162.173.185
34.162.205.205
34.162.81.45
34.162.182.149
34.162.59.92
34.162.157.190
34.162.191.145
Dallas us-south1 34.174.172.89
34.174.40.67
34.174.5.11
34.174.96.109
34.174.148.99
34.174.176.19
34.174.253.135
34.174.129.163
Iowa us-central1 34.121.70.114
34.71.81.17
34.122.223.84
34.121.145.212
35.232.1.105
35.202.145.227
35.226.82.216
35.225.241.102
Las Vegas us-west4 34.125.53.201
34.125.69.174
34.125.159.85
34.125.152.1
34.125.195.166
34.125.50.249
34.125.68.55
34.125.91.116
Los Angeles us-west2 35.236.59.167
34.94.132.139
34.94.207.21
34.94.81.187
34.94.88.122
35.235.101.187
34.94.238.66
34.94.195.77
Mexico northamerica-south1 34.51.6.35
34.51.7.113
34.51.12.83
34.51.10.94
34.51.11.219
34.51.11.52
34.51.2.114
34.51.15.251
Montréal northamerica-northeast1 34.95.20.253
35.203.31.219
34.95.22.233
34.95.27.99
35.203.12.23
35.203.39.46
35.203.116.49
35.203.104.223
Northern Virginia us-east4 35.245.95.250
35.245.126.228
35.236.225.172
35.245.86.140
35.199.31.35
35.199.19.115
35.230.167.48
35.245.128.132
35.245.111.126
35.236.209.21
Oregon us-west1 35.197.117.207
35.199.178.12
35.197.86.233
34.82.155.140
35.247.28.48
35.247.31.246
35.247.106.13
34.105.85.54
Salt Lake City us-west3 34.106.37.58
34.106.85.113
34.106.28.153
34.106.64.121
34.106.246.131
34.106.56.150
34.106.41.31
34.106.182.92
São Paolo southamerica-east1 35.199.88.228
34.95.169.140
35.198.53.30
34.95.144.215
35.247.250.120
35.247.255.158
34.95.231.121
35.198.8.157
Santiago southamerica-west1 34.176.188.48
34.176.38.192
34.176.205.134
34.176.102.161
34.176.197.198
34.176.223.236
34.176.47.188
34.176.14.80
South Carolina us-east1 35.196.207.183
35.237.231.98
104.196.102.222
35.231.13.201
34.75.129.215
34.75.127.9
35.229.36.137
35.237.91.139
Toronto northamerica-northeast2 34.124.116.108
34.124.116.107
34.124.116.102
34.124.116.80
34.124.116.72
34.124.116.85
34.124.116.20
34.124.116.68
Europe
Belgium europe-west1 35.240.36.149
35.205.171.56
34.76.234.4
35.205.38.234
34.77.237.73
35.195.107.238
35.195.52.87
34.76.102.189
Berlin europe-west10 34.32.28.80
34.32.31.206
34.32.19.49
34.32.33.71
34.32.15.174
34.32.23.7
34.32.1.208
34.32.8.3
Finland europe-north1 35.228.35.94
35.228.183.156
35.228.211.18
35.228.146.84
35.228.103.114
35.228.53.184
35.228.203.85
35.228.183.138
Frankfurt europe-west3 35.246.153.144
35.198.80.78
35.246.181.106
35.246.211.135
34.89.165.108
35.198.68.187
35.242.223.6
34.89.137.180
London europe-west2 35.189.119.113
35.189.101.107
35.189.69.131
35.197.205.93
35.189.121.178
35.189.121.41
35.189.85.30
35.197.195.192
Madrid europe-southwest1 34.175.99.115
34.175.186.237
34.175.39.130
34.175.135.49
34.175.1.49
34.175.95.94
34.175.102.118
34.175.166.114
Milan europe-west8 34.154.183.149
34.154.40.104
34.154.59.51
34.154.86.2
34.154.182.20
34.154.127.144
34.154.201.251
34.154.0.104
Netherlands europe-west4 35.204.237.173
35.204.18.163
34.91.86.224
34.90.184.136
34.91.115.67
34.90.218.6
34.91.147.143
34.91.253.1
Paris europe-west9 34.163.76.229
34.163.153.68
34.155.181.30
34.155.85.234
34.155.230.192
34.155.175.220
34.163.68.177
34.163.157.151
Stockholm europe-north2 34.51.133.48
34.51.136.177
34.51.128.140
34.51.141.252
34.51.139.127
34.51.142.55
34.51.134.218
34.51.138.9
Turin europe-west12 34.17.15.186
34.17.44.123
34.17.41.160
34.17.47.82
34.17.43.109
34.17.38.236
34.17.34.223
34.17.16.47
Warsaw europe-central2 34.118.72.8
34.118.45.245
34.118.69.169
34.116.244.189
34.116.170.150
34.118.97.148
34.116.148.164
34.116.168.127
Zürich europe-west6 34.65.205.160
34.65.121.140
34.65.196.143
34.65.9.133
34.65.156.193
34.65.216.124
34.65.233.83
34.65.168.250
Asia Pacific
Delhi asia-south2 34.126.212.96
34.126.212.85
34.126.208.224
34.126.212.94
34.126.208.226
34.126.212.232
34.126.212.93
34.126.212.206
Hong Kong asia-east2 34.92.245.180
35.241.116.105
35.220.240.216
35.220.188.244
34.92.196.78
34.92.165.209
35.220.193.228
34.96.153.178
Jakarta asia-southeast2 34.101.79.105
34.101.129.32
34.101.244.197
34.101.100.180
34.101.109.205
34.101.185.189
34.101.179.27
34.101.197.251
Melbourne australia-southeast2 34.126.196.95
34.126.196.106
34.126.196.126
34.126.196.96
34.126.196.112
34.126.196.99
34.126.196.76
34.126.196.68
Mumbai asia-south1 34.93.67.112
35.244.0.1
35.200.245.13
35.200.203.161
34.93.209.130
34.93.120.224
35.244.10.12
35.200.186.100
Osaka asia-northeast2 34.97.94.51
34.97.118.176
34.97.63.76
34.97.159.156
34.97.113.218
34.97.4.108
34.97.119.140
34.97.30.191
Seoul asia-northeast3 34.64.152.215
34.64.140.241
34.64.133.199
34.64.174.192
34.64.145.219
34.64.136.56
34.64.247.158
34.64.135.220
Singapore asia-southeast1 34.87.12.235
34.87.63.5
34.87.91.51
35.198.197.191
35.240.253.175
35.247.165.193
35.247.181.82
35.247.189.103
Sydney australia-southeast1 35.189.33.150
35.189.38.5
35.189.29.88
35.189.22.179
35.189.20.163
35.189.29.83
35.189.31.141
35.189.14.219
Taiwan asia-east1 35.221.201.20
35.194.177.253
34.80.17.79
34.80.178.20
34.80.174.198
35.201.132.11
35.201.223.177
35.229.251.28
35.185.155.147
35.194.232.172
Tokyo asia-northeast1 34.85.11.246
34.85.30.58
34.85.8.125
34.85.38.59
34.85.31.67
34.85.36.143
34.85.32.222
34.85.18.128
34.85.23.202
34.85.35.192
Middle East
Dammam me-central2 34.166.20.177
34.166.10.104
34.166.21.128
34.166.19.184
34.166.20.83
34.166.18.138
34.166.18.48
34.166.23.171
Doha me-central1 34.18.48.121
34.18.25.208
34.18.38.183
34.18.33.25
34.18.21.203
34.18.21.80
34.18.36.126
34.18.23.252
Tel Aviv me-west1 34.165.184.115
34.165.110.74
34.165.174.16
34.165.28.235
34.165.170.172
34.165.187.98
34.165.85.64
34.165.245.97
Africa
Johannesburg africa-south1 34.35.11.24
34.35.10.66
34.35.8.32
34.35.3.248
34.35.2.113
34.35.5.61
34.35.7.53
34.35.3.17

Multi-regional locations

Multi-region description Multi-region name IP addresses
Data centers within member states of the European Union1 EU 34.76.156.158
34.76.156.172
34.76.136.146
34.76.1.29
34.76.156.232
34.76.156.81
34.76.156.246
34.76.102.206
34.76.129.246
34.76.121.168
Data centers in the United States US 35.185.196.212
35.197.102.120
35.185.224.10
35.185.228.170
35.197.5.235
35.185.206.139
35.197.67.234
35.197.38.65
35.185.202.229
35.185.200.120

1 Data located in the EU multi-region is not stored in the europe-west2 (London) or europe-west6 (Zürich) data centers.

Generate metadata and run translation engine

The BigQuery Data Transfer Service for Snowflake connector uses the BigQuery migration service translation engine for schema mapping when migrating Snowflake tables into BigQuery. To complete a Snowflake data transfer, you must first generate metadata for translation, then run the translation engine:

  1. Run the dwh-migration-tool for Snowflake. For more information, see Generate metadata for translation and assessment.
  2. Upload the generated metadata.zip file to a Cloud Storage bucket. The metadata.zip file is used as input for the translation engine.
  3. Run the batch translation service, specifying the target_types field as dts-mapping. For more information, see Translate SQL queries with the translation API.
    • The following is an example of a command to run a batch translation for Snowflake:
      curl -d "{
      "name": "sf_2_bq_translation",
      "displayName": "Snowflake to BigQuery Translation",
      "tasks": {
        string: {
          "type": "Snowflake2BigQuery_Translation",
          "translation_details": {
              "target_base_uri": "gs://sf_test_translation/output",
              "source_target_mapping": {
                "source_spec": {
                    "base_uri": "gs://sf_test_translation/input"
                }
              },
              "target_types": "dts-mapping",
          }
        }
      },
      }" 
      -H "Content-Type:application/json"
      -H "Authorization: Bearer TOKEN" -X POST https://bigquerymigration.googleapis.com/v2alpha/projects/project_id/locations/location/workflows
    • You can check the status of this command in the SQL Translation page in BigQuery. The output of the batch translation job is stored in gs://translation_target_base_uri/dts/config/.

Required service account permissions

In a Snowflake transfer, a service account is used to read data from the translation engine output in the specified Cloud Storage path. You must grant the service account the storage.objects.get and the storage.objects.list permissions.

If the service account is from a Google Cloud project that is different from the project that created the BigQuery data transfer, then you must also enable cross-project service account authorization.

For more information, see BigQuery IAM roles and permissions.

Assess Snowflake data

BigQuery writes data from Snowflake to Cloud Storage as Parquet files. Parquet files don't support the TIMESTAMP_TZ and TIMESTAMP_LTZ data types. If your data contains these types, you can export it to Amazon S3 as CSV files and then import the CSV files into BigQuery. For more information, see Overview of Amazon S3 transfers.

Gather transfer information

Gather the information that you need to set up the migration with the BigQuery Data Transfer Service:

Set up an Snowflake transfer

Select one of the following options:

Console

  1. Go to the Data transfers page in the Google Cloud console.

    Go to Data transfers

  2. Click Create transfer.

  3. In the Source type section, select Snowflake Migration from the Source list.

  4. In the Transfer config name section, enter a name for the transfer, such as My migration, in the Display name field. The display name can be any value that lets you identify the transfer if you need to modify it later.

  5. In the Destination settings section, choose the dataset you created from the Dataset list.

  6. In the Data source details section, do the following:

    1. For Account identifier, enter a unique identifier for your Snowflake account, which is a combination of your organization name and account name. The identifier is the prefix of Snowflake account URL and not the complete URL. For example, ACCOUNT_IDENTIFIER.snowflakecomputing.com.
    2. For Username, enter the username of the Snowflake user whose credentials and authorization is used to access your database to transfer the Snowflake tables. We recommend using the user that you created for this transfer.
    3. For Password, enter the password of the Snowflake user.
    4. For Warehouse, enter a warehouse that is used for the execution of this data transfer.
    5. For Service account, enter a service account to use with this data transfer. The service account should belong to the same Google Cloud project where the transfer configuration and destination dataset is created. The service account must have the storage.objects.list and storage.objects.get required permissions.
    6. For Database, enter the name of the Snowflake database that contains the tables included in this data transfer.
    7. For Schema, enter the name of the Snowflake schema that contains the tables included in this data transfer.
    8. For Table name patterns, specify a table to transfer by entering a name or a pattern that matches the table name in the schema. You can use regular expressions to specify the pattern, for example table1_regex;table2_regex. The pattern should follow Java regular expression syntax. For example,

      • lineitem;ordertb matches tables that are named lineitem and ordertb.
      • .* matches all tables.
    9. For Translation output GCS path, specify a path to the Cloud Storage folder that contains the schema mapping files from the translation engine.

      • The path should follow the format gs:/translation_target_base_uri/dts/db/schema/ and must end with /.
    10. For Storage integration object name, enter the name of the Snowflake storage integration object. In this case, S3.

    11. For Cloud provider, select AWS.

    12. For GCS URI, Azure storage account name, Azure container name, Azure SAS, leave these fields blank.

    13. For Amazon S3 URI, enter the URI of the S3 bucket to use as a staging area.

    14. For Access key ID and Secret access key, enter the access key pair.

  7. Optional: In the Notification options section, do the following:

    1. Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
    2. For Select a Pub/Sub topic, choose your topic name or click Create a topic. This option configures Pub/Sub run notifications for your transfer.
  8. Click Save.

  9. The Google Cloud console displays all the transfer setup details, including a Resource name for this transfer.

bq

Enter the bq mk command and supply the transfer creation flag --transfer_config. The following flags are also required:

  • --project_id
  • --data_source
  • --target_dataset
  • --display_name
  • --params
bq mk \
    --transfer_config \
    --project_id=project_id \
    --data_source=data_source \
    --target_dataset=dataset \
    --display_name=name \
    --service_account_name=service_account \
    --params='parameters'

Replace the following:

  • project_id: your Google Cloud project ID. If --project_id isn't specified, the default project is used.
  • data_source: the data source, snowflake_migration.
  • dataset: the BigQuery target dataset for the transfer configuration.
  • name: the display name for the transfer configuration. The transfer name can be any value that lets you identify the transfer if you need to modify it later.
  • service_account: (Optional) the service account name used to authenticate your transfer. The service account should be owned by the same project_id used to create the transfer and it should have all of the required roles.
  • parameters: the parameters for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'.

Parameters required for an Snowflake transfer configuration are:

  • account_identifier: a unique identifier for your Snowflake account, which is a combination of your organization name and account name. The identifier is the prefix of Snowflake account URL and not the complete URL. For example, account_identifier.snowflakecomputing.com.
  • username: the username of the Snowflake user whose credentials and authorization is used to access your database to transfer the Snowflake tables.
  • password: enter the password of the Snowflake user.
  • warehouse: enter a warehouse that is used for the execution of this data transfer.
  • service_account: enter a service account to use with this data transfer. The service account should belong to the same Google Cloud project where the transfer configuration and destination dataset is created. The service account must have the storage.objects.list and storage.objects.get required permissions.
  • database: enter the name of the Snowflake database that contains the tables included in this data transfer.
  • schema: enter the name of the Snowflake schema that contains the tables included in this data transfer.
  • table_name_patterns: specify a table to transfer by entering a name or a pattern that matches the table name in the schema. You can use regular expressions to specify the pattern, for example table1_regex;table2_regex. The pattern should follow Java regular expression syntax. For example,

    • lineitem;ordertb matches tables that are named lineitem and ordertb.
    • .* matches all tables.

      You can also leave this field blank to migrate all tables from the specified schema.

  • translation_output_gcs_path: specify a path to the Cloud Storage folder that contains the schema mapping files from the translation engine.

    • The path should follow the format gs:/translation_target_base_uri/dts/db/schema/ and must end with /.
  • storage_integration_object_name: enter the name of the Snowflake storage integration object. In this case, S3.

  • cloud_provider: enter AWS.

  • amazon_s3_uri: enter the URI of the S3 bucket to use as a staging area.

  • aws_access_key_id: enter the access key pair.

  • aws_secret_access_key: enter the access key pair.

For example, the following command creates an Snowflake transfer named snowflake transfer config with a target dataset named your_bq_dataset and a project with the ID of your_project_id.

  PARAMS='{
  "account_identifier": "your_account_identifier",
  "aws_access_key_id": "your_access_key_id",
  "aws_secret_access_key": "your_aws_secret_access_key",
  "cloud_provider": "AWS",
  "database": "your_sf_database",
  "password": "your_sf_password",
  "schema": "your_snowflake_schema",
  "service_account": "your_service_account",
  "storage_integration_object_name": "your_storage_integration_object",
  "staging_s3_uri": "s3://your/s3/bucket/uri",
  "table_name_patterns": ".*",
  "translation_output_gcs_path": "gs://sf_test_translation/output/dts/database_name/schema_name/",
  "username": "your_sf_username",
  "warehouse": "your_warehouse"
}'

bq mk --transfer_config \
    --project_id=your_project_id \
    --target_dataset=your_bq_dataset \
    --display_name='snowflake transfer config' \
    --params="$PARAMS" \
    --data_source=snowflake_migration

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Quotas and limits

BigQuery has a load quota of 15 TB for each load job for each table. Internally, Snowflake compresses the table data, so the exported table size is larger than the table size reported by Snowflake. If you plan to migrate a table larger than 15 TB, please contact contact dts-migration-preview-support@google.com.

Because of Amazon S3's consistency model, it's possible that some files won't be included in the transfer to BigQuery.

Pricing

For information on BigQuery Data Transfer Service pricing, see the Pricing page.

  • If the Snowflake warehouse and the Amazon S3 bucket are in different regions, then Snowflake applies egress charges when you run a Snowflake data transfer. There are no egress charges for Snowflake data transfers if both the Snowflake warehouse and the Amazon S3 bucket are in the same region.
  • When data is transferred from AWS to Google Cloud, inter-cloud egress charges are applied.

What's next