Snowflake to BigQuery migration - introduction

This document provides an introduction to migrating data from Snowflake to BigQuery.

To support migrations to BigQuery, Google Cloud offers the BigQuery migration service as a comprehensive solution for migrating your data warehouse to BigQuery. The migration service offers features to support each phase of your migration, including assessment and planning, SQL translation, data transfer, and data validation. For more information, see Introduction to BigQuery migration service.

Feature comparison

The following sections highlight key differences and similarities between BigQuery and Snowflake, and introduce several features unique to BigQuery.

Terminology

The following table maps Snowflake terms to equivalent BigQuery terms:

Snowflake BigQuery
Database Dataset
Schema Schema
View View
Secure views Authorized Views
Virtual warehouse Reservation
Materialized view Materialized view
Micro-partitions Partitioning
Clustering Clustering
Security-enhanced user-defined functions (UDFs) Authorized UDFs

BigQuery architecture

BigQuery is an analytic data warehouse with a highly scalable architecture. BigQuery decouples its storage and compute, allowing these components to scale independently on demand. For more information about the architecture in BigQuery, see BigQuery overview. The following are several key features in BigQuery that are relevant to the data migration process:

  • BigQuery uses managed tables and offers flexible options for transfer of third-party data. This includes direct ingestion from Cloud Storage in portable formats like Parquet, Avro, ORC, and CSV.

    • BigQuery also supports Apache Iceberg tables. Iceberg is an open table format that brings data lake house capabilities to BigQuery, providing schema evolution, time travel, and ACID transactions. This allows for seamless integration with other systems that use Iceberg and provides greater flexibility and control over your data.
  • BigQuery lets you define explicit partitioning and clustering when you create tables. This gives you granular control over data organization, letting you optimize query performance, reduce costs by strategically partitioning and clustering, and tailor the data layout to your specific analytical needs.

  • BigQuery handles compute resources automatically. You don't need to provision or manage virtual machines.

    • For predictable workloads, BigQuery offers reservations, letting you reserve compute capacity in advance at a lower cost. This can be beneficial for consistent workloads with predictable resource needs.
    • During migration, analyze your Snowflake usage patterns to understand your compute requirements and help you determine the most cost-effective approach in BigQuery, whether it's on-demand pricing or using reservations.

BigQuery user-interface

The BigQuery interface is built into the Google Cloud console.

BigQuery also has a Python-based command-line tool called the bq command-line tool.

BigQuery security features

When you migrate from Snowflake to BigQuery, consider how Google Cloud handles security differently from Snowflake.

Security in BigQuery is intrinsically linked to Identity and Access Management (IAM) in Google Cloud. IAM privileges define the operations that are permitted on a resource and are enforced at the Google Cloud level, providing a centralized and consistent approach to security management. The following are some key security features of Google Cloud:

  • Integrated Security: BigQuery leverages Google Cloud's security features. This includes IAM for granular access control for robust and seamless security integration.
  • Resource-level security: IAM focuses on resource-level access control, granting permissions to users and groups for various BigQuery resources and services. This approach allows for effective management of access rights so that users only have the necessary permissions to perform their tasks.
  • Network security: BigQuery benefits from Google Cloud's robust network security features, such as Virtual Private Cloud and private connections.

When you migrate from Snowflake to BigQuery, consider the following security-related migration requirements:

  • IAM Configuration: You must configure IAM roles and permissions in BigQuery to match your existing Snowflake access control policies. This involves mapping Snowflake roles to appropriate BigQuery IAM roles and permissions.
  • Fine-grained access control: If you're using row-level or column-level security in Snowflake, you'll need to implement equivalent controls in BigQuery using authorized views or policy tags.
  • Views and UDF migration: When migrating views and UDFs, verify that the associated security controls are properly translated to authorized views and authorized UDFs in BigQuery.

Encryption

BigQuery encrypts your data at rest and in transit by default. If you require more control over encryption keys, BigQuery supports customer-managed encryption keys in the Cloud Key Management Service. You can also use column-level encryption.

To maintain data security during and after migration to BigQuery, consider the following:

  • Key Management: If you require customer-managed keys, establish a key management strategy in Cloud Key Management Service and configure BigQuery to use those keys.
  • Data Masking/Tokenization: If sensitive data is involved, assess whether data masking or tokenization is required to protect it.
  • Row-Level Security: Implement row-level security using authorized views, row-level security filters, or other appropriate methods.
  • Vulnerability Scanning and Penetration Testing: Conduct regular vulnerability scanning and penetration testing to check the security posture of your BigQuery environment.

Roles

Roles are the entities to which privileges on securable objects can be granted and revoked.

In IAM, permissions are grouped into roles. IAM provides three types of roles:

  • Basic roles: These roles include the Owner, Editor, and Viewer roles. You can apply these roles at the project or service resource levels by using the Google Cloud console, the Identity and Access Management API, or the gcloud CLI. In general, for the strongest security, we recommend that you use predefined roles to follow the principle of least privilege.
  • Predefined roles: These roles provide more granular access to features in a product (such as BigQuery) and are meant to support common use cases and access control patterns.
  • Custom roles: These roles are composed of user-specified permissions.

Access control

Snowflake lets you grant roles to other roles, creating a hierarchy of roles. IAM doesn't support a role hierarchy but implements a resource hierarchy. The IAM hierarchy includes the organization level, folder level, project level, and resource level. You can set IAM roles at any level of the hierarchy, and resources inherit all the policies of their parent resources.

BigQuery supports table-level access control. Table-level permissions determine the users, groups, and service accounts that can access a table or view. You can give a user access to specific tables or views without giving the user access to the complete dataset.

For more granular access, you can also use column-level access control or row-level security. This type of control provides fine-grained access to sensitive columns by using policy tags or type-based data classifications.

You can also create authorized views to limit data access for more fine-grained access control so that specified users can query a view without having read access to the underlying tables.

Migrate other Snowflake features

Consider the following Snowflake features as you plan your migration to BigQuery. In some cases, you can use other services in Google Cloud to complete your migration.

Get started

The following sections summarize the Snowflake to BigQuery migration process:

Run a migration assessment

In your Snowflake to BigQuery migration, we recommend that you start by running the BigQuery migration assessment tool to assess the feasibility and potential benefits of moving your data warehouse from Snowflake to BigQuery. This tool provides a structured approach to understanding your current Snowflake environment and estimating the effort involved in a successful migration.

Running the BigQuery migration assessment tool produces an assessment report that contains the following sections:

  • Existing system report: a snapshot of the existing Snowflake system and usage, including the number of databases, schemas, tables, and total size in TB. It also lists the schemas by size and points to potential sub-optimal resource utilization, like tables with no writes or few reads.
  • BigQuery steady state transformation suggestions: shows what the system will look like in BigQuery after the migration. It includes suggestions for optimizing workloads in BigQuery and avoiding wastage.
  • Migration plan: provides information about the migration effort itself. For example, getting from the existing system to the BigQuery steady state. This section includes the count of queries that were automatically translated and the expected time to move each table into BigQuery.

For more information about the results of a migration assessment, see Review the Looker Studio report.

Set up a Snowflake to BigQuery migration pipeline

Once you've reviewed the results of your migration assessment, you can start your Snowflake migration by setting up a migration pipeline. For more information, see Snowflake to BigQuery migration - overview.

Validate your migration

Once you've migrated your Snowflake data to BigQuery, run the Data Validation Tool (DVT) to perform a data validation on your newly migrated BigQuery data. The DVT validates various functions, from the table level to the row level, to verify that your migrated data works as intended.