Data definition language (DDL) statements in GoogleSQL

Data definition language (DDL) statements let you create and modify BigQuery resources using GoogleSQL query syntax. You can use DDL commands to create, alter, and delete resources, such as the following:

Required permissions

To create a job that runs a DDL statement, you must have the bigquery.jobs.create permission for the project where you are running the job. Each DDL statement also requires specific permissions on the affected resources, which are documented under each statement.

IAM roles

The predefined IAM roles bigquery.user, bigquery.jobUser, and bigquery.admin include the required bigquery.jobs.create permission.

For more information about IAM roles in BigQuery, see Predefined roles and permissions or the IAM permissions reference.

Run DDL statements

You can run DDL statements by using the Google Cloud console, by using the bq command-line tool, by calling the jobs.query REST API, or programmatically using the BigQuery API client libraries.

Console

  1. Go to the BigQuery page in the Google Cloud console.

    Go to BigQuery

  2. Click Compose new query.

  3. Enter the DDL statement into the Query editor text area. For example:

     CREATE TABLE mydataset.newtable ( x INT64 )
     

  4. Click Run.

bq

Enter the bq query command and supply the DDL statement as the query parameter. Set the use_legacy_sql flag to false.

bq query --use_legacy_sql=false \
  'CREATE TABLE mydataset.newtable ( x INT64 )'

API

Call the jobs.query method and supply the DDL statement in the request body's query property.

DDL functionality extends the information returned by a Jobs resource. statistics.query.statementType includes the following additional values for DDL support:

  • CREATE_TABLE
  • CREATE_TABLE_AS_SELECT
  • DROP_TABLE
  • CREATE_VIEW
  • DROP_VIEW

statistics.query has 2 additional fields:

  • ddlOperationPerformed: The DDL operation performed, possibly dependent on the existence of the DDL target. Current values include:
    • CREATE: The query created the DDL target.
    • SKIP: No-op. Examples — CREATE TABLE IF NOT EXISTS was submitted, and the table exists. Or DROP TABLE IF EXISTS was submitted, and the table does not exist.
    • REPLACE: The query replaced the DDL target. Example — CREATE OR REPLACE TABLE was submitted, and the table already exists.
    • DROP: The query deleted the DDL target.
  • ddlTargetTable: When you submit a CREATE TABLE/VIEW statement or a DROP TABLE/VIEW statement, the target table is returned as an object with 3 fields:
    • "projectId": string
    • "datasetId": string
    • "tableId": string

Java

Call the BigQuery.create() method to start a query job. Call the Job.waitFor() method to wait for the DDL query to finish.

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.QueryJobConfiguration;

// Sample to create a view using DDL
public class DDLCreateView {

  public static void runDDLCreateView() {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "MY_PROJECT_ID";
    String datasetId = "MY_DATASET_ID";
    String tableId = "MY_VIEW_ID";
    String ddl =
        "CREATE VIEW "
            + "`"
            + projectId
            + "."
            + datasetId
            + "."
            + tableId
            + "`"
            + " OPTIONS("
            + " expiration_timestamp=TIMESTAMP_ADD("
            + " CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),"
            + " friendly_name=\"new_view\","
            + " description=\"a view that expires in 2 days\","
            + " labels=[(\"org_unit\", \"development\")]"
            + " )"
            + " AS SELECT name, state, year, number"
            + " FROM `bigquery-public-data.usa_names.usa_1910_current`"
            + " WHERE state LIKE 'W%'`";
    ddlCreateView(ddl);
  }

  public static void ddlCreateView(String ddl) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration config = QueryJobConfiguration.newBuilder(ddl).build();

      // create a view using query and it will wait to complete job.
      Job job = bigquery.create(JobInfo.of(config));
      job = job.waitFor();
      if (job.isDone()) {
        System.out.println("View created successfully");
      } else {
        System.out.println("View was not created");
      }
    } catch (BigQueryException | InterruptedException e) {
      System.out.println("View was not created. \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function ddlCreateView() {
  // Creates a view via a DDL query

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const projectId = "my_project"
  // const datasetId = "my_dataset"
  // const tableId = "my_new_view"

  const query = `
  CREATE VIEW \`${projectId}.${datasetId}.${tableId}\`
  OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(
          CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
      friendly_name="new_view",
      description="a view that expires in 2 days",
      labels=[("org_unit", "development")]
  )
  AS SELECT name, state, year, number
      FROM \`bigquery-public-data.usa_names.usa_1910_current\`
      WHERE state LIKE 'W%'`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);

  job.on('complete', metadata => {
    console.log(`Created new view ${tableId} via job ${metadata.id}`);
  });
}

Python

Call the Client.query() method to start a query job. Call the QueryJob.result() method to wait for the DDL query to finish.

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

# from google.cloud import bigquery
# project = 'my-project'
# dataset_id = 'my_dataset'
# table_id = 'new_view'
# client = bigquery.Client(project=project)

sql = """
CREATE VIEW `{}.{}.{}`
OPTIONS(
    expiration_timestamp=TIMESTAMP_ADD(
        CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
    friendly_name="new_view",
    description="a view that expires in 2 days",
    labels=[("org_unit", "development")]
)
AS SELECT name, state, year, number
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    WHERE state LIKE 'W%'
""".format(
    project, dataset_id, table_id
)

job = client.query(sql)  # API request.
job.result()  # Waits for the query to finish.

print(
    'Created new view "{}.{}.{}".'.format(
        job.destination.project,
        job.destination.dataset_id,
        job.destination.table_id,
    )
)

On-demand query size calculation

If you use on-demand billing, BigQuery charges for data definition language (DDL) queries based on the number of bytes processed by the query.

DDL statement Bytes processed
CREATE TABLE None.
CREATE TABLE ... AS SELECT ... The sum of bytes processed for all the columns referenced from the tables scanned by the query.
CREATE VIEW None.
DROP TABLE None.
DROP VIEW None.

For more information about cost estimation, see Estimate and control costs.

CREATE SCHEMA statement

Creates a new dataset.

Syntax

CREATE SCHEMA [ IF NOT EXISTS ]
[project_name.]dataset_name
[DEFAULT COLLATE collate_specification]
[OPTIONS(schema_option_list)]

Arguments

  • IF NOT EXISTS: If any dataset exists with the same name, the CREATE statement has no effect.

  • DEFAULT COLLATE collate_specification: When a new table is created in the dataset, the table inherits a default collation specification unless a collation specification is explicitly specified for a table or a column.

    If you remove or change this collation specification later with the ALTER SCHEMA statement, this will not change existing collation specifications in this dataset. If you want to update an existing collation specification in a dataset, you must alter the column that contains the specification.

  • project_name: The name of the project where you are creating the dataset. Defaults to the project that runs this DDL statement.

  • dataset_name: The name of the dataset to create.

  • schema_option_list: A list of options for creating the dataset.

Details

The dataset is created in the location that you specify in the query settings. For more information, see Specifying your location.

For more information about creating a dataset, see Creating datasets. For information about quotas, see Dataset limits.

schema_option_list

The option list specifies options for the dataset. Specify the options in the following format: NAME=VALUE, ...

The following options are supported:

NAME VALUE Details
default_kms_key_name STRING Specifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table.
default_partition_expiration_days FLOAT64 Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table.
default_rounding_mode

STRING

Example: default_rounding_mode = "ROUND_HALF_EVEN"

This specifies the defaultRoundingMode that is used for new tables created in this dataset. It does not impact existing tables. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.
default_table_expiration_days FLOAT64 Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table.
description STRING The description of the dataset.
failover_reservation STRING Associates the dataset to a reservation in the case of a failover scenario.
friendly_name STRING A descriptive name for the dataset.
is_case_insensitive BOOL TRUE if the dataset and its table names are case-insensitive, otherwise FALSE. By default, this is FALSE, which means the dataset and its table names are case-sensitive.
  • Datasets: mydataset and MyDataset can coexist in the same project, unless one of them has case-sensitivity turned off.
  • Tables: mytable and MyTable can coexist in the same dataset if case-sensitivity for the dataset is turned on.
is_primary BOOLEAN Declares if the dataset is the primary replica.
labels <ARRAY<STRUCT<STRING, STRING>>> An array of labels for the dataset, expressed as key-value pairs.
location STRING The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails.
max_time_travel_hours SMALLINT Specifies the duration in hours of the time travel window for the dataset. The max_time_travel_hours value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified.
primary_replica STRING The replica name to set as the primary replica.
storage_billing_model STRING

Alters the storage billing model for the dataset. Set the storage_billing_model value to PHYSICAL to use physical bytes when calculating storage charges, or to LOGICAL to use logical bytes. LOGICAL is the default.

The storage_billing_model option is only available for datasets that have been updated after December 1, 2022. For datasets that were last updated before that date, the storage billing model is LOGICAL.

When you change a dataset's billing model, it takes 24 hours for the change to take effect.

Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.datasets.create The project where you create the dataset.

Examples

Creating a new dataset

The following example creates a dataset with a default table expiration and a set of labels.

CREATE SCHEMA mydataset
OPTIONS(
  location="us",
  default_table_expiration_days=3.75,
  labels=[("label1","value1"),("label2","value2")]
  )

Creating a case-insensitive dataset

The following example creates a case-insensitive dataset. Both the dataset name and table names inside the dataset are case-insensitive.

CREATE SCHEMA mydataset
OPTIONS(
  is_case_insensitive=TRUE
)

Creating a dataset with collation support

The following example creates a dataset with a collation specification.

CREATE SCHEMA mydataset
DEFAULT COLLATE 'und:ci'

CREATE TABLE statement

Creates a new table.

Syntax

CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ]
table_name
[(
  column | constraint_definition[, ...]
)]
[DEFAULT COLLATE collate_specification]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[WITH CONNECTION connection_name]
[OPTIONS(table_option_list)]
[AS query_statement]

column:=
column_definition

constraint_definition:=
[primary_key]
| [[CONSTRAINT constraint_name] foreign_key, ...]

primary_key :=
PRIMARY KEY (column_name[, ...]) NOT ENFORCED

foreign_key :=
FOREIGN KEY (column_name[, ...]) foreign_reference

foreign_reference :=
REFERENCES primary_key_table(column_name[, ...]) NOT ENFORCED

Arguments

  • OR REPLACE: Replaces any table with the same name if it exists. Cannot appear with IF NOT EXISTS.

  • TEMP | TEMPORARY: Creates a temporary table.

  • IF NOT EXISTS: If any table exists with the same name, the CREATE statement has no effect. Cannot appear with OR REPLACE.

  • table_name: The name of the table to create. See Table path syntax. For temporary tables, do not include the project name or dataset name.

  • column: The table's schema information.

  • constraint_definition: An expression that defines a table constraint.

  • collation_specification: When a new column is added to the table without an explicit collation specification, the column inherits this collation specification for STRING types.

    If you remove or change this collation specification later with the ALTER TABLE statement, this will not change existing collation specifications in this table. If you want to update an existing collation specification in a table, you must alter the column that contains the specification.

    If the table is part of a dataset, the default collation specification for this table overrides the default collation specification for the dataset.

  • partition_expression: An expression that determines how to partition the table.

  • clustering_column_list: A comma-separated list of column references that determine how to cluster the table. You cannot have collation on columns in this list.

  • connection_name: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`).

  • table_option_list: A list of options for creating the table.

  • query_statement: The query from which the table should be created. For the query syntax, see SQL syntax reference. If a collation specification is used on this table, collation passes through this query statement.

  • primary_key: An expression that defines a primary key table constraint.

  • foreign_key: An expression that defines a foreign key table constraint.

Details

CREATE TABLE statements must comply with the following rules:

  • Only one CREATE statement is allowed.
  • Either the column list, the AS query_statement clause, or both must be present.
  • When both the column list and the AS query_statement clause are present, BigQuery ignores the names in the AS query_statement clause and matches the columns with the column list by position.
  • When the AS query_statement clause is present and the column list is absent, BigQuery determines the column names and types from the AS query_statement clause.
  • Column names must be specified either through the column list, the AS query_statement clause or schema of the table in the LIKE clause.
  • Duplicate column names are not allowed.
  • When both the LIKE and the AS query_statement clause are present, the column list in the query statement must match the columns of the table referenced by the LIKE clause.
  • Table names are case-sensitive unless the dataset they belong to is not. To create a case-insensitive dataset, see Creating a case-insensitive dataset. To alter a dataset to make it case-insensitive dataset, see Turning on case insensitivity for a dataset.

Limitations:

  • It is not possible to create an ingestion-time partitioned table from the result of a query. Instead, use a CREATE TABLE DDL statement to create the table, and then use an INSERT DML statement to insert data into it.
  • It is not possible to use the OR REPLACE modifier to replace a table with a different kind of partitioning. Instead, DROP the table, and then use a CREATE TABLE ... AS SELECT ... statement to recreate it.

This statement supports the following variants, which have the same limitations:

column

(column_name column_schema[, ...]) contains the table's schema information in a comma-separated list.

column :=
  column_name column_schema

column_schema :=
   {
     simple_type
     | STRUCT<field_list>
     | ARRAY<array_element_schema>
   }
   [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED]
   [DEFAULT default_expression]
   [NOT NULL]
   [OPTIONS(column_option_list)]

simple_type :=
  { data_type | STRING COLLATE collate_specification }

field_list :=
  field_name column_schema [, ...]

array_element_schema :=
  { simple_type | STRUCT<field_list> }
  [NOT NULL]
  • column_name is the name of the column. A column name:

    • Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
    • Must start with a letter or underscore
    • Can be up to 300 characters
  • column_schema: Similar to a data type, but supports an optional NOT NULL constraint for types other than ARRAY. column_schema also supports options on top-level columns and STRUCT fields.

    column_schema can be used only in the column definition list of CREATE TABLE statements. It cannot be used as a type in expressions.

  • simple_type: Any supported data type aside from STRUCT and ARRAY.

    If simple_type is a STRING, it supports an additional clause for collation, which defines how a resulting STRING can be compared and sorted. The syntax looks like this:

    STRING COLLATE collate_specification
    

    If you have DEFAULT COLLATE collate_specification assigned to the table, the collation specification for a column overrides the specification for the table.

  • default_expression: The default value assigned to the column.

  • field_list: Represents the fields in a struct.

  • field_name: The name of the struct field. Struct field names have the same restrictions as column names.

  • NOT NULL: When the NOT NULL constraint is present for a column or field, the column or field is created with REQUIRED mode. Conversely, when the NOT NULL constraint is absent, the column or field is created with NULLABLE mode.

    Columns and fields of ARRAY type do not support the NOT NULL modifier. For example, a column_schema of ARRAY<INT64> NOT NULL is invalid, since ARRAY columns have REPEATED mode and can be empty but cannot be NULL. An array element in a table can never be NULL, regardless of whether the NOT NULL constraint is specified. For example, ARRAY<INT64> is equivalent to ARRAY<INT64 NOT NULL>.

    The NOT NULL attribute of a table's column_schema does not propagate through queries over the table. If table T contains a column declared as x INT64 NOT NULL, for example, CREATE TABLE dataset.newtable AS SELECT x FROM T creates a table named dataset.newtable in which x is NULLABLE.

partition_expression

PARTITION BY is an optional clause that controls table partitioning. partition_expression is an expression that determines how to partition the table. The partition expression can contain the following values:

  • _PARTITIONDATE. Partition by ingestion time with daily partitions. This syntax cannot be used with the AS query_statement clause.
  • DATE(_PARTITIONTIME). Equivalent to _PARTITIONDATE. This syntax cannot be used with the AS query_statement clause.
  • <date_column>. Partition by a DATE column with daily partitions.
  • DATE({ <timestamp_column> | <datetime_column> }). Partition by a TIMESTAMP or DATETIME column with daily partitions.
  • DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR }). Partition by a DATETIME column with the specified partitioning type.
  • TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }). Partition by a TIMESTAMP column with the specified partitioning type.
  • TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR }). Partition by ingestion time with the specified partitioning type. This syntax cannot be used with the AS query_statement clause.
  • DATE_TRUNC(<date_column>, { MONTH | YEAR }). Partition by a DATE column with the specified partitioning type.
  • RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>])). Partition by an integer column with the specified range, where:

    • start is the start of range partitioning, inclusive.
    • end is the end of range partitioning, exclusive.
    • interval is the width of each range within the partition. Defaults to 1.

clustering_column_list

CLUSTER BY is an optional clause that controls table clustering. clustering_column_list is a comma-separated list that determines how to cluster the table. The clustering column list can contain a list of up to four clustering columns.

table_option_list

The option list lets you set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

partition_expiration_days

FLOAT64

Example: partition_expiration_days=7

Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire.

This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example: require_partition_filter=true

Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value is false.

This property is equivalent to the timePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

kms_key_name

STRING

Example: kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property.

See more details about Protecting data with Cloud KMS keys.

friendly_name

STRING

Example: friendly_name="my_table"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a table that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

default_rounding_mode

STRING

Example: default_rounding_mode = "ROUND_HALF_EVEN"

This specifies the default rounding mode that's used for values written to any new NUMERIC or BIGNUMERIC type columns or STRUCT fields in the table. It does not impact existing fields in the table. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0.

This property is equivalent to the defaultRoundingMode table resource property.

enable_change_history

BOOL

In preview.

Example: enable_change_history=TRUE

Set this property to TRUE in order to capture change history on the table, which you can then view by using the CHANGES function. Enabling this table option has an impact on costs; for more information see Pricing and costs. The default is FALSE.

max_staleness

INTERVAL

Example: max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

The maximum interval behind the current time where it's acceptable to read stale data. For example, with change data capture, when this option is set, the table copy operation is denied if data is more stale than the max_staleness value.

max_staleness is disabled by default.

enable_fine_grained_mutations

BOOL

In preview.

Example: enable_fine_grained_mutations=TRUE

Set this property to TRUE to enable fine-grained DML optimization on the table. The default is FALSE.

storage_uri

STRING

In preview.

Example: storage_uri=gs://BUCKET_DIRECTORY/TABLE_DIRECTORY/

A fully qualified location prefix for the external folder where data is stored. Supports gs: buckets.

Required for managed tables.

table_format

STRING

In preview.

Example: table_format=PARQUET

The open-source file format in which the table data is stored. Only PARQUET is supported.

Required for managed tables.

The default is PARQUET.

file_format

STRING

In preview.

Example: file_format=ICEBERG

The open table format in which metadata-only snapshots are stored. Only ICEBERG is supported.

Required for managed tables.

The default is ICEBERG.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

If VALUE evaluates to NULL, the corresponding option NAME in the CREATE TABLE statement is ignored.

column_option_list

Specify a column option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
description

STRING

Example: description="a unique id"

This property is equivalent to the schema.fields[].description table resource property.

rounding_mode

STRING

Example: rounding_mode = "ROUND_HALF_EVEN"

This specifies the rounding mode that's used for values written to a NUMERIC or BIGNUMERIC type column or STRUCT field. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.

This property is equivalent to the roundingMode table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting the VALUE replaces the existing value of that option for the column, if there was one. Setting the VALUE to NULL clears the column's value for that option.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the table.

In addition, the OR REPLACE clause requires bigquery.tables.update and bigquery.tables.updateData permissions.

If the OPTIONS clause includes any expiration options, then the bigquery.tables.delete permission is also required.

Examples

Creating a new table

The following example creates a partitioned table named newtable in mydataset:

CREATE TABLE mydataset.newtable
(
  x INT64 OPTIONS(description="An optional INTEGER field"),
  y STRUCT <
    a ARRAY <STRING> OPTIONS(description="A repeated STRING field"),
    b BOOL
  >
)
PARTITION BY _PARTITIONDATE
OPTIONS(
  expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC",
  partition_expiration_days=1,
  description="a table that expires in 2025, with each partition living for 24 hours",
  labels=[("org_unit", "development")]
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table uses the following partition_expression to partition the table: PARTITION BY _PARTITIONDATE. This expression partitions the table using the date in the _PARTITIONDATE pseudocolumn.

The table schema contains two columns:

  • x: An integer, with description "An optional INTEGER field"
  • y: A STRUCT containing two columns:

    • a: An array of strings, with description "A repeated STRING field"
    • b: A boolean

The table option list specifies the:

  • Table expiration time: January 1, 2025 at 00:00:00 UTC
  • Partition expiration time: 1 day
  • Description: A table that expires in 2025
  • Label: org_unit = development

Creating a new table from an existing table

The following example creates a table named top_words in mydataset from a query:

CREATE TABLE mydataset.top_words
OPTIONS(
  description="Top ten words per Shakespeare corpus"
) AS
SELECT
  corpus,
  ARRAY_AGG(STRUCT(word, word_count) ORDER BY word_count DESC LIMIT 10) AS top_words
FROM `bigquery-public-data`.samples.shakespeare
GROUP BY corpus;

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.top_words, your table qualifier might be `myproject.mydataset.top_words`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table schema contains 2 columns:

  • corpus: Name of a Shakespeare corpus
  • top_words: An ARRAY of STRUCTs containing 2 fields: word (a STRING) and word_count (an INT64 with the word count)

The table option list specifies the:

  • Description: Top ten words per Shakespeare corpus

Creating a table only if the table doesn't exist

The following example creates a table named newtable in mydataset only if no table named newtable exists in mydataset. If the table name exists in the dataset, no error is returned, and no action is taken.

CREATE TABLE IF NOT EXISTS mydataset.newtable (x INT64, y STRUCT <a ARRAY <STRING>, b BOOL>)
OPTIONS(
  expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC",
  description="a table that expires in 2025",
  labels=[("org_unit", "development")]
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • x: An integer
  • y: A STRUCT containing a (an array of strings) and b (a boolean)

The table option list specifies the:

  • Expiration time: January 1, 2025 at 00:00:00 UTC
  • Description: A table that expires in 2025
  • Label: org_unit = development

Creating or replacing a table

The following example creates a table named newtable in mydataset, and if newtable exists in mydataset, it is overwritten with an empty table.

CREATE OR REPLACE TABLE mydataset.newtable (x INT64, y STRUCT <a ARRAY <STRING>, b BOOL>)
OPTIONS(
  expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC",
  description="a table that expires in 2025",
  labels=[("org_unit", "development")]
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • x: An integer
  • y: A STRUCT containing a (an array of strings) and b (a boolean)

The table option list specifies the:

  • Expiration time: January 1, 2025 at 00:00:00 UTC
  • Description: A table that expires in 2025
  • Label: org_unit = development

Creating a table with REQUIRED columns

The following example creates a table named newtable in mydataset. The NOT NULL modifier in the column definition list of a CREATE TABLE statement specifies that a column or field is created in REQUIRED mode.

CREATE TABLE mydataset.newtable (
  x INT64 NOT NULL,
  y STRUCT <
    a ARRAY <STRING>,
    b BOOL NOT NULL,
    c FLOAT64
  > NOT NULL,
  z STRING
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table schema contains 3 columns:

  • x: A REQUIRED integer
  • y: A REQUIRED STRUCT containing a (an array of strings), b (a REQUIRED boolean), and c (a NULLABLE float)
  • z: A NULLABLE string

Creating a table with collation support

The following examples create a table named newtable in mydataset with columns a, b, c, and a struct with fields x and y.

All STRING column schemas in this table are collated with 'und:ci':

CREATE TABLE mydataset.newtable (
  a STRING,
  b STRING,
  c STRUCT <
    x FLOAT64
    y ARRAY <STRING>
  >
)
DEFAULT COLLATE 'und:ci';

Only b and y are collated with 'und:ci':

CREATE TABLE mydataset.newtable (
  a STRING,
  b STRING COLLATE 'und:ci',
  c STRUCT <
    x FLOAT64
    y ARRAY <STRING COLLATE 'und:ci'>
  >
);

Creating a table with parameterized data types

The following example creates a table named newtable in mydataset. The parameters in parentheses specify that the column contains a parameterized data type. See Parameterized Data Types for more information about parameterized types.

CREATE TABLE mydataset.newtable (
  x STRING(10),
  y STRUCT <
    a ARRAY <BYTES(5)>,
    b NUMERIC(15, 2) OPTIONS(rounding_mode = 'ROUND_HALF_EVEN'),
    c FLOAT64
  >,
  z BIGNUMERIC(35)
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. Instead of mydataset.newtable, your table qualifier should be `myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The table schema contains 3 columns:

  • x: A parameterized string with a maximum length of 10
  • y: A STRUCT containing a (an array of parameterized bytes with a maximum length of 5), b (a parameterized NUMERIC with a maximum precision of 15, maximum scale of 2, and rounding mode set to 'ROUND_HALF_EVEN'), and c (a float)
  • z: A parameterized BIGNUMERIC with a maximum precision of 35 and maximum scale of 0

Creating a partitioned table

The following example creates a partitioned table named newtable in mydataset using a DATE column:

CREATE TABLE mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY transaction_date
OPTIONS(
  partition_expiration_days=3,
  description="a table partitioned by transaction_date"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.newtable, your table qualifier might be `myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • transaction_id: An integer
  • transaction_date: A date

The table option list specifies the:

  • Partition expiration: Three days
  • Description: A table partitioned by transaction_date

Creating a partitioned table from the result of a query

The following example creates a partitioned table named days_with_rain in mydataset using a DATE column:

CREATE TABLE mydataset.days_with_rain
PARTITION BY date
OPTIONS (
  partition_expiration_days=365,
  description="weather stations with precipitation, partitioned by day"
) AS
SELECT
  DATE(CAST(year AS INT64), CAST(mo AS INT64), CAST(da AS INT64)) AS date,
  (SELECT ANY_VALUE(name) FROM `bigquery-public-data.noaa_gsod.stations` AS stations
   WHERE stations.usaf = stn) AS station_name,  -- Stations can have multiple names
  prcp
FROM `bigquery-public-data.noaa_gsod.gsod2017` AS weather
WHERE prcp != 99.9  -- Filter unknown values
  AND prcp > 0      -- Filter stations/days with no precipitation

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.days_with_rain, your table qualifier might be `myproject.mydataset.days_with_rain`.

The table schema contains 2 columns:

  • date: The DATE of data collection
  • station_name: The name of the weather station as a STRING
  • prcp: The amount of precipitation in inches as a FLOAT64

The table option list specifies the:

  • Partition expiration: One year
  • Description: Weather stations with precipitation, partitioned by day

Creating a clustered table

Example 1

The following example creates a clustered table named myclusteredtable in mydataset. The table is a partitioned table, partitioned by a truncated TIMESTAMP column and clustered by a STRING column named customer_id.

CREATE TABLE mydataset.myclusteredtable
(
  input_timestamp TIMESTAMP,
  customer_id STRING,
  transaction_amount NUMERIC
)
PARTITION BY TIMESTAMP_TRUNC(input_timestamp, HOUR)
CLUSTER BY customer_id
OPTIONS (
  partition_expiration_days=3,
  description="a table clustered by customer_id"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 3 columns:

  • input_timestamp: The time of data collection as a TIMESTAMP
  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description: A table clustered by customer_id
Example 2

The following example creates a clustered table named myclusteredtable in mydataset. The table is an ingestion-time partitioned table.

CREATE TABLE mydataset.myclusteredtable
(
  customer_id STRING,
  transaction_amount NUMERIC
)
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY
  customer_id
OPTIONS (
  partition_expiration_days=3,
  description="a table clustered by customer_id"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description: A table clustered by customer_id
Example 3

The following example creates a clustered table named myclusteredtable in mydataset. The table is not partitioned.

CREATE TABLE mydataset.myclusteredtable
(
  customer_id STRING,
  transaction_amount NUMERIC
)
CLUSTER BY
  customer_id
OPTIONS (
  description="a table clustered by customer_id"
)

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Description: A table clustered by customer_id

Creating a clustered table from the result of a query

Example 1

The following example creates a partitioned and clustered table named myclusteredtable in mydataset using the result of a query.

CREATE TABLE mydataset.myclusteredtable
(
  input_timestamp TIMESTAMP,
  customer_id STRING,
  transaction_amount NUMERIC
)
PARTITION BY DATE(input_timestamp)
CLUSTER BY
  customer_id
OPTIONS (
  partition_expiration_days=3,
  description="a table clustered by customer_id"
)
AS SELECT * FROM mydataset.myothertable

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 3 columns:

  • input_timestamp: The time of data collection as a TIMESTAMP
  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description: A table clustered by customer_id
Example 2

The following example creates a clustered table named myclusteredtable in mydataset using the result of a query. The table is not partitioned.

CREATE TABLE mydataset.myclusteredtable
(
  customer_id STRING,
  transaction_amount NUMERIC
)
CLUSTER BY
  customer_id
OPTIONS (
  description="a table clustered by customer_id"
)
AS SELECT * FROM mydataset.myothertable

If you haven't configured a default project, prepend a project ID to the dataset name in the example SQL, and enclose the name in backticks if project_id contains special characters: `project_id.dataset.table`. So, instead of mydataset.myclusteredtable, your table qualifier might be `myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as a STRING
  • transaction_amount: The transaction amount as NUMERIC

The table option list specifies the:

  • Description: A table clustered by customer_id

Creating a temporary table

The following example creates a temporary table named Example and inserts values into it.

CREATE TEMP TABLE Example
(
  x INT64,
  y STRING
);

INSERT INTO Example
VALUES (5, 'foo');

INSERT INTO Example
VALUES (6, 'bar');

SELECT *
FROM Example;

This script returns the following output:

+-----+---+-----+
| Row | x | y   |
+-----+---|-----+
| 1   | 5 | foo |
| 2   | 6 | bar |
+-----+---|-----+

Load data across clouds

Example 1

Suppose you have a BigLake table named myawsdataset.orders that references data from Amazon S3. You want to transfer data from that table to a BigQuery table myotherdataset.shipments in the US multi-region.

First, display information about the myawsdataset.orders table:

    bq show myawsdataset.orders;

The output is similar to the following:

  Last modified             Schema              Type     Total URIs   Expiration
----------------- -------------------------- ---------- ------------ -----------
  31 Oct 17:40:28   |- l_orderkey: integer     EXTERNAL   1
                    |- l_partkey: integer
                    |- l_suppkey: integer
                    |- l_linenumber: integer
                    |- l_returnflag: string
                    |- l_linestatus: string
                    |- l_commitdate: date

Next, display information about the myotherdataset.shipments table:

  bq show myotherdataset.shipments

The output is similar to the following. Some columns are omitted to simplify the output.

  Last modified             Schema             Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical
 ----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ ---------------
  31 Oct 17:34:31   |- l_orderkey: integer      3086653      210767042                                                         210767042
                    |- l_partkey: integer
                    |- l_suppkey: integer
                    |- l_commitdate: date
                    |- l_shipdate: date
                    |- l_receiptdate: date
                    |- l_shipinstruct: string
                    |- l_shipmode: string

Now, using the CREATE TABLE AS SELECT statement you can selectively load data to the myotherdataset.orders table in the US multi-region:

CREATE OR REPLACE TABLE
  myotherdataset.orders
  PARTITION BY DATE_TRUNC(l_commitdate, YEAR) AS
SELECT
  *
FROM
  myawsdataset.orders
WHERE
  EXTRACT(YEAR FROM l_commitdate) = 1992;

You can then perform a join operation with the newly created table:

SELECT
  orders.l_orderkey,
  orders.l_orderkey,
  orders.l_suppkey,
  orders.l_commitdate,
  orders.l_returnflag,
  shipments.l_shipmode,
  shipments.l_shipinstruct
FROM
  myotherdataset.shipments
JOIN
  `myotherdataset.orders` as orders
ON
  orders.l_orderkey = shipments.l_orderkey
AND orders.l_partkey = shipments.l_partkey
AND orders.l_suppkey = shipments.l_suppkey
WHERE orders.l_returnflag = 'R'; -- 'R' means refunded.

When new data is available, append the data of the 1993 year to the destination table using the INSERT INTO SELECT statement:

INSERT INTO
   myotherdataset.orders
 SELECT
   *
 FROM
   myawsdataset.orders
 WHERE
   EXTRACT(YEAR FROM l_commitdate) = 1993;

Example 2

The following example inserts data into an ingestion-time partitioned table:

CREATE TABLE
 mydataset.orders(id String, numeric_id INT64)
PARTITION BY _PARTITIONDATE;

After creating a partitioned table, you can insert data into the ingestion-time partitioned table:

INSERT INTO
 mydataset.orders(
   _PARTITIONTIME,
   id,
   numeric_id)
SELECT
 TIMESTAMP("2023-01-01"),
 id,
 numeric_id,
FROM
 mydataset.ordersof23
WHERE
 numeric_id > 4000000;

CREATE TABLE LIKE statement

Creates a new table with all of the same metadata of another table.

Syntax

CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ]
table_name
LIKE [[project_name.]dataset_name.]source_table_name
...
[OPTIONS(table_option_list)]

Details

This statement is a variant of the CREATE TABLE statement and has the same limitations. Other than the use of the LIKE clause in place of a column list, the syntax is identical to the CREATE TABLE syntax.

The CREATE TABLE LIKE statement copies only the metadata of the source table. You can use the AS query_statement clause to include data into the new table.

The new table has no relationship to the source table after creation; thus modifications to the source table will not propagate to the new table.

By default, the new table inherits partitioning, clustering, and options metadata from the source table. You can customize metadata in the new table by using the optional clauses in the SQL statement. For example, if you want to specify a different set of options for the new table, then include the OPTIONS clause with a list of options and values. This behavior matches that of ALTER TABLE SET OPTIONS.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the table.
bigquery.tables.get The source table.

In addition, the OR REPLACE clause requires bigquery.tables.update and bigquery.tables.updateData permissions.

If the OPTIONS clause includes any expiration options, then the bigquery.tables.delete permission is also required.

Examples

Example 1

The following example creates a new table named newtable in mydataset with the same metadata as sourcetable:

CREATE TABLE mydataset.newtable
LIKE mydataset.sourcetable

Example 2

The following example creates a new table named newtable in mydataset with the same metadata as sourcetable and the data from the SELECT statement:

CREATE TABLE mydataset.newtable
LIKE mydataset.sourcetable
AS SELECT * FROM mydataset.myothertable

CREATE TABLE COPY statement

Creates a table that has the same metadata and data as another table. The source table can be a table, a table clone, or a table snapshot.

Syntax

CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] table_name
COPY source_table_name
...
[OPTIONS(table_option_list)]

Details

This statement is a variant of the CREATE TABLE statement and has the same limitations. Other than the use of the COPY clause in place of a column list, the syntax is identical to the CREATE TABLE syntax.

The CREATE TABLE COPY statement copies both the metadata and data from the source table.

The new table inherits partitioning and clustering from the source table. By default, the table options metadata from the source table are also inherited, but you can override table options by using the OPTIONS clause. The behavior is equivalent to running ALTER TABLE SET OPTIONS after the table is copied.

The new table has no relationship to the source table after creation; modifications to the source table are not propagated to the new table.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the table copy.
bigquery.tables.get The source table.
bigquery.tables.getData The source table.

In addition, the OR REPLACE clause requires bigquery.tables.update and bigquery.tables.updateData permissions.

If the OPTIONS clause includes any expiration options, then the bigquery.tables.delete permission is also required.

CREATE SNAPSHOT TABLE statement

Creates a table snapshot based on a source table. The source table can be a table, a table clone, or a table snapshot.

Syntax

CREATE SNAPSHOT TABLE [ IF NOT EXISTS ] table_snapshot_name
CLONE source_table_name
[FOR SYSTEM_TIME AS OF time_expression]
[OPTIONS(snapshot_option_list)]

Arguments

  • IF NOT EXISTS: If a table snapshot or other table resource exists with the same name, the CREATE statement has no effect.

  • table_snapshot_name: The name of the table snapshot that you want to create. The table snapshot name must be unique per dataset. See Table path syntax.

  • source_table_name: The name of the table that you want to snapshot or the table snapshot that you want to copy. See Table path syntax.

    If the source table is a standard table, then BigQuery creates a table snapshot of the source table. If the source table is a table snapshot, then BigQuery creates a copy of the table snapshot.

  • FOR SYSTEM_TIME AS OF: Lets you select the version of the table that was current at the time specified by timestamp_expression. It can only be used when creating a snapshot of a table; it can't be used when making a copy of a table snapshot.

  • snapshot_option_list: Additional table snapshot creation options such as a label and an expiration time.

Details

CREATE SNAPSHOT TABLE statements must comply with the following rules:

  • Only one CREATE statement is allowed.
  • The source table must be one of the following:
    • A table
    • A table clone
    • A table snapshot
  • The FOR SYSTEM_TIME AS OF clause can only be used when creating a snapshot of a table or table clone; it can't be used when making a copy of a table snapshot.

snapshot_option_list

The option list lets you set table snapshot options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table snapshot option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

friendly_name

STRING

Example: friendly_name="my_table_snapshot"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="A table snapshot that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

VALUE is a constant expression that contains only literals, query parameters, and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, and UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

If VALUE evaluates to NULL, the corresponding option NAME in the CREATE SNAPSHOT TABLE statement is ignored.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the table snapshot.
bigquery.tables.createSnapshot The source table.
bigquery.tables.get The source table.
bigquery.tables.getData The source table.

Examples

Create a table snapshot: fail if it already exists

The following example creates a table snapshot of the table myproject.mydataset.mytable. The table snapshot is created in the dataset mydataset and is named mytablesnapshot:

CREATE SNAPSHOT TABLE `myproject.mydataset.mytablesnapshot`
CLONE `myproject.mydataset.mytable`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="my_table_snapshot",
  description="A table snapshot that expires in 2 days",
  labels=[("org_unit", "development")]
)

If the table snapshot name already exists in the dataset, then the following error is returned:

Already Exists: myproject.mydataset.mytablesnapshot

The table snapshot option list specifies the following:

  • Expiration time: 48 hours after the time the table snapshot is created
  • Friendly name: my_table_snapshot
  • Description: A table snapshot that expires in 2 days
  • Label: org_unit = development

Create a table snapshot: ignore if it already exists

The following example creates a table snapshot of the table myproject.mydataset.mytable. The table snapshot is created in the dataset mydataset and is named mytablesnapshot:

CREATE SNAPSHOT TABLE IF NOT EXISTS `myproject.mydataset.mytablesnapshot`
CLONE `myproject.mydataset.mytable`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="my_table_snapshot",
  description="A table snapshot that expires in 2 days"
  labels=[("org_unit", "development")]
)

The table snapshot option list specifies the following:

  • Expiration time: 48 hours after the time the table snapshot is created
  • Friendly name: my_table_snapshot
  • Description: A table snapshot that expires in 2 days
  • Label: org_unit = development

If the table snapshot name already exists in the dataset, then no action is taken, and no error is returned.

For information about restoring table snapshots, see CREATE TABLE CLONE.

For information about removing table snapshots, see DROP SNAPSHOT TABLE.

CREATE TABLE CLONE statement

Creates a table clone based on a source table. The source table can be a table, a table clone, or a table snapshot.

Syntax

CREATE TABLE [ IF NOT EXISTS ]
destination_table_name
CLONE source_table_name [FOR SYSTEM_TIME AS OF time_expression]
...
[OPTIONS(table_option_list)]

Details

Other than the use of the CLONE clause in place of a column list, the syntax is identical to the CREATE TABLE syntax.

Arguments

  • IF NOT EXISTS: If the specified destination table name already exists, the CREATE statement has no effect.

  • destination_table_name: The name of the table that you want to create. The table name must be unique per dataset. The table name can contain the following:

    • Up to 1,024 characters
    • Letters (upper or lower case), numbers, and underscores
  • OPTIONS(table_option_list): Lets you specify additional table creation options such as a label and an expiration time.

  • source_table_name: The name of the source table.

CREATE TABLE CLONE statements must comply with the following rules:

  • Only one CREATE statement is allowed.
  • The table that is being cloned must be a table, a table clone, or a table snapshot.

OPTIONS

CREATE TABLE CLONE options are the same as CREATE TABLE options.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the table clone.
bigquery.tables.get The source table.
bigquery.tables.getData The source table.
bigquery.tables.restoreSnapshot The source table (required only if the source table is a table snapshot).

If the OPTIONS clause includes any expiration options, then the bigquery.tables.delete permission is also required.

Examples

Restore a table snapshot: fail if destination table already exists

The following example creates the table myproject.mydataset.mytable from the table snapshot myproject.mydataset.mytablesnapshot:

CREATE TABLE `myproject.mydataset.mytable`
CLONE `myproject.mydataset.mytablesnapshot`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 365 DAY),
  friendly_name="my_table",
  description="A table that expires in 1 year",
  labels=[("org_unit", "development")]
)

If the table name exists in the dataset, then the following error is returned:

Already Exists: myproject.mydataset.mytable.

The table option list specifies the following:

  • Expiration time: 365 days after the time that the table is created
  • Friendly name: my_table
  • Description: A table that expires in 1 year
  • Label: org_unit = development

Create a clone of a table: ignore if the destination table already exists

The following example creates the table clone myproject.mydataset.mytableclone based on the table myproject.mydataset.mytable:

CREATE TABLE IF NOT EXISTS `myproject.mydataset.mytableclone`
CLONE `myproject.mydataset.mytable`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 365 DAY),
  friendly_name="my_table",
  description="A table that expires in 1 year",
  labels=[("org_unit", "development")]
)

The table option list specifies the following:

  • Expiration time: 365 days after the time the table is created
  • Friendly name: my_table
  • Description: A table that expires in 1 year
  • Label: org_unit = development

If the table name exists in the dataset, then no action is taken, and no error is returned.

For information about creating a copy of a table, see CREATE TABLE COPY.

For information about creating a snapshot of a table, see CREATE SNAPSHOT TABLE.

CREATE VIEW statement

Creates a new view.

Syntax

CREATE [ OR REPLACE ] VIEW [ IF NOT EXISTS ] view_name
[(view_column_name_list)]
[OPTIONS(view_option_list)]
AS query_expression

view_column_name_list :=
  view_column[, ...]

view_column :=
  column_name [OPTIONS(view_column_option_list)]

Arguments

  • OR REPLACE: Replaces any view with the same name if it exists. Cannot appear with IF NOT EXISTS.

  • IF NOT EXISTS: If a view or other table resource exists with the same name, the CREATE statement has no effect. Cannot appear with OR REPLACE.

  • view_name: The name of the view you're creating. See Table path syntax.

  • view_column_name_list: Lets you explicitly specify the column names of the view, which may be aliases to the column names in the underlying SQL query.

  • view_option_list: Additional view creation options such as a label and an expiration time.

  • query_expression: The GoogleSQL query expression used to define the view.

Details

CREATE VIEW statements must comply with the following rules:

  • Only one CREATE statement is allowed.

view_column_name_list

The view's column name list is optional. The names must be unique but do not have to be the same as the column names of the underlying SQL query. For example, if your view is created with the following statement:

CREATE VIEW mydataset.age_groups(age, count) AS SELECT age, COUNT(*)
FROM mydataset.people
group by age;

Then you can query it with:

SELECT age, count from mydataset.age_groups;

The number of columns in the column name list must match the number of columns in the underlying SQL query. If the columns in the table of the underlying SQL query is added or dropped, the view becomes invalid and must be recreated. For example, if the age column is dropped from the mydataset.people table, then the view created in the previous example becomes invalid.

view_column_option_list

The view_column_option_list lets you specify optional top-level column options. Column options for a view have the same syntax and requirements as for a table, but with a different list of NAME and VALUE fields:

NAME VALUE Details
description

STRING

Example: description="a unique id"

view_option_list

The option list allows you to set view options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a view option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

friendly_name

STRING

Example: friendly_name="my_view"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a view that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

privacy_policy

JSON-formatted STRING

The policies to enforce when anyone queries the view. To learn more about the policies available for a view, see the privacy_policy view option.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

If VALUE evaluates to NULL, the corresponding option NAME in the CREATE VIEW statement is ignored.

privacy_policy

The following policies are available in the privacy_policy view option to create analysis rules. A policy represents a condition that needs to be met before a query can be run.

Policy Details

aggregation_threshold_policy

The aggregation threshold policy to enforce when a view is queried.

Syntax:

'{
  "aggregation_threshold_policy": {
    "threshold": value,
    "privacy_unit_columns": value
  }
}'

Parameters:

  • aggregation_threshold_policy: An aggregation threshold policy for the view. When this parameter is included, a minimum number of distinct entities must be present in a set of data in the view.
  • threshold: The minimum number of distinct privacy units (privacy unit column values) that need to contribute to each row in the query results. If a potential row doesn't satisfy this threshold, that row is omitted from the query results. value is a positive JSON integer.
  • privacy_unit_columns: The columns that represents the privacy unit columns in a view. At this time, a view can have only one privacy unit column. value is a JSON string.

Example:

privacy_policy='{"aggregation_threshold_policy": {"threshold" : 50, "privacy_unit_columns": "ID"}}'

differential_privacy_policy

A differential privacy policy for the view. When this parameter is included, only differentially private queries can be run on the view.

Syntax:

'{
  "differential_privacy_policy": {
    "privacy_unit_column": value,
    "max_epsilon_per_query": value,
    "epsilon_budget": value,
    "delta_per_query": value,
    "delta_budget": value,
    "max_groups_contributed": value
  }
}'

Parameters:

  • differential_privacy_policy: The differential privacy policy for the view.
  • privacy_unit_column: The column that represents the privacy unit column for differentially private queries on the view. value is a JSON string.
  • max_epsilon_per_query: The maximum amount of epsilon that can be specified for a differentially private query on the view. value is a JSON number from 0.001 to 1e+15.
  • epsilon_budget: The amount of epsilon that can be used in totality for all differentially private queries on the view. value is JSON number from 0.001 to 1e+15.
  • delta_per_query: The maximum amount of delta that can be specified for a differentially private query on the view. value is a JSON number from 1e-15 to 1.
  • delta_budget: The amount of delta that can be used in totality for all differentially private queries on the view. The budget must be larger than the delta for any differentially private query on the view. value is a JSON number from 1e-15 to `1000`.
  • max_groups_contributed: The maximum number of groups to which each protected entity can contribute in a differentially private query. value is a non-negative JSON integer.

Example:

privacy_policy='{"differential_privacy_policy": { "privacy_unit_column": "contributor_id", "max_epsilon_per_query": 0.01, "epsilon_budget": 25.6, "delta_per_query": 0.005, "delta_budget": 9.6, "max_groups_contributed": 2}}'

join_restriction_policy

A join restriction policy for the view. When this parameter is included, only the specified joins can be run on the specified columns in the view.

This policy can be used alone or with other policies, such as the aggregation threshold or differential privacy policy.

Syntax:

'{
  "join_restriction_policy": {
    "join_condition": value,
    "join_allowed_columns": value
  }
}'

Parameters:

  • join_restriction_policy: The join restriction policy for the view.
  • join_condition: The type of join condition to enforce on the view. value can be one of the following JSON strings:
    • JOIN_ALL: All columns in join_allowed_columns must be inner joined upon for this view to be queried.
    • JOIN_ANY: At least one column in join_allowed_columns must be joined upon for this view to be queried.
    • JOIN_BLOCKED: This view can't be joined along any column. Don't set join_allowed_columns in this case. This can be used with all analysis rules except for the list overlap analysis rule.
    • JOIN_NOT_REQUIRED: A join is not required to query this view. If a join is used, only the columns in join_allowed_columns can be used. This can be used with all analysis rules except for the list overlap analysis rule.
  • join_allowed_columns: A list of columns that can be part of a join operation. value is a JSON array.

Example:

privacy_policy='{"join_restriction_policy": { "join_condition": 'JOIN_ANY', "join_allowed_columns": ['col1', 'col2']}}'

Default project in view body

If the view is created in the same project used to run the CREATE VIEW statement, the view body query_expression can reference entities without specifying the project; the default project is the project which owns the view. Consider the sample query below.

CREATE VIEW myProject.myDataset.myView AS SELECT * FROM anotherDataset.myTable;

After running the above CREATE VIEW query in the project myProject, you can run the query SELECT * FROM myProject.myDataset.myView. Regardless of the project you choose to run this SELECT query, the referenced table anotherDataset.myTable is always resolved against project myProject.

If the view is not created in the same project used to run the CREATE VIEW statement, then all references in the view body query_expression must be qualified with project IDs. For instance, the preceding sample CREATE VIEW query is invalid if it runs in a project different from myProject.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the view.

In addition, the OR REPLACE clause requires bigquery.tables.update permission.

If the OPTIONS clause includes an expiration time, then the bigquery.tables.delete permission is also required.

Examples

Creating a new view

The following example creates a view named newview in mydataset:

CREATE VIEW `myproject.mydataset.newview`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="newview",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

If the view name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.table

The view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: newview
  • Description: A view that expires in 2 days
  • Label: org_unit = development

Creating a view only if the view doesn't exist

The following example creates a view named newview in mydataset only if no view named newview exists in mydataset. If the view name exists in the dataset, no error is returned, and no action is taken.

CREATE VIEW IF NOT EXISTS `myproject.mydataset.newview`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="newview",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: newview
  • Description: A view that expires in 2 days
  • Label: org_unit = development

Creating or replacing a view

The following example creates a view named newview in mydataset, and if newview exists in mydataset, it is overwritten using the specified query expression.

CREATE OR REPLACE VIEW `myproject.mydataset.newview`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="newview",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM myproject.mydataset.mytable

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: newview
  • Description: A view that expires in 2 days
  • Label: org_unit = development

Creating a view with column descriptions

The following example creates a view named newview in mydataset. This view definition provides the column description for each column in mytable. You can rename columns from the original query.

CREATE VIEW `myproject.mydataset.newview` (
  column_1_new_name OPTIONS (DESCRIPTION='Description of the column 1 contents'),
  column_2_new_name OPTIONS (DESCRIPTION='Description of the column 2 contents'),
  column_3_new_name OPTIONS (DESCRIPTION='Description of the column 3 contents')
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

CREATE MATERIALIZED VIEW statement

Creates a new materialized view.

Syntax

CREATE [ OR REPLACE ] MATERIALIZED VIEW [ IF NOT EXISTS ] materialized_view_name
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS(materialized_view_option_list)]
AS query_expression

Arguments

  • OR REPLACE: Replaces a materialized view with the same name if it exists. Cannot appear with IF NOT EXISTS.

  • IF NOT EXISTS: If a materialized view or other table resource exists with the same name, the CREATE statement has no effect. Cannot appear with OR REPLACE.

  • materialized_view_name: The name of the materialized view you're creating. See Table path syntax.

    If the project_name is omitted from the materialized view name, or it is the same as the project that runs this DDL query, then the latter is also used as the default project for references to tables, functions, and other resources in query_expression. The default project of the references is fixed and does not depend on the future queries that invoke the new materialized view. Otherwise, all references in query_expression must be qualified with project names.

    The materialized view name must be unique per dataset.

  • partition_expression: An expression that determines how to partition the table. A materialized view can only be partitioned in the same way as the table in query expression (the base table) is partitioned.

  • clustering_column_list: A comma-separated list of column references that determine how to cluster the materialized view.

  • materialized_view_option_list: Allows you to specify additional materialized view options such as a whether refresh is enabled, the refresh interval, a label, and an expiration time.

  • query_expression: The GoogleSQL query expression used to define the materialized view.

Details

CREATE MATERIALIZED VIEW statements must comply with the following rules:

  • Only one CREATE statement is allowed.

Default project in materialized view body

If the materialized view is created in the same project used to run the CREATE MATERIALIZED VIEW statement, the materialized view body query_expression can reference entities without specifying the project; the default project is the project which owns the materialized view. Consider the sample query below.

CREATE MATERIALIZED VIEW myProject.myDataset.myView AS SELECT * FROM anotherDataset.myTable;

After running the above CREATE MATERIALIZED VIEW query in the project myProject, you can run the query SELECT * FROM myProject.myDataset.myView. Regardless of the project you choose to run this SELECT query, the referenced table anotherDataset.myTable is always resolved against project myProject.

If the materialized view is not created in the same project used to run the CREATE VIEW statement, then all references in the materialized view body query_expression must be qualified with project IDs. For instance, the preceding sample CREATE MATERIALIZED VIEW query is invalid if it runs in a project different from myProject.

materialized_view_option_list

The option list allows you to set materialized view options such as a whether refresh is enabled. the refresh interval, a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a materialized view option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details
enable_refresh BOOLEAN

Example: enable_refresh=false
Default: true

refresh_interval_minutes FLOAT64

Example: refresh_interval_minutes=20
Default: refresh_interval_minutes=30

expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property. expiration_timestamp is optional and not used by default.

max_staleness INTERVAL

Example: max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

The max_staleness property provides consistently high performance with controlled costs when processing large, frequently changing datasets. max_staleness is disabled by default.

allow_non_incremental_definition BOOLEAN

Example: allow_non_incremental_definition=true

The allow_non_incremental_definition property supports an expanded range of SQL queries to create materialized views. allow_non_incremental_definition=true is disabled by default. CREATE MATERIALIZED VIEW statement support only. The allow_non_incremental_definition property can't be changed after the materialized view is created.

kms_key_name

STRING

Example: kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property.

See more details about Protecting data with Cloud KMS keys.

friendly_name

STRING

Example: friendly_name="my_mv"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a materialized view that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the materialized view.

In addition, the OR REPLACE clause requires bigquery.tables.update permission.

If the OPTIONS clause includes any expiration options, then the bigquery.tables.delete permission is also required.

Examples

Creating a new materialized view

The following example creates a materialized view named new_mv in mydataset:

CREATE MATERIALIZED VIEW `myproject.mydataset.new_mv`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="new_mv",
  description="a materialized view that expires in 2 days",
  labels=[("org_unit", "development")],
  enable_refresh=true,
  refresh_interval_minutes=20
)
AS SELECT column_1, SUM(column_2) AS sum_2, AVG(column_3) AS avg_3
FROM `myproject.mydataset.mytable`
GROUP BY column_1

If the materialized view name exists in the dataset, the following error is returned:

Already Exists: project_id:dataset.materialized_view

When you use a DDL statement to create a materialized view, you must specify the project, dataset, and materialized view in the following format: `project_id.dataset.materialized_view` (including the backticks if project_id contains special characters); for example, `myproject.mydataset.new_mv`.

The materialized view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view option list specifies the:

  • Expiration time: 48 hours from the time the materialized view is created
  • Friendly name: new_mv
  • Description: A materialized view that expires in 2 days
  • Label: org_unit = development
  • Refresh enabled: true
  • Refresh interval: 20 minutes

Creating a materialized view only if the materialized view doesn't exist

The following example creates a materialized view named new_mv in mydataset only if no materialized view named new_mv exists in mydataset. If the materialized view name exists in the dataset, no error is returned, and no action is taken.

CREATE MATERIALIZED VIEW IF NOT EXISTS `myproject.mydataset.new_mv`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
  friendly_name="new_mv",
  description="a view that expires in 2 days",
  labels=[("org_unit", "development")],
  enable_refresh=false
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name: new_mv
  • Description: A view that expires in 2 days
  • Label: org_unit = development
  • Refresh enabled: false

Creating a materialized view with partitioning and clustering

The following example creates a materialized view named new_mv in mydataset, partitioned by the col_datetime column and clustered by the col_int column:

CREATE MATERIALIZED VIEW `myproject.mydataset.new_mv`
PARTITION BY DATE(col_datetime)
CLUSTER BY col_int
AS SELECT col_int, col_datetime, COUNT(1) as cnt
   FROM `myproject.mydataset.mv_base_table`
   GROUP BY col_int, col_datetime

The base table, mv_base_table, must also be partitioned by the col_datetime column. For more information, see Working with partitioned and clustered tables.

CREATE MATERIALIZED VIEW AS REPLICA OF statement

Creates a replica of a materialized view. The source materialized view must be over an Amazon Simple Storage Service (Amazon S3) BigLake table. You can use the materialized view replica to make Amazon S3 data available locally for joins.

For more information, see Create materialized view replicas.

Syntax

CREATE MATERIALIZED VIEW replica_name
[OPTIONS(materialized_view_replica_option_list)]
AS REPLICA OF source_materialized_view_name

Arguments

  • replica_name: The name of the materialized view replica you're creating, in table path syntax. If the project name is omitted from the materialized view replica name, the current project is used as the default.

    The materialized view replica name must be unique for each dataset.

  • materialized_view_replica_option_list: Allows you to specify options such as the replication interval.

  • source_materialized_view_name: The name of the materialized view you are replicating, in table path syntax. The source materialized view must be over an Amazon S3 BigLake table, and must be authorized on the dataset that contains that table.

materialized_view_replica_option_list

The option list lets you set materialized view replica options.

Specify a materialized view replica option list in the following format:

NAME=VALUE, ...

NAME VALUE Details
replication_interval_seconds INT64

Specifies how often to replicate the data from the source materialized view to the replica. Must be a value between 60 and 3,600, inclusive. Defaults to 300 (5 minutes).

Example: replication_interval_seconds=900

Required permissions

This statement requires the following IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.tables.replicateData
  • bigquery.jobs.create

Example

The following example creates a materialized view replica named mv_replica in bq_dataset:

CREATE MATERIALIZED VIEW `myproject.bq_dataset.mv_replica`
OPTIONS(
  replication_interval_seconds=600
)
AS REPLICA OF `myproject.s3_dataset.my_s3_mv`

CREATE EXTERNAL SCHEMA statement

Creates a new federated dataset.

A federated dataset is a connection between BigQuery and an external data source at the dataset level. For an example, see Create AWS Glue federated datasets.

Syntax

CREATE EXTERNAL SCHEMA [ IF NOT EXISTS ] dataset_name
[WITH CONNECTION connection_name]
[OPTIONS(external_schema_option_list)]

Arguments

  • IF NOT EXISTS: If any dataset exists with the same name, the CREATE statement has no effect.

  • dataset_name: The name of the dataset to create.

  • connection_name: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`).

  • external_schema_option_list: A list of options for creating the federated dataset.

Details

The dataset is created in the location that you specify in the query settings. For more information, see Specify locations. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations.

For more information about creating a dataset, see Create datasets. For information about quotas, see dataset limits.

external_schema_option_list

The option list specifies options for the federated dataset. Specify the options in the following format: NAME=VALUE, ...

The following options are supported:

NAME VALUE Details
description STRING The description of the dataset.
friendly_name STRING A descriptive name for the dataset.
labels <ARRAY<STRUCT<STRING, STRING>>> An array of labels for the dataset, expressed as key-value pairs.
location STRING The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations.
external_source STRING The source of the external dataset, such as the Amazon Resource Name (ARN), with a prefix identifying the source, such as aws-glue://.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.datasets.create The project where you create the federated dataset.
bigquery.connections.use The project where you create the federated dataset.
bigquery.connections.delegate The project where you create the federated dataset.

Examples

The following example creates an AWS Glue federated dataset:

CREATE EXTERNAL SCHEMA mydataset
WITH CONNECTION myproject.`aws-us-east-1`.myconnection
  OPTIONS (
    external_source = 'aws-glue://arn:aws:glue:us-east-1:123456789:database/test_database',
    location = 'aws-us-east-1');

CREATE EXTERNAL TABLE statement

Creates a new external table.

External tables let BigQuery query data that is stored outside of BigQuery storage. For more information about external tables, see Introduction to external data sources.

Syntax

CREATE [ OR REPLACE ] EXTERNAL TABLE [ IF NOT EXISTS ] table_name
[(
  column_name column_schema,
  ...
)]
[WITH CONNECTION connection_name]
[WITH PARTITION COLUMNS
  [(
      partition_column_name partition_column_type,
      ...
  )]
]
OPTIONS (
  external_table_option_list,
  ...
);

Arguments

  • OR REPLACE: Replaces any external table with the same name if it exists. Cannot appear with IF NOT EXISTS.

  • IF NOT EXISTS: If an external table or other table resource exists with the same name, the CREATE statement has no effect. Cannot appear with OR REPLACE.

  • table_name: The name of the external table. See Table path syntax.

  • column_name: The name of a column in the table.

  • column_schema: Specifies the schema of the column. It uses the same syntax as the column_schema definition in the CREATE TABLE statement. If you don't include this clause, BigQuery detects the schema automatically.

  • connection_name: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`).

  • partition_column_name: The name of a partition column. Include this field if your external data uses a hive-partitioned layout. For more information, see: Supported data layouts.

  • partition_column_type: The partition column type.

  • external_table_option_list: A list of options for creating the external table.

Details

The CREATE EXTERNAL TABLE statement does not support creating temporary external tables.

To create an externally partitioned table, use the WITH PARTITION COLUMNS clause to specify the partition schema details. BigQuery validates the column definitions against the external data location. The schema declaration must strictly follow the ordering of the fields in the external path. For more information about external partitioning, see Querying externally partitioned data.

external_table_option_list

The option list specifies options for creating the external table. The format and uris options are required. Specify the option list in the following format: NAME=VALUE, ...

Options
allow_jagged_rows

BOOL

If true, allow rows that are missing trailing optional columns.

Applies to CSV data.

allow_quoted_newlines

BOOL

If true, allow quoted data sections that contain newline characters in the file.

Applies to CSV data.

bigtable_options

STRING

Only required when creating a Bigtable external table.

Specifies the schema of the Bigtable external table in JSON format.

For a list of Bigtable table definition options, see BigtableOptions in the REST API reference.

compression

STRING

The compression type of the data source. Supported values include: GZIP. If not specified, the data source is uncompressed.

Applies to CSV and JSON data.

decimal_target_types

ARRAY<STRING>

Determines how to convert a Decimal type. Equivalent to ExternalDataConfiguration.decimal_target_types

Example: ["NUMERIC", "BIGNUMERIC"].

description

STRING

A description of this table.

enable_list_inference

BOOL

If true, use schema inference specifically for Parquet LIST logical type.

Applies to Parquet data.

enable_logical_types

BOOL

If true, convert Avro logical types into their corresponding SQL types. For more information, see Logical types.

Applies to Avro data.

encoding

STRING

The character encoding of the data. Supported values include: UTF8 (or UTF-8), ISO_8859_1 (or ISO-8859-1).

Applies to CSV data.

enum_as_string

BOOL

If true, infer Parquet ENUM logical type as STRING instead of BYTES by default.

Applies to Parquet data.

expiration_timestamp

TIMESTAMP

The time when this table expires. If not specified, the table does not expire.

Example: "2025-01-01 00:00:00 UTC".

field_delimiter

STRING

The separator for fields in a CSV file.

Applies to CSV data.

format

STRING

The format of the external data. Supported values for CREATE EXTERNAL TABLE include: AVRO, CLOUD_BIGTABLE, CSV, DATASTORE_BACKUP, DELTA_LAKE (preview), GOOGLE_SHEETS, NEWLINE_DELIMITED_JSON (or JSON), ORC, PARQUET.

Supported values for LOAD DATA include: AVRO, CSV, DELTA_LAKE (preview) NEWLINE_DELIMITED_JSON (or JSON), ORC, PARQUET.

The value JSON is equivalent to NEWLINE_DELIMITED_JSON.

hive_partition_uri_prefix

STRING

A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

Example: "gs://bucket/path".

file_set_spec_type

STRING

Specifies how to interpret source URIs for load jobs and external tables.

Supported values include:

  • FILE_SYSTEM_MATCH. Expands source URIs by listing files from the object store. This is the default behavior if FileSetSpecType is not set.
  • NEW_LINE_DELIMITED_MANIFEST. Indicates that the provided URIs are newline-delimited manifest files, with one URI per line. Wildcard URIs are not supported in the manifest files, and all referenced data files must be in the same bucket as the manifest file.

For example, if you have a source URI of "gs://bucket/path/file" and the file_set_spec_type is FILE_SYSTEM_MATCH, then the file is used directly as a data file. If the file_set_spec_type is NEW_LINE_DELIMITED_MANIFEST, then each line in the file is interpreted as a URI that points to a data file.

ignore_unknown_values

BOOL

If true, ignore extra values that are not represented in the table schema, without returning an error.

Applies to CSV and JSON data.

json_extension

STRING

For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records.

Supported values include:
GEOJSON. Newline-delimited GeoJSON data. For more information, see Creating an external table from a newline-delimited GeoJSON file.

max_bad_records

INT64

The maximum number of bad records to ignore when reading the data.

Applies to: CSV, JSON, and Google Sheets data.

max_staleness

INTERVAL

Applicable for BigLake tables and object tables.

Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it.

To disable metadata caching, specify 0. This is the default.

To enable metadata caching, specify an interval literal value between 30 minutes and 7 days. For example, specify INTERVAL 4 HOUR for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation falls back to retrieving metadata from Cloud Storage instead.

null_marker

STRING

The string that represents NULL values in a CSV file.

Applies to CSV data.

object_metadata

STRING

Only required when creating an object table.

Set the value of this option to SIMPLE when creating an object table.

preserve_ascii_control_characters

BOOL

If true, then the embedded ASCII control characters which are the first 32 characters in the ASCII table, ranging from '\x00' to '\x1F', are preserved.

Applies to CSV data.

projection_fields

STRING

A list of entity properties to load.

Applies to Datastore data.

quote

STRING

The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set the allow_quoted_newlines property to true.

Applies to CSV data.

reference_file_schema_uri

STRING

User provided reference file with the table schema.

Applies to Parquet/ORC/AVRO data.

Example: "gs://bucket/path/reference_schema_file.parquet".

require_hive_partition_filter

BOOL

If true, all queries over this table require a partition filter that can be used to eliminate partitions when reading data. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

sheet_range

STRING

Range of a Google Sheets spreadsheet to query from.

Applies to Google Sheets data.

Example: "sheet1!A1:B20",

skip_leading_rows

INT64

The number of rows at the top of a file to skip when reading the data.

Applies to CSV and Google Sheets data.

uris

For external tables, including object tables, that aren't Bigtable tables:

ARRAY<STRING>

An array of fully qualified URIs for the external data locations. Each URI can contain one asterisk (*) wildcard character, which must come after the bucket name. When you specify uris values that target multiple files, all of those files must share a compatible schema.

The following examples show valid uris values:

  • ['gs://bucket/path1/myfile.csv']
  • ['gs://bucket/path1/*.csv']
  • ['gs://bucket/path1/*', 'gs://bucket/path2/file00*']

For Bigtable tables:

STRING

The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI.

Example: https://googleapis.com/bigtable/projects/project_id/instances/instance_id[/appProfiles/app_profile]/tables/table_name

For more information on constructing a Bigtable URI, see Retrieve the Bigtable URI.

Required permissions

This statement requires the following IAM permissions:

Permission Resource
bigquery.tables.create The dataset where you create the external table.

In addition, the OR REPLACE clause requires bigquery.tables.update permission.

If the OPTIONS clause includes an expiration time, then the bigquery.tables.delete permission is also required.

Examples

The following example creates a BigLake table and explicitly specifies the schema. It also specifies refreshing metadata cache automatically at a system-defined interval.

CREATE OR REPLACE EXTERNAL TABLE mydataset.newtable (x INT64, y STRING, z BOOL)
  WITH CONNECTION myconnection
  OPTIONS(
    format ="PARQUET",
    max_staleness = STALENESS_INTERVAL,
    metadata_cache_mode =