Data definition language (DDL) statements in GoogleSQL
Data definition language (DDL) statements let you create and modify BigQuery resources using GoogleSQL query syntax. You can use DDL commands to create, alter, and delete resources, such as the following:
- Datasets
- Tables
- Table schemas
- Table clones
- Table snapshots
- Views
- User-defined functions (UDFs)
- Indexes
- Capacity commitments and reservations
- Row-level access policies
Required permissions
To create a job that runs a DDL statement, you must have the
bigquery.jobs.create
permission for the project where you are running the job.
Each DDL statement also requires specific permissions on the affected resources,
which are documented under each statement.
IAM roles
The predefined IAM roles bigquery.user
,
bigquery.jobUser
, and bigquery.admin
include the required
bigquery.jobs.create
permission.
For more information about IAM roles in BigQuery, see Predefined roles and permissions or the IAM permissions reference.
Run DDL statements
You can run DDL statements by using the Google Cloud console, by using the
bq command-line tool, by calling the
jobs.query
REST API, or
programmatically using the
BigQuery API client libraries.
Console
Go to the BigQuery page in the Google Cloud console.
Click Compose new query.
Enter the DDL statement into the Query editor text area. For example:
CREATE TABLE mydataset.newtable ( x INT64 )
Click Run.
bq
Enter the
bq query
command
and supply the DDL statement as the query parameter. Set the
use_legacy_sql
flag to false
.
bq query --use_legacy_sql=false \ 'CREATE TABLE mydataset.newtable ( x INT64 )'
API
Call the jobs.query
method
and supply the DDL statement in the request body's query
property.
DDL functionality extends the information returned by a
Jobs resource.
statistics.query.statementType
includes the following additional values for DDL
support:
CREATE_TABLE
CREATE_TABLE_AS_SELECT
DROP_TABLE
CREATE_VIEW
DROP_VIEW
statistics.query
has 2 additional fields:
ddlOperationPerformed
: The DDL operation performed, possibly dependent on the existence of the DDL target. Current values include:CREATE
: The query created the DDL target.SKIP
: No-op. Examples —CREATE TABLE IF NOT EXISTS
was submitted, and the table exists. OrDROP TABLE IF EXISTS
was submitted, and the table does not exist.REPLACE
: The query replaced the DDL target. Example —CREATE OR REPLACE TABLE
was submitted, and the table already exists.DROP
: The query deleted the DDL target.
ddlTargetTable
: When you submit aCREATE TABLE/VIEW
statement or aDROP TABLE/VIEW
statement, the target table is returned as an object with 3 fields:- "projectId": string
- "datasetId": string
- "tableId": string
Java
Call the
BigQuery.create()
method to start a query job. Call the
Job.waitFor()
method to wait for the DDL query to finish.
Before trying this sample, follow the Java setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Java API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Python
Call the
Client.query()
method to start a query job. Call the
QueryJob.result()
method to wait for the DDL query to finish.
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
On-demand query size calculation
If you use on-demand billing, BigQuery charges for data definition language (DDL) queries based on the number of bytes processed by the query.
DDL statement | Bytes processed |
---|---|
CREATE TABLE |
None. |
CREATE TABLE ... AS SELECT ... |
The sum of bytes processed for all the columns referenced from the tables scanned by the query. |
CREATE VIEW |
None. |
DROP TABLE |
None. |
DROP VIEW |
None. |
For more information about cost estimation, see Estimate and control costs.
CREATE SCHEMA
statement
Creates a new dataset.
Syntax
CREATE SCHEMA [ IF NOT EXISTS ] [project_name.]dataset_name [DEFAULT COLLATE collate_specification] [OPTIONS(schema_option_list)]
Arguments
IF NOT EXISTS
: If any dataset exists with the same name, theCREATE
statement has no effect.DEFAULT COLLATE collate_specification
: When a new table is created in the dataset, the table inherits a default collation specification unless a collation specification is explicitly specified for a table or a column.If you remove or change this collation specification later with the
ALTER SCHEMA
statement, this will not change existing collation specifications in this dataset. If you want to update an existing collation specification in a dataset, you must alter the column that contains the specification.project_name
: The name of the project where you are creating the dataset. Defaults to the project that runs this DDL statement.dataset_name
: The name of the dataset to create.schema_option_list
: A list of options for creating the dataset.
Details
The dataset is created in the location that you specify in the query settings. For more information, see Specifying your location.
For more information about creating a dataset, see Creating datasets. For information about quotas, see Dataset limits.
schema_option_list
The option list specifies options for the dataset. Specify the options in the
following format: NAME=VALUE, ...
The following options are supported:
NAME |
VALUE |
Details |
---|---|---|
default_kms_key_name |
STRING |
Specifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table. |
default_partition_expiration_days |
FLOAT64 |
Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table. |
default_rounding_mode |
|
Example: This specifies the
|
default_table_expiration_days |
FLOAT64 |
Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table. |
description |
STRING |
The description of the dataset. |
failover_reservation |
STRING |
Associates the dataset to a reservation in the case of a failover scenario. |
friendly_name |
STRING |
A descriptive name for the dataset. |
is_case_insensitive |
BOOL |
TRUE if the dataset and its table names are
case-insensitive, otherwise FALSE . By default, this
is FALSE , which means the dataset and its table names are
case-sensitive.
|
is_primary |
BOOLEAN |
Declares if the dataset is the primary replica. |
labels |
<ARRAY<STRUCT<STRING, STRING>>> |
An array of labels for the dataset, expressed as key-value pairs. |
location |
STRING |
The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails. |
max_time_travel_hours |
SMALLINT |
Specifies the duration in hours of the
time travel window
for the dataset. The max_time_travel_hours value must
be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168)
between 48 (2 days) and 168 (7 days). 168 hours is the default
if this option isn't specified.
|
primary_replica |
STRING |
The replica name to set as the primary replica. |
storage_billing_model |
STRING |
Alters the
storage billing model
for the dataset. Set the The When you change a dataset's billing model, it takes 24 hours for the change to take effect. Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again. |
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.create |
The project where you create the dataset. |
Examples
Creating a new dataset
The following example creates a dataset with a default table expiration and a set of labels.
CREATE SCHEMA mydataset OPTIONS( location="us", default_table_expiration_days=3.75, labels=[("label1","value1"),("label2","value2")] )
Creating a case-insensitive dataset
The following example creates a case-insensitive dataset. Both the dataset name and table names inside the dataset are case-insensitive.
CREATE SCHEMA mydataset OPTIONS( is_case_insensitive=TRUE )
Creating a dataset with collation support
The following example creates a dataset with a collation specification.
CREATE SCHEMA mydataset DEFAULT COLLATE 'und:ci'
CREATE TABLE
statement
Creates a new table.
Syntax
CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] TABLE [ IF NOT EXISTS ] table_name [( column | constraint_definition[, ...] )] [DEFAULT COLLATE collate_specification] [PARTITION BY partition_expression] [CLUSTER BY clustering_column_list] [WITH CONNECTION connection_name] [OPTIONS(table_option_list)] [AS query_statement] column:= column_definition constraint_definition:= [primary_key] | [[CONSTRAINT constraint_name] foreign_key, ...] primary_key := PRIMARY KEY (column_name[, ...]) NOT ENFORCED foreign_key := FOREIGN KEY (column_name[, ...]) foreign_reference foreign_reference := REFERENCES primary_key_table(column_name[, ...]) NOT ENFORCED
Arguments
OR REPLACE
: Replaces any table with the same name if it exists. Cannot appear withIF NOT EXISTS
.TEMP | TEMPORARY
: Creates a temporary table.IF NOT EXISTS
: If any table exists with the same name, theCREATE
statement has no effect. Cannot appear withOR REPLACE
.table_name
: The name of the table to create. See Table path syntax. For temporary tables, do not include the project name or dataset name.column
: The table's schema information.constraint_definition
: An expression that defines a table constraint.collation_specification
: When a new column is added to the table without an explicit collation specification, the column inherits this collation specification forSTRING
types.If you remove or change this collation specification later with the
ALTER TABLE
statement, this will not change existing collation specifications in this table. If you want to update an existing collation specification in a table, you must alter the column that contains the specification.If the table is part of a dataset, the default collation specification for this table overrides the default collation specification for the dataset.
partition_expression
: An expression that determines how to partition the table.clustering_column_list
: A comma-separated list of column references that determine how to cluster the table. You cannot have collation on columns in this list.connection_name
: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`
).table_option_list
: A list of options for creating the table.query_statement
: The query from which the table should be created. For the query syntax, see SQL syntax reference. If a collation specification is used on this table, collation passes through this query statement.primary_key
: An expression that defines a primary key table constraint.foreign_key
: An expression that defines a foreign key table constraint.
Details
CREATE TABLE
statements must comply with the following rules:
- Only one
CREATE
statement is allowed. - Either the column list, the
AS query_statement
clause, or both must be present. - When both the column list and the
AS query_statement
clause are present, BigQuery ignores the names in theAS query_statement
clause and matches the columns with the column list by position. - When the
AS query_statement
clause is present and the column list is absent, BigQuery determines the column names and types from theAS query_statement
clause. - Column names must be specified either through the column list,
the
AS query_statement
clause or schema of the table in theLIKE
clause. - Duplicate column names are not allowed.
- When both the
LIKE
and theAS query_statement
clause are present, the column list in the query statement must match the columns of the table referenced by theLIKE
clause. - Table names are case-sensitive unless the dataset they belong to is not. To create a case-insensitive dataset, see Creating a case-insensitive dataset. To alter a dataset to make it case-insensitive dataset, see Turning on case insensitivity for a dataset.
Limitations:
- It is not possible to create an
ingestion-time partitioned table
from the result of a query. Instead, use a
CREATE TABLE
DDL statement to create the table, and then use anINSERT
DML statement to insert data into it. - It is not possible to use the
OR REPLACE
modifier to replace a table with a different kind of partitioning. Instead,DROP
the table, and then use aCREATE TABLE ... AS SELECT ...
statement to recreate it.
This statement supports the following variants, which have the same limitations:
CREATE TABLE LIKE
: Create a table with the same schema as an existing table.CREATE TABLE COPY
: Create a table by copying schema and data from an existing table.
column
(column_name column_schema[, ...])
contains the table's
schema information in a comma-separated list.
column := column_name column_schema column_schema := { simple_type | STRUCT<field_list> | ARRAY<array_element_schema> } [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED] [DEFAULT default_expression] [NOT NULL] [OPTIONS(column_option_list)] simple_type := { data_type | STRING COLLATE collate_specification } field_list := field_name column_schema [, ...] array_element_schema := { simple_type | STRUCT<field_list> } [NOT NULL]
column_name
is the name of the column. A column name:- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
- Must start with a letter or underscore
- Can be up to 300 characters
column_schema
: Similar to a data type, but supports an optionalNOT NULL
constraint for types other thanARRAY
.column_schema
also supports options on top-level columns andSTRUCT
fields.column_schema
can be used only in the column definition list ofCREATE TABLE
statements. It cannot be used as a type in expressions.simple_type
: Any supported data type aside fromSTRUCT
andARRAY
.If
simple_type
is aSTRING
, it supports an additional clause for collation, which defines how a resultingSTRING
can be compared and sorted. The syntax looks like this:STRING COLLATE collate_specification
If you have
DEFAULT COLLATE collate_specification
assigned to the table, the collation specification for a column overrides the specification for the table.default_expression
: The default value assigned to the column.field_list
: Represents the fields in a struct.field_name
: The name of the struct field. Struct field names have the same restrictions as column names.NOT NULL
: When theNOT NULL
constraint is present for a column or field, the column or field is created withREQUIRED
mode. Conversely, when theNOT NULL
constraint is absent, the column or field is created withNULLABLE
mode.Columns and fields of
ARRAY
type do not support theNOT NULL
modifier. For example, acolumn_schema
ofARRAY<INT64> NOT NULL
is invalid, sinceARRAY
columns haveREPEATED
mode and can be empty but cannot beNULL
. An array element in a table can never beNULL
, regardless of whether theNOT NULL
constraint is specified. For example,ARRAY<INT64>
is equivalent toARRAY<INT64 NOT NULL>
.The
NOT NULL
attribute of a table'scolumn_schema
does not propagate through queries over the table. If tableT
contains a column declared asx INT64 NOT NULL
, for example,CREATE TABLE dataset.newtable AS SELECT x FROM T
creates a table nameddataset.newtable
in whichx
isNULLABLE
.
partition_expression
PARTITION BY
is an optional clause that controls
table partitioning. partition_expression
is an expression that determines how to partition the table. The partition
expression can contain the following values:
_PARTITIONDATE
. Partition by ingestion time with daily partitions. This syntax cannot be used with theAS query_statement
clause.DATE(_PARTITIONTIME)
. Equivalent to_PARTITIONDATE
. This syntax cannot be used with theAS query_statement
clause.<date_column>
. Partition by aDATE
column with daily partitions.DATE({ <timestamp_column> | <datetime_column> })
. Partition by aTIMESTAMP
orDATETIME
column with daily partitions.DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR })
. Partition by aDATETIME
column with the specified partitioning type.TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR })
. Partition by aTIMESTAMP
column with the specified partitioning type.TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR })
. Partition by ingestion time with the specified partitioning type. This syntax cannot be used with theAS query_statement
clause.DATE_TRUNC(<date_column>, { MONTH | YEAR })
. Partition by aDATE
column with the specified partitioning type.RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>]))
. Partition by an integer column with the specified range, where:start
is the start of range partitioning, inclusive.end
is the end of range partitioning, exclusive.interval
is the width of each range within the partition. Defaults to 1.
clustering_column_list
CLUSTER BY
is an optional clause that controls table clustering.
clustering_column_list
is a comma-separated list that determines how to
cluster the table. The clustering column list can contain a list of up to four
clustering columns.
table_option_list
The option list lets you set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a table option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
expiration_timestamp |
TIMESTAMP |
Example: This property is equivalent to the expirationTime table resource property. |
partition_expiration_days |
|
Example: Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire. This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours. This property can only be set if the table is partitioned. |
require_partition_filter |
|
Example: Specifies whether queries on this table must include a a predicate
filter that filters on the partitioning column. For more information,
see
Set partition filter requirements. The default value is
This property is equivalent to the timePartitioning.requirePartitionFilter table resource property. This property can only be set if the table is partitioned. |
kms_key_name |
|
Example: This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property. See more details about Protecting data with Cloud KMS keys. |
friendly_name |
|
Example: This property is equivalent to the friendlyName table resource property. |
description |
|
Example: This property is equivalent to the description table resource property. |
labels |
|
Example: This property is equivalent to the labels table resource property. |
default_rounding_mode |
|
Example: This specifies the default rounding mode
that's used for values written to any new
This property is equivalent to the
|
enable_change_history |
|
In preview. Example: Set this property to |
max_staleness |
|
Example: The maximum interval behind the current time where it's
acceptable to read stale data. For example, with
change data capture,
when this option is set, the table copy operation is denied if data is
more stale than the
|
enable_fine_grained_mutations |
|
In preview. Example: Set this property to |
storage_uri |
|
In preview. Example: A fully qualified location prefix for the external folder where data is
stored. Supports Required for managed tables. |
table_format |
|
In preview. Example: The open-source file format in which the table data is stored.
Only Required for managed tables. The default is |
file_format |
|
In preview. Example: The open table format in which metadata-only snapshots are stored.
Only Required for managed tables. The default is |
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the
CREATE TABLE
statement is ignored.
column_option_list
Specify a column option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
description |
|
Example: This property is equivalent to the schema.fields[].description table resource property. |
rounding_mode |
|
Example: This specifies the rounding mode
that's used for values written to a
This property is equivalent to the
|
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the VALUE
replaces the existing value of that option for the column, if
there was one. Setting the VALUE
to NULL
clears the column's value for that
option.
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create |
The dataset where you create the table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
and
bigquery.tables.updateData
permissions.
If the OPTIONS
clause includes any expiration options, then the
bigquery.tables.delete
permission is also required.
Examples
Creating a new table
The following example creates a partitioned table named newtable
in
mydataset
:
CREATE TABLE mydataset.newtable ( x INT64 OPTIONS(description="An optional INTEGER field"), y STRUCT < a ARRAY <STRING> OPTIONS(description="A repeated STRING field"), b BOOL > ) PARTITION BY _PARTITIONDATE OPTIONS( expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC", partition_expiration_days=1, description="a table that expires in 2025, with each partition living for 24 hours", labels=[("org_unit", "development")] )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.newtable
, your table qualifier might be
`myproject.mydataset.newtable`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table uses the following partition_expression
to partition the table:
PARTITION BY _PARTITIONDATE
. This expression partitions the table using
the date in the _PARTITIONDATE
pseudocolumn.
The table schema contains two columns:
- x: An integer, with description "An optional INTEGER field"
y: A STRUCT containing two columns:
- a: An array of strings, with description "A repeated STRING field"
- b: A boolean
The table option list specifies the:
- Table expiration time: January 1, 2025 at 00:00:00 UTC
- Partition expiration time: 1 day
- Description:
A table that expires in 2025
- Label:
org_unit = development
Creating a new table from an existing table
The following example creates a table named top_words
in mydataset
from a
query:
CREATE TABLE mydataset.top_words OPTIONS( description="Top ten words per Shakespeare corpus" ) AS SELECT corpus, ARRAY_AGG(STRUCT(word, word_count) ORDER BY word_count DESC LIMIT 10) AS top_words FROM `bigquery-public-data`.samples.shakespeare GROUP BY corpus;
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.top_words
, your table qualifier might be
`myproject.mydataset.top_words`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table schema contains 2 columns:
- corpus: Name of a Shakespeare corpus
top_words: An
ARRAY
ofSTRUCT
s containing 2 fields:word
(aSTRING
) andword_count
(anINT64
with the word count)
The table option list specifies the:
- Description:
Top ten words per Shakespeare corpus
Creating a table only if the table doesn't exist
The following example creates a table named newtable
in mydataset
only if no
table named newtable
exists in mydataset
. If the table name exists in the
dataset, no error is returned, and no action is taken.
CREATE TABLE IF NOT EXISTS mydataset.newtable (x INT64, y STRUCT <a ARRAY <STRING>, b BOOL>) OPTIONS( expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC", description="a table that expires in 2025", labels=[("org_unit", "development")] )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.newtable
, your table qualifier might be
`myproject.mydataset.newtable`
.
The table schema contains 2 columns:
- x: An integer
y: A STRUCT containing a (an array of strings) and b (a boolean)
The table option list specifies the:
- Expiration time: January 1, 2025 at 00:00:00 UTC
- Description:
A table that expires in 2025
- Label:
org_unit = development
Creating or replacing a table
The following example creates a table named newtable
in mydataset
, and if
newtable
exists in mydataset
, it is overwritten with an empty table.
CREATE OR REPLACE TABLE mydataset.newtable (x INT64, y STRUCT <a ARRAY <STRING>, b BOOL>) OPTIONS( expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC", description="a table that expires in 2025", labels=[("org_unit", "development")] )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.newtable
, your table qualifier might be
`myproject.mydataset.newtable`
.
The table schema contains 2 columns:
- x: An integer
y: A STRUCT containing a (an array of strings) and b (a boolean)
The table option list specifies the:
- Expiration time: January 1, 2025 at 00:00:00 UTC
- Description:
A table that expires in 2025
- Label:
org_unit = development
Creating a table with REQUIRED
columns
The following example creates a table named newtable
in mydataset
. The NOT
NULL
modifier in the column definition list of a CREATE TABLE
statement
specifies that a column or field is created in REQUIRED
mode.
CREATE TABLE mydataset.newtable ( x INT64 NOT NULL, y STRUCT < a ARRAY <STRING>, b BOOL NOT NULL, c FLOAT64 > NOT NULL, z STRING )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.newtable
, your table qualifier might be
`myproject.mydataset.newtable`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table schema contains 3 columns:
- x: A
REQUIRED
integer - y: A
REQUIRED
STRUCT containing a (an array of strings), b (aREQUIRED
boolean), and c (aNULLABLE
float) z: A
NULLABLE
string
Creating a table with collation support
The following examples create a table named newtable
in mydataset
with
columns a
, b
, c
, and a struct with fields x
and y
.
All STRING
column schemas in this table are collated with 'und:ci'
:
CREATE TABLE mydataset.newtable ( a STRING, b STRING, c STRUCT < x FLOAT64 y ARRAY <STRING> > ) DEFAULT COLLATE 'und:ci';
Only b
and y
are collated with 'und:ci'
:
CREATE TABLE mydataset.newtable ( a STRING, b STRING COLLATE 'und:ci', c STRUCT < x FLOAT64 y ARRAY <STRING COLLATE 'und:ci'> > );
Creating a table with parameterized data types
The following example creates a table named newtable
in mydataset
. The
parameters in parentheses specify that the column contains a parameterized data
type. See Parameterized Data Types
for more information about parameterized types.
CREATE TABLE mydataset.newtable ( x STRING(10), y STRUCT < a ARRAY <BYTES(5)>, b NUMERIC(15, 2) OPTIONS(rounding_mode = 'ROUND_HALF_EVEN'), c FLOAT64 >, z BIGNUMERIC(35) )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. Instead of
mydataset.newtable
, your table qualifier should be
`myproject.mydataset.newtable`
.
If the table name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The table schema contains 3 columns:
- x: A parameterized string with a maximum length of 10
- y: A STRUCT containing a (an array of parameterized bytes with a maximum length of 5), b (a parameterized NUMERIC with a maximum precision of 15, maximum scale of 2, and rounding mode set to 'ROUND_HALF_EVEN'), and c (a float)
- z: A parameterized BIGNUMERIC with a maximum precision of 35 and maximum scale of 0
Creating a partitioned table
The following example creates a
partitioned table
named newtable
in mydataset
using a DATE
column:
CREATE TABLE mydataset.newtable (transaction_id INT64, transaction_date DATE) PARTITION BY transaction_date OPTIONS( partition_expiration_days=3, description="a table partitioned by transaction_date" )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.newtable
, your table qualifier might be
`myproject.mydataset.newtable`
.
The table schema contains 2 columns:
- transaction_id: An integer
- transaction_date: A date
The table option list specifies the:
- Partition expiration: Three days
- Description:
A table partitioned by transaction_date
Creating a partitioned table from the result of a query
The following example creates a
partitioned table
named days_with_rain
in mydataset
using a DATE
column:
CREATE TABLE mydataset.days_with_rain
PARTITION BY date
OPTIONS (
partition_expiration_days=365,
description="weather stations with precipitation, partitioned by day"
) AS
SELECT
DATE(CAST(year AS INT64), CAST(mo AS INT64), CAST(da AS INT64)) AS date,
(SELECT ANY_VALUE(name) FROM `bigquery-public-data.noaa_gsod.stations` AS stations
WHERE stations.usaf = stn) AS station_name, -- Stations can have multiple names
prcp
FROM `bigquery-public-data.noaa_gsod.gsod2017` AS weather
WHERE prcp != 99.9 -- Filter unknown values
AND prcp > 0 -- Filter stations/days with no precipitation
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.days_with_rain
, your table qualifier might be
`myproject.mydataset.days_with_rain`
.
The table schema contains 2 columns:
- date: The
DATE
of data collection - station_name: The name of the weather station as a
STRING
- prcp: The amount of precipitation in inches as a
FLOAT64
The table option list specifies the:
- Partition expiration: One year
- Description:
Weather stations with precipitation, partitioned by day
Creating a clustered table
Example 1
The following example creates a
clustered table
named myclusteredtable
in mydataset
. The table is a partitioned table,
partitioned by a truncated TIMESTAMP
column and clustered by a STRING
column
named customer_id
.
CREATE TABLE mydataset.myclusteredtable ( input_timestamp TIMESTAMP, customer_id STRING, transaction_amount NUMERIC ) PARTITION BY TIMESTAMP_TRUNC(input_timestamp, HOUR) CLUSTER BY customer_id OPTIONS ( partition_expiration_days=3, description="a table clustered by customer_id" )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.myclusteredtable
, your table qualifier might be
`myproject.mydataset.myclusteredtable`
.
The table schema contains 3 columns:
- input_timestamp: The time of data collection as a
TIMESTAMP
- customer_id: The customer ID as a
STRING
- transaction_amount: The transaction amount as
NUMERIC
The table option list specifies the:
- Partition expiration: 3 days
- Description:
A table clustered by customer_id
Example 2
The following example creates a
clustered table
named myclusteredtable
in mydataset
. The table is an
ingestion-time partitioned table.
CREATE TABLE mydataset.myclusteredtable ( customer_id STRING, transaction_amount NUMERIC ) PARTITION BY DATE(_PARTITIONTIME) CLUSTER BY customer_id OPTIONS ( partition_expiration_days=3, description="a table clustered by customer_id" )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.myclusteredtable
, your table qualifier might be
`myproject.mydataset.myclusteredtable`
.
The table schema contains 2 columns:
- customer_id: The customer ID as a
STRING
- transaction_amount: The transaction amount as
NUMERIC
The table option list specifies the:
- Partition expiration: 3 days
- Description:
A table clustered by customer_id
Example 3
The following example creates a
clustered table
named myclusteredtable
in mydataset
. The table is not partitioned.
CREATE TABLE mydataset.myclusteredtable ( customer_id STRING, transaction_amount NUMERIC ) CLUSTER BY customer_id OPTIONS ( description="a table clustered by customer_id" )
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.myclusteredtable
, your table qualifier might be
`myproject.mydataset.myclusteredtable`
.
The table schema contains 2 columns:
- customer_id: The customer ID as a
STRING
- transaction_amount: The transaction amount as
NUMERIC
The table option list specifies the:
- Description:
A table clustered by customer_id
Creating a clustered table from the result of a query
Example 1
The following example creates a partitioned and
clustered table
named myclusteredtable
in mydataset
using the result of a query.
CREATE TABLE mydataset.myclusteredtable ( input_timestamp TIMESTAMP, customer_id STRING, transaction_amount NUMERIC ) PARTITION BY DATE(input_timestamp) CLUSTER BY customer_id OPTIONS ( partition_expiration_days=3, description="a table clustered by customer_id" ) AS SELECT * FROM mydataset.myothertable
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.myclusteredtable
, your table qualifier might be
`myproject.mydataset.myclusteredtable`
.
The table schema contains 3 columns:
- input_timestamp: The time of data collection as a
TIMESTAMP
- customer_id: The customer ID as a
STRING
- transaction_amount: The transaction amount as
NUMERIC
The table option list specifies the:
- Partition expiration: 3 days
- Description:
A table clustered by customer_id
Example 2
The following example creates a
clustered table
named myclusteredtable
in mydataset
using the result of a query. The table
is not partitioned.
CREATE TABLE mydataset.myclusteredtable ( customer_id STRING, transaction_amount NUMERIC ) CLUSTER BY customer_id OPTIONS ( description="a table clustered by customer_id" ) AS SELECT * FROM mydataset.myothertable
If you haven't configured a default project, prepend a project ID to the dataset
name in the example SQL, and enclose the name in backticks if project_id
contains special characters:
`project_id.dataset.table`
. So, instead of
mydataset.myclusteredtable
, your table qualifier might be
`myproject.mydataset.myclusteredtable`
.
The table schema contains 2 columns:
- customer_id: The customer ID as a
STRING
- transaction_amount: The transaction amount as
NUMERIC
The table option list specifies the:
- Description:
A table clustered by customer_id
Creating a temporary table
The following example creates a temporary table named Example
and inserts
values into it.
CREATE TEMP TABLE Example ( x INT64, y STRING ); INSERT INTO Example VALUES (5, 'foo'); INSERT INTO Example VALUES (6, 'bar'); SELECT * FROM Example;
This script returns the following output:
+-----+---+-----+
| Row | x | y |
+-----+---|-----+
| 1 | 5 | foo |
| 2 | 6 | bar |
+-----+---|-----+
Load data across clouds
Example 1
Suppose you have a BigLake table named myawsdataset.orders
that
references data from Amazon S3.
You want to transfer data from that table to a
BigQuery table myotherdataset.shipments
in the US multi-region.
First, display information about the myawsdataset.orders
table:
bq show myawsdataset.orders;
The output is similar to the following:
Last modified Schema Type Total URIs Expiration ----------------- -------------------------- ---------- ------------ ----------- 31 Oct 17:40:28 |- l_orderkey: integer EXTERNAL 1 |- l_partkey: integer |- l_suppkey: integer |- l_linenumber: integer |- l_returnflag: string |- l_linestatus: string |- l_commitdate: date
Next, display information about the myotherdataset.shipments
table:
bq show myotherdataset.shipments
The output is similar to the following. Some columns are omitted to simplify the output.
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Clustered Fields Total Logical ----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ --------------- 31 Oct 17:34:31 |- l_orderkey: integer 3086653 210767042 210767042 |- l_partkey: integer |- l_suppkey: integer |- l_commitdate: date |- l_shipdate: date |- l_receiptdate: date |- l_shipinstruct: string |- l_shipmode: string
Now, using the CREATE TABLE AS SELECT
statement you can selectively load data
to the myotherdataset.orders
table in the US multi-region:
CREATE OR REPLACE TABLE myotherdataset.orders PARTITION BY DATE_TRUNC(l_commitdate, YEAR) AS SELECT * FROM myawsdataset.orders WHERE EXTRACT(YEAR FROM l_commitdate) = 1992;
You can then perform a join operation with the newly created table:
SELECT orders.l_orderkey, orders.l_orderkey, orders.l_suppkey, orders.l_commitdate, orders.l_returnflag, shipments.l_shipmode, shipments.l_shipinstruct FROM myotherdataset.shipments JOIN `myotherdataset.orders` as orders ON orders.l_orderkey = shipments.l_orderkey AND orders.l_partkey = shipments.l_partkey AND orders.l_suppkey = shipments.l_suppkey WHERE orders.l_returnflag = 'R'; -- 'R' means refunded.
When new data is available, append the data of the 1993 year to the destination
table using the INSERT INTO SELECT
statement:
INSERT INTO myotherdataset.orders SELECT * FROM myawsdataset.orders WHERE EXTRACT(YEAR FROM l_commitdate) = 1993;
Example 2
The following example inserts data into an ingestion-time partitioned table:
CREATE TABLE mydataset.orders(id String, numeric_id INT64) PARTITION BY _PARTITIONDATE;
After creating a partitioned table, you can insert data into the ingestion-time partitioned table:
INSERT INTO mydataset.orders( _PARTITIONTIME, id, numeric_id) SELECT TIMESTAMP("2023-01-01"), id, numeric_id, FROM mydataset.ordersof23 WHERE numeric_id > 4000000;
CREATE TABLE LIKE
statement
Creates a new table with all of the same metadata of another table.
Syntax
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] table_name LIKE [[project_name.]dataset_name.]source_table_name ... [OPTIONS(table_option_list)]
Details
This statement is a variant of the CREATE TABLE
statement and has the same
limitations.
Other than the use of the LIKE
clause in place of a column list,
the syntax is identical to the CREATE TABLE
syntax.
The CREATE TABLE LIKE
statement copies only the metadata of the source table.
You can use the AS query_statement
clause to include data into the new table.
The new table has no relationship to the source table after creation; thus modifications to the source table will not propagate to the new table.
By default, the new table inherits partitioning, clustering, and options
metadata from the source table. You can customize metadata in the new table by
using the optional clauses in the SQL statement. For example, if you want to
specify a different set of options for the new table, then include the OPTIONS
clause with a list of options and values. This behavior matches that of
ALTER TABLE SET OPTIONS
.
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create |
The dataset where you create the table. |
bigquery.tables.get |
The source table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
and
bigquery.tables.updateData
permissions.
If the OPTIONS
clause includes any expiration options, then the
bigquery.tables.delete
permission is also required.
Examples
Example 1
The following example creates a new table named newtable
in
mydataset
with the same metadata as sourcetable
:
CREATE TABLE mydataset.newtable LIKE mydataset.sourcetable
Example 2
The following example creates a new table named newtable
in
mydataset
with the same metadata as sourcetable
and the data from the
SELECT
statement:
CREATE TABLE mydataset.newtable LIKE mydataset.sourcetable AS SELECT * FROM mydataset.myothertable
CREATE TABLE COPY
statement
Creates a table that has the same metadata and data as another table. The source table can be a table, a table clone, or a table snapshot.
Syntax
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] table_name COPY source_table_name ... [OPTIONS(table_option_list)]
Details
This statement is a variant of the CREATE TABLE
statement and has the same
limitations.
Other than the use of the COPY
clause in place of a column list,
the syntax is identical to the CREATE TABLE
syntax.
The CREATE TABLE COPY
statement copies both the metadata and data from the
source table.
The new table inherits partitioning and clustering from the source table. By
default, the table options metadata from the source table are also inherited,
but you can override table options by using the OPTIONS
clause. The behavior
is equivalent to running ALTER TABLE SET OPTIONS
after the table is copied.
The new table has no relationship to the source table after creation; modifications to the source table are not propagated to the new table.
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create |
The dataset where you create the table copy. |
bigquery.tables.get |
The source table. |
bigquery.tables.getData |
The source table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
and
bigquery.tables.updateData
permissions.
If the OPTIONS
clause includes any expiration options, then the
bigquery.tables.delete
permission is also required.
CREATE SNAPSHOT TABLE
statement
Creates a table snapshot based on a source table. The source table can be a table, a table clone, or a table snapshot.
Syntax
CREATE SNAPSHOT TABLE [ IF NOT EXISTS ] table_snapshot_name CLONE source_table_name [FOR SYSTEM_TIME AS OF time_expression] [OPTIONS(snapshot_option_list)]
Arguments
IF NOT EXISTS
: If a table snapshot or other table resource exists with the same name, theCREATE
statement has no effect.table_snapshot_name
: The name of the table snapshot that you want to create. The table snapshot name must be unique per dataset. See Table path syntax.source_table_name
: The name of the table that you want to snapshot or the table snapshot that you want to copy. See Table path syntax.If the source table is a standard table, then BigQuery creates a table snapshot of the source table. If the source table is a table snapshot, then BigQuery creates a copy of the table snapshot.
FOR SYSTEM_TIME AS OF
: Lets you select the version of the table that was current at the time specified bytimestamp_expression
. It can only be used when creating a snapshot of a table; it can't be used when making a copy of a table snapshot.snapshot_option_list
: Additional table snapshot creation options such as a label and an expiration time.
Details
CREATE SNAPSHOT TABLE
statements must comply with the following rules:
- Only one
CREATE
statement is allowed. - The source table must be one of the following:
- A table
- A table clone
- A table snapshot
- The
FOR SYSTEM_TIME AS OF
clause can only be used when creating a snapshot of a table or table clone; it can't be used when making a copy of a table snapshot.
snapshot_option_list
The option list lets you set table snapshot options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a table snapshot option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
expiration_timestamp |
TIMESTAMP |
Example: This property is equivalent to the
|
friendly_name |
|
Example: This property is equivalent to the
|
description |
|
Example: This property is equivalent to the
|
labels |
|
Example: This property is equivalent to the
|
VALUE
is a constant expression that contains only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, andUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the
CREATE SNAPSHOT TABLE
statement is ignored.
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create
|
The dataset where you create the table snapshot. |
bigquery.tables.createSnapshot |
The source table. |
bigquery.tables.get |
The source table. |
bigquery.tables.getData |
The source table. |
Examples
Create a table snapshot: fail if it already exists
The following example creates a table snapshot of the table
myproject.mydataset.mytable
. The table snapshot is created in the dataset
mydataset
and is named mytablesnapshot
:
CREATE SNAPSHOT TABLE `myproject.mydataset.mytablesnapshot` CLONE `myproject.mydataset.mytable` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR), friendly_name="my_table_snapshot", description="A table snapshot that expires in 2 days", labels=[("org_unit", "development")] )
If the table snapshot name already exists in the dataset, then the following error is returned:
Already Exists: myproject.mydataset.mytablesnapshot
The table snapshot option list specifies the following:
- Expiration time: 48 hours after the time the table snapshot is created
- Friendly name:
my_table_snapshot
- Description:
A table snapshot that expires in 2 days
- Label:
org_unit = development
Create a table snapshot: ignore if it already exists
The following example creates a table snapshot of the table
myproject.mydataset.mytable
. The table snapshot is created in the dataset
mydataset
and is named mytablesnapshot
:
CREATE SNAPSHOT TABLE IF NOT EXISTS `myproject.mydataset.mytablesnapshot` CLONE `myproject.mydataset.mytable` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR), friendly_name="my_table_snapshot", description="A table snapshot that expires in 2 days" labels=[("org_unit", "development")] )
The table snapshot option list specifies the following:
- Expiration time: 48 hours after the time the table snapshot is created
- Friendly name:
my_table_snapshot
- Description:
A table snapshot that expires in 2 days
- Label:
org_unit = development
If the table snapshot name already exists in the dataset, then no action is taken, and no error is returned.
For information about restoring table snapshots, see
CREATE TABLE CLONE
.
For information about removing table snapshots, see
DROP SNAPSHOT TABLE
.
CREATE TABLE CLONE
statement
Creates a table clone based on a source table. The source table can be a table, a table clone, or a table snapshot.
Syntax
CREATE TABLE [ IF NOT EXISTS ] destination_table_name CLONE source_table_name [FOR SYSTEM_TIME AS OF time_expression] ... [OPTIONS(table_option_list)]
Details
Other than the use of the CLONE
clause in place of a column list, the syntax
is identical to the CREATE TABLE
syntax.
Arguments
IF NOT EXISTS
: If the specified destination table name already exists, theCREATE
statement has no effect.destination_table_name
: The name of the table that you want to create. The table name must be unique per dataset. The table name can contain the following:- Up to 1,024 characters
- Letters (upper or lower case), numbers, and underscores
OPTIONS(table_option_list)
: Lets you specify additional table creation options such as a label and an expiration time.source_table_name
: The name of the source table.
CREATE TABLE CLONE
statements must comply with the following rules:
- Only one
CREATE
statement is allowed. - The table that is being cloned must be a table, a table clone, or a table snapshot.
OPTIONS
CREATE TABLE CLONE
options are the same as
CREATE TABLE
options.
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create |
The dataset where you create the table clone. |
bigquery.tables.get |
The source table. |
bigquery.tables.getData |
The source table. |
bigquery.tables.restoreSnapshot |
The source table (required only if the source table is a table snapshot). |
If the OPTIONS
clause includes any expiration options, then the
bigquery.tables.delete
permission is also required.
Examples
Restore a table snapshot: fail if destination table already exists
The following example creates the table
myproject.mydataset.mytable
from the table snapshot
myproject.mydataset.mytablesnapshot
:
CREATE TABLE `myproject.mydataset.mytable` CLONE `myproject.mydataset.mytablesnapshot` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 365 DAY), friendly_name="my_table", description="A table that expires in 1 year", labels=[("org_unit", "development")] )
If the table name exists in the dataset, then the following error is returned:
Already Exists: myproject.mydataset.mytable.
The table option list specifies the following:
- Expiration time: 365 days after the time that the table is created
- Friendly name:
my_table
- Description:
A table that expires in 1 year
- Label:
org_unit = development
Create a clone of a table: ignore if the destination table already exists
The following example creates the table clone
myproject.mydataset.mytableclone
based on the table
myproject.mydataset.mytable
:
CREATE TABLE IF NOT EXISTS `myproject.mydataset.mytableclone` CLONE `myproject.mydataset.mytable` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 365 DAY), friendly_name="my_table", description="A table that expires in 1 year", labels=[("org_unit", "development")] )
The table option list specifies the following:
- Expiration time: 365 days after the time the table is created
- Friendly name:
my_table
- Description:
A table that expires in 1 year
- Label:
org_unit = development
If the table name exists in the dataset, then no action is taken, and no error is returned.
For information about creating a copy of a table, see
CREATE TABLE COPY
.
For information about creating a snapshot of a table, see
CREATE SNAPSHOT TABLE
.
CREATE VIEW
statement
Creates a new view.
Syntax
CREATE [ OR REPLACE ] VIEW [ IF NOT EXISTS ] view_name [(view_column_name_list)] [OPTIONS(view_option_list)] AS query_expression view_column_name_list := view_column[, ...] view_column := column_name [OPTIONS(view_column_option_list)]
Arguments
OR REPLACE
: Replaces any view with the same name if it exists. Cannot appear withIF NOT EXISTS
.IF NOT EXISTS
: If a view or other table resource exists with the same name, theCREATE
statement has no effect. Cannot appear withOR REPLACE
.view_name
: The name of the view you're creating. See Table path syntax.view_column_name_list
: Lets you explicitly specify the column names of the view, which may be aliases to the column names in the underlying SQL query.view_option_list
: Additional view creation options such as a label and an expiration time.query_expression
: The GoogleSQL query expression used to define the view.
Details
CREATE VIEW
statements must comply with the following rules:
- Only one
CREATE
statement is allowed.
view_column_name_list
The view's column name list is optional. The names must be unique but do not have to be the same as the column names of the underlying SQL query. For example, if your view is created with the following statement:
CREATE VIEW mydataset.age_groups(age, count) AS SELECT age, COUNT(*)
FROM mydataset.people
group by age;
Then you can query it with:
SELECT age, count from mydataset.age_groups;
The number of columns in the column name list must match the number of columns in the underlying SQL query. If the columns in the table of the underlying SQL query is added or dropped, the view becomes invalid and must be recreated. For example, if the age
column is dropped from the mydataset.people
table, then the view created in the previous example becomes invalid.
view_column_option_list
The view_column_option_list
lets you specify optional top-level column
options. Column options for a view have the same syntax and requirements as
for a table, but with a different list of NAME
and VALUE
fields:
NAME |
VALUE |
Details |
---|---|---|
description |
|
Example: |
view_option_list
The option list allows you to set view options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a view option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
expiration_timestamp |
TIMESTAMP |
Example: This property is equivalent to the expirationTime table resource property. |
friendly_name |
|
Example: This property is equivalent to the friendlyName table resource property. |
description |
|
Example: This property is equivalent to the description table resource property. |
labels |
|
Example: This property is equivalent to the labels table resource property. |
privacy_policy |
|
The policies to enforce when anyone queries the view.
To learn more about the policies available for a view, see
the |
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
If VALUE
evaluates to NULL
, the corresponding option NAME
in the
CREATE VIEW
statement is ignored.
privacy_policy
The following policies are available in the
privacy_policy
view option
to create analysis rules. A policy represents
a condition that needs to be met before a query can be run.
Policy | Details |
---|---|
|
The aggregation threshold policy to enforce when a view is queried. Syntax: '{ "aggregation_threshold_policy": { "threshold": value, "privacy_unit_columns": value } }' Parameters:
Example:
privacy_policy='{"aggregation_threshold_policy":
{"threshold" : 50,
"privacy_unit_columns": "ID"}}'
|
|
A differential privacy policy for the view. When this parameter is included, only differentially private queries can be run on the view. Syntax: '{ "differential_privacy_policy": { "privacy_unit_column": value, "max_epsilon_per_query": value, "epsilon_budget": value, "delta_per_query": value, "delta_budget": value, "max_groups_contributed": value } }' Parameters:
Example:
privacy_policy='{"differential_privacy_policy": {
"privacy_unit_column": "contributor_id",
"max_epsilon_per_query": 0.01,
"epsilon_budget": 25.6,
"delta_per_query": 0.005,
"delta_budget": 9.6,
"max_groups_contributed": 2}}'
|
|
A join restriction policy for the view. When this parameter is included, only the specified joins can be run on the specified columns in the view. This policy can be used alone or with other policies, such as the aggregation threshold or differential privacy policy. Syntax: '{ "join_restriction_policy": { "join_condition": value, "join_allowed_columns": value } }' Parameters:
Example:
privacy_policy='{"join_restriction_policy": {
"join_condition": 'JOIN_ANY',
"join_allowed_columns": ['col1', 'col2']}}'
|
Default project in view body
If the view is created in the same project used to run the CREATE VIEW
statement, the view body query_expression
can reference entities without
specifying the project; the default project is the project
which owns the view. Consider the sample query below.
CREATE VIEW myProject.myDataset.myView AS SELECT * FROM anotherDataset.myTable;
After running the above CREATE VIEW
query in the project myProject
, you can
run the query SELECT * FROM myProject.myDataset.myView
. Regardless of the project you
choose to run this SELECT
query, the referenced table anotherDataset.myTable
is always resolved against project myProject
.
If the view is not created in the same project used to run the CREATE VIEW
statement, then all references in the view body query_expression
must be
qualified with project IDs. For instance, the preceding sample CREATE VIEW
query
is invalid if it runs in a project different from myProject
.
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create |
The dataset where you create the view. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
permission.
If the OPTIONS
clause includes an expiration time, then the
bigquery.tables.delete
permission is also required.
Examples
Creating a new view
The following example creates a view named newview
in mydataset
:
CREATE VIEW `myproject.mydataset.newview`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="newview",
description="a view that expires in 2 days",
labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
If the view name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.table
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view option list specifies the:
- Expiration time: 48 hours from the time the view is created
- Friendly name:
newview
- Description:
A view that expires in 2 days
- Label:
org_unit = development
Creating a view only if the view doesn't exist
The following example creates a view named newview
in mydataset
only if no
view named newview
exists in mydataset
. If the view name exists in the
dataset, no error is returned, and no action is taken.
CREATE VIEW IF NOT EXISTS `myproject.mydataset.newview`
OPTIONS(
expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),
friendly_name="newview",
description="a view that expires in 2 days",
labels=[("org_unit", "development")]
)
AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view option list specifies the:
- Expiration time: 48 hours from the time the view is created
- Friendly name:
newview
- Description:
A view that expires in 2 days
- Label:
org_unit = development
Creating or replacing a view
The following example creates a view named newview
in mydataset
, and if
newview
exists in mydataset
, it is overwritten using the specified query
expression.
CREATE OR REPLACE VIEW `myproject.mydataset.newview` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR), friendly_name="newview", description="a view that expires in 2 days", labels=[("org_unit", "development")] ) AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM
myproject.mydataset.mytable
The view option list specifies the:
- Expiration time: 48 hours from the time the view is created
- Friendly name:
newview
- Description:
A view that expires in 2 days
- Label:
org_unit = development
Creating a view with column descriptions
The following example creates a view named newview
in mydataset
. This view
definition provides the column description for each column in mytable
.
You can rename columns from the original query.
CREATE VIEW `myproject.mydataset.newview` ( column_1_new_name OPTIONS (DESCRIPTION='Description of the column 1 contents'), column_2_new_name OPTIONS (DESCRIPTION='Description of the column 2 contents'), column_3_new_name OPTIONS (DESCRIPTION='Description of the column 3 contents') ) AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
CREATE MATERIALIZED VIEW
statement
Creates a new materialized view.
Syntax
CREATE [ OR REPLACE ] MATERIALIZED VIEW [ IF NOT EXISTS ] materialized_view_name [PARTITION BY partition_expression] [CLUSTER BY clustering_column_list] [OPTIONS(materialized_view_option_list)] AS query_expression
Arguments
OR REPLACE
: Replaces a materialized view with the same name if it exists. Cannot appear withIF NOT EXISTS
.IF NOT EXISTS
: If a materialized view or other table resource exists with the same name, theCREATE
statement has no effect. Cannot appear withOR REPLACE
.materialized_view_name
: The name of the materialized view you're creating. See Table path syntax.If the
project_name
is omitted from the materialized view name, or it is the same as the project that runs this DDL query, then the latter is also used as the default project for references to tables, functions, and other resources inquery_expression
. The default project of the references is fixed and does not depend on the future queries that invoke the new materialized view. Otherwise, all references inquery_expression
must be qualified with project names.The materialized view name must be unique per dataset.
partition_expression
: An expression that determines how to partition the table. A materialized view can only be partitioned in the same way as the table inquery expression
(the base table) is partitioned.clustering_column_list
: A comma-separated list of column references that determine how to cluster the materialized view.materialized_view_option_list
: Allows you to specify additional materialized view options such as a whether refresh is enabled, the refresh interval, a label, and an expiration time.query_expression
: The GoogleSQL query expression used to define the materialized view.
Details
CREATE MATERIALIZED VIEW
statements must comply with the following rules:
- Only one
CREATE
statement is allowed.
Default project in materialized view body
If the materialized view is created in the same project used to run the CREATE MATERIALIZED VIEW
statement, the materialized view body query_expression
can reference entities without
specifying the project; the default project is the project
which owns the materialized view. Consider the sample query below.
CREATE MATERIALIZED VIEW myProject.myDataset.myView AS SELECT * FROM anotherDataset.myTable;
After running the above CREATE MATERIALIZED VIEW
query in the project myProject
, you can
run the query SELECT * FROM myProject.myDataset.myView
. Regardless of the project you
choose to run this SELECT
query, the referenced table anotherDataset.myTable
is always resolved against project myProject
.
If the materialized view is not created in the same project used to run the CREATE VIEW
statement, then all references in the materialized view body query_expression
must be
qualified with project IDs. For instance, the preceding sample CREATE MATERIALIZED VIEW
query
is invalid if it runs in a project different from myProject
.
materialized_view_option_list
The option list allows you to set materialized view options such as a whether refresh is enabled. the refresh interval, a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a materialized view option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
enable_refresh |
BOOLEAN |
Example: |
refresh_interval_minutes |
FLOAT64 |
Example: |
expiration_timestamp |
TIMESTAMP |
Example: This property is equivalent to the
expirationTime
table resource property. |
max_staleness |
INTERVAL |
Example: The
|
allow_non_incremental_definition |
BOOLEAN |
Example: The
|
kms_key_name |
|
Example: This property is equivalent to the encryptionConfiguration.kmsKeyName table resource property. See more details about Protecting data with Cloud KMS keys. |
friendly_name |
|
Example: This property is equivalent to the friendlyName table resource property. |
description |
|
Example: This property is equivalent to the description table resource property. |
labels |
|
Example: This property is equivalent to the labels table resource property. |
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create
|
The dataset where you create the materialized view. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
permission.
If the OPTIONS
clause includes any expiration options, then the
bigquery.tables.delete
permission is also required.
Examples
Creating a new materialized view
The following example creates a materialized view named new_mv
in mydataset
:
CREATE MATERIALIZED VIEW `myproject.mydataset.new_mv` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR), friendly_name="new_mv", description="a materialized view that expires in 2 days", labels=[("org_unit", "development")], enable_refresh=true, refresh_interval_minutes=20 ) AS SELECT column_1, SUM(column_2) AS sum_2, AVG(column_3) AS avg_3 FROM `myproject.mydataset.mytable` GROUP BY column_1
If the materialized view name exists in the dataset, the following error is returned:
Already Exists: project_id:dataset.materialized_view
When you use a DDL statement to create a materialized view, you must specify the
project, dataset, and materialized view in the following format:
`project_id.dataset.materialized_view`
(including the backticks if project_id
contains special characters); for example,
`myproject.mydataset.new_mv`
.
The materialized view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view option list specifies the:
- Expiration time: 48 hours from the time the materialized view is created
- Friendly name:
new_mv
- Description:
A materialized view that expires in 2 days
- Label:
org_unit = development
- Refresh enabled: true
- Refresh interval: 20 minutes
Creating a materialized view only if the materialized view doesn't exist
The following example creates a materialized view named new_mv
in mydataset
only if no materialized view named new_mv
exists in mydataset
. If the
materialized view name exists in the dataset, no error is returned, and no
action is taken.
CREATE MATERIALIZED VIEW IF NOT EXISTS `myproject.mydataset.new_mv` OPTIONS( expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR), friendly_name="new_mv", description="a view that expires in 2 days", labels=[("org_unit", "development")], enable_refresh=false ) AS SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view is defined using the following GoogleSQL query:
SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`
The materialized view option list specifies the:
- Expiration time: 48 hours from the time the view is created
- Friendly name:
new_mv
- Description:
A view that expires in 2 days
- Label:
org_unit = development
- Refresh enabled: false
Creating a materialized view with partitioning and clustering
The following example creates a materialized view named new_mv
in mydataset
,
partitioned by the col_datetime
column and clustered
by the col_int
column:
CREATE MATERIALIZED VIEW `myproject.mydataset.new_mv` PARTITION BY DATE(col_datetime) CLUSTER BY col_int AS SELECT col_int, col_datetime, COUNT(1) as cnt FROM `myproject.mydataset.mv_base_table` GROUP BY col_int, col_datetime
The base table, mv_base_table
, must also be partitioned by the
col_datetime
column. For more information, see
Working with partitioned and clustered tables.
CREATE MATERIALIZED VIEW AS REPLICA OF
statement
Creates a replica of a materialized view. The source materialized view must be over an Amazon Simple Storage Service (Amazon S3) BigLake table. You can use the materialized view replica to make Amazon S3 data available locally for joins.
For more information, see Create materialized view replicas.
Syntax
CREATE MATERIALIZED VIEW replica_name [OPTIONS(materialized_view_replica_option_list)] AS REPLICA OF source_materialized_view_name
Arguments
replica_name
: The name of the materialized view replica you're creating, in table path syntax. If the project name is omitted from the materialized view replica name, the current project is used as the default.The materialized view replica name must be unique for each dataset.
materialized_view_replica_option_list
: Allows you to specify options such as the replication interval.source_materialized_view_name
: The name of the materialized view you are replicating, in table path syntax. The source materialized view must be over an Amazon S3 BigLake table, and must be authorized on the dataset that contains that table.
materialized_view_replica_option_list
The option list lets you set materialized view replica options.
Specify a materialized view replica option list in the following format:
NAME=VALUE, ...
NAME |
VALUE |
Details |
---|---|---|
replication_interval_seconds |
INT64 |
Specifies how often to replicate the data from the source materialized
view to the replica. Must be a value between Example: |
Required permissions
This statement requires the following IAM permissions:
bigquery.tables.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.replicateData
bigquery.jobs.create
Example
The following example creates a materialized view replica named mv_replica
in bq_dataset
:
CREATE MATERIALIZED VIEW `myproject.bq_dataset.mv_replica` OPTIONS( replication_interval_seconds=600 ) AS REPLICA OF `myproject.s3_dataset.my_s3_mv`
CREATE EXTERNAL SCHEMA
statement
Creates a new federated dataset.
A federated dataset is a connection between BigQuery and an external data source at the dataset level. For an example, see Create AWS Glue federated datasets.
Syntax
CREATE EXTERNAL SCHEMA [ IF NOT EXISTS ] dataset_name [WITH CONNECTION connection_name] [OPTIONS(external_schema_option_list)]
Arguments
IF NOT EXISTS
: If any dataset exists with the same name, theCREATE
statement has no effect.dataset_name
: The name of the dataset to create.connection_name
: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`
).external_schema_option_list
: A list of options for creating the federated dataset.
Details
The dataset is created in the location that you specify in the query settings. For more information, see Specify locations. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations.
For more information about creating a dataset, see Create datasets. For information about quotas, see dataset limits.
external_schema_option_list
The option list specifies options for the federated dataset. Specify the options
in the following format: NAME=VALUE, ...
The following options are supported:
NAME |
VALUE |
Details |
---|---|---|
description |
STRING |
The description of the dataset. |
friendly_name |
STRING |
A descriptive name for the dataset. |
labels |
<ARRAY<STRUCT<STRING, STRING>>> |
An array of labels for the dataset, expressed as key-value pairs. |
location |
STRING |
The location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations. |
external_source |
STRING |
The source of the external dataset, such as the
Amazon Resource Name (ARN),
with a prefix identifying the source, such as aws-glue:// .
|
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.datasets.create |
The project where you create the federated dataset. |
bigquery.connections.use |
The project where you create the federated dataset. |
bigquery.connections.delegate |
The project where you create the federated dataset. |
Examples
The following example creates an AWS Glue federated dataset:
CREATE EXTERNAL SCHEMA mydataset
WITH CONNECTION myproject.`aws-us-east-1`.myconnection
OPTIONS (
external_source = 'aws-glue://arn:aws:glue:us-east-1:123456789:database/test_database',
location = 'aws-us-east-1');
CREATE EXTERNAL TABLE
statement
Creates a new external table.
External tables let BigQuery query data that is stored outside of BigQuery storage. For more information about external tables, see Introduction to external data sources.
Syntax
CREATE [ OR REPLACE ] EXTERNAL TABLE [ IF NOT EXISTS ] table_name [( column_name column_schema, ... )] [WITH CONNECTION connection_name] [WITH PARTITION COLUMNS [( partition_column_name partition_column_type, ... )] ] OPTIONS ( external_table_option_list, ... );
Arguments
OR REPLACE
: Replaces any external table with the same name if it exists. Cannot appear withIF NOT EXISTS
.IF NOT EXISTS
: If an external table or other table resource exists with the same name, theCREATE
statement has no effect. Cannot appear withOR REPLACE
.table_name
: The name of the external table. See Table path syntax.column_name
: The name of a column in the table.column_schema
: Specifies the schema of the column. It uses the same syntax as thecolumn_schema
definition in theCREATE TABLE
statement. If you don't include this clause, BigQuery detects the schema automatically.connection_name
: Specifies a connection resource that has credentials for accessing the external data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`
).partition_column_name
: The name of a partition column. Include this field if your external data uses a hive-partitioned layout. For more information, see: Supported data layouts.partition_column_type
: The partition column type.external_table_option_list
: A list of options for creating the external table.
Details
The CREATE EXTERNAL TABLE
statement does not support creating temporary
external tables.
To create an externally partitioned table, use the WITH PARTITION COLUMNS
clause to specify the partition schema details. BigQuery
validates the column definitions against the external data location. The schema
declaration must strictly follow the ordering of the fields in the external
path. For more information about external partitioning, see
Querying externally partitioned data.
external_table_option_list
The option list specifies options for creating the external table. The format
and uris
options are required. Specify the option list in the following
format: NAME=VALUE, ...
Options | |
---|---|
allow_jagged_rows |
If Applies to CSV data. |
allow_quoted_newlines |
If Applies to CSV data. |
bigtable_options |
Only required when creating a Bigtable external table. Specifies the schema of the Bigtable external table in JSON format. For a list of Bigtable table definition options, see
|
compression |
The compression type of the data source. Supported values include:
Applies to CSV and JSON data. |
decimal_target_types |
Determines how to convert a Example: |
description |
A description of this table. |
enable_list_inference |
If Applies to Parquet data. |
enable_logical_types |
If Applies to Avro data. |
encoding |
The character encoding of the data. Supported values include:
Applies to CSV data. |
enum_as_string |
If Applies to Parquet data. |
expiration_timestamp |
The time when this table expires. If not specified, the table does not expire. Example: |
field_delimiter |
The separator for fields in a CSV file. Applies to CSV data. |
format |
The format of the external data.
Supported values for
Supported values for
The value |
hive_partition_uri_prefix |
A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables. Applies to Avro, CSV, JSON, Parquet, and ORC data. Example: |
file_set_spec_type |
Specifies how to interpret source URIs for load jobs and external tables. Supported values include:
For example, if you have a source URI of |
ignore_unknown_values |
If Applies to CSV and JSON data. |
json_extension |
For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records. Supported values include: |
max_bad_records |
The maximum number of bad records to ignore when reading the data. Applies to: CSV, JSON, and Google Sheets data. |
max_staleness |
Applicable for BigLake tables and object tables. Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it. To disable metadata caching, specify 0. This is the default. To enable metadata caching, specify an
interval literal
value between 30 minutes and 7 days. For example, specify
|
null_marker |
The string that represents Applies to CSV data. |
object_metadata |
Only required when creating an object table. Set the value of this option to |
preserve_ascii_control_characters |
If Applies to CSV data. |
projection_fields |
A list of entity properties to load. Applies to Datastore data. |
quote |
The string used to quote data sections in a CSV file. If your data
contains quoted newline characters, also set the
Applies to CSV data. |
reference_file_schema_uri |
User provided reference file with the table schema. Applies to Parquet/ORC/AVRO data. Example: |
require_hive_partition_filter |
If Applies to Avro, CSV, JSON, Parquet, and ORC data. |
sheet_range |
Range of a Google Sheets spreadsheet to query from. Applies to Google Sheets data. Example: |
skip_leading_rows |
The number of rows at the top of a file to skip when reading the data. Applies to CSV and Google Sheets data. |
uris |
For external tables, including object tables, that aren't Bigtable tables:
An array of fully qualified URIs for the external data locations.
Each URI can contain one
asterisk ( The following examples show valid
For Bigtable tables:
The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI. Example:
For more information on constructing a Bigtable URI, see Retrieve the Bigtable URI. |
Required permissions
This statement requires the following IAM permissions:
Permission | Resource |
---|---|
bigquery.tables.create |
The dataset where you create the external table. |
In addition, the OR REPLACE
clause requires bigquery.tables.update
permission.
If the OPTIONS
clause includes an expiration time, then the
bigquery.tables.delete
permission is also required.
Examples
The following example creates a BigLake table and explicitly specifies the schema. It also specifies refreshing metadata cache automatically at a system-defined interval.
CREATE OR REPLACE EXTERNAL TABLE mydataset.newtable (x INT64, y STRING, z BOOL)
WITH CONNECTION myconnection
OPTIONS(
format ="PARQUET",
max_staleness = STALENESS_INTERVAL,
metadata_cache_mode =