Manage Iceberg resources in BigQuery metastore
BigQuery metastore is a single, shared metastore that enables data sharing across data processing engines, eliminating the need to maintain separate metastores for your open source workloads.
This document explains how to create, view, modify, and delete Iceberg resources in BigQuery metastore.
Before you begin
-
Make sure that billing is enabled for your Google Cloud project.
Learn how to check if billing is enabled on a project. -
Enable the BigQuery, BigQuery Storage, and Dataproc APIs.
- Optional: Understand how BigQuery metastore works and why you should use it.
Required roles
To get the permissions that you need to manage Iceberg resources in BigQuery metastore, ask your administrator to grant you the following IAM roles on your project:
-
BigQuery Data Editor (
roles/bigquery.dataEditor
) -
Storage Object Admin (
roles/storage.objectAdmin
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Create metastore resources
The following sections describe how to create resources in BigQuery metastore.
Create namespaces
Select one of the following options:
API
Use the
datasets.insert
method,
and specify the
ExternalCatalogDatasetOptions
field
in the dataset resource that you
pass in.
{ "datasetReference": { "projectId": "PROJECT_ID", "datasetId": "DATASET_ID" }, "externalCatalogDatasetOptions": { "defaultStorageLocationUri": "URI", "parameters": { ... } }, "location": "LOCATION" }
Replace the following:
PROJECT_ID
: the ID of the project that contains your target datasetDATASET_ID
: the ID of your target datasetURI
: the Cloud Storage URI for all tables in the datasetLOCATION
: the BigQuery location that you want to create the dataset in
Spark SQL
CREATE NAMESPACE SPARK_CATALOG.NAMESPACE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your new namespace
Terraform
terraform { required_providers { google-beta = { source = "hashicorp/google-beta" version = ">=6.15.0" } } } provider "google-beta" { project = "PROJECT_ID" } resource "google_bigquery_dataset" "default" { provider = google-beta dataset_id = "DATASET_ID" location = "LOCATION" external_catalog_dataset_options { default_storage_location_uri = "URI" parameters = { ... } } }
Replace the following:
PROJECT_ID
: the ID of the project that contains your target datasetDATASET_ID
: the ID of your target datasetLOCATION
: the BigQuery location that you want to create the dataset inURI
: the Cloud Storage URI for all tables in the dataset
Create Iceberg tables
Select one of the following options:
API
Use the
tables.insert
method,
and specify the
ExternalCatalogTableOptions
field
in the table resource that you pass
in.
{ "tableReference": { "projectId": "PROJECT_ID", "datasetId": "DATASET_ID", "tableId": "TABLE_ID" }, "externalCatalogTableOptions": { "parameters": { "table_type": "iceberg", "metadata_location": "METADATA_URI" }, "connection_id": "CONNECTION_ID" } }
Replace the following:
PROJECT_ID
: the ID of the project that contains your target table.DATASET_ID
: the ID of the dataset that contains your target table.TABLE_ID
: the ID of your target table.METADATA_URI
: the Cloud Storage URI for the latest Iceberg metadata file. For example,gs://mybucket/mytable/metadata/1234.metadata.json
.CONNECTION_ID
: the ID of your connection to Cloud Storage.
Spark SQL
CREATE TABLE SPARK_CATALOG.NAMESPACE.TABLE (id bigint, data string) USING iceberg;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespaceTABLE
: the name of your new table
Terraform
resource "google_bigquery_table" "default" { provider = google-beta deletion_protection = false dataset_id = google_bigquery_dataset.default.dataset_id table_id = "TABLE" external_catalog_table_options { storage_descriptor { location_uri = "STORAGE_URI" input_format = "org.apache.hadoop.mapred.FileInputFormat" output_format = "org.apache.hadoop.mapred.FileOutputFormat" } parameters = { "table_type" = "iceberg" "metadata_location" = "METADATA_URI" "write.parquet.compression-codec" : "zstd" "EXTERNAL" : "TRUE" } } }
Replace the following:
TABLE
: the name of the target table.STORAGE_URI
: the Cloud Storage URI where the table data is stored, starting withgs://
.METADATA_URI
: the Cloud Storage URI for the latest Iceberg metadata file. For example,gs://mybucket/mytable/metadata/1234.metadata.json
.
View metastore resources
The following sections describe how to view resources in BigQuery metastore.
View namespaces
Select one of the following options:
API
Use the
datasets.list
method
to view all namespaces, or use the
datasets.get
method
to view information about a defined namespace.
Spark SQL
To view all namespaces in a catalog, use the following statement:
SHOW { DATABASES | NAMESPACES } IN SPARK_CATALOG;
Replace SPARK_CATALOG
with the name of your
Spark catalog.
To view information about a defined namespace, use the following statement:
DESCRIBE { DATABASE | NAMESPACE } [EXTENDED] SPARK_CATALOG.NAMESPACE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespace
View tables
Select one of the following options:
API
Use the
tables.list
method
to view all tables in a namespace, or use the
tables.get
method
to view information about a defined table.
Spark SQL
To view all tables in a namespace, use the following statement:
SHOW TABLES IN SPARK_CATALOG.NAMESPACE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespace
To view information about a defined table, use the following statement:
DESCRIBE TABLE [EXTENDED] SPARK_CATALOG.NAMESPACE.TABLE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespaceTABLE
: the name of your table
Modify metastore resources
The following sections describe how to modify resources in BigQuery metastore.
Update namespaces
Select one of the following options:
API
Use the
datasets.patch
method,
and update the
ExternalCatalogDatasetOptions
field
in the dataset resource. The
datasets.update
method is not recommended because it replaces the entire
dataset resource.
Spark SQL
Use the
ALTER DATABASE
statement.
Update Iceberg tables
Select one of the following options:
API
Use the
tables.patch
method,
and update the
ExternalCatalogTableOptions
field
in the table resource. The
tables.update
method is not recommended because it replaces the entire
table resource.
To update the schema or metadata file, use the tables.patch
method and set
the autodetect_schema
property to true:
PATCH https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID?autodetect_schema=true
Replace the following:
PROJECT_ID
: the ID of the project that contains the table that you want to updateDATASET_ID
: the ID of the dataset that contains the table that you want to updateTABLE_ID
: the ID of the table that you want to update
In the body of the request, specify the updated value for each field. For
example, to update the Iceberg table's
metadata location, specify the updated value for the metadata_location
field:
{ "externalCatalogTableOptions": { "parameters": {"metadata_location": "METADATA_URI"} }, "schema": null }'
Replace METADATA_URI
with the
Cloud Storage URI for
the latest Iceberg metadata file. For example,
gs://mybucket/mytable/metadata/1234.metadata.json
.
Spark SQL
Use the
ALTER TABLE
statement.
Delete metastore resources
The following sections describe how to delete resources in BigQuery metastore.
Delete namespaces
Select one of the following options:
API
Use the
datasets.delete
method.
Set the deleteContents
parameter to true to delete the tables in your
namespace.
Spark SQL
DROP NAMESPACE SPARK_CATALOG.NAMESPACE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespace
Delete tables
Select one of the following options:
API
Use the
tables.delete
method and
specify the name of the table. This method doesn't delete the associated files
in Cloud Storage.
Spark SQL
To only drop the table, use the following statement:
DROP TABLE SPARK_CATALOG.NAMESPACE.TABLE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespaceTABLE
: the name of the table to drop
To drop the table and delete the associated files in Cloud Storage, use the following statement:
DROP TABLE SPARK_CATALOG.NAMESPACE.TABLE PURGE;
Replace the following:
SPARK_CATALOG
: the name of your Spark catalogNAMESPACE
: the name of your namespaceTABLE
: the name of the table to delete
What's next
- Learn about additional BigQuery metastore features.