The Apache Iceberg REST catalog in BigLake metastore creates interoperability between all your query engines by offering a single source of truth for all of your Iceberg data. It lets query engines, such as Apache Spark, discover, read metadata from, and manage Iceberg tables in a consistent way.
You should be familiar with BigLake metastore before continuing.
Before you begin
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigLake API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Required roles
To get the permissions that you need to use the Iceberg REST catalog in BigLake metastore, ask your administrator to grant you the following IAM roles:
-
Perform administrative tasks, such as managing catalog user access, storage access, and the catalog's credential vending mode:
-
BigLake Admin (
roles/biglake.admin) on the project -
Storage Admin (
roles/storage.admin) on the Cloud Storage bucket
-
BigLake Admin (
-
Read table data in credential vending mode:
BigLake Viewer (
roles/biglake.viewer) on the project -
Write table data in credential vending mode:
BigLake Editor (
roles/biglake.editor) on the project -
Read catalog resources and table data in non-credential vending mode:
-
BigLake Viewer (
roles/biglake.viewer) on the project -
Storage Object Viewer (
roles/storage.objectViewer) on the Cloud Storage bucket
-
BigLake Viewer (
-
Manage catalog resources and write table data in non-credential vending mode:
-
BigLake Editor (
roles/biglake.editor) on the project -
Storage Object User (
roles/storage.objectUser) on the Cloud Storage bucket
-
BigLake Editor (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Limitations
The Iceberg REST catalog is subject to the following limitation:
- When using credential vending mode, you must set the
io-implproperty toorg.apache.iceberg.gcp.gcs.GCSFileIO. The default,org.apache.iceberg.hadoop.HadoopFileIO, isn't supported.
Create a catalog
The first step when using the Iceberg REST catalog is to create a catalog resource object that uses either end-user credentials or credential vending mode.
Credential vending is a storage access delegation mechanism that allows BigLake metastore administrators to control permissions directly on BigLake metastore resources, eliminating the need for catalog users to have direct access to Cloud Storage buckets. It lets BigLake administrators give users permissions on specific data files.
End-user credentials
Console
In the Google Cloud console, open the BigLake metastore page.
Click Create catalog. The Create catalog page opens.
For Select a Cloud Storage bucket, enter the name of the Cloud Storage bucket to use with your catalog. Alternatively, click Browse to choose from a list of existing buckets or to create a new one. You can only have one catalog per Cloud Storage bucket.
For Authentication method, select End-user credentials.
Click Create.
gcloud
Use the
gcloud alpha biglake iceberg catalogs create command.
gcloud alpha biglake iceberg catalogs create \ catalog-id CATALOG_NAME --project PROJECT_ID \ --catalog-type CATALOG_TYPE_GCS_BUCKET \ --credential-mode CREDENTIAL_MODE_END_USER
Replace the following:
CATALOG_NAME: a name for your catalog.PROJECT_ID: your Google Cloud project ID.
Credential vending mode
A catalog administrator enables credential vending when creating or updating a catalog. As a catalog user, you can then instruct the Iceberg REST catalog to return downscoped storage credentials by specifying the access delegation when you configure the Iceberg REST catalog.
Console
In the Google Cloud console, open the BigLake metastore page.
Click Create catalog. The Create catalog page opens.
For Select a Cloud Storage bucket, enter the name of the Cloud Storage bucket to use with your catalog. Alternatively, click Browse to choose from a list of existing buckets or to create a new one. You can only have one catalog per Cloud Storage bucket.
For Authentication method, select Credential vending mode.
Click Create.
gcloud
Use the
gcloud alpha biglake iceberg catalogs create command.
gcloud alpha biglake iceberg catalogs create \ catalog-id CATALOG_NAME --project PROJECT_ID \ --catalog-type CATALOG_TYPE_GCS_BUCKET \ --credential-mode CREDENTIAL_MODE_VENDED_CREDENTIALS
Replace the following:
CATALOG_NAME: a name for your catalog.PROJECT_ID: your Google Cloud project ID.
BigQuery catalog federation
Catalog federation lets you use the Iceberg REST catalog to create and query Iceberg resources that are managed by BigQuery. Specifically, catalog federation lets you use BigLake tables for Apache Iceberg in BigQuery with the Iceberg REST catalog. As these resources are managed by BigQuery, you must have the applicable required permissions.
You enable catalog federation when you
configure the Iceberg REST catalog.
In that configuration, use the bq://projects/PROJECT_ID
warehouse format instead of the standard
gs://CLOUD_STORAGE_BUCKET_NAME format. You can also
choose to include a BigQuery location to restrict future requests
to a single location using the
bq://projects/PROJECT_ID/locations/LOCATION
format.
Credential vending and Trino workflows aren't supported for federated catalogs.
Configure the Iceberg REST catalog
After you create a catalog resource object that points to a warehouse location, configure your client application to use the catalog:
Cluster
To use Spark with the Iceberg REST catalog on Dataproc, first create a cluster with the Iceberg component:
gcloud dataproc clusters create CLUSTER_NAME \ --enable-component-gateway \ --project=PROJECT_ID \ --region=REGION \ --optional-components=ICEBERG \ --image-version=DATAPROC_VERSION
Replace the following:
CLUSTER_NAME: a name for your cluster.PROJECT_ID: your Google Cloud project ID.REGION: the region for the Dataproc cluster.DATAPROC_VERSION: the Dataproc image version, for example2.2.
After you create the cluster, configure your Spark session to use the Iceberg REST catalog:
import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \ .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \ .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \ .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'WAREHOUSE_PATH') \ .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \ .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \ .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \ .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \ .getOrCreate()
Replace the following:
CATALOG_NAME: the name of your Iceberg REST catalog.APP_NAME: a name for your Spark session.WAREHOUSE_PATH: The path to your warehouse. Usegs://CLOUD_STORAGE_BUCKET_NAME, or, to use BigQuery catalog federation, usebq://projects/PROJECT_IDinstead.PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.
This example doesn't use credential vending. To use credential vending, you
must use a catalog in credential vending mode and add the
X-Iceberg-Access-Delegation header to Iceberg REST
catalog requests with a value of vended-credentials by adding the following
line to the SparkSession builder:
.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')
Example with credential vending
The following example configures the query engine with credential vending:
import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \ .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \ .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \ .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \ .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \ .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \ .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \ .config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials') \ .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \ .getOrCreate()
For more information, see the
Headers in the RESTCatalog
section of the Iceberg documentation.
Dataproc clusters support Google authorization flows for Iceberg in the following releases:
- Dataproc on Compute Engine 2.2 image versions 2.2.65 and later.
- Dataproc on Compute Engine 2.3 image versions 2.3.11 and later.
Serverless
Submit a PySpark batch workload to Google Cloud Serverless for Apache Spark with the following configuration:
gcloud dataproc batches submit pyspark PYSPARK_FILE \ --project=PROJECT_ID \ --region=REGION \ --version=RUNTIME_VERSION \ --properties="\ spark.sql.defaultCatalog=CATALOG_NAME,\ spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog,\ spark.sql.catalog.CATALOG_NAME.type=rest,\ spark.sql.catalog.CATALOG_NAME.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog,\ spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_PATH,\ spark.sql.catalog.CATALOG_NAME.io-impl=org.apache.iceberg.gcp.gcs.GCSFileIO,\ spark.sql.catalog.CATALOG_NAME.header.x-goog-user-project=PROJECT_ID,\ spark.sql.catalog.CATALOG_NAME.rest.auth.type=org.apache.iceberg.gcp.auth.GoogleAuthManager,\ spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,\ spark.sql.catalog.CATALOG_NAME.rest-metrics-reporting-enabled=false"
Replace the following:
PYSPARK_FILE: thegs://Cloud Storage path to your PySpark application file.PROJECT_ID: your Google Cloud project ID.REGION: the region for the Dataproc batch workload.RUNTIME_VERSION: the Serverless for Apache Spark runtime version, for example2.2.CATALOG_NAME: the name of your Iceberg REST catalog.WAREHOUSE_PATH: The path to your warehouse. Usegs://CLOUD_STORAGE_BUCKET_NAME, or, to use BigQuery catalog federation, usebq://projects/PROJECT_IDinstead.
To use credential vending, you must use a
catalog in credential vending mode and add the
X-Iceberg-Access-Delegation header to Iceberg REST
catalog requests with a value of vended-credentials by adding the following
line to the Serverless for Apache Spark configurations:
.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')
Example with credential vending
The following example configures the query engine with credential vending:
gcloud dataproc batches submit pyspark PYSPARK_FILE \ --project=PROJECT_ID \ --region=REGION \ --version=RUNTIME_VERSION \ --properties="\ spark.sql.defaultCatalog=CATALOG_NAME,\ spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog,\ spark.sql.catalog.CATALOG_NAME.type=rest,\ spark.sql.catalog.CATALOG_NAME.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog,\ spark.sql.catalog.CATALOG_NAME.warehouse=gs://CLOUD_STORAGE_BUCKET_NAME,\ spark.sql.catalog.CATALOG_NAME.header.x-goog-user-project=PROJECT_ID,\ spark.sql.catalog.CATALOG_NAME.rest.auth.type=org.apache.iceberg.gcp.auth.GoogleAuthManager,\ spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,\ spark.sql.catalog.CATALOG_NAME.rest-metrics-reporting-enabled=false, spark.sql.catalog.CATALOG_NAME.header.X-Iceberg-Access-Delegation=vended-credentials"
For more information, see the
Headers in the RESTCatalog
section of the Iceberg documentation.
Serverless for Apache Spark supports Google authorization flows for Iceberg in the following runtime versions:
- Serverless for Apache Spark 2.2 runtimes 2.2.60 and later
- Serverless for Apache Spark 2.3 runtimes 2.3.10 and later
Trino
To use Trino with the Iceberg REST catalog, create a
Dataproc cluster with the Trino component and configure catalog
properties using the gcloud dataproc clusters create --properties flag. The following example creates a
Trino catalog named CATALOG_NAME:
gcloud dataproc clusters create CLUSTER_NAME \ --enable-component-gateway \ --region=REGION \ --image-version=DATAPROC_VERSION \ --network=NETWORK_ID \ --optional-components=TRINO \ --properties="\ trino-catalog:CATALOG_NAME.connector.name=iceberg,\ trino-catalog:CATALOG_NAME.iceberg.catalog.type=rest,\ trino-catalog:CATALOG_NAME.iceberg.rest-catalog.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog,\ trino-catalog:CATALOG_NAME.iceberg.rest-catalog.warehouse=WAREHOUSE_PATH,\ trino-catalog:CATALOG_NAME.iceberg.rest-catalog.biglake.project-id=PROJECT_ID,\ trino-catalog:CATALOG_NAME.iceberg.rest-catalog.rest.auth.type=org.apache.iceberg.gcp.auth.GoogleAuthManager"
Replace the following:
CLUSTER_NAME: a name for your cluster.REGION: the Dataproc cluster region.DATAPROC_VERSION: Dataproc image version, for example2.2.NETWORK_ID: cluster network ID. For more information, see Dataproc cluster network configuration.CATALOG_NAME: the name of your Trino catalog using the Iceberg REST catalog.WAREHOUSE_PATH: The path to your warehouse. Usegs://CLOUD_STORAGE_BUCKET_NAME.PROJECT_ID: your Google Cloud project ID to use for BigLake metastore.
After cluster creation, connect to the main VM instance, and use the Trino CLI:
trino --catalog=CATALOG_NAME
Dataproc Trino supports Google authorization flows for Iceberg in the following releases:
- Dataproc on Compute Engine 2.2 runtime versions 2.2.65 and later
- Dataproc on Compute Engine 2.3 runtime versions 2.3.11 and later
- Dataproc on Compute Engine 3.0 is not supported.
Iceberg 1.10 or later
Open source Iceberg 1.10 and later releases have built-in
support for Google authorization flows in GoogleAuthManager. The following is an example of how to configure Apache Spark to use the BigLake metastore Iceberg REST Catalog.
import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \ .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \ .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \ .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'WAREHOUSE_PATH') \ .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \ .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \ .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \ .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \ .getOrCreate()
Replace the following:
CATALOG_NAME: the name of your Iceberg REST catalog.APP_NAME: a name for your Spark session.WAREHOUSE_PATH: The path to your warehouse. Usegs://CLOUD_STORAGE_BUCKET_NAME, or, to use BigQuery catalog federation, usebq://projects/PROJECT_IDinstead.PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.
The preceding example doesn't use credential vending. To use credential
vending, you must use a
catalog in credential vending mode and add the
X-Iceberg-Access-Delegation header to Iceberg REST
catalog requests with a value of vended-credentials by adding the following
line to the SparkSession builder:
.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')
Example with credential vending
The following example configures the query engine with credential vending:
import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \ .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \ .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \ .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \ .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \ .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \ .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \ .config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials') \ .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \ .getOrCreate()
For more information, see the
Headers in the RESTCatalog
section of the Iceberg documentation.
Prior Iceberg releases
For open source Iceberg releases prior to 1.10, you can configure standard OAuth authentication by configuring a session with the following:
import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.1,org.apache.iceberg:iceberg-gcp-bundle:1.9.1') \ .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \ .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \ .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \ .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'WAREHOUSE_PATH') \ .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \ .config(f"spark.sql.catalog.{catalog_name}.token", "TOKEN") \ .config(f"spark.sql.catalog.{catalog_name}.oauth2-server-uri", "https://oauth2.googleapis.com/token") \ .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \ .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \ .getOrCreate()
Replace the following:
CATALOG_NAME: the name of your Iceberg REST catalog.APP_NAME: a name for your Spark session.WAREHOUSE_PATH: The path to your warehouse. Usegs://CLOUD_STORAGE_BUCKET_NAME, or, to use BigQuery catalog federation, usebq://projects/PROJECT_IDinstead.PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.TOKEN: your authentication token, which is valid for one hour—for example, a token generated usinggcloud auth application-default print-access-token.
The preceding example doesn't use credential vending. To use credential
vending, you must use a
catalog in credential vending mode and add the
X-Iceberg-Access-Delegation header to Iceberg REST
catalog requests with a value of vended-credentials by adding the following
line to the SparkSession builder:
.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')
Example with credential vending
The following example configures the query engine with credential vending:
import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \ .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \ .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \ .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \ .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \ .config(f"spark.sql.catalog.{catalog_name}.token", "TOKEN") \ .config(f"spark.sql.catalog.{catalog_name}.oauth2-server-uri", "https://oauth2.googleapis.com/token") \ .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \ .config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials') \ .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \ .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \ .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \ .getOrCreate()
For more information, see the
Headers in the RESTCatalog
section of the Iceberg documentation.
Create a namespace or schema
Spark
spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;") spark.sql("USE NAMESPACE_NAME;")
Replace NAMESPACE_NAME with a name for your namespace.
If you're using BigQuery catalog federation, use the following instead:
spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME LOCATION 'gs://BUCKET_NAME/NAMESPACE_NAME' WITH DBPROPERTIES ('gcp-region' = 'LOCATION');") spark.sql("USE NAMESPACE_NAME;")
Replace the following:
NAMESPACE_NAME: a name for your namespace.BUCKET_NAME: the Cloud Storage bucket that you're using with your catalog.LOCATION: a BigQuery location. The default value is theUSmulti-region.
Trino
CREATE SCHEMA IF NOT EXISTS CATALOG_NAME.SCHEMA_NAME; USE CATALOG_NAME.SCHEMA_NAME;
Replace the following:
CATALOG_NAME: the name of your Trino catalog using the Iceberg REST catalog.SCHEMA_NAME: a name for your schema.
Create a table
Spark
spark.sql("CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;") spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()
Replace the following:
NAMESPACE_NAME: the name of your namespaceTABLE_NAME: a name for your table
Trino
CREATE TABLE TABLE_NAME (id int, data varchar); DESCRIBE TABLE_NAME;
Replace TABLE_NAME with a name for
your table.
List tables
Spark
spark.sql("SHOW TABLES").show()
Trino
SHOW TABLES;
Insert data into the table
The following example inserts sample data into the table:
Spark
spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\"), (2, \"second row\"), (3, \"third row\");")
Trino
INSERT INTO TABLE_NAME VALUES (1, 'first row'), (2, 'second row'), (3, 'third row');
Query a table
The following example selects all data from the table:
Spark
spark.sql("SELECT * FROM TABLE_NAME;").show()
Trino
SELECT * FROM TABLE_NAME;
BigQuery
SELECT * FROM `WAREHOUSE_NAME>NAMESPACE_OR_SCHEMA_NAME.TABLE_NAME`;
Replace the following:
WAREHOUSE_NAME: The name of the Cloud Storage bucket for your Iceberg REST catalog. For example, if your URI isgs://iceberg_bucket, useiceberg_bucket.NAMESPACE_OR_SCHEMA_NAME: The table namespace if using Spark or table schema name if using Trino.
Alter a table
The following example adds a column to the table:
Spark
spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS ( desc string);") spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()
Trino
ALTER TABLE TABLE_NAME ADD COLUMN desc varchar; DESCRIBE SCHEMA_NAME.TABLE_NAME;
Delete a table
Spark
spark.sql("DROP TABLE TABLE_NAME;")
Trino
DROP TABLE TABLE_NAME;
Pricing
For pricing details, see BigLake pricing.
What's next
- Learn how to manage catalogs in the Google Cloud console.