Use BigLake metastore with the Iceberg REST catalog

The managed Apache Iceberg REST catalog in BigLake metastore creates interoperability between all your query engines by offering a single source of truth for all your Iceberg data. It lets query engines, such as Apache Spark, discover, read metadata from, and manage Iceberg tables in a consistent way.

The Iceberg tables that you use with the Iceberg REST catalog are called BigLake tables for Apache Iceberg (preview). These are Iceberg tables that you create from open source engines and store in Cloud Storage. They can be read by open source engines or BigQuery. Writes are only supported from open source engines. In this document, we refer to these tables as BigLake Iceberg tables.

Before you begin

Verify that billing is enabled for your Google Cloud project.
Learn how to check if billing is enabled on a project.
Enable the BigLake API.
Enable the API
Optional: Understand how BigLake metastore works and why you should use it.

Required roles

To get the permissions that you need to use the Iceberg REST catalog in BigLake metastore, ask your administrator to grant you the following IAM roles on your project or service account:

BigLake Admin (roles/biglake.admin)
Storage Admin (roles/storage.admin)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Limitations

The Iceberg REST catalog is subject to the following limitations:

Multi-region buckets, dual-region buckets, and buckets with custom region placement aren't supported.

Configure a query engine with the Iceberg REST catalog

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
    .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
    .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
    .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1beta/restcatalog') \
    .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://STORAGE_URI') \
    .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
    .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \
    .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.hadoop.HadoopFileIO') \
    .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .getOrCreate()

Replace the following:

CATALOG_NAME: a name for your Iceberg REST catalog.
APP_NAME: a name for your Spark session.
STORAGE_URI: the URI of the Cloud Storage bucket that contains your BigLake Iceberg tables.
PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.

Create a namespace

spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;")

spark.sql("USE NAMESPACE_NAME;")

Replace NAMESPACE_NAME with a name for your namespace.

Create a table

spark.sql("CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;")

spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Replace the following:

NAMESPACE_NAME: the name of your namespace
TABLE_NAME: a name for your table

List tables

spark.sql("SHOW TABLES").show()

Insert data into the table

The following example inserts sample data into the table:

spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\"), (2, \"second row\"), (3, \"third row\");")

Query a table

The following example selects all data from the table:

spark.sql("SELECT * FROM TABLE_NAME;").show()

The following example queries the same table from BigQuery:

SELECT * FROM `CLOUD_STORAGE_BUCKET_NAME>NAMESPACE_NAME.TABLE_NAME`;

Replace CLOUD_STORAGE_BUCKET_NAME with the name of the Cloud Storage bucket for your Iceberg REST catalog. For example, if your URI is gs://iceberg_bucket, use iceberg_bucket.

Alter a table schema

The following example adds a column to the table:

spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS ( desc string);")
spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Delete a table

The following example deletes the table from the given namespace:

spark.sql("DROP TABLE TABLE_NAME;")

Pricing

For pricing details, see BigLake pricing.

What's next

Learn more about managing Iceberg resources with BigLake metastore.
Learn about additional BigLake metastore features.