Use BigLake metastore with the Iceberg REST catalog
The managed Apache Iceberg REST catalog in BigLake metastore creates interoperability between all your query engines by offering a single source of truth for all your Iceberg data. It lets query engines, such as Apache Spark, discover, read metadata from, and manage Iceberg tables in a consistent way.
The Iceberg tables that you use with the Iceberg REST catalog are called BigLake tables for Apache Iceberg (preview). These are Iceberg tables that you create from open-source engines and store in Cloud Storage. They can be read by open-source engines or BigQuery. Writes are only supported from open-source engines. In this document, we refer to these tables as BigLake Iceberg tables.
Before you begin
-
Make sure that billing is enabled for your Google Cloud project.
Learn how to check if billing is enabled on a project. -
Enable the BigLake API.
- Optional: Understand how BigLake metastore works and why you should use it.
Required roles
To get the permissions that you need to use the Iceberg REST catalog in BigLake metastore, ask your administrator to grant you the following IAM roles on your project or service account:
-
BigLake Admin (
roles/biglake.admin
) -
Storage Admin (
roles/storage.admin
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Configure a query engine with the Iceberg REST catalog
import google.auth from google.auth.transport.requests import Request from google.oauth2 import service_account import pyspark from pyspark.context import SparkContext from pyspark.sql import SparkSession catalog_name = "CATALOG_NAME" spark = SparkSession.builder.appName("APP_NAME") \ .config("spark.sql.defaultCatalog", catalog_name) \ .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog") \ .config(f"spark.sql.catalog.{catalog_name}.type", "rest") \ .config(f"spark.sql.catalog.{catalog_name}.uri","https://biglake.googleapis.com/iceberg/v1beta/restcatalog") \ .config(f"spark.sql.catalog.{catalog_name}.warehouse", "gs://STORAGE_URI") \ .config(f"spark.sql.catalog.{catalog_name}.token", "TOKEN") \ .config(f"spark.sql.catalog.{catalog_name}.oauth2-server-uri", "https://oauth2.googleapis.com/token") \ .config(f"spark.sql.catalog.{catalog_name}.header.x-goog-user-project", "PROJECT_ID") \ .config("spark.sql.extensions","org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \ .config(f"spark.sql.catalog.{catalog_name}.io-impl","org.apache.iceberg.hadoop.HadoopFileIO") \ .config(f"spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled", "false") \ .getOrCreate()
Replace the following:
CATALOG_NAME
: a name for your Iceberg REST catalog.APP_NAME
: a name for your Spark session.STORAGE_URI
: the URI of the Cloud Storage bucket that contains your BigLake Iceberg tables.TOKEN
: your authentication token, which is valid for one hour, for example,gcloud auth application-default print-access-token
.PROJECT_ID
: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.
Create a namespace
spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;") spark.sql("USE NAMESPACE_NAME;")
Replace NAMESPACE_NAME
with a name for your namespace.
Create a table
spark.sql("CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;") spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()
Replace the following:
NAMESPACE_NAME
: a name for your namespaceTABLE_NAME
: a name for your table
List tables
spark.sql("SHOW TABLES").show()
Insert data into the table
The following example inserts sample data into the table:
spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\"), (2, \"second row\"), (3, \"third row\");")
Query a table
The following example selects all data from the table:
spark.sql("SELECT * FROM TABLE_NAME;").show()
Alter a table schema
The following example adds a column to the table:
spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS ( desc string);") spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()
Delete a table
The following example deletes the table from the given namespace:
spark.sql("DROP TABLE TABLE_NAME;")
Pricing
For pricing details, see BigLake pricing.
What's next
- Learn more about managing Iceberg resources with BigLake metastore.
- Learn about additional BigLake metastore features.