Use BigLake metastore with the Iceberg REST catalog

The managed Apache Iceberg REST catalog in BigLake metastore creates interoperability between all your query engines by offering a single source of truth for all your Iceberg data. It lets query engines, such as Apache Spark, discover, read metadata from, and manage Iceberg tables in a consistent way.

The Iceberg tables that you use with the Iceberg REST catalog are called BigLake tables for Apache Iceberg (preview). These are Iceberg tables that you create from open-source engines and store in Cloud Storage. They can be read by open-source engines or BigQuery. Writes are only supported from open-source engines. In this document, we refer to these tables as BigLake Iceberg tables.

Before you begin

  1. Make sure that billing is enabled for your Google Cloud project.

    Learn how to check if billing is enabled on a project.
  2. Enable the BigLake API.

    Enable the API

  3. Optional: Understand how BigLake metastore works and why you should use it.

Required roles

To get the permissions that you need to use the Iceberg REST catalog in BigLake metastore, ask your administrator to grant you the following IAM roles on your project or service account:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Configure a query engine with the Iceberg REST catalog

import google.auth
from google.auth.transport.requests import Request
from google.oauth2 import service_account
import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
    .config("spark.sql.defaultCatalog", catalog_name) \
    .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog") \
    .config(f"spark.sql.catalog.{catalog_name}.type", "rest") \
    .config(f"spark.sql.catalog.{catalog_name}.uri","https://biglake.googleapis.com/iceberg/v1beta/restcatalog") \
    .config(f"spark.sql.catalog.{catalog_name}.warehouse", "gs://STORAGE_URI") \
    .config(f"spark.sql.catalog.{catalog_name}.token", "TOKEN") \
    .config(f"spark.sql.catalog.{catalog_name}.oauth2-server-uri", "https://oauth2.googleapis.com/token") \
    .config(f"spark.sql.catalog.{catalog_name}.header.x-goog-user-project", "PROJECT_ID") \
    .config("spark.sql.extensions","org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config(f"spark.sql.catalog.{catalog_name}.io-impl","org.apache.iceberg.hadoop.HadoopFileIO") \
    .config(f"spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled", "false") \
    .getOrCreate()

Replace the following:

  • CATALOG_NAME: a name for your Iceberg REST catalog.
  • APP_NAME: a name for your Spark session.
  • STORAGE_URI: the URI of the Cloud Storage bucket that contains your BigLake Iceberg tables.
  • TOKEN: your authentication token, which is valid for one hour, for example, gcloud auth application-default print-access-token.
  • PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.

Create a namespace

spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;")

spark.sql("USE NAMESPACE_NAME;")

Replace NAMESPACE_NAME with a name for your namespace.

Create a table

spark.sql("CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;")

spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Replace the following:

  • NAMESPACE_NAME: a name for your namespace
  • TABLE_NAME: a name for your table

List tables

spark.sql("SHOW TABLES").show()

Insert data into the table

The following example inserts sample data into the table:

spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\"), (2, \"second row\"), (3, \"third row\");")

Query a table

The following example selects all data from the table:

spark.sql("SELECT * FROM TABLE_NAME;").show()

Alter a table schema

The following example adds a column to the table:

spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS ( desc string);")
spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Delete a table

The following example deletes the table from the given namespace:

spark.sql("DROP TABLE TABLE_NAME;")

Pricing

For pricing details, see BigLake pricing.

What's next