To use Spark stored procedures, review the required roles for stored procedures
and grant the necessary roles.
To get the permissions that
you need to use Spark and stored procedures with BigLake metastore as a metadata store,
ask your administrator to grant you the
following IAM roles:
Create BigLake metastore tables in Spark:
BigQuery Data Editor (roles/bigquery.dataEditor)
on the Spark Connection service account in the project
Storage Object Admin (roles/storage.objectAdmin)
on the Spark Connection service account in the project
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[],[],null,["# Use BigLake metastore with Spark stored procedures\n==================================================\n\nThis document explains how to use [Apache Spark stored procedures](/bigquery/docs/spark-procedures) with\nBigLake metastore.\n\nBefore you begin\n----------------\n\n1. Enable billing for your Google Cloud project. Learn how to [check if billing is enabled on a project](/billing/docs/how-to/verify-billing-enabled).\n2. Enable the BigQuery and Dataflow APIs.\n\n [Enable the APIs](/billing/docs/how-to/verify-billing-enabled)\n3. Optional: Learn more about the following:\n\n - Understand how [BigLake metastore works](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,dataproc.googleapis.com) and why you should use it.\n - Learn how [BigQuery Spark stored procedures work](/bigquery/docs/spark-procedures) and complete the before you begin tasks.\n\n### Required roles\n\nTo use Spark stored procedures, review the required roles for [stored procedures](/bigquery/docs/spark-procedures#required_roles)\nand grant the necessary roles.\n\n\nTo get the permissions that\nyou need to use Spark and stored procedures with BigLake metastore as a metadata store,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- Create BigLake metastore tables in Spark:\n - [BigQuery Data Editor](/iam/docs/roles-permissions/bigquery#bigquery.dataEditor) (`roles/bigquery.dataEditor`) on the Spark Connection service account in the project\n - [Storage Object Admin](/iam/docs/roles-permissions/storage#storage.objectAdmin) (`roles/storage.objectAdmin`) on the Spark Connection service account in the project\n- Query BigLake metastore tables in BigQuery:\n - [BigQuery Data Viewer](/iam/docs/roles-permissions/bigquery#bigquery.dataViewer) (`roles/bigquery.dataViewer`) on the project\n - [BigQuery User](/iam/docs/roles-permissions/bigquery#bigquery.user) (`roles/bigquery.user`) on the project\n - [Storage Object Viewer](/iam/docs/roles-permissions/storage#storage.objectViewer) (`roles/storage.objectViewer`) on the project\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nCreate and run a stored procedure\n---------------------------------\n\nThe following example shows you how to create and run a stored procedure with\nBigLake metastore.\n\n1. Go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the query editor, add the following sample code for the [`CREATE PROCEDURE`\n statement](/bigquery/docs/reference/standard-sql/data-definition-language#create_procedure).\n\n ```googlesql\n CREATE OR REPLACE PROCEDURE\n `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e`()\n WITH CONNECTION `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e.\u003cvar translate=\"no\"\u003eSPARK_CONNECTION_ID\u003c/var\u003e` OPTIONS (engine='SPARK',\n runtime_version='1.1',\n properties=[(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.warehouse\",\n \"\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_location\",\n \"\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_project\",\n \"\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e\",\n \"org.apache.iceberg.spark.SparkCatalog\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.catalog-impl\",\n \"org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog\"),\n (\"spark.jars.packages\",\n \"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1\")],\n jar_uris=[\"gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar\"])\n LANGUAGE python AS R\"\"\"\n from pyspark.sql import SparkSession\n spark = SparkSession \\\n .builder \\\n .appName(\"BigLake Metastore Iceberg\") \\\n .getOrCreate()\n spark.sql(\"USE CATALOG_NAME;\")\n spark.sql(\"CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;\")\n spark.sql(\"USE NAMESPACE_NAME;\")\n spark.sql(\"CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e'\")\n spark.sql(\"DESCRIBE TABLE_NAME;\")\n spark.sql(\"INSERT INTO TABLE_NAME VALUES (1, \\\"first row\\\");\")\n spark.sql(\"SELECT * from \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e;\")\n spark.sql(\"ALTER TABLE \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e ADD COLUMNS (newDoubleCol double);\")\n spark.sql(\"DESCRIBE \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e;\")\n \"\"\";\n CALL `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e`();\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of your Google Cloud project.\n - \u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e: the ID of the dataset in BigQuery that contains the procedure.\n - \u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e: the name of the procedure that you're creating or replacing.\n - \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: the location of your Spark connection.\n - \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of your BigQuery resources.\n - \u003cvar translate=\"no\"\u003eSPARK_CONNECTION_ID\u003c/var\u003e: the ID of your Spark connection.\n - \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the name of the catalog that you're using.\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the Cloud Storage folder that contains your data warehouse.\n - \u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e: the namespace that you're using.\n\nWhat's next\n-----------\n\n- Set up [optional BigLake metastore features](/bigquery/docs/blms-features).\n- [View and query tables from Spark in the BigQuery console](/bigquery/docs/blms-query-tables)."]]