[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[],[],null,["# Use BigLake metastore with Spark stored procedures\n==================================================\n\nThis document explains how to use [Apache Spark stored procedures](/bigquery/docs/spark-procedures) with\nBigLake metastore.\n\nBefore you begin\n----------------\n\n1. Enable billing for your Google Cloud project. Learn how to [check if billing is enabled on a project](/billing/docs/how-to/verify-billing-enabled).\n2. Enable the BigQuery and Dataflow APIs.\n\n [Enable the APIs](/billing/docs/how-to/verify-billing-enabled)\n3. Optional: Learn more about the following:\n\n - Understand how [BigLake metastore works](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,dataproc.googleapis.com) and why you should use it.\n - Learn how [BigQuery Spark stored procedures work](/bigquery/docs/spark-procedures) and complete the before you begin tasks.\n\n### Required roles\n\nTo use Spark stored procedures, review the required roles for [stored procedures](/bigquery/docs/spark-procedures#required_roles)\nand grant the necessary roles.\n\n\nTo get the permissions that\nyou need to use Spark and stored procedures with BigLake metastore as a metadata store,\n\nask your administrator to grant you the\nfollowing IAM roles:\n\n- Create BigLake metastore tables in Spark:\n - [BigQuery Data Editor](/iam/docs/roles-permissions/bigquery#bigquery.dataEditor) (`roles/bigquery.dataEditor`) on the Spark Connection service account in the project\n - [Storage Object Admin](/iam/docs/roles-permissions/storage#storage.objectAdmin) (`roles/storage.objectAdmin`) on the Spark Connection service account in the project\n- Query BigLake metastore tables in BigQuery:\n - [BigQuery Data Viewer](/iam/docs/roles-permissions/bigquery#bigquery.dataViewer) (`roles/bigquery.dataViewer`) on the project\n - [BigQuery User](/iam/docs/roles-permissions/bigquery#bigquery.user) (`roles/bigquery.user`) on the project\n - [Storage Object Viewer](/iam/docs/roles-permissions/storage#storage.objectViewer) (`roles/storage.objectViewer`) on the project\n\n\nFor more information about granting roles, see [Manage access to projects, folders, and organizations](/iam/docs/granting-changing-revoking-access).\n\n\nYou might also be able to get\nthe required permissions through [custom\nroles](/iam/docs/creating-custom-roles) or other [predefined\nroles](/iam/docs/roles-overview#predefined).\n\nCreate and run a stored procedure\n---------------------------------\n\nThe following example shows you how to create and run a stored procedure with\nBigLake metastore.\n\n1. Go to the **BigQuery** page.\n\n [Go to BigQuery](https://console.cloud.google.com/bigquery)\n2. In the query editor, add the following sample code for the [`CREATE PROCEDURE`\n statement](/bigquery/docs/reference/standard-sql/data-definition-language#create_procedure).\n\n ```googlesql\n CREATE OR REPLACE PROCEDURE\n `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e`()\n WITH CONNECTION `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e.\u003cvar translate=\"no\"\u003eSPARK_CONNECTION_ID\u003c/var\u003e` OPTIONS (engine='SPARK',\n runtime_version='1.1',\n properties=[(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.warehouse\",\n \"\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_location\",\n \"\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_project\",\n \"\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e\",\n \"org.apache.iceberg.spark.SparkCatalog\"),\n (\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.catalog-impl\",\n \"org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog\"),\n (\"spark.jars.packages\",\n \"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1\")],\n jar_uris=[\"gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar\"])\n LANGUAGE python AS R\"\"\"\n from pyspark.sql import SparkSession\n spark = SparkSession \\\n .builder \\\n .appName(\"BigLake Metastore Iceberg\") \\\n .getOrCreate()\n spark.sql(\"USE CATALOG_NAME;\")\n spark.sql(\"CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;\")\n spark.sql(\"USE NAMESPACE_NAME;\")\n spark.sql(\"CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e'\")\n spark.sql(\"DESCRIBE TABLE_NAME;\")\n spark.sql(\"INSERT INTO TABLE_NAME VALUES (1, \\\"first row\\\");\")\n spark.sql(\"SELECT * from \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e;\")\n spark.sql(\"ALTER TABLE \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e ADD COLUMNS (newDoubleCol double);\")\n spark.sql(\"DESCRIBE \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e;\")\n \"\"\";\n CALL `\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e.\u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e`();\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of your Google Cloud project.\n - \u003cvar translate=\"no\"\u003eBQ_DATASET_ID\u003c/var\u003e: the ID of the dataset in BigQuery that contains the procedure.\n - \u003cvar translate=\"no\"\u003ePROCEDURE_NAME\u003c/var\u003e: the name of the procedure that you're creating or replacing.\n - \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: the location of your Spark connection.\n - \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of your BigQuery resources.\n - \u003cvar translate=\"no\"\u003eSPARK_CONNECTION_ID\u003c/var\u003e: the ID of your Spark connection.\n - \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the name of the catalog that you're using.\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the Cloud Storage folder that contains your data warehouse.\n - \u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e: the namespace that you're using.\n\nWhat's next\n-----------\n\n- Set up [optional BigLake metastore features](/bigquery/docs/blms-features).\n- [View and query tables from Spark in the BigQuery console](/bigquery/docs/blms-query-tables)."]]