[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-17。"],[],[],null,["# Additional BigLake metastore features\n=====================================\n\nTo customize your BigLake metastore configuration, you can use the following\nadditional features:\n\n- Apache Spark Iceberg procedures\n- The filter option for unsupported tables\n- BigQuery connection overrides\n- Access control policies for BigLake metastore Iceberg tables\n\nUse Iceberg Spark procedures\n----------------------------\n\nTo use [Iceberg Spark procedures](https://iceberg.apache.org/docs/1.5.1/spark-procedures/),\nyou must include [Iceberg SQL extensions](https://iceberg.apache.org/docs/1.5.1/spark-configuration/#sql-extensions)\nin your Spark configuration. For example, you can\ncreate a procedure to roll back to a previous state.\n\n#### Use interactive Spark-SQL to roll back to a previous state\n\nYou can use an Iceberg Spark\nprocedure to create, modify, and roll back a table to its previous state. For\nexample:\n\n1. Create a Spark table:\n\n ```bash\n spark-sql \\\n --jars https://storage-download.googleapis.com/maven-central/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1/iceberg-spark-runtime-3.5_2.12-1.6.1.jar,gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar \\\n --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\\n --conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.catalog-impl=org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.gcp_project=PROJECT_ID \\\n --conf spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_DIRECTORY\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the catalog name that references your Spark table.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project.\n\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the\n Cloud Storage folder where your data warehouse is stored.\n\n ```googlesql\n USE `\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e`;\n CREATE NAMESPACE NAMESPACE_NAME;\n USE NAMESPACE NAMESPACE_NAME;\n CREATE TABLE NAMESPACE_NAME.TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e';\n INSERT INTO NAMESPACE_NAME.TABLE_NAME VALUES (1, \"first row\");\n DESCRIBE EXTENDED TABLE_NAME;\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e: the namespace name that references your Spark table.\n - \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e: a table name that references your Spark table.\n\n The output contains details about the table configuration: \n\n ```bash\n ...\n Table Properties [current-snapshot-id=1659239298328512231,format=iceberg/parquet,format-version=2,write.parquet.compression-codec=zstd]\n ...\n ```\n2. Alter the table again, and then roll it back to the previously created\n snapshot `1659239298328512231`:\n\n ```bash\n ALTER TABLE TABLE_NAME ADD COLUMNS (newDoubleCol double);\n INSERT INTO TABLE_NAME VALUES (2, \"second row\", 2.5);\n SELECT * FROM TABLE_NAME;\n CALL CATALOG_NAME.system.set_current_snapshot('\u003cvar translate=\"no\"\u003eNAMESPACE_NAME\u003c/var\u003e.\u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e', SNAPSHOT_ID);\n SELECT * FROM TABLE_NAME;\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eSNAPSHOT_ID\u003c/var\u003e: the ID of the snapshot you are rolling back to.\n\n The output is similar to the following: \n\n ```bash\n 1 first row\n Time taken: 0.997 seconds, Fetched 1 row(s)\n ```\n\nFilter unsupported tables from table listing functions\n------------------------------------------------------\n\nWhen you use Spark SQL with the BigLake metastore\ncatalog, the `SHOW TABLES` command shows all the tables in the specified\nnamespace, even those that aren't compatible with\nSpark.\n\nTo only display supported tables, turn on the `filter_unsupported_tables`\noption: \n\n```bash\nspark-sql\n --jars https://storage-download.googleapis.com/maven-central/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1/iceberg-spark-runtime-3.5_2.12-1.6.1.jar,gs://spark-lib/bigquery/iceberg-bigquery-catalog-1.6.1-1.0.1-beta.jar \\\n --conf spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.catalog-impl=org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog \\\n --conf spark.sql.catalog.CATALOG_NAME.gcp_project=PROJECT_ID \\\n --conf spark.sql.catalog.CATALOG_NAME.gcp_location=LOCATION \\\n --conf spark.sql.catalog.CATALOG_NAME.warehouse=WAREHOUSE_DIRECTORY \\\n --conf spark.sql.catalog.CATALOG_NAME.filter_unsupported_tables=\"true\"\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e: the name of the Spark catalog to use.\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project to use.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the BigQuery resources.\n- \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the Cloud Storage folder to use as the data warehouse.\n\nSet a BigQuery connection override\n----------------------------------\n\nYou can use BigQuery connections to access data stored\noutside of BigQuery, such as in Cloud Storage.\n\nTo set a [BigQuery connection\noverride](/bigquery/docs/create-cloud-resource-connection#create-cloud-resource-connection) that provides access to a Cloud Storage bucket, complete\nthe following steps:\n\n1. In your BigQuery project, create a new connection to your\n Cloud Storage resource. This connection defines how\n BigQuery accesses your data.\n\n2. Grant the user or service account accessing the data the\n `roles/bigquery.connectionUser` role on the connection.\n\n Make sure that the connection resource shares the same location as the\n target resources in BigQuery. For more information, see\n [Manage connections](/bigquery/docs/working-with-connections).\n3. Specify the connection in your Iceberg table with the\n `bq_connection` property:\n\n ```googlesql\n CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG LOCATION '\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e' TBLPROPERTIES ('bq_connection'='projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/connections/\u003cvar translate=\"no\"\u003eCONNECTION_ID\u003c/var\u003e');\n ```\n\n Replace the following:\n - \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e: a table name for your Spark table.\n - \u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e: the URI of the Cloud Storage bucket that stores your data.\n - \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the ID of the Google Cloud project to use.\n - \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the [location](/bigquery/docs/locations) of the connection.\n - \u003cvar translate=\"no\"\u003eCONNECTION_ID\u003c/var\u003e: the ID of the connection.\n\nSet access control policies\n---------------------------\n\nYou can enable fine-grained access control (FGAC) on BigLake metastore\nIceberg tables by configuring access control policies. You\ncan only set access control policies on tables that use a\n[BigQuery connection\noverride](/bigquery/docs/blms-features#connection-override). You can set these\npolicies in the following ways:\n\n- [Column-level security](/bigquery/docs/column-level-security)\n- [Row-level security](/bigquery/docs/managing-row-level-security)\n- [Data masking](/bigquery/docs/column-data-masking)\n\nAfter you configure your FGAC policies, you can query the table from\nSpark using the following example: \n\n```python\nfrom pyspark.sql import SparkSession\n\n# Create a Spark session\nspark = SparkSession.builder \\\n.appName(\"BigLake Metastore Iceberg\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e\", \"org.apache.iceberg.spark.SparkCatalog\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.catalog-impl\", \"org.apache.iceberg.gcp.bigquery.BigQueryMetastoreCatalog\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_project\", \"\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\") \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.gcp_location\", \"\u003cvar translate=\"no\"\u003eLOCATION\u003cvar translate=\"no\"\u003e\"\u003c/var\u003e\u003c/var\u003e) \\\n.config(\"spark.sql.catalog.\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e.warehouse\", \"\u003cvar translate=\"no\"\u003eWAREHOUSE_DIRECTORY\u003c/var\u003e\") \\\n.getOrCreate()\n\nspark.sql(\"USE `\u003cvar translate=\"no\"\u003eCATALOG_NAME\u003c/var\u003e`;\")\n\n# Configure spark for storing temp results\nspark.conf.set(\"viewsEnabled\",\"true\")\nspark.sql(\"CREATE namespace if not exists \u003cvar translate=\"no\"\u003eMATERIALIZATION_NAMESPACE\u003c/var\u003e\");\nspark.conf.set(\"materializationDataset\",\"\u003cvar translate=\"no\"\u003eMATERIALIZATION_NAMESPACE\u003c/var\u003e\")\n\nspark.sql(\"USE NAMESPACE \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e;\")\n\nsql = \"\"\"SELECT * FROM \u003cvar translate=\"no\"\u003eDATASET_NAME\u003c/var\u003e.\u003cvar translate=\"no\"\u003eICEBERG_TABLE_NAME\u003c/var\u003e\"\"\"\ndf = spark.read.format(\"bigquery\").load(sql)\ndf.show()\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eCATALOG_NAME\u003c/code\u003e\u003c/var\u003e: the name of your catalog.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003ePROJECT_ID\u003c/code\u003e\u003c/var\u003e: the ID of the project that contains your BigQuery resources.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eLOCATION\u003c/code\u003e\u003c/var\u003e: the [location](/bigquery/docs/locations) of the BigQuery resources.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eWAREHOUSE_DIRECTORY\u003c/code\u003e\u003c/var\u003e: the URI of the Cloud Storage folder that contains your data warehouse.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eMATERIALIZATION_NAMESPACE\u003c/code\u003e\u003c/var\u003e: the namespace where you want to store temporary results.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eDATASET_NAME\u003c/code\u003e\u003c/var\u003e: the name of your dataset that contains the table that you are querying.\n- \u003cvar translate=\"no\"\u003e\u003ccode translate=\"no\" dir=\"ltr\"\u003eICEBERG_TABLE_NAME\u003c/code\u003e\u003c/var\u003e: the name the table that you are querying.\n\nWhat's next\n-----------\n\n- [Use BigLake metastore with Dataproc](/bigquery/docs/blms-use-dataproc)\n- [Use BigLake metastore with Dataproc Serverless](/bigquery/docs/blms-use-dataproc-serverless)"]]