Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
É possível instalar componentes adicionais, como o Delta Lake, ao criar um cluster do Dataproc usando o recurso Componentes opcionais. Nesta página, você verá como instalar o componente Delta Lake em um cluster do Dataproc.
Quando instalado em um cluster do Dataproc, o componente do Delta Lake instala
bibliotecas do Delta Lake e configura o Spark e o Hive no cluster para trabalhar com o Delta Lake.
Versões de imagem do Dataproc compatíveis
É possível instalar o componente Delta Lake em clusters do Dataproc criados com a versão 2.2.46 da imagem do Dataproc e versões posteriores.
Quando você cria um cluster do Dataproc com o componente Delta Lake ativado, as seguintes propriedades do Spark são configuradas para trabalhar com o Delta Lake.
Arquivo de configuração
Propriedade
Valor padrão
/etc/spark/conf/spark-defaults.conf
spark.sql.extensions
io.delta.sql.DeltaSparkSessionExtension
/etc/spark/conf/spark-defaults.conf
spark.sql.catalog.spark_catalog
org.apache.spark.sql.delta.catalog.DeltaCatalog
Instalar o componente
Instale o componente ao criar um cluster do Dataproc usando o console Google Cloud , a Google Cloud CLI ou a API Dataproc.
Console
No Google Cloud console, acesse a página
Criar um cluster do Dataproc.
Esta seção fornece exemplos de leitura e gravação de dados usando tabelas do Delta Lake.
Tabela do Delta Lake
Gravar em uma tabela do Delta Lake
É possível usar o DataFrame do Spark
para gravar dados em uma tabela do Delta Lake. Os exemplos a seguir criam um DataFrame
com dados de amostra, uma tabela my_delta_table do Delta Lake no
Cloud Storage e gravam os dados na tabela do Delta Lake.
PySpark
# Create a DataFrame with sample data.data=spark.createDataFrame([(1,"Alice"),(2,"Bob")],["id","name"])# Create a Delta Lake table in Cloud Storage.spark.sql("""CREATE TABLE IF NOT EXISTS my_delta_table ( id integer, name string)USING deltaLOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table'""")# Write the DataFrame to the Delta Lake table in Cloud Storage.data.writeTo("my_delta_table").append()
Scala
// Create a DataFrame with sample data.valdata=Seq((1,"Alice"),(2,"Bob")).toDF("id","name")// Create a Delta Lake table in Cloud Storage.spark.sql("""CREATE TABLE IF NOT EXISTS my_delta_table ( id integer, name string)USING deltaLOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table'""")// Write the DataFrame to the Delta Lake table in Cloud Storage.data.write.format("delta").mode("append").saveAsTable("my_delta_table")
A classe io.delta.hive.DeltaStorageHandler implementa as APIs de fonte de dados do Hive. Ele pode carregar uma tabela Delta e extrair os metadados dela. Se o esquema da tabela na instrução CREATE TABLE não for consistente com os metadados do Delta Lake, um erro será gerado.
Ler de uma tabela do Delta Lake no Hive.
Para ler dados de uma tabela Delta, use uma instrução SELECT:
SELECT*FROMdeltaTable;
Exclui uma tabela do Delta Lake.
Para descartar uma tabela Delta, use a instrução DROP TABLE:
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-08-22 UTC."],[],[],null,["# Dataproc optional Delta Lake component\n\nYou can install additional components like [Delta Lake](https://delta.io/) when\nyou create a Dataproc cluster using the\n[Optional components](/dataproc/docs/concepts/components/overview#available_optional_components)\nfeature. This page describes how you can optionally install the Delta Lake component\non a Dataproc cluster.\n\nWhen installed on a Dataproc cluster, the Delta Lake component installs\nDelta Lake libraries and configures Spark and Hive in the cluster to work with Delta Lake.\n\nCompatible Dataproc image versions\n----------------------------------\n\nYou can install the Delta Lake component on Dataproc clusters created with\nDataproc image version [2.2.46](/dataproc/docs/concepts/versioning/dataproc-release-2.2)\nand later image versions.\n\nSee\n[Supported Dataproc versions](/dataproc/docs/concepts/versioning/dataproc-versions#supported_cloud_dataproc_versions)\nfor the Delta Lake component version included in Dataproc image releases.\n\nDelta Lake related properties\n-----------------------------\n\nWhen you create a Dataproc cluster with the Delta Lake component enabled,\nthe following Spark properties are configured to work with Delta Lake.\n\nInstall the component\n---------------------\n\nInstall the component when you create a Dataproc cluster using\nthe Google Cloud console, Google Cloud CLI, or the Dataproc API. \n\n### Console\n\n1. In the Google Cloud console, go to the Dataproc **Create a cluster** page.\n\n [Go to Create a cluster](https://console.cloud.google.com/dataproc/clustersAdd)\n\n The **Set up cluster** panel is selected.\n2. In the **Components** section, under **Optional components** , select **Delta Lake** and other optional components to install on your cluster.\n\n### gcloud CLI\n\nTo create a Dataproc cluster that includes the Delta Lake component,\nuse the\n[gcloud dataproc clusters create](/sdk/gcloud/reference/dataproc/clusters/create)\ncommand with the `--optional-components` flag. \n\n```\ngcloud dataproc clusters create CLUSTER_NAME \\\n --optional-components=DELTA \\\n --region=REGION \\\n ... other flags\n```\n\nNotes:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: Specify the name of the cluster.\n- \u003cvar translate=\"no\"\u003eREGION\u003c/var\u003e: Specify a [Compute Engine region](/compute/docs/regions-zones#available) where the cluster will be located.\n\n### REST API\n\nThe Delta Lake component can be specified through the Dataproc API using the\n[SoftwareConfig.Component](/dataproc/docs/reference/rest/v1/ClusterConfig#Component)\nas part of a\n[clusters.create](/dataproc/docs/reference/rest/v1/projects.regions.clusters/create)\nrequest.\n\nUsage examples\n--------------\n\nThis section provides data read and write examples using Delta Lake tables. \n\n### Delta Lake table\n\n### Write to a Delta Lake table\n\nYou can use the [Spark DataFrame](https://spark.apache.org/docs/latest/sql-programming-guide.html)\nto write data to a Delta Lake table. The following examples create a `DataFrame`\nwith sample data, create a `my_delta_table` Delta Lake table In\nCloud Storage, and then write the data to the Delta Lake table.\n\n#### PySpark\n\n # Create a DataFrame with sample data.\n data = spark.createDataFrame([(1, \"Alice\"), (2, \"Bob\")], [\"id\", \"name\"])\n\n # Create a Delta Lake table in Cloud Storage.\n spark.sql(\"\"\"CREATE TABLE IF NOT EXISTS my_delta_table (\n id integer,\n name string)\n USING delta\n LOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table'\"\"\")\n\n # Write the DataFrame to the Delta Lake table in Cloud Storage.\n data.writeTo(\"my_delta_table\").append()\n\n#### Scala\n\n // Create a DataFrame with sample data.\n val data = Seq((1, \"Alice\"), (2, \"Bob\")).toDF(\"id\", \"name\")\n\n // Create a Delta Lake table in Cloud Storage.\n spark.sql(\"\"\"CREATE TABLE IF NOT EXISTS my_delta_table (\n id integer,\n name string)\n USING delta\n LOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table'\"\"\")\n\n // Write the DataFrame to the Delta Lake table in Cloud Storage.\n data.write.format(\"delta\").mode(\"append\").saveAsTable(\"my_delta_table\")\n\n#### Spark SQL\n\n CREATE TABLE IF NOT EXISTS my_delta_table (\n id integer,\n name string)\n USING delta\n LOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table';\n\n INSERT INTO my_delta_table VALUES (\"1\", \"Alice\"), (\"2\", \"Bob\");\n\n### Read from a Delta Lake table\n\nThe following examples read the `my_delta_table` and display its contents.\n\n#### PySpark\n\n # Read the Delta Lake table into a DataFrame.\n df = spark.table(\"my_delta_table\")\n\n # Display the data.\n df.show()\n\n#### Scala\n\n // Read the Delta Lake table into a DataFrame.\n val df = spark.table(\"my_delta_table\")\n\n // Display the data.\n df.show()\n\n#### Spark SQL\n\n SELECT * FROM my_delta_table;\n\n### Hive with Delta Lake\n\n### Write to a Delta Table in Hive.\n\nThe Dataproc Delta Lake optional component is pre-configured to\nwork with Hive external tables.\n\nFor more information, see [Hive connector](https://github.com/delta-io/delta/tree/master/connectors/hive#hive-connector).\n\n#### Run the examples in a beeline client.\n\n beeline -u jdbc:hive2://\n\n#### Create a Spark Delta Lake table.\n\nThe Delta Lake table must be created using Spark before a Hive external table\ncan reference it. \n\n CREATE TABLE IF NOT EXISTS my_delta_table (\n id integer,\n name string)\n USING delta\n LOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table';\n\n INSERT INTO my_delta_table VALUES (\"1\", \"Alice\"), (\"2\", \"Bob\");\n\n#### Create a Hive external table.\n\n SET hive.input.format=io.delta.hive.HiveInputFormat;\n SET hive.tez.input.format=io.delta.hive.HiveInputFormat;\n\n CREATE EXTERNAL TABLE deltaTable(id INT, name STRING)\n STORED BY 'io.delta.hive.DeltaStorageHandler'\n LOCATION 'gs://delta-gcs-demo/example-prefix/default/my_delta_table';\n\nNotes:\n\n- The `io.delta.hive.DeltaStorageHandler` class implements the Hive data source APIs. It can load a Delta table and extract its metadata. If the table schema in the `CREATE TABLE` statement is not consistent with the underlying Delta Lake metadata, an error is thrown.\n\n### Read from a Delta Lake table in Hive.\n\nTo read data from a Delta table, use a `SELECT` statement: \n\n SELECT * FROM deltaTable;\n\n#### Drop a Delta Lake table.\n\nTo drop a Delta table, use the `DROP TABLE` statement: \n\n DROP TABLE deltaTable;\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e"]]