Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Halaman ini menunjukkan contoh penggunaan Spark SQL dengan layanan Dataproc Metastore. Dalam contoh ini, Anda akan meluncurkan sesi Spark SQL di cluster Dataproc
dan menjalankan beberapa perintah contoh untuk membuat database dan tabel.
Untuk mulai menggunakan Spark SQL, gunakan SSH untuk terhubung ke cluster Dataproc yang terkait dengan layanan Dataproc Metastore Anda. Setelah terhubung ke
cluster dengan SSH, Anda dapat menjalankan perintah Spark untuk mengelola metadata.
Untuk terhubung ke Spark SQL
Di Google Cloud konsol, buka halaman VM
Instances.
Di daftar instance virtual machine, klik SSH di baris instance VM Dataproc yang ingin Anda hubungkan.
Jendela browser terbuka di direktori beranda Anda di node dengan output yang mirip dengan berikut ini:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-02 UTC."],[[["\u003cp\u003eThis guide demonstrates how to use Spark SQL with a Dataproc Metastore service by launching a Spark SQL session on a Dataproc cluster.\u003c/p\u003e\n"],["\u003cp\u003eBefore starting, you must create a Dataproc Metastore service and attach it to a Dataproc cluster.\u003c/p\u003e\n"],["\u003cp\u003eTo initiate Spark SQL, connect via SSH to your Dataproc cluster, and then use \u003ccode\u003espark-shell\u003c/code\u003e to manage metadata.\u003c/p\u003e\n"],["\u003cp\u003eThe process involves creating a database and a table using Spark SQL commands, such as \u003ccode\u003ecreate database\u003c/code\u003e, \u003ccode\u003euse\u003c/code\u003e, and \u003ccode\u003ecreate table\u003c/code\u003e, and then viewing the content.\u003c/p\u003e\n"],["\u003cp\u003eNext steps include importing, exporting, and using Hive with your metadata.\u003c/p\u003e\n"]]],[],null,["# Use Spark SQL with Dataproc Metastore\n\nThis page shows you an example of using Spark SQL with a Dataproc Metastore\nservice. In this example, you launch a Spark SQL session on a Dataproc cluster\nand run some sample commands to create a database and table.\n\nBefore you begin\n----------------\n\n- Create a [Dataproc Metastore service](/dataproc-metastore/docs/create-service).\n- Attach the [Dataproc Metastore service to a Dataproc cluster](/dataproc-metastore/docs/attach-dataproc).\n\nConnect to Spark SQL\n--------------------\n\nTo start using Spark SQL, use SSH to connect to the Dataproc cluster that's\nassociated with your Dataproc Metastore service. After you connect to\nthe cluster with SSH, you can run Spark commands to manage your metadata.\n\n**To connect to Spark SQL**\n\n1. In the Google Cloud console, go to the [VM\n Instances](https://console.cloud.google.com/compute/instances) page.\n2. In the list of virtual machine instances, click **SSH** in the row of the Dataproc VM instance that you want to connect to.\n\nA browser window opens in your home directory on the node with an output similar\nto the following: \n\n Connected, host fingerprint: ssh-rsa ...\n Linux cluster-1-m 3.16.0-0.bpo.4-amd64 ...\n ...\n example-cluster@cluster-1-m:~$\n\nTo start Hive and create a database and table, run the following commands in the SSH session:\n\n1. Start the Spark shell.\n\n spark-shell\n\n2. Create a database called `myDB`.\n\n spark.sql(\"create database myDB\");\n\n3. Use the database you created.\n\n spark.sql(\"use myDB\");\n\n4. Create a table called `myTable`.\n\n spark.sql(\"create table bar(id int,name string)\");\n\n5. List the tables under `myDatabase`.\n\n spark.sql(\"show tables\").show();\n\n6. Show the table rows in the table you created.\n\n desc myTable;\n\nRunning these commands shows an output similar to the following: \n\n $spark-shell\n\n scala\u003espark.sql(\"create database myDB\");\n\n scala\u003espark.sql(\"use myTable\");\n\n scala\u003espark.sql(\"create table myTable(id int,name string)\");\n\n scala\u003espark.sql(\"show tables\").show();\n\n +--------+---------+-----------+\n |database|tableName|isTemporary|\n +--------+---------+-----------+\n | myDB| myTable| false|\n +--------+---------+-----------+\n +--------+---------+-------+\n |col_name|data_type|comment|\n +--------+---------+-------+\n | id| int| null|\n | name| string| null|\n +--------+---------+-------+\n\nWhat's next\n-----------\n\n- [Import metadata](/dataproc-metastore/docs/import-metadata)\n- [Export metadata](/dataproc-metastore/docs/export-metadata)\n- [Use Apache Hive](/dataproc-metastore/docs/use-hive)"]]