Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Saat Anda menggunakan layanan Dataproc untuk membuat cluster dan menjalankan
tugas di cluster, layanan ini akan menyiapkan
peran dan izin Dataproc yang diperlukan
di project Anda untuk mengakses dan menggunakan resource Google Cloud yang diperlukan untuk menyelesaikan
tugas ini. Namun, jika Anda melakukan pekerjaan lintas project, misalnya untuk mengakses data di project lain, Anda harus menyiapkan peran dan izin yang diperlukan untuk mengakses resource lintas project.
Untuk membantu Anda melakukan pekerjaan lintas project dengan sukses, dokumen ini mencantumkan berbagai akun utama yang menggunakan layanan Dataproc dan peran yang berisi izin yang diperlukan agar akun utama tersebut dapat mengakses dan menggunakan resource Google Cloud .
Ada tiga akun utama (identitas) yang mengakses dan menggunakan Dataproc:
Identitas Pengguna
Identitas Bidang Kontrol
Identitas Bidang Data
Pengguna Dataproc API (Identitas pengguna)
Contoh: username@example.com
Pengguna ini adalah pengguna yang memanggil layanan Dataproc untuk membuat
cluster, mengirimkan tugas, dan membuat permintaan lain ke layanan. Pengguna
biasanya adalah individu, tetapi juga dapat berupa
akun layanan
jika Dataproc dipanggil melalui klien API atau dari
Google Cloud layanan lain seperti Compute Engine, fungsi Cloud Run, atau Cloud Composer.
Tugas yang dikirimkan melalui Dataproc API dijalankan sebagai root di Linux.
Cluster Dataproc mewarisi metadata SSH Compute Engine di seluruh project, kecuali jika diblokir secara eksplisit dengan menyetel --metadata=block-project-ssh-keys=true saat Anda membuat cluster (lihat Metadata cluster).
Direktori pengguna HDFS dibuat untuk setiap pengguna SSH tingkat project. Direktori HDFS ini dibuat pada saat deployment cluster, dan pengguna SSH baru (setelah deployment) tidak diberi direktori HDFS di cluster yang ada.
Akun layanan Agen Layanan Dataproc digunakan untuk melakukan serangkaian operasi sistem yang luas pada resource yang berada di project tempat cluster Dataproc dibuat, termasuk:
Pembuatan resource Compute Engine, termasuk instance VM, grup instance, dan template instance
Operasi get dan list untuk mengonfirmasi konfigurasi
resource seperti image, firewall, tindakan inisialisasi Dataproc, dan bucket Cloud Storage
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-22 UTC."],[[["\u003cp\u003eDataproc sets up necessary roles and permissions for accessing Google Cloud resources within the same project, but cross-project access requires additional setup.\u003c/p\u003e\n"],["\u003cp\u003eThere are three primary identities (principals) that interact with the Dataproc service: User Identity, Control Plane Identity, and Data Plane Identity.\u003c/p\u003e\n"],["\u003cp\u003eThe User Identity (Dataproc API User) is the individual or service account initiating actions like cluster creation and job submission.\u003c/p\u003e\n"],["\u003cp\u003eThe Control Plane Identity (Dataproc Service Agent) handles system operations on resources within the project, including VM creation and bucket management.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Plane Identity (Dataproc VM Service Account) executes application code on Dataproc VMs, interacting with the control plane, staging buckets, and other Google Cloud resources as needed by jobs.\u003c/p\u003e\n"]]],[],null,["# Dataproc principals\n\nWhen you use the Dataproc service to create clusters and run\njobs on your clusters, the service sets up the necessary\n[Dataproc roles and permissions](/dataproc/docs/concepts/iam/iam)\nin your project to access and use the Google Cloud resources it needs to accomplish\nthese tasks. However, if you do cross-project work, for example to access data\nin another project, you will need to set up the necessary roles and permissions\nto access cross-project resources.\n\nTo help you do cross-project work successfully, this document lists the different\nprincipals that use the Dataproc service\nand the roles that contain the necessary permissions for those principals to access\nand use Google Cloud resources.\n\nThere are three principals (identities) that access and use the Dataproc:\n\n1. User Identity\n2. Control Plane Identity\n3. Data Plane Identity\n\nDataproc API User (User identity)\n---------------------------------\n\nExample: *username@example.com*\n\nThis is the user that calls the Dataproc service to create\nclusters, submit jobs, and make other requests to the service. The user\nis usually an individual, but it can also be a\n[service account](/iam/docs/understanding-service-accounts)\nif Dataproc is invoked through an API client or from another\nGoogle Cloud service such as Compute Engine, Cloud Run functions, or Cloud Composer.\n\n**Related roles**\n\n- [Dataproc roles](/dataproc/docs/concepts/iam/iam#roles), [Project roles](/dataproc/docs/concepts/iam/iam#project_roles)\n\n**Notes**\n\n- Dataproc API-submitted jobs run as `root` on Linux.\n- Dataproc clusters inherit project-wide Compute Engine\n SSH metadata unless explicitly blocked by setting\n `--metadata=block-project-ssh-keys=true` when you create your cluster\n (see\n [Cluster metadata](/dataproc/docs/concepts/configuring-clusters/metadata)).\n\n- HDFS user directories are created for each project-level SSH user. These\n HDFS directories are created at cluster deployment time, and a new (post-deployment)\n SSH user is not given an HDFS directory on existing clusters.\n\nDataproc Service Agent (Control Plane identity)\n-----------------------------------------------\n\nExample: *service-\u003cvar translate=\"no\"\u003eproject-number\u003c/var\u003e@dataproc-accounts.iam.gserviceaccount.com*\n\nThe\n[Dataproc Service Agent service account](/dataproc/docs/concepts/configuring-clusters/service-accounts#service_agent_account)\nis used to perform a broad set of system operations on resources located\nin the project where a Dataproc cluster is created, including:\n\n- Creation of Compute Engine resources, including VM instances, instance groups, and instance templates\n- `get` and `list` operations to confirm the configuration of resources such as images, firewalls, Dataproc initialization actions, and Cloud Storage buckets\n- Auto-creation of the Dataproc [staging and temp buckets](/dataproc/docs/concepts/configuring-clusters/staging-bucket) if the staging or temp bucket is not specified by the user\n- Writing cluster configuration metadata to the staging bucket\n- Accessing [VPC networks in a host project](/dataproc/docs/concepts/configuring-clusters/network#create_a_cluster_that_uses_a_network_in_another_project)\n\n**Related roles**\n\n- [Dataproc Service Agent](/iam/docs/understanding-roles#dataproc.serviceAgent)\n\nDataproc VM service account (Data Plane identity)\n-------------------------------------------------\n\nExample: *\u003cvar translate=\"no\"\u003eproject-number\u003c/var\u003e-compute@developer.gserviceaccount.com*\n\nYour application code runs as the\n[VM service account](/dataproc/docs/concepts/configuring-clusters/service-accounts#VM_service_account)\non Dataproc VMs. User jobs are granted the roles (with their\nassociated permissions) of this service account.\n\nThe VM service account does the following:\n\n- Communicates with the [Dataproc control plane](#service-agent).\n- Reads and writes data from and to the [Dataproc staging and temp buckets](/dataproc/docs/concepts/configuring-clusters/staging-bucket).\n- As needed by your Dataproc jobs, reads and writes data from and to Cloud Storage, BigQuery, Cloud Logging, and other Google Cloud resources.\n\n**Related roles**\n\n- [Dataproc Worker](/dataproc/docs/concepts/iam/iam#roles)\n- [Cloud Storage roles](/storage/docs/access-control/iam-roles#standard-roles)\n- [BigQuery roles](/bigquery/docs/access-control#bigquery)\n\n| **Note:** For interactive workloads, users can opt to use their user identity to access Cloud Storage objects in buckets owned by the same project that contains the cluster (see [Dataproc Personal Cluster Authentication](/dataproc/docs/concepts/iam/personal-auth)).\n\nWhat's next\n-----------\n\n- Learn more about [Dataproc roles and permissions](/dataproc/docs/concepts/iam/iam).\n- Learn more about [Dataproc service accounts](/dataproc/docs/concepts/configuring-clusters/service-accounts).\n- See [BigQuery Access Control](/bigquery/docs/access-control).\n- See [Cloud Storage Access Control options](/storage/docs/access-control)."]]