[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-04-29。"],[[["\u003cp\u003eDataproc sets up necessary roles and permissions for accessing Google Cloud resources within the same project, but cross-project access requires additional setup.\u003c/p\u003e\n"],["\u003cp\u003eThere are three primary identities (principals) that interact with the Dataproc service: User Identity, Control Plane Identity, and Data Plane Identity.\u003c/p\u003e\n"],["\u003cp\u003eThe User Identity (Dataproc API User) is the individual or service account initiating actions like cluster creation and job submission.\u003c/p\u003e\n"],["\u003cp\u003eThe Control Plane Identity (Dataproc Service Agent) handles system operations on resources within the project, including VM creation and bucket management.\u003c/p\u003e\n"],["\u003cp\u003eThe Data Plane Identity (Dataproc VM Service Account) executes application code on Dataproc VMs, interacting with the control plane, staging buckets, and other Google Cloud resources as needed by jobs.\u003c/p\u003e\n"]]],[],null,["# Dataproc principals\n\nWhen you use the Dataproc service to create clusters and run\njobs on your clusters, the service sets up the necessary\n[Dataproc roles and permissions](/dataproc/docs/concepts/iam/iam)\nin your project to access and use the Google Cloud resources it needs to accomplish\nthese tasks. However, if you do cross-project work, for example to access data\nin another project, you will need to set up the necessary roles and permissions\nto access cross-project resources.\n\nTo help you do cross-project work successfully, this document lists the different\nprincipals that use the Dataproc service\nand the roles that contain the necessary permissions for those principals to access\nand use Google Cloud resources.\n\nThere are three principals (identities) that access and use the Dataproc:\n\n1. User Identity\n2. Control Plane Identity\n3. Data Plane Identity\n\nDataproc API User (User identity)\n---------------------------------\n\nExample: *username@example.com*\n\nThis is the user that calls the Dataproc service to create\nclusters, submit jobs, and make other requests to the service. The user\nis usually an individual, but it can also be a\n[service account](/iam/docs/understanding-service-accounts)\nif Dataproc is invoked through an API client or from another\nGoogle Cloud service such as Compute Engine, Cloud Run functions, or Cloud Composer.\n\n**Related roles**\n\n- [Dataproc roles](/dataproc/docs/concepts/iam/iam#roles), [Project roles](/dataproc/docs/concepts/iam/iam#project_roles)\n\n**Notes**\n\n- Dataproc API-submitted jobs run as `root` on Linux.\n- Dataproc clusters inherit project-wide Compute Engine\n SSH metadata unless explicitly blocked by setting\n `--metadata=block-project-ssh-keys=true` when you create your cluster\n (see\n [Cluster metadata](/dataproc/docs/concepts/configuring-clusters/metadata)).\n\n- HDFS user directories are created for each project-level SSH user. These\n HDFS directories are created at cluster deployment time, and a new (post-deployment)\n SSH user is not given an HDFS directory on existing clusters.\n\nDataproc Service Agent (Control Plane identity)\n-----------------------------------------------\n\nExample: *service-\u003cvar translate=\"no\"\u003eproject-number\u003c/var\u003e@dataproc-accounts.iam.gserviceaccount.com*\n\nThe\n[Dataproc Service Agent service account](/dataproc/docs/concepts/configuring-clusters/service-accounts#service_agent_account)\nis used to perform a broad set of system operations on resources located\nin the project where a Dataproc cluster is created, including:\n\n- Creation of Compute Engine resources, including VM instances, instance groups, and instance templates\n- `get` and `list` operations to confirm the configuration of resources such as images, firewalls, Dataproc initialization actions, and Cloud Storage buckets\n- Auto-creation of the Dataproc [staging and temp buckets](/dataproc/docs/concepts/configuring-clusters/staging-bucket) if the staging or temp bucket is not specified by the user\n- Writing cluster configuration metadata to the staging bucket\n- Accessing [VPC networks in a host project](/dataproc/docs/concepts/configuring-clusters/network#create_a_cluster_that_uses_a_network_in_another_project)\n\n**Related roles**\n\n- [Dataproc Service Agent](/iam/docs/understanding-roles#dataproc.serviceAgent)\n\nDataproc VM service account (Data Plane identity)\n-------------------------------------------------\n\nExample: *\u003cvar translate=\"no\"\u003eproject-number\u003c/var\u003e-compute@developer.gserviceaccount.com*\n\nYour application code runs as the\n[VM service account](/dataproc/docs/concepts/configuring-clusters/service-accounts#VM_service_account)\non Dataproc VMs. User jobs are granted the roles (with their\nassociated permissions) of this service account.\n\nThe VM service account does the following:\n\n- Communicates with the [Dataproc control plane](#service-agent).\n- Reads and writes data from and to the [Dataproc staging and temp buckets](/dataproc/docs/concepts/configuring-clusters/staging-bucket).\n- As needed by your Dataproc jobs, reads and writes data from and to Cloud Storage, BigQuery, Cloud Logging, and other Google Cloud resources.\n\n**Related roles**\n\n- [Dataproc Worker](/dataproc/docs/concepts/iam/iam#roles)\n- [Cloud Storage roles](/storage/docs/access-control/iam-roles#standard-roles)\n- [BigQuery roles](/bigquery/docs/access-control#bigquery)\n\n| **Note:** For interactive workloads, users can opt to use their user identity to access Cloud Storage objects in buckets owned by the same project that contains the cluster (see [Dataproc Personal Cluster Authentication](/dataproc/docs/concepts/iam/personal-auth)).\n\nWhat's next\n-----------\n\n- Learn more about [Dataproc roles and permissions](/dataproc/docs/concepts/iam/iam).\n- Learn more about [Dataproc service accounts](/dataproc/docs/concepts/configuring-clusters/service-accounts).\n- See [BigQuery Access Control](/bigquery/docs/access-control).\n- See [Cloud Storage Access Control options](/storage/docs/access-control)."]]