Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
Auf dieser Seite wird beschrieben, wie Sie dem Cloud Data Fusion-Dienstagenten die Rolle Dienstkontonutzer für das Dataproc-Dienstkonto zuweisen, damit er Pipelines in Dataproc-Clustern bereitstellen und ausführen kann.
Dienstkonten, die von Dataproc verwendet werden, müssen außerdem die Berechtigung datafusion.instances.runtime zum Zugriff auf die Cloud Data Fusion-Laufzeitressourcen erteilen.
Unabhängig davon, ob Sie ein nutzerverwaltetes Dienstkonto oder das Compute Engine-Standarddienstkonto auf den virtuellen Maschinen in einem Cluster verwenden, müssen Sie Cloud Data Fusion die Rolle "Dienstkontonutzer" zuweisen. Andernfalls kann Cloud Data Fusion keinen Dataproc-Cluster bereitstellen und der folgende Fehler wird angezeigt, wenn Sie eine Datenpipeline ausführen:
PROVISION task failed in REQUESTING_CREATE state for program run [pipeline-name] due to Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service account '[service-account-name]'
Name des Dienstkontos abrufen
Rufen Sie in der Google Cloud Console die Seite „Identitäts- und Zugriffsverwaltung“ auf. Zur IAM-Seite
Wählen Sie in der Projektauswahl oben auf der Seite das Projekt, den Ordner oder die Organisation aus, zu der die Cloud Data Fusion-Instanz gehört.
Suchen und kopieren Sie den Namen des Cloud Data Fusion-Dienstkontos. Verwenden Sie das folgende Format:
service-[project-number]@gcp-sa-datafusion.iam.gserviceaccount.com.
Klicken Sie auf Projekt auswählen, wählen Sie ein Projekt aus, in dem sich das Dienstkonto befindet, das Sie für den Dataproc-Cluster verwenden möchten, und klicken Sie dann auf Öffnen.
Weisen Sie in Cloud Data Fusion Version 6.2.0 und höher die Rolle Cloud Storage-Administrator (roles/storage.admin) Dienstkonten zu, die von Dataproc in Ihrem Projekt verwendet werden.
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],["Zuletzt aktualisiert: 2025-08-12 (UTC)."],[[["\u003cp\u003eTo allow Cloud Data Fusion to provision and run pipelines on Dataproc clusters, the Cloud Data Fusion Service Agent must be granted the Service Account User role on the Dataproc Service Account.\u003c/p\u003e\n"],["\u003cp\u003eFor Dataproc service accounts, it's also necessary to grant \u003ccode\u003edatafusion.instances.runtime\u003c/code\u003e permission to enable access to Cloud Data Fusion runtime resources.\u003c/p\u003e\n"],["\u003cp\u003eIf the Service Account User role is not granted, Cloud Data Fusion will be unable to provision a Dataproc cluster, resulting in an error during pipeline execution.\u003c/p\u003e\n"],["\u003cp\u003eThe Cloud Data Fusion runner role and Cloud Storage admin role must be granted to service accounts used by Dataproc to authorize the execution of Cloud Data Fusion pipelines and access Cloud Storage resources.\u003c/p\u003e\n"],["\u003cp\u003eYou can grant these permissions in the UI when you create an instance starting in Cloud Data Fusion versions 6.2.3, alternatively you can follow the manual steps provided to grant these roles.\u003c/p\u003e\n"]]],[],null,["# Grant service account roles for Dataproc\n\nThis page describes how to grant the [Service Account User](/iam/docs/service-account-permissions#user-role)\nrole on Dataproc Service Account to Cloud Data Fusion Service\nAgent to allow it to provision and run pipelines on Dataproc\nclusters.\n\nFor service accounts that are used by Dataproc, you also need to\n[grant `datafusion.instances.runtime` permission](#runtime-permission) to access\nCloud Data Fusion runtime resources.\n| **Note:** Starting in Cloud Data Fusion versions 6.2.3, you can grant these permissions in the UI when you create an instance. \n| To get the permissions that you need to create a Cloud Data Fusion instance, ask your administrator to grant you the [Service Account Admin](/iam/docs/understanding-roles#iam.serviceAccountAdmin) (`roles/iam.serviceAccountAdmin`) IAM role on the Dataproc service account.\n\nWhether you use a user-managed service account, or the default Compute Engine\nservice account on the virtual machines in a cluster, you must grant the\nService Account User role to Cloud Data Fusion. Otherwise,\nCloud Data Fusion cannot provision a Dataproc cluster\nand the following error appears when you execute a data pipeline:\n\n`PROVISION task failed in REQUESTING_CREATE state for program run [pipeline-name] due to Dataproc operation failure: INVALID_ARGUMENT: User not authorized to act as service account '[service-account-name]'`\n\nGet the service account name\n----------------------------\n\n1. In the Google Cloud console, go to the Identity and Access Management page. \n [Go to the IAM page](https://console.cloud.google.com/iam-admin/iam)\n2. From the project selector at the top of the page, choose the project, folder, or organization to which the Cloud Data Fusion instance belongs.\n3. Find and copy the [Cloud Data Fusion service account](/data-fusion/docs/concepts/service-accounts#service_account_table) name. Use the following format: `service-[project-number]@gcp-sa-datafusion.iam.gserviceaccount.com`.\n\nGive service account user permission\n------------------------------------\n\n1. In the Google Cloud console, go to the **Service Accounts** page. \n [Go to the Service Accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts)\n2. Click **Select a project** , choose a project where the service account you want to use for the Dataproc cluster is located, and then click **Open**.\n3. Click the email address of the [Dataproc service account](/dataproc/docs/concepts/configuring-clusters/service-accounts).\n\n | When Cloud Data Fusion provisions a Dataproc cluster, you can specify which user-managed service account to use for the Dataproc virtual machines in that cluster. If a service account is not specified, the default Compute Engine service account is used, which is in the format of `[project-number]-compute@developer.gserviceaccount.com`.\n4. Click the **Permissions** tab. The page displays a list of principals that\n have been granted roles on the service account.\n\n5. Click person_add **Grant access**.\n\n6. In the **New principals** field, paste the Cloud Data Fusion service\n account name that you previously copied.\n\n7. Select the **Service Account User** role.\n\n8. Click **Save**.\n\nGrant roles to Dataproc service accounts\n----------------------------------------\n\n### Grant runner role permission\n\nGrant the [Cloud Data Fusion runner role](/iam/docs/understanding-roles#cloud-data-fusion-roles)\n(`roles/datafusion.runner`) to service accounts that are used by\nDataproc. This authorizes the Dataproc service\naccount to run Cloud Data Fusion pipelines in your [project](/data-fusion/docs/concepts/security#projects).\nFor more information, see [Requiring permission to attach service accounts to resources](/iam/docs/service-accounts-actas).\n\n### Grant Cloud Storage admin permission\n\nIn Cloud Data Fusion versions 6.2.0 and above, grant the\n[Cloud Storage admin role](/storage/docs/access-control/iam-roles#standard-roles)\n(`roles/storage.admin`) to service accounts that are used by\nDataproc in your [project](/data-fusion/docs/concepts/security#projects).\n\nWhat's next\n-----------\n\n- Learn more about [Access control](/data-fusion/docs/access-control) in Cloud Data Fusion.\n- Learn more about Cloud Data Fusion [service accounts](/data-fusion/docs/concepts/service-accounts)."]]