Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
In diesem Dokument wird beschrieben, wie Sie einen Dataplex Universal Catalog-Lake erstellen. Sie können einen Lake in einer beliebigen Region erstellen, die Dataplex Universal Catalog unterstützt.
Hinweise
Sign in to your Google Cloud account. If you're new to
Google Cloud,
create an account to evaluate how our products perform in
real-world scenarios. New customers also get $300 in free credits to
run, test, and deploy workloads.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Damit Sie Ihren Data Lake erstellen und verwalten können, müssen Ihnen die vordefinierten Rollenroles/dataplex.admin oder roles/dataplex.editor zugewiesen sein.
Weitere Informationen finden Sie unter Einzelne Rolle zuweisen.
Wenn Sie einen Cloud Storage-Bucket aus einem anderen Projekt an Ihren Data Lake anhängen möchten, weisen Sie dem folgenden Dataplex Universal Catalog-Dienstkonto eine Administratorrolle für den Bucket zu, indem Sie den folgenden Befehl ausführen:
Sie können über Hive Metastore in Spark-Abfragen auf Metadaten des Dataplex Universal Catalog zugreifen, indem Sie eine Dataproc Metastore-Dienstinstanz mit Ihrem Dataplex Universal Catalog-Lake verknüpfen. Sie benötigen einen gRPC-fähigen Dataproc Metastore (Version 3.1.2 oder höher), der mit dem Dataplex Universal Catalog-Lake verknüpft ist.
Konfigurieren Sie die Dataproc Metastore-Dienstinstanz so, dass ein gRPC-Endpunkt (anstelle des standardmäßigen Thrift-Metastore-Endpunkts) verfügbar gemacht wird:
Die Lake-ID wird automatisch für Sie generiert. Sie können auch Ihre eigene ID angeben. Weitere Informationen finden Sie unter Konvention für Ressourcennamen.
Optional: Geben Sie eine Beschreibung ein.
Geben Sie die Region an, in der die Instanz erstellt werden soll.
Bei Lakes, die in einer bestimmten Region erstellt wurden (z. B. us-central1), können Sie sowohl Daten mit einer Region (us-central1) als auch Daten mit mehreren Regionen (us multi-region) je nach Zoneneinstellungen anhängen.
Optional: Fügen Sie Ihrem Lake Labels hinzu.
Optional: Klicken Sie im Abschnitt Metastore auf das Menü Metastore-Dienst und wählen Sie den Dienst aus, den Sie im Abschnitt Vorbereitung erstellt haben.
Klicken Sie auf Erstellen.
gcloud
Verwenden Sie den Befehl gcloud alpha dataplex lakes create, um einen Data Lake zu erstellen:
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],["Zuletzt aktualisiert: 2025-08-19 (UTC)."],[[["\u003cp\u003eDataplex lakes can be created in any region that supports Dataplex, but it is critical to not use the ID \u003ccode\u003egoog-dataproc-default-lake\u003c/code\u003e as it is reserved for Dataproc workspaces.\u003c/p\u003e\n"],["\u003cp\u003eTo manage a Dataplex lake, users need to have the \u003ccode\u003eroles/dataplex.admin\u003c/code\u003e or \u003ccode\u003eroles/dataplex.editor\u003c/code\u003e predefined roles.\u003c/p\u003e\n"],["\u003cp\u003eAccessing Dataplex metadata in Spark queries requires associating a gRPC-enabled Dataproc Metastore service (version 3.1.2 or higher) with your Dataplex lake.\u003c/p\u003e\n"],["\u003cp\u003eWhen creating a Dataplex lake, you can optionally associate a Dataproc Metastore service to the lake within the "Metastore" section of the lake creation process in the Google Cloud console, or with a command line argument.\u003c/p\u003e\n"],["\u003cp\u003eYou can create a lake either in the Google Cloud console or using the gcloud command line, using the \u003ccode\u003egcloud alpha dataplex lakes create\u003c/code\u003e command, or by using the REST API's \u003ccode\u003elakes.create\u003c/code\u003e method.\u003c/p\u003e\n"]]],[],null,["# Create a Dataplex Universal Catalog lake\n\nThis document describes how to create a Dataplex Universal Catalog lake. You can\ncreate a lake in any of the [regions](/dataplex/docs/locations) that support\nDataplex Universal Catalog.\n\nBefore you begin\n----------------\n\n| **Note:** For Dataplex Universal Catalog use cases, don't create or use a lake with the ID `goog-dataproc-default-lake`. The ID `goog-dataproc-default-lake` serves as the default lake exclusively intended for Dataproc workspaces.\n\n- Sign in to your Google Cloud account. If you're new to Google Cloud, [create an account](https://console.cloud.google.com/freetrial) to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage\n APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,dataproc.googleapis.com,metastore.googleapis.com,bigquery.googleapis.com,storage.googleapis.com)\n\n- In the Google Cloud console, on the project selector page,\n select or create a Google Cloud project.\n\n | **Note**: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.\n\n [Go to project selector](https://console.cloud.google.com/projectselector2/home/dashboard)\n-\n [Verify that billing is enabled for your Google Cloud project](/billing/docs/how-to/verify-billing-enabled#confirm_billing_is_enabled_on_a_project).\n\n-\n\n\n Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage\n APIs.\n\n\n [Enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataplex.googleapis.com,dataproc.googleapis.com,metastore.googleapis.com,bigquery.googleapis.com,storage.googleapis.com)\n\n\u003cbr /\u003e\n\n### Access control\n\n1. To create and manage your lake, make sure you have the\n [predefined roles](/dataplex/docs/iam-roles#predefined-roles)\n `roles/dataplex.admin` or `roles/dataplex.editor` granted.\n For more information, see\n [grant a single role](/iam/docs/granting-changing-revoking-access#grant-single-role).\n\n2. To attach a Cloud Storage bucket from another project to your lake,\n grant the following Dataplex Universal Catalog service account an\n administrator role on the bucket by running the following command:\n\n gcloud alpha dataplex lakes authorize \\\n --project \u003cvar translate=\"no\"\u003ePROJECT_ID_OF_LAKE\u003c/var\u003e \\\n --storage-bucket-resource \u003cvar translate=\"no\"\u003eBUCKET_NAME\u003c/var\u003e\n\n### Create a metastore\n\nYou can access Dataplex Universal Catalog metadata using Hive Metastore in\n[Spark queries](/dataplex/docs/access-metadata) by associating a\nDataproc Metastore service instance with your\nDataplex Universal Catalog lake. You need to have a gRPC-enabled\nDataproc Metastore (version 3.1.2 or higher) associated with the\nDataplex Universal Catalog lake.\n\n1. Create a [Dataproc Metastore service](/dataproc-metastore/docs/create-service#create-a-service).\n\n2. Configure the Dataproc Metastore service instance to expose\n a gRPC endpoint (instead of the default Thrift Metastore endpoint):\n\n curl -X PATCH \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json\" \\\n \"https://metastore.googleapis.com/v1beta/projects/\u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e/locations/\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e/services/\u003cvar translate=\"no\"\u003eSERVICE_ID\u003c/var\u003e?updateMask=hiveMetastoreConfig.endpointProtocol\" \\\n -d '{\"hiveMetastoreConfig\": {\"endpointProtocol\": \"GRPC\"}}'\n\n | **Note:** This step might take a few minutes to complete.\n3. View the gRPC endpoint:\n\n gcloud metastore services describe \u003cvar translate=\"no\"\u003eSERVICE_ID\u003c/var\u003e \\\n --project \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e \\\n --location \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --format \"value(endpointUri)\"\n\nCreate a lake\n-------------\n\n### Console\n\n1. In the Google Cloud console, go to Dataplex Universal Catalog.\n\n [Go to Dataplex](https://console.cloud.google.com/dataplex/lakes)\n2. Navigate to the **Manage** view.\n\n3. Click add**Create**.\n\n4. Enter a **Display name**.\n\n5. The lake ID is automatically generated for you. If you prefer, you\n can provide your own ID. See [Resource naming convention](/compute/docs/naming-resources#resource-name-format).\n\n6. Optional: Enter a **Description**.\n\n7. Specify the **Region** in which to create the lake.\n\n For lakes created in a given region (for example, `us-central1`), you\n can attach both single-region (`us-central1`) data and multi-region\n (`us multi-region`) data depending on the zone settings.\n8. Optional: Add labels to your lake.\n\n9. Optional: In the **Metastore** section, click the **Metastore service**\n menu, and select the service you created in the [Before you begin](#metastore)\n section.\n\n10. Click **Create**.\n\n### gcloud\n\nTo create a lake, use the `gcloud alpha dataplex lakes create` command: \n\n```\ngcloud alpha dataplex lakes create LAKE \\\n --location=LOCATION \\\n --labels=k1=v1,k2=v2,k3=v3 \\\n --metastore-service=METASTORE_SERVICE\n```\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eLAKE\u003c/var\u003e: name of the new lake\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: refers to a Google Cloud region\n- \u003cvar translate=\"no\"\u003ek1=v1,k2=v2,k3=v3\u003c/var\u003e: labels used (if any)\n- \u003cvar translate=\"no\"\u003eMETASTORE_SERVICE\u003c/var\u003e: the Dataproc Metastore service, if created\n\n### REST\n\nTo create a lake, use the [lakes.create](/dataplex/docs/reference/rest/v1/projects.locations.lakes/create)\nmethod.\n| **Note:** It might take a few minutes for the lake to be created.\n\nWhat's next?\n------------\n\n- Learn how to [Add zones to a lake](/dataplex/docs/add-zone).\n- Learn how to [Attach assets to a zone](/dataplex/docs/manage-buckets#add-asset).\n- Learn how to [secure your lake](/dataplex/docs/lake-security#secure_your_lake).\n- Learn how to [manage your lake](/dataplex/docs/manage-lake)."]]