This document describes how to create a Dataplex Universal Catalog lake. You can create a lake in any of the regions that support Dataplex Universal Catalog.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
- 
    
    
      In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Roles required to select or create a project - Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- 
      Create a project: To create a project, you need the Project Creator
      (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
 
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 
  
  
    
      Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs. Roles required to enable APIs To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
- 
    
    
      In the Google Cloud console, on the project selector page, select or create a Google Cloud project. Roles required to select or create a project - Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- 
      Create a project: To create a project, you need the Project Creator
      (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
 
- 
  
    Verify that billing is enabled for your Google Cloud project. 
- 
  
  
    
      Enable the Dataplex, Dataproc, Dataproc Metastore, BigQuery, and Cloud Storage APIs. Roles required to enable APIs To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Access control
- To create and manage your lake, make sure you have the predefined roles - roles/dataplex.adminor- roles/dataplex.editorgranted. For more information, see grant a single role.
- To attach a Cloud Storage bucket from another project to your lake, grant the following Dataplex Universal Catalog service account an administrator role on the bucket by running the following command: - gcloud alpha dataplex lakes authorize \ --project PROJECT_ID_OF_LAKE \ --storage-bucket-resource BUCKET_NAME
Create a metastore
You can access Dataplex Universal Catalog metadata using Hive Metastore in Spark queries by associating a Dataproc Metastore service instance with your Dataplex Universal Catalog lake. You need to have a gRPC-enabled Dataproc Metastore (version 3.1.2 or higher) associated with the Dataplex Universal Catalog lake.
- Create a Dataproc Metastore service. 
- Configure the Dataproc Metastore service instance to expose a gRPC endpoint (instead of the default Thrift Metastore endpoint): - curl -X PATCH \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services/SERVICE_ID?updateMask=hiveMetastoreConfig.endpointProtocol" \ -d '{"hiveMetastoreConfig": {"endpointProtocol": "GRPC"}}'
- View the gRPC endpoint: - gcloud metastore services describe SERVICE_ID \ --project PROJECT_ID \ --location LOCATION \ --format "value(endpointUri)"
Create a lake
Console
- In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page. 
- Click Create. 
- Enter a Display name. 
- The lake ID is automatically generated for you. If you prefer, you can provide your own ID. See Resource naming convention. 
- Optional: Enter a Description. 
- Specify the Region in which to create the lake. - For lakes created in a given region (for example, - us-central1), you can attach both single-region (- us-central1) data and multi-region (- us multi-region) data depending on the zone settings.
- Optional: Add labels to your lake. 
- Optional: In the Metastore section, click the Metastore service menu, and select the service you created in the Before you begin section. 
- Click Create. 
gcloud
To create a lake, use the gcloud alpha dataplex lakes create command:
gcloud alpha dataplex lakes create LAKE \ --location=LOCATION \ --labels=k1=v1,k2=v2,k3=v3 \ --metastore-service=METASTORE_SERVICE
Replace the following:
- LAKE: name of the new lake
- LOCATION: refers to a Google Cloud region
- k1=v1,k2=v2,k3=v3: labels used (if any)
- METASTORE_SERVICE: the Dataproc Metastore service, if created
REST
To create a lake, use the lakes.create method.
What's next?
- Learn how to Add zones to a lake.
- Learn how to Attach assets to a zone.
- Learn how to secure your lake.
- Learn how to manage your lake.