Dataproc Metastore administrator interface

This page explains how to use the Dataproc Metastore administrator interface.

The administrator interface provides you with a centralized tool to inspect and manage the metadata stored in your Dataproc Metastore service, all without having to connect to a Dataproc cluster or a Hive instance. Instead, you can manage your metadata with the gcloud CLI or Dataproc Metastore APIs.

For example, by using the administrator interface, you can run a SQL query directly on your backend metadata to fetch a specific table name. This process involves following fewer steps than the typical workflow, such as creating a Dataproc cluster, connecting to the cluster using SSH, starting a Hive instance, and then finally running a query (for example, SELECT * FROM table_name).

As a result, the administrator interface can help you save time and help decrease the amount of Google Cloud resources required to retrieve your data.

Before you begin

Required roles

To get the permissions that you need to use the Dataproc Metastore administrator interface, ask your administrator to grant you the following IAM roles on your project, based on the principle of least privilege:

  • To query Dataproc Metastore metadata: Metadata query administrator (roles/metastore.metadataQueryAdmin) on the user account or service account
  • To alter the resource location of your metadata, including databases, tables and partitions, or move a table to another database:
    • Metadata mutate administrator (roles/metastore.metadataMutateAdmin) on the user account or service account
    • Dataproc Metastore Editor (roles/metastore.editor) on the user account or service account

For more information about granting roles, see Manage access to projects, folders, and organizations.

These predefined roles contain the permissions required to use the Dataproc Metastore administrator interface. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to use the Dataproc Metastore administrator interface:

  • To query Dataproc Metastore metadata: metastore.services.queryMetadata
  • To alter or move Dataproc Metastore tables: metastore.services.mutateMetadata

You might also be able to get these permissions with custom roles or other predefined roles.

For more information about specific Dataproc Metastore roles and permissions, see Dataproc Metastore IAM overview.

Supported administrator operations

You can only can run administrator interface operations by using the gcloud CLI or the Dataproc Metastore APIs. Administrator interface operations aren't supported in the Google Cloud console.

The administrator interface supports the following operations.

  • Read-only operations.

    • Query metadata.
  • Read and write operations.

    • Alter the resource location of your metadata, including databases, tables and partitions.
    • Alter table properties, such as a custom key-value pairs.
    • Move a table to another database.

Query metadata

This operation lets you look up metadata information in your database using SQL queries. After you run a query, the results are dumped into your artifacts Google Cloud bucket.

Before running this operation, note the following considerations:

  • Supported operations only include read-only MySQL or Spanner queries. If the query tries to modify the data, the operation fails.
  • The output file contains a maximum of 1000 rows. This configuration can't be changed.
  • The output files aren't automatically deleted. Instead, you must manually delete them from your Google Cloud bucket. If you don't delete them, you might incur extra storage costs.

gcloud CLI

  1. To query metadata, run the following gcloud metastore services query-metadata command:

    gcloud metastore services query-metadata SERVICE \
      --location=LOCATION \
      --query=QUERY

    Replace the following:

    • SERVICE: the name of your Dataproc Metastore service.
    • LOCATION: the Google Cloud region that your Dataproc Metastore service resides in.
    • QUERY: the SQL query to target your metadata.
      • If you're using a MySQL database, use a regular MySQL query.
      • If you're using a Spanner database, use a GoogleSQL query.
  2. View the output file in your artifacts Google Cloud bucket.

REST

curl -X POST -s -i \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -X POST -d '{"query": "QUERY"}' \
  -H "Content-Type:application/json" \
  https://metastore.googleapis.com/projects/PROJECT_ID/locations/LOCATION/services/SERVICE:queryMetadata

Replace the following:

  • QUERY: the SQL query you're using to target your metadata.
    • If you're using a MySQL database, use a regular MySQL query.
    • If you're using a Spanner database, use a GoogleSQL query.
  • PROJECT_ID: the Google Cloud project ID that you're Dataproc Metastore service resides in.
  • SERVICE: the name of your Dataproc Metastore service.
  • LOCATION: the region that your Dataproc Metastore resides in.

The following example shows a sample command that runs a select * query from a database named DBS.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" -X POST -d  '{"query": "select * from DBS;"}' \
  https://metastore.googleapis.com/projects/dpms/locations/us-central1/services/dpms1:queryMetadata

Interpret the output of a query metadata operation

When you first run a query metadata command, Dataproc Metastore automatically creates a Google Cloud folder in your artifacts Google Cloud bucket. This folder is named query-results. After every successful query execution (API call), a new folder is created inside the query-results folder (which is named with a randomly generated UUID).

Each new folder contains a result manifest file with your query results. The location of this folder is returned in the response of your API call.

For example, in the response, the resultManifestUri field contains the location of the file.

"response": {
    "@type": "type.googleapis.com/google.cloud.metastore.QueryMetadataResponse",
    "resultManifestUri": "gs://gcs-bucket-6a3638b8-e319-46363-ad33-e632a5e/query-results/800156f5-2d13-4b80-bec3-2345a9e880f6/result-manifest"
  }

The output of the result manifest file is similar to the following:

{
  "status": {
    "code": 0,
    "message": "Query results are successfully uploaded to cloud storage",
    "details": []
  },
  "filenames": ["result-001"]
}

Result manifest file details:

  • The status field shows whether the query was a success or failure.
  • If the query execution is successful, the filenames field lists all the files created. These files are in the same folder as the result manifest file.
  • If the query resulted in a failure, the details field shows the error message.

Alter the resource location of your metadata

This operation lets you alter the resource location of a database, table, or partition.

When you run this command, it only updates the parent directory or the respective metadata resource. This command doesn't transfer any existing data to the new location.

gcloud CLI

  1. To alter the resource location of metadata, run the following gcloud metastore services alter-metadata-resource-location command:

    gcloud metastore services alter-metadata-resource-location SERVICE \
      --location=LOCATION \
      --resource_name=RESOURCE_NAME \
      --location_uri=LOCATION_URI

    Replace the following:

    • SERVICE: the name of your Dataproc Metastore service.
    • LOCATION: the Google Cloud region that your Dataproc Metastore service resides in.
    • RESOURCE_NAME: the name of the database, table, or partition you're altering.
    • LOCATION_URI: the new Cloud Storage Path for the contents of RESOURCE_NAME. This value is the path that you're moving your metadata resource location to. This path must begin with gs://. For example, gs://bucket/object.
  2. Verify that the resource location change was successful.

REST

curl -X POST -s -i \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  POST -d '{"resource_name": "RESOURCE_NAME", "location_uri":"LOCATION_URI"}' \
  -H "Content-Type:application/json" \
  https://metastore.googleapis.com/projects/PROJECT_ID/locations/LOCATION/services/SERVICE:alterLocation

Replace the following:

  • PROJECT_ID: the Google Cloud project ID that you're Dataproc Metastore service resides in.
  • SERVICE: the name of your Dataproc Metastore service.
  • LOCATION: the region that your Dataproc Metastore resides in.
  • RESOURCE_NAME: the name of the database, table, or partition you're altering.
  • LOCATION_URI: the new Cloud Storage Path for the contents of RESOURCE_NAME. This value is the path that you're moving your metadata resource location to. This path must begin with gs://. For example, gs://bucket/object.

The following example shows a sample command that moves a table called test-table2 to a new Cloud Storage bucket.

 curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
   -H "Content-Type: application/json" \
   -X POST -d  '{"resource_name": "databases/testdb1/tables/test-table2",
   "location_uri":"gs://gcs-bucket-dpms1-9425bd83-b794-4f1c-9e79-2d833f758cc1/empty"}'
   https://metastore.googleapis.com/projects/dpms/locations/us-central1/services/dpms1:alterLocation

Alter table properties

This operation lets you alter the properties of a table, such as a custom key-value pair that you're using to store data. For example, you can change a key-value pair of properties.customerID_1 to properties.customerID_2.

gcloud CLI

  1. To alter a table properties, run the following gcloud metastore services alter-table-properties command:

    gcloud metastore services alter-table-properties SERVICE \
      --location=LOCATION \
      --table-name=TABLE_NAME \
      --update-mask=UPDATE_MASK \
      --properties=PROPERTIES

    Replace the following:

    • SERVICE: the name of your Dataproc Metastore service.
    • LOCATION: the Google Cloud region that your Dataproc Metastore service resides in.
    • TABLE_NAME: the name of the table containing the properties you're altering in the following format, databases/{database_id}/tables/{table_id}.
    • UPDATE_MASK: the existing property values you're updating. Use a comma separated list to describe the key-value pairs for example, properties.1,properties.2,properties.3,....
    • PROPERTIES: the new properties for your table. Use a comma separated list to describe the key-value pairs. For example, a=1,b=2,c=3,.... The values you list here overwrite the values in UPDATE_MASK.

REST

curl -X POST -s -i \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  POST -d '{"table_name": "TABLE_NAME", "update_mask":"UPDATE_MASK", "properties":PROPERTIES}'\
  -H "Content-Type:application/json" \
  https://metastore.googleapis.com/projects/PROJECT_ID/locations/LOCATION/services/SERVICE:alterTableProperties

Replace the following:

  • SERVICE: the name of your Dataproc Metastore service.
  • LOCATION: the Google Cloud region that your Dataproc Metastore service resides in.
  • TABLE_NAME: the name of the table containing the properties you're altering in the following format, databases/{database_id}/tables/{table_id}.
  • UPDATE_MASK: the exsting property values you're updating. Use a comma separated list to describe the key-value pairs for example, properties.1,properties.2,properties.3,....
  • PROPERTIES: the new properties for your table. Use a comma separated list to describe the key-value pairs for example, a=1,b=2,c=3,.... The values that you list here overwrite the values in UPDATE_MASK.

The following example shows a sample command that alters the table properties of a table called test-table. In this example the existing key-value pair, properties.customerID_1 is updated to the new value properties.customerID_2

  curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
   -H "Content-Type: application/json"
   -X POST -d  '{"table_name": "databases/default/tables/test-table", "update_mask":{"paths":"properties.customerID_1"}, "properties":{"customerID_1":"customerID_2"}}' https://metastore.googleapis.com/projects/dpms-p

Move a table to another database

This operation lets you move an internal table (managed table) to another database. In this case, both the database parent directory and it's data are moved.

You can't move data stored in external tables.

gcloud CLI

  1. To move a table to another database, run the following gcloud metastore services move-table-to-database command:

    gcloud metastore services move-table-to-database SERVICE \
      --location=LOCATION \
      --db_name=DB_NAME \
      --table_name=TABLE_NAME \
      --destination_db_name=DESTINATION_DB_NAME

    Replace the following:

    • SERVICE: the name of your Dataproc Metastore service.
    • LOCATION: the Google Cloud region that your Dataproc Metastore service resides in.
    • DB_NAME: the name of the source database that contains the table you want to move.
    • TABLE_NAME: the name of the table that you want to move.
    • DESTINATION_DB_NAME: the name of the new database that you want to move the table to.
  2. Verify that the table change was successful.

REST

curl -X POST -s -i \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  POST -d '{"table_name": "TABLE_NAME", "db_name": "DB_NAME", "destination_db_name": "DESTINATION_DB_NAME"}'\
  -H "Content-Type:application/json" \
  https://metastore.googleapis.com/projects/PROJECT_ID/locations/LOCATION/services/SERVICE:moveTableToDatabase

Replace the following:

  • PROJECT_ID: the Google Cloud project ID that your Dataproc Metastore service resides in.
  • SERVICE: the name of your Dataproc Metastore service.
  • LOCATION: the region that your Dataproc Metastore resides in.
  • DB_NAME: the name of the source database that contains the table you want to move.
  • TABLE_NAME: the name of the table that you want to move.
  • DESTINATION_DB_NAME: the name of the new database that you want to move the table to.

The following example shows a sample command that moves a databased called called testdb1 to a different database called testdb2.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
 -H "Content-Type: application/json"
 -X POST -d  '{"table_name": "testtb1", "db_name": "testdb1",
 "destination_db_name": "testdb2"}' https://metastore.googleapis.com/projects/dpms/locations/asia-northeast2/services/dpms1:moveTableToDatabase

What's next