[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-27。"],[[["\u003cp\u003eDataproc Metastore is a fully managed, serverless Apache Hive metastore service on Google Cloud that maps data to Apache Hive tables for metadata management.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Metastore supports both internal tables (managing both metadata and data) and external tables (managing only metadata), impacting how data is handled upon deletion.\u003c/p\u003e\n"],["\u003cp\u003eThe Hive warehouse directory, which is where your internal data tables are stored, defaults to a Cloud Storage bucket automatically created for each Dataproc Metastore service, and this can be changed by setting a configuration override.\u003c/p\u003e\n"],["\u003cp\u003eDataproc Metastore also automatically creates an artifacts bucket in your project to store exported metadata and managed internal table data, and both the warehouse directory and the artifacts bucket default to the same bucket.\u003c/p\u003e\n"],["\u003cp\u003eDataproc service accounts and service agents require appropriate roles to access the Hive warehouse directory, and changing the default warehouse directory is supported but requires specific considerations, including maintaining the right permissions.\u003c/p\u003e\n"]]],[],null,["# Hive metastore\n\nDataproc Metastore is a fully managed, highly available,\nautohealing, serverless, Apache Hive metastore (HMS) that runs on Google Cloud.\n\nTo fully manage your metadata, Dataproc Metastore maps your\ndata to [Apache Hive](https://hive.apache.org/)\ntables.\n\nSupported Apache Hive versions\n------------------------------\n\nDataproc Metastore only supports specific versions of Apache Hive.\nFor more information, see the [Hive version policy](/dataproc-metastore/docs/version-policy).\n\nHow Hive handles metadata\n-------------------------\n\nSince Dataproc Metastore is a Hive metastore,\nit's important to understand how it manages your metadata.\n\nBy default, all Hive applications can have *managed internal tables* or\n*unmanaged external tables*. Meaning, the metadata that you store in a\nDataproc Metastore service can exist in both internal and external tables.\n\nWhen modifying data, a Dataproc Metastore service (Hive) treats\ninternal and external tables differently.\n\n- **Internal tables**. Manages both metadata and table data.\n- **External tables**. Only manages metadata.\n\nFor example, if you delete a table definition using the `DROP TABLE` Hive SQL\nstatement: \n\n```\ndrop table foo\n```\n\n- **Internal tables**. Dataproc Metastore deletes all metadata.\n It also deletes the files associated with the table.\n\n- **External tables**. Dataproc Metastore only deletes the metadata.\n It keeps the data associated with the table.\n\nHive warehouse directory\n------------------------\n\nDataproc Metastore uses the Hive warehouse directory to manage\nyour internal tables. The Hive warehouse directory is where your actual data is\nstored.\n\nWhen you use a Dataproc Metastore service, the default Hive warehouse\ndirectory is a Cloud Storage bucket. Dataproc Metastore only\nsupports the use of Cloud Storage buckets for the warehouse directory.\nIn comparison, this is different to an on-premise HMS, where the Hive warehouse\ndirectory usually points to a local directory.\n\nThis bucket is automatically created for you every time you create a\nDataproc Metastore service. This value can be changed by setting\na Hive Metastore configuration override on the [`hive.metastore.warehouse.dir`](#change-hive-directory)\nproperty.\n\n### Artifacts Cloud Storage buckets\n\nThe artifacts bucket stores your Dataproc Metastore artifacts,\nsuch as exported metadata and managed internal table data.\n\nWhen you create a Dataproc Metastore service, a Cloud Storage\nbucket is automatically created for you in your project. By default both the\nartifacts bucket and the warehouse directory point to the same bucket. You\ncan't change the location of the artifacts bucket, however, you can\nchange the location of the Hive warehouse directory.\n\nThe artifacts bucket is located at the following location:\n\n- `gs://`\u003cvar translate=\"no\"\u003eyour-artifacts-bucket\u003c/var\u003e`/hive-warehouse`.\n- For example, `gs://gcs-your-project-name-0825d7b3-0627-4637-8fd0-cc6271d00eb4`.\n\n| **Note:** This bucket is created with [uniform bucket-level access](/storage/docs/uniform-bucket-level-access) and can't be changed to use fine-grained ACLs.\n\n#### Access the Hive warehouse directory\n\nAfter your bucket is automatically created for you, ensure that your [Dataproc service accounts](/dataproc/docs/concepts/configuring-clusters/service-accounts#dataproc_service_accounts_2)\nhave permission to access the Hive warehouse directory.\n\n- To access the warehouse directory at the object level (for example, gs://mybucket/object), grant the Dataproc service accounts read and write access to the storage object of the bucket, using the `roles/storage.objectAdmin` role. This role must be set at the bucket level or higher.\n\n- To access the warehouse directory when you use a top-level folder, (for example, gs://mybucket), grant the Dataproc service accounts read and write access to the storage object of the bucket, using the `roles/storage.storageAdmin` role.\n\nIf the Hive warehouse directory is not in the same project as the\nDataproc Metastore, ensure that the Dataproc Metastore\nservice agent has permission to access the Hive warehouse directory. The service\nagent for a Dataproc Metastore project is\n`service-`\u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e`@gcp-sa-metastore.iam.gserviceaccount.com`.\nGrant the service agent read and write access to the bucket using the\n[`roles/storage.objectAdmin`](/storage/docs/access-control/iam-roles) role.\n\n### Find the Hive warehouse directory\n\n1. Open the [**Dataproc Metastore**](https://console.cloud.google.com/dataproc/metastore/services) page.\n2. Click the name of your service.\n\n The Service detail page opens.\n3. In the configuration table, find **Metastore config overrides** \\\u003e **hive.metastore.warehouse.dir**.\n\n4. Find the value that starts with `gs://`.\n\n This value is the location of your Hive warehouse directory.\n\n### Change the Hive warehouse directory\n\nTo use your own Cloud Storage bucket with Dataproc Metastore,\nset a Hive Metastore configuration override to point to the new bucket location.\n\nIf you change your default warehouse directory, follow these recommendations.\n\n- Don't use the Cloud Storage bucket root (`gs://mybucket`) to store\n Hive tables.\n\n- Make sure your Dataproc Metastore VM service account\n has permission to [access the Hive warehouse directory](#access-dir).\n\n- For best results, use Cloud Storage buckets that are located in the\n same region as your Dataproc Metastore service. Although\n Dataproc Metastore allows cross-region buckets,\n colocated resources perform better. For example, an EU multi-region bucket\n doesn't work well with a `us-central1` service. Cross-region access results in\n higher latency, lack of regional failure isolation, and charges for cross-region\n network bandwidth.\n\n#### To change the Hive warehouse directory\n\n1. Open the [**Dataproc Metastore**](https://console.cloud.google.com/dataproc/metastore/services) page.\n2. Click the name of your service.\n\n The Service detail page opens.\n3. In the configuration table, find the **Metastore config overrides** \\\u003e **hive.metastore.warehouse.dir** section.\n\n4. Change the `hive.metastore.warehouse.dir` value to the location of your new\n bucket. For example, `gs://my-bucket/path/to/location`.\n\n### Delete your bucket\n\nDeleting your Dataproc Metastore service doesn't automatically\ndelete your Cloud Storage artifacts bucket. Your bucket isn't automatically\ndeleted because it might contain useful post-service data. To delete your bucket,\nrun a Cloud Storage [delete operation](/storage/docs/deleting-buckets).\n| **Caution:** Don't delete your artifacts bucket until after you delete your Dataproc Metastore service. If you delete your artifacts bucket first, the service might stop working.\n\nWhat's next\n-----------\n\n- [Create a service](/dataproc-metastore/docs/create-service)\n- [Update and delete a service](/dataproc-metastore/docs/manage-service)\n- [Import metadata into a service](/dataproc-metastore/docs/import-metadata)"]]