Stay organized with collections
Save and categorize content based on your preferences.
Metadata federation is a service that lets you access multiple sources of
metadata from a single endpoint.
To set up federation, you create a federation service and then configure your
metadata sources. Afterward, the service exposes a single gRPC endpoint that you can
use to access all of your metadata.
For example, using federation, you can create a Dataproc cluster
that exposes multiple Dataproc Metastore services through a
single endpoint. Afterward, you can run big data jobs through open-source
software (OSS) engines, such as Spark or Hive, to access your metadata across
multiple metastores.
How federation works
OSS big data workloads that run on Spark or Hive send requests to the Hive
Metastore API to fetch metadata at runtime.
The Hive Metastore interface
supports both read and write methods. The federation service exposes a gRPC
version of the Hive Metastore interface.
At runtime, when the federation service receives a request, it checks the
source ordering to retrieve the appropriate metadata.
Metadata sources
When you create a federation service, you must add a metadata source.
You can use the following sources as backend metastores:
A Dataproc Metastore instance.
A project containing one or more BigQuery datasets.
The following section lists the restrictions that you must adhere to when
using various metadata sources.
All sources
The following restrictions apply to all metadata sources:
A federation service doesn't contain its own data. Instead, the federation
service just serves metadata from one of its metadata sources.
A federation service can't be a source of metadata in another federation
service.
Dataproc Metastore
If you're using a Dataproc Metastore as a source, the following
restrictions apply:
Federation services are only available through gRPC endpoints. To use a
Dataproc Metastore with federation, create your metastore
with a gRPC endpoint.
Federation services can be attached to both single-region or multi-region
Dataproc Metastore services.
If the multi-regional metastore is in a different project than the
federation service, grant the Dataproc Metastore service account
of the federation project the metastore.services.get permission on the
Dataproc Metastore instances configured in the multi-region.
BigQuery
If you're using a project that contains BigQuery datasets as a
source, you must satisfy the following conditions:
Grant the correct Identity and Access Management roles to access the project that contains the
BigQuery datasets.
Add at least one Dataproc Metastore service as a source, along
with your BigQuery datasets.
Dataplex Universal Catalog Lakes
Grant an IAM role that contains the dataplex.lakes.get permission.
Add at least one Dataproc Metastore service as a source,
along with your Dataplex Universal Catalog Lake.
Source ordering
Your federation service processes metadata requests in a priority order. This
concept is known as source ordering. At runtime, when the federation service
receives a request, it checks the source ordering and completes one of the
following actions:
If the request contains a database name. The request is routed to the
backend metastore that contains the database name. If more than one
metastore contains the same database name, the request is routed to the
metastore with the lowest rank.
If the request creates or drops a database. The request is routed to the
metastore with the lowest rank.
If the request doesn't contain a database name and it doesn't create or
drop a database. The request is routed to the
Dataproc Metastore instance with the lowest rank. Some
examples of Hive Metastore requests that don't specify a database are
set_ugi and create_database.
If none of the metastores contain a database. The OSS engine responds
with the equivalent of a not-found error.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[[["\u003cp\u003eMetadata federation allows access to multiple metadata sources through a single gRPC endpoint, simplifying metadata retrieval.\u003c/p\u003e\n"],["\u003cp\u003eFederation services utilize backend sources such as Dataproc Metastore instances, BigQuery datasets, and Dataplex Lakes, but cannot be a source for another federation service.\u003c/p\u003e\n"],["\u003cp\u003eRequests to the federation service are handled according to a source ordering, with the service checking for the requested database name or routing to the lowest-ranked metastore if not specified.\u003c/p\u003e\n"],["\u003cp\u003eWhen using BigQuery or Dataplex as a source, you must include at least one Dataproc Metastore service as a source as well.\u003c/p\u003e\n"],["\u003cp\u003eFederation services do not contain their own data, and will only serve metadata from its designated metadata sources.\u003c/p\u003e\n"]]],[],null,["# About Metadata federation\n\nMetadata federation is a service that lets you access multiple sources of\nmetadata from a single endpoint.\n\nTo set up federation, you create a federation service and then configure your\nmetadata sources. Afterward, the service exposes a single gRPC endpoint that you can\nuse to access all of your metadata.\n\nFor example, using federation, you can create a Dataproc cluster\nthat exposes multiple Dataproc Metastore services through a\nsingle endpoint. Afterward, you can run big data jobs through open-source\nsoftware (OSS) engines, such as Spark or Hive, to access your metadata across\nmultiple metastores.\n\nHow federation works\n--------------------\n\nOSS big data workloads that run on Spark or Hive send requests to the Hive\nMetastore API to fetch metadata at runtime.\n\n- The [Hive Metastore interface](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift) supports both read and write methods. The federation service exposes a gRPC version of the Hive Metastore interface.\n- At runtime, when the federation service receives a request, it checks the [source ordering](#source_ordering) to retrieve the appropriate metadata.\n\nMetadata sources\n----------------\n\nWhen you create a federation service, you must add a metadata source.\nYou can use the following sources as backend metastores:\n\n- A Dataproc Metastore instance.\n- A project containing one or more BigQuery datasets.\n- A Dataplex Universal Catalog Lake ([Preview](/products#product-launch-stages)).\n\nSource restrictions\n-------------------\n\nThe following section lists the restrictions that you must adhere to when\nusing various metadata sources.\n\n### All sources\n\nThe following restrictions apply to all metadata sources:\n\n- A federation service doesn't contain its own data. Instead, the federation service just serves metadata from one of its metadata sources.\n- A federation service can't be a source of metadata in another federation service.\n\n### Dataproc Metastore\n\nIf you're using a Dataproc Metastore as a source, the following\nrestrictions apply:\n\n- Federation services are only available through gRPC endpoints. To use a Dataproc Metastore with federation, create your metastore with a [gRPC endpoint](/dataproc-metastore/docs/endpoint-protocol).\n- Federation services can be attached to both single-region or multi-region\n Dataproc Metastore services.\n\n If the multi-regional metastore is in a different project than the\n federation service, grant the Dataproc Metastore service account\n of the federation project the `metastore.services.get` permission on the\n Dataproc Metastore instances configured in the multi-region.\n\n### BigQuery\n\nIf you're using a project that contains BigQuery datasets as a\nsource, you must satisfy the following conditions:\n\n- Grant the correct Identity and Access Management roles to access the project that contains the BigQuery datasets.\n- Add at least one Dataproc Metastore service as a source, along with your BigQuery datasets.\n\n### Dataplex Universal Catalog Lakes\n\n|\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\n- Grant an IAM role that contains the `dataplex.lakes.get` permission.\n- Add at least one Dataproc Metastore service as a source, along with your Dataplex Universal Catalog Lake.\n\n### Source ordering\n\nYour federation service processes metadata requests in a priority order. This\nconcept is known as source ordering. At runtime, when the federation service\nreceives a request, it checks the source ordering and completes one of the\nfollowing actions:\n| **Note:** The metastore with the lowest rank is known as the primary metastore.\n\n- **If the request contains a database name**. The request is routed to the backend metastore that contains the database name. If more than one metastore contains the same database name, the request is routed to the metastore with the lowest rank.\n- **If the request creates or drops a database**. The request is routed to the metastore with the lowest rank.\n- **If the request doesn't contain a database name and it doesn't create or\n drop a database** . The request is routed to the Dataproc Metastore instance with the lowest rank. Some examples of Hive Metastore requests that don't specify a database are `set_ugi` and `create_database`.\n- **If none of the metastores contain a database**. The OSS engine responds with the equivalent of a not-found error.\n\nWhat's next\n-----------\n\n- [Create a federation service](/dataproc-metastore/docs/create-federation)"]]