This document provides instructions to migrate from the preview version of business glossary, which supported Data Catalog metadata, to the generally available version of business glossary, which supports Dataplex Universal Catalog metadata. The transition process requires you to export glossaries, categories, terms, and links from Data Catalog, and then import them into Dataplex Universal Catalog.
To transition to business glossary on Dataplex Universal Catalog, follow these steps:
- Export glossaries, categories, and terms from Data Catalog.
- Import glossaries, categories, and terms to Dataplex Universal Catalog.
- Export links between terms from Data Catalog.
- Import links between terms to Dataplex Universal Catalog.
- Export links between terms and columns from Data Catalog.
- Import links between terms and columns to Dataplex Universal Catalog.
Required roles
To export a glossary from Data Catalog you need to have the
roles/datacatalog.glossaryOwner
role on the projects in which the glossary is
present. See the permissions required for this role.
To get the permissions that
you need to import business glossary to Dataplex Universal Catalog,
ask your administrator to grant you the
Dataplex Administrator (roles/dataplex.admin
)
IAM role on the projects.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to import business glossary to Dataplex Universal Catalog. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to import business glossary to Dataplex Universal Catalog:
-
dataplex.glossaries.import
on the glossary resource -
dataplex.entryGroups.import
on the Dataplex Universal Catalog entry group provided in theentry_groups
field and on the entry groups where the Data Catalog entries are present which are linked to the glossary terms -
dataplex.entryGroups.useSynonymEntryLink
on the Dataplex Universal Catalog entry group provided in theentry_groups
field and on the entry groups where the Data Catalog entries are present which are linked to the glossary terms -
dataplex.entryGroups.useRelatedEntryLink
on the Dataplex Universal Catalog entry group provided in theentry_groups
field and on the entry groups where the Data Catalog entries are present which are linked to the glossary terms -
dataplex.entryLinks.reference
on all the projects provided in thereferenced_entry_scopes
field
You might also be able to get these permissions with custom roles or other predefined roles.
Export glossaries, categories, and terms from Data Catalog
You can export one glossary at a time.
Clone the dataplex-labs repository, and then change directories to the
business-glossary-import
subdirectory:git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs cd dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import
Get your access token:
export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token)
Run the export script:
python3 bg_import/business_glossary_export_v2.py \ --user-project="PROJECT_ID" \ --url="DATA_CATALOG_GLOSSARY_URL" \ --export-mode=glossary_only
Replace the following:
PROJECT_ID
: the ID of the project that contains the glossary.DATA_CATALOG_GLOSSARY_URL
: the URL of the Data Catalog business glossary in the console.
The script creates a JSON file that follows the same format as the metadata import file that's used for metadata import jobs. The names of the glossary, categories, and terms use the following formats:
- Glossary:
projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID
- Term:
projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/terms/TERM_ID
- Category:
projects/PROJECT_NUMBER/locations/LOCATION_ID/entryGroups/@dataplex/entries/projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID/categories/CATEGORY_ID
Where the
GLOSSARY_ID
,CATEGORY_ID
,TERM_ID
,PROJECT_NUMBER
andLOCATION_ID
are the same as the values from the Data Catalog glossary.
Import glossaries, categories, and terms
You need to import Dataplex Universal Catalog glossaries, categories, and terms exported in the previous step. This section describes how to import by using the metadata job API.
Create a Cloud Storage bucket and then upload the file to the bucket.
Grant the Dataplex Universal Catalog service account read access to the Cloud Storage bucket.
Run a metadata import job to import the glossary.
# Set GCURL alias alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"' # Import CURL Command gcurl https://dataplex.googleapis.com/v1/projects/PROJECT_NUMBER/locations/LOCATION_ID/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type": "IMPORT", "import_spec": { "log_level": "DEBUG", "source_storage_uri": "gs://STORAGE_BUCKET/", "entry_sync_mode": "FULL", "aspect_sync_mode": "INCREMENTAL", "scope": { "glossaries": ["projects/PROJECT_NUMBER/locations/LOCATION_ID/glossaries/GLOSSARY_ID"] } } } EOF )"
Replace the following:
JOB_ID
: (optional) a metadata import job ID, which you can use to track the job's status. If you don't provide an ID, the gcurl command generates a unique ID.STORAGE_BUCKET
: the URI of the Cloud Storage bucket or folder that contains the exported glossary file
Optional: To track the status of the metadata import job, use the
metadataJobs.get
method:gcurl -X GET https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/metadataJobs/JOB_ID
If you get any errors in the metadata import job, they'll appear in the logs.
Export links between terms from Data Catalog
Clone the dataplex-labs repository (if you haven't already), and then change directories to the
business-glossary-import
subdirectory:git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs cd dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import
Get your access token:
export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token)
Run the export code:
python3 bg_import/business_glossary_export_v2.py \ --user-project=PROJECT_ID \ --url="DATA_CATALOG_GLOSSARY_URL" \ --export-mode=entry_links_only \ --entrylinktype="related,synonym"
The script creates a JSON file that contains the synonyms and related links between terms. The exported files are in the folder
Exported_Files
indataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import
. The name of the file isentrylinks_relatedsynonymGLOSSARY_ID.json
.
Import links between terms to Dataplex Universal Catalog
You need to import links between terms exported in the previous step. This section describes how to import by using the metadata job API.
Create a new Cloud Storage bucket, and then upload the exported entry links file from the previous step into the bucket.
Grant the Dataplex Universal Catalog service account read access to the Cloud Storage bucket.
Run a metadata import job to import the entry links:
# Import CURL Command gcurl https://dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/global/metadataJobs?metadata_job_id=JOB_ID -d "(cat<<EOF { "type": "IMPORT", "import_spec": { "log_level": "DEBUG", "source_storage_uri": "gs://STORAGE_BUCKET", "entry_sync_mode": "FULL", "aspect_sync_mode": "INCREMENTAL", "scope": { "entry_groups": ["projects/GLOSSARY_PROJECT_ID/locations/global/entryGroups/@dataplex"], "entry_link_types": ["projects/dataplex-types/locations/global/entryLinkTypes/synonym", "projects/dataplex-types/locations/global/entryLinkTypes/related"] "referenced_entry_scopes": [PROJECT_IDS], }, }, } EOF )"
Replace the following:
GLOSSARY_PROJECT_ID
: the ID of the project that contains the glossaryPROJECT_IDS
: if terms are linked across glossaries in different projects, provide the IDs of the projects
Note the following:
- The
entry_groups
object contains the entry group where the entry links are created. This is the@dataplex
system entry group in the same project and location as the glossary. The
entry_link_types
object lets you import synonyms, related terms, or both:- Synonyms:
projects/dataplex-types/locations/global/entryLinkTypes/synonym
- Related terms:
projects/dataplex-types/locations/global/entryLinkTypes/related
- Synonyms:
The
referenced_entry_scopes
object includes the project IDs of entry-links that link terms from different glossaries.
Export links between terms and columns
After exporting and importing the glossaries, and links between terms, proceed with importing the links between terms and columns. In the following command, link type is set to definition to export links between terms and columns.
## Clone the repository and navigate to the directory git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs cd dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token); ## Run the export code python3 bg_import/business_glossary_export_v2.py \ --user-project="PROJECT_ID" \ --url="DATA_CATALOG_GLOSSARY_URL" \ --export-mode=entry_links_only \ --entrylinktype="definition"
Import links between terms and columns
You need to import links between terms and columns exported in the previous step. This section describes how to import by using the metadata job API.
Upload each file exported in the preceding step to a Cloud Storage bucket as described in step 2.
Run a separate import command for each file uploaded in the Cloud Storage bucket. Each file corresponds to a unique entry group containing links between terms and columns of that entry group.
# Set GCURL alias alias gcurl='curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json"' # Import CURL Command gcurl https://DATAPLEX_API/metadataJobs?metadata_job_id=JOB_ID -d "$(cat<<EOF { "type":"IMPORT", "import_spec":{ "log_level":"DEBUG", "source_storage_uri":"gs://STORAGE_BUCKET", "entry_sync_mode":"FULL", "aspect_sync_mode":"INCREMENTAL", "scope":{ "entry_groups":[ "projects/ENTRY_GROUP_PROJECT_ID/locations/ENTRY_GROUP_LOCATION_ID/entryGroups/ENTRY_GROUP_ID" ], "entry_link_types":[ "projects/dataplex-types/locations/global/entryLinkTypes/definition", ], "referenced_entry_scopes":[ PROJECT_IDS ] } } } EOF )"
Replace DATAPLEX_API
with dataplex.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID
.