Managed connectivity overview

This document provides an overview of the managed connectivity pipelines that you can use to import metadata from third-party sources into Dataplex Universal Catalog.

Managed connectivity lets you import metadata into Dataplex Universal Catalog at scale. A managed connectivity pipeline extracts metadata from your data sources and then imports the metadata into Dataplex Universal Catalog. If necessary, the pipeline also creates Dataplex Universal Catalog entry groups in your Google Cloud project. You can orchestrate the workflows, and schedule the import jobs based on your requirements.

You build your own custom connectors to extract metadata from third-party sources. For example, you can build a connector to extract metadata from sources like MySQL, SQL Server, Oracle, Snowflake, Databricks, and others. For steps to build a sample custom connector, see Develop a custom connector for metadata import. You can also use the community-contributed custom connectors that are available for a variety of third-party sources.

For steps to run a managed connectivity pipeline, see Import metadata from a custom source using Workflows.

How managed connectivity works

The following diagram shows a managed connectivity pipeline.

A managed connectivity pipeline.

At a high level, here's how managed connectivity works:

You build a connector for your data source.

The connector must be an Artifact Registry image that can run on Serverless for Apache Spark.
You run the managed connectivity pipeline in Workflows, an orchestration platform.
The managed connectivity pipeline does the following things:
1. Creates a target entry group based on your configuration, if the entry group doesn't exist yet.
2. Runs the connector. The connector extracts the metadata from your data source and generates a metadata import file that can be imported into Dataplex Universal Catalog.
3. Monitors the progress of the metadata extraction.
4. Runs a metadata import job to import the metadata into Dataplex Universal Catalog.
5. Monitors the progress of the metadata import job.

The managed connectivity pipeline uses Google Cloud Serverless for Apache Spark to run the connector, and Dataplex Universal Catalog metadata import API methods to run the metadata import job.

The metadata that you import consists of Dataplex Universal Catalog entries and their aspects. For more information about Dataplex Universal Catalog metadata, see About metadata management in Dataplex Universal Catalog.

Community-contributed custom connectors

To import metadata from third-party sources, you can use custom connectors that are contributed by the community. See each connector's README file for setup instructions and more information about the connector.

Data source	Repository
MySQL	mysql-connector
Oracle	oracle-connector
PostgreSQL	postgresql-connector
Snowflake	snowflake-connector
SQL Server	sql-server-connector

Managed connectivity overview Stay organized with collections Save and categorize content based on your preferences.

How managed connectivity works

Community-contributed custom connectors

What's next

Managed connectivity overview