Managed connectivity overview

This document provides an overview of the managed connectivity pipelines that you can use to import metadata from third-party sources into Dataplex.

Managed connectivity lets you import metadata into Dataplex at scale. A managed connectivity pipeline extracts metadata from your data sources and then imports the metadata into Dataplex. If necessary, the pipeline also creates Dataplex Catalog entry groups in your Google Cloud project. You can orchestrate the workflows, and schedule the import jobs based on your requirements.

You build your own custom connectors to extract metadata from third-party sources. For example, you can build a connector to extract metadata from sources like MySQL, SQL Server, Oracle, Snowflake, Databricks, and others. For steps to build a sample custom connector, see Develop a custom connector for metadata import.

For steps to run a managed connectivity pipeline, see Import metadata from a custom source using Workflows.

How managed connectivity works

The following diagram shows a managed connectivity pipeline.

A managed connectivity pipeline.

At a high level, here's how managed connectivity works:

  1. You build a connector for your data source.

    The connector must be an Artifact Registry image that can run on Dataproc Serverless.

  2. You run the managed connectivity pipeline in Workflows, an orchestration platform.

  3. The managed connectivity pipeline does the following things:

    1. Creates a target entry group based on your configuration, if the entry group doesn't exist yet.
    2. Runs the connector. The connector extracts the metadata from your data source and generates a metadata import file that can be imported into Dataplex Catalog.
    3. Monitors the progress of the metadata extraction.
    4. Runs a metadata import job to import the metadata into Dataplex Catalog.
    5. Monitors the progress of the metadata import job.

The managed connectivity pipeline uses Dataproc Serverless to run the connector, and Dataplex metadata import API methods to run the metadata import job.

The metadata that you import consists of Dataplex Catalog entries and their aspects. For more information about Dataplex Catalog metadata, see Dataplex Catalog overview.

What's next