Connect to a third-party Git repository

This document shows you how to connect a remote repository to a Dataform repository. After you connect the repositories, the changes you make in a Dataform development workspace can be pushed to and pulled from the remote Git repository.

You can connect a remote repository through HTTPS or SSH.

The following table lists supported Git providers and connection methods that are available for their repositories:

Git provider Connection method
Azure DevOps Services SSH
Bitbucket SSH
GitHub SSH or HTTPS
GitLab SSH or HTTPS

Before you begin

  1. If your organization or project restricts remote Git repositories with the dataform.restrictGitRemotes Organization Policy, ensure that the remote Git repository is added to the allowlist in the policy before you create a Dataform repository which you want to connect to a remote repository. For more information, see Restrict remote repositories.
  2. Select or create a Dataform repository. You need it later to share a secret with your default Dataform service account.

Required roles

To get the permissions that you need to connect a Dataform repository to a remote Git repository, ask your administrator to grant you the Dataform Admin (roles/dataform.admin) IAM role on repositories. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Connect a remote repository through SSH

To connect a remote repository through SSH, you need to generate an SSH key and a Secret Manager secret. The SSH key consists of a public SSH key and a private SSH key. You need to share the public SSH key with your Git provider, and create a Secret Manager secret with the private SSH key. Then, share the secret with your default Dataform service account.

Dataform uses the secret with the private SSH key to sign in to your Git provider to commit changes on behalf of the developers. Dataform makes these commits using the developer's Google Cloud email address so you can tell who made each commit.

To connect a remote repository to a Dataform repository through SSH, follow these steps:

  1. In your Git provider, do the following:

    Azure DevOps Services

    1. In Azure DevOps Services, create a private SSH key.
    2. Upload the public SSH key to your Azure DevOps Services repository.

    Bitbucket

    1. In Bitbucket, create a private SSH key.
    2. Upload the public SSH key to your Bitbucket repository.

    GitHub

    1. In GitHub, create a private SSH key.
    2. Upload the GitHub public SSH key to your GitHub repository.

    GitLab

    1. In GitLab, create a private SSH key.
    2. Upload the GitLab public SSH key to your GitLab repository.
  2. In Secret Manager, create a secret and set your private SSH key as the secret value.

    1. Grant access to the secret to your default Dataform service account.

      Your default Dataform service account is in the following format:

      service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
      
    2. Grant the roles/secretmanager.secretAccessor role to the service account.

  3. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  4. Select the Dataform repository that you want to connect to the remote repository.

  5. On the repository page, click Settings > Connect with Git.

  6. In the Link to remote repository pane, in the Remote Git repository URL field, enter the URL of the remote Git repository, ending with .git.

    The URL of the remote Git repository must be in one of the following formats:

    • Absolute URL: ssh://git@{host_name}[:{port}]/{repository_path}, port is optional.
    • SCP-like URL: git@{host_name}:{repository_path}.
  7. In the Default remote branch name field, enter the name of the main development branch of the remote Git repository.

  8. In the Secret drop-down, select your secret for the remote Git repository.

  9. In the SSH public host key value field, enter the public host key of your Git provider.

    Azure DevOps Services

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the Azure DevOps Services public host key, run the ssh-keyscan -t rsa ssh.dev.azure.com command in the terminal.

    Bitbucket

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the Bitbucket public host key, see Configure SSH.

    GitHub

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the GitHub public host key, see GitHub's SSH key fingerprints.

    GitLab

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the GitLab public host key, see SSH known_hosts entries.

  10. Click Link.

Connect a remote repository through HTTPS

To connect a remote repository through HTTPS, you need to create a Secret Manager secret with a personal access token, and share the secret with your default Dataform service account.

Dataform then uses the access token to sign in to your Git provider to commit changes on behalf of the developers. Dataform makes these commits using the developer's Google Cloud email address so you can tell who made each commit.

To connect a remote repository to a Dataform repository through HTTPS, follow these steps:

  1. In your Git provider, do the following:

    GitHub

    1. In GitHub, create a fine-grained personal access token or a classic personal access token.

      • For a fine-grained GitHub personal access token, do the following:
      1. Select repository access to only selected repositories, then select the repository that you want to connect to.

      2. Grant read and write access on contents of the repository.

      3. Set a token expiration time appropriate to your needs.

      • For a classic GitHub personal access token, do the following:
      1. Grant Dataform the repo permission.

      2. Set a token expiration time appropriate to your needs.

    2. If your organization uses SAML single sign-on (SSO), authorize the token.

    GitLab

    1. In GitLab, create a GitLab personal access token.

    2. Name the token dataform.

      The GitLab personal access token must be named dataform.

    3. Grant Dataform the api, read_repository, and write_repository permissions.

    4. Set a token expiration time appropriate to your needs.

  2. In Secret Manager, create a secret containing the personal access token of your remote repository.

  3. Grant access to the secret to your default Dataform service account.

    Your default Dataform service account is in the following format:

    service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
    
    1. Grant the roles/secretmanager.secretAccessor role to the service account.
  4. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  5. Select the Dataform repository that you want to connect to the remote repository.

  6. On the repository page, click Settings > Connect with Git.

  7. In the Link to remote repository pane, in the Remote Git repository URL field, enter the URL of the remote Git repository, ending with .git.

    The URL of the remote Git repository cannot contain usernames or passwords.

  8. In the Default remote branch name field, enter the name of the main development branch of the remote Git repository.

  9. In the Secret drop-down, select your secret for the remote Git repository.

  10. Click Link.

Edit the remote repository connection

To edit a connection between a Dataform repository and a remote Git repository, follow these steps:

  1. In the Google Cloud console, go to the Dataform page.

    Go to Dataform

  2. Click the repository that you want to edit.

  3. On the repository page, click Settings > Edit Git connection.

  4. On the Link to remote repository pane, edit connection settings.

  5. Click Update.

What's next