Create repositories

This document shows you how to work with repositories in BigQuery, including the following tasks:

  • Creating repositories
  • Deleting repositories
  • Sharing repositories
  • Optionally connecting a BigQuery repository to a third-party repository

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the BigQuery and Dataform APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the BigQuery and Dataform APIs.

    Enable the APIs

Required roles

To get the permissions that you need to work with a repositories and workspaces, ask your administrator to grant you the following IAM roles on repositories and workspaces:

  • Create and manage shared repositories: Code Owner (roles/dataform.codeOwner)
  • Create and delete workspaces in shared repositories: Code Editor (roles/dataform.codeEditor)
  • Create, modify, and version control files in workspaces in shared repositories: Code Editor (roles/dataform.codeEditor)
  • View workspaces and their files in shared repositories: Code Viewer (roles/dataform.codeViewer)
  • Create and manage private repositories, including all actions with workspaces and files in the private repository: Code Creator (roles/dataform.codeCreator)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Principals that have the Code Editor role on a repository are able to edit all workspaces in the repository.

Private repositories that you create are still visible to principals who are granted the BigQuery Admin or BigQuery Studio Admin roles at the project level. These principals can share your private repository with other users.

Create a repository

To create a BigQuery repository, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click the Repositories folder.

  3. In the editor, click Add Repository.

  4. In the Create repository pane, in the Repository ID field, type a unique ID.

    IDs can only include numbers, letters, hyphens, and underscores.

  5. In the Region drop-down list, select a BigQuery region for storing the repository and its contents. Select the BigQuery region nearest to your location.

    For a list of available BigQuery regions, see BigQuery Studio locations. The repository region does not have to match the location of your BigQuery datasets.

  6. Click Create.

Connect to a third-party repository

This section shows you how to connect a BigQuery repository to a remote repository. After you connect the repositories, you can perform Git actions on the files in the workspaces contained by the repository. For example, pulling updates from the remote repository and pushing changes to the remote repository.

We recommend creating a dedicated BigQuery repository for each remote repository that you connect to. Give the BigQuery repository a similar name to the remote repository to help make the mapping clear.

You can connect a remote repository through HTTPS or SSH. The following table lists supported Git providers and the connection methods that are available for their repositories:

Git provider Connection method
Azure DevOps Services SSH
Bitbucket SSH
GitHub SSH or HTTPS
GitLab SSH or HTTPS

Connect a remote repository through SSH

To connect a remote repository through SSH, you must generate an SSH key and a Secret Manager secret. The SSH key consists of a public SSH key and a private SSH key. You must share the public SSH key with your Git provider, and create a Secret Manager secret with the private SSH key. Then, share the secret with your default BigQuery service account.

BigQuery uses the secret with the private SSH key to sign in to your Git provider to commit changes on behalf of users. BigQuery makes these commits using the user's Google Cloud email address so you can tell who made each commit.

To connect a remote repository to a BigQuery repository through SSH, follow these steps:

  1. In your Git provider, do the following:

    Azure DevOps Services

    1. In Azure DevOps Services, create a private SSH key.
    2. Upload the public SSH key to your Azure DevOps Services repository.

    Bitbucket

    1. In Bitbucket, create a private SSH key.
    2. Upload the public SSH key to your Bitbucket repository.

    GitHub

    1. In GitHub, check for existing SSH keys.
    2. If you don't have any existing SSH keys, or you'd like to use a new key, create a private SSH key.
    3. Upload the GitHub public SSH key to your GitHub repository.

    GitLab

    1. In GitLab, create a private SSH key.
    2. Upload the GitLab public SSH key to your GitLab repository.
  2. In Secret Manager, create a secret and paste in your private SSH key as the secret value. Your private SSH key should be stored in a file similar to ~/.ssh/id_ed25519. Give a name to the secret so you can find it in the future.

  3. Grant access to the secret to your default Dataform service account.

    Your default Dataform service account is in the following format:

    service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
    
  4. Grant the roles/secretmanager.secretAccessor role to the service account.

  5. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  6. In the Explorer pane, expand the Repositories folder.

  7. Select the BigQuery repository that you want to connect to the remote repository.

  8. In the editor, select the Configuration tab.

  9. Click Connect with Git.

  10. In the Connect to remote repository pane, select the SSH radio button.

  11. In the Remote Git repository URL field, type the URL of the remote Git repository, ending with .git.

    The URL of the remote Git repository must be in one of the following formats:

    • Absolute URL: ssh://git@{host_name}[:{port}]/{repository_path}, port is optional.
    • SCP-like URL: git@{host_name}:{repository_path}.
  12. In the Default remote branch name field, type the name of the main branch of the remote Git repository.

  13. In the Secret drop-down, select the secret that you created that contains the SSH private key.

  14. In the SSH public host key value field, type the public host key of your Git provider.

    Azure DevOps Services

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the Azure DevOps Services public host key, run the ssh-keyscan -t rsa ssh.dev.azure.com command in the terminal.

    Bitbucket

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the Bitbucket public host key, see Configure SSH.

    GitHub

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the GitHub public host key, see GitHub's SSH key fingerprints.

    GitLab

    The SSH public host key value must be in the format of a known_hosts file. The value must contain an algorithm and a public key encoded in the base64 format, but without the hostname or IP, in the following format:

    ALGORITHM BASE64_KEY_VALUE
    

    To retrieve the GitLab public host key, see SSH known_hosts entries.

  15. Click Connect.

Connect a remote repository through HTTPS

To connect a remote repository through HTTPS, you must create a Secret Manager secret with a personal access token, and share the secret with your default BigQuery service account.

BigQuery then uses the access token to sign in to your Git provider to commit changes on behalf of users. BigQuery makes these commits using the user's Google Cloud email address so you can tell who made each commit.

To connect a remote repository to a BigQuery repository through HTTPS, follow these steps:

  1. In your Git provider, do the following:

    GitHub

    1. In GitHub, create a fine-grained personal access token or a classic personal access token.

      • For a fine-grained GitHub personal access token, do the following:
      1. Select repository access to only selected repositories, then select the repository that you want to connect to.

      2. Grant read and write access on contents of the repository.

      3. Set a token expiration time appropriate to your needs.

      • For a classic GitHub personal access token, do the following:
      1. Grant BigQuery the repo permission.

      2. Set a token expiration time appropriate to your needs.

    2. If your organization uses SAML single sign-on (SSO), authorize the token.

    GitLab

    1. In GitLab, create a GitLab personal access token.

    2. Name the token dataform; this is required.

    3. Grant BigQuery the api, read_repository, and write_repository permissions.

    4. Set a token expiration time appropriate to your needs.

  2. In Secret Manager, create a secret containing the personal access token of your remote repository.

  3. Grant access to the secret to your default Dataform service account.

    Your default Dataform service account is in the following format:

    service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.com
    
  4. Grant the roles/secretmanager.secretAccessor role to the service account.

  5. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  6. In the Explorer pane, expand the Repositories folder.

  7. Select the BigQuery repository that you want to connect to the remote repository.

  8. In the editor, select the Configuration tab.

  9. Click Connect with Git.

  10. In the Connect to remote repository pane, select the HTTPS radio button.

  11. In the Remote Git repository URL field, type the URL of the remote Git repository, ending with .git.

    The URL of the remote Git repository can't contain usernames or passwords.

  12. In the Default remote branch name field, type the name of the main branch of the remote Git repository.

  13. In the Secret drop-down, select the secret that you created that contains the personal access token.

  14. Click Connect.

Edit the remote repository connection

To edit a connection between a BigQuery repository and a remote Git repository, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, expand the Repositories folder.

  3. Select the BigQuery repository whose connection you want to edit.

  4. In the editor, select the Configuration tab.

  5. On the repository page, click Edit Git connection.

  6. Edit connection settings.

  7. Click Update.

Share a repository

To share a repository, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click the Repositories folder.

  3. In the Git Repositories pane, select the repository that you want to share.

  4. Click the Actions option and then click Share.

  5. In the Share permissions pane, click Add User/Group.

  6. In the Add User/Group pane, in the New Principals field, type one or more user or group names, separated by commas.

  7. In the Role field, choose the role to assign to the new principals.

  8. Click Save.

Delete a repository

To delete a repository and all its contents, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Explorer pane, click the Repositories folder.

  3. In the Git Repositories pane, select the repository that you want to delete.

  4. Click the Actions option and then click Delete.

  5. Click Delete.

What's next