Step 4: Set up components

This page describes the fourth step to deploy Cortex Framework Data Foundation, the core of Cortex Framework. In this step, you set up the required Google Cloud Services to deploy.

In this section you enable the following Google Cloud services in your Google Cloud project:

  • BigQuery instance and datasets
  • Cloud Build API
  • Cloud Storage Buckets
  • Service account
  • Cloud Resource Manager API
  • Workflow executions (if deploying Cortex for Meridian)
  • AI platform (if deploying Cortex for Meridian)

Enable these Google Cloud Services using Cloud Shell:

  1. Copy and paste the following command:

    gcloud config set project SOURCE_PROJECT
    
    gcloud services enable bigquery.googleapis.com \
                          cloudbuild.googleapis.com \
                          composer.googleapis.com \
                          storage-component.googleapis.com \
                          cloudresourcemanager.googleapis.com \
                          dataflow.googleapis.com \
                          datacatalog.googleapis.com \
                          workflowexecutions.googleapis.com \
                          workflows.googleapis.com \
                          aiplatform.googleapis.com
    

    Replace SOURCE_PROJECT with your source project ID.

  2. Ensure that Google Cloud services are enabled if you get a success message.

Optional. You can enable the following Google Cloud services in your Google Cloud Project:

  • Cloud Composer for Change Data Capture (CDC) processing, hierarchy flattening (SAP only), and data replication (Non-SAP only) through Directed Acyclic Graphs (DAGs). To set up an instance, see Cloud Composer documentation.
  • Looker for connecting to reporting templates.
  • BigQuery sharing (formerly Analytics Hub) linked datasets are used for some external sources, such as the Weather DAG. You might choose to fill this structure with any other available source of your choice for advanced scenarios.
  • Dataflow: Integration tool for many of the Marketing data sets like Google Ads.
  • Dataplex: Used for building a Data Mesh. For more information see the Data Mesh User Guide.
  • For Cortex Framework with Meridian:
    • Colab Enterprise: Running the Colab Enterprise notebook during and execution having access to the following:
    • The notebook and configuration file on Cloud Storage.
    • Query relevant BigQuery views and tables.
    • Write the results back to Cloud Storage.
    • Workflows: Running the Workflow and Cloud Build to trigger Colab Enterprise notebook executions.

Grant permissions to the executing user

To execute the deployment in the project where Cloud Build is triggered, grant the following permissions to the executing user:

  • Service Usage Consumer
  • Storage Object Viewer for the Cloud Build default bucket or bucket for logs
  • Object Writer to the output buckets
  • Cloud Build Editor
  • Project Viewer or Storage Object Viewer

For more information about granting these permissions, see the following documentation:

Create and configure the Cloud Build service account

Cloud Build uses a service account to execute builds on your behalf. This section guides you through creating a dedicated service account for Cortex Framework deployments and granting the necessary permissions to the Cloud Build service account.

Create a new service account

Create a dedicated service account for deploying Cortex Framework for enhanced security and control. After creating the new service account you can then specify this service account during the deployment process using the _BUILD_ACCOUNT substitution parameter.

Before you begin, you need to enable the IAM API, set up authentication, and understand IAM service accounts and roles. For more information, see Create service accounts.

Create this new service account using either the Google Cloud console or the Google Cloud CLI.

Console

  1. Go the Service accounts page.

    Service accounts

    The remaining steps appear in the Google Cloud console.

  2. Select a Google Cloud project.

  3. Enter a service account and display name, with description and in the Google Cloud console. The Google Cloud console generates a service account ID based on this name. Edit the ID if necessary. You can't change the ID later. Use the following default values for the service account:

    • Name: "cortex-deployer".
    • Description: "Cortex Deployer Service Account".
    • Display name: "Cortex Deployer".
  4. Grant users access to this service account.

    1. Add the ID of all users (including your own) who can run the deployment through the service account.
    2. Assign the Service Account Token Creator role. For more information about this role, see Service account roles.
  5. Click Done to finish creating the service account.

If you already have a service account, follow these steps:

  1. Go to Service accounts.
  2. Select the Service account.
  3. Click the Permissions tab.
  4. Click Grant Access.
    1. Add the ID of all users (including your own) who can run the deployment through the service account.
    2. Assign the Service Account Token Creator role.

gcloud

  1. Create a service account through an IAM policy with the following command:

    gcloud iam service-accounts create cortex-deployer \
    --description="Cortex Deployer Service Account" \
    --display-name="Cortex Deployer"
    
  2. Add the IAM policy to your Google Cloud project with the following command:

    gcloud projects add-iam-policy-binding SOURCE_PROJECT \
    --member="serviceAccount:cortex-deployer@SOURCE_PROJECT.iam.gserviceaccount.com" \
    --role="roles/cloudbuild.builds.editor"
    
  3. Add the ID of all users (including your own) who can run the deployment through the service account, and assign them the Service Account Token Creator role with the following command:

    gcloud iam service-accounts add-iam-policy-binding cortex-deployer\
    --member="user:USER_EMAIL" \
    --role="roles/iam.serviceAccountTokenCreator"
    

    Replace the following:

    • SOURCE_PROJECT with the source project for Cortex Framework Data Foundation deployment.
    • USER_EMAIL with the executing user email.

For more information about creating a new service account, see Create a service account.

_BUILD_ACCOUNT substitution parameter

Use the _BUILD_ACCOUNT substitution parameter to execute the deployment through the new service account using the following command, replacing the placeholder values:

  gcloud builds submit \
    --config=./cloudbuild.yaml \
    --substitutions=_GCS_BUCKET=<BUCKET_FOR_LOGS>, \
    _BUILD_ACCOUNT='projects/SOURCE_PROJECT/serviceAccounts/SERVICE_ACCOUNT@SOURCE_PROJECT.iam.gserviceaccount.com'

Replace the following:

  • BUCKET_FOR_LOGS with the bucket name for the logs.
  • SERVICE_ACCOUNT with the name for your new Cortex Framework deployment service account (for example, cortex-deployer).
  • SOURCE_PROJECT with the source project ID for Cortex Framework Data Foundation deployment.

This service account triggers a Cloud Build job that in turn runs specific steps through the Cloud Build service account. This lets you trigger a deployment process without direct access to the resources.

Grant permissions

The Cloud Build service account needs specific permissions in your source project (and the target project if you are deploying to a separate project). Grant the following roles using either the Google Cloud console or the Google Cloud CLI:

  • Cloud Build service account (roles/cloudbuild.builds.builder)
  • Service account User (roles/iam.serviceAccountUser)
  • BigQuery Data Editor (roles/bigquery.dataEditor)
  • BigQuery Job User (roles/bigquery.jobUser)
  • Logs Writer (roles/logging.logWriter)
  • Colab Enterprise Admin (roles/aiplatform.colabEnterpriseAdmin - if deploying Cortex for Meridian)
  • Storage Object User (roles/storage.objectUser - if deploying Cortex for Meridian)
  • Workflows Editor (roles/workflows.editor - if deploying Cortex for Meridian)

Console

  1. In the Google Cloud console, go to the IAM page.

    Go to IAM

  2. Select your source project.

  3. Click Grant access.

  4. Add the default Cloud Build service account from the preceding step as a new principal.

  5. From the Select a role drop-down menu, search for Cloud Build service account, then click Cloud Build service account.

  6. Repeat the previous step to add the rest of the roles:

    • Service Account User
    • BigQuery Data Editor
    • BigQuery Job User
    • Logs Writer
    • Colab Enterprise Admin (if deploying Cortex for Meridian)
    • Storage Object User (if deploying Cortex for Meridian)
    • Workflows Editor (if deploying Cortex for Meridian)
  7. Click Save.

  8. Verify that the service account and the corresponding roles are listed in the IAM page. You have successfully granted an IAM role.

gcloud

Use the following command to grant the roles to the Cloud Build service account:

  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/cloudbuild.builds.builder"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/iam.serviceAccountUser"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/bigquery.dataEditor"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/bigquery.jobUser"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/logging.logWriter"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/aiplatform.colabEnterpriseAdmin"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/storage.objectUser"
  gcloud projects add-iam-policy-binding SOURCE_PROJECT \
      --member="serviceAccount:CLOUD_BUILD_SA" \
      --role="roles/workflows.editor"

Replace the following:

  • SOURCE_PROJECT with the Source project ID.
  • CLOUD_BUILD_SA with the Cloud Build service account, in the format CLOUD_BUILD_SA@SOURCE_PROJECT.gserviceaccount.com.

For more information, see Granting a role to the Cloud Build service account using the IAM page and Set and manage IAM policies on buckets.

Create an additional service account for Cortex for Meridian

A separate service account is necessary for Meridian when deploying Cortex Framework. If you are not deploying for Meridian, you can skip this step. A separate service account is required for the following:

  • Running the Workflow and Cloud Build to trigger Colab Enterprise notebook executions.
  • Running the Colab Enterprise notebook with access to:
    • Notebook and configuration file on Cloud Storage.
    • Query relevant BigQuery views and tables.
    • Write the results back to Cloud Storage.

Use the following command to create a service account for Cortex for Meridian:


 gcloud iam service-accounts create cortex-meridian-colab-runner --project= PROJECT_ID \
        --description="Cortex for Meridian Colab Runner Service Account" \
        --display-name="Cortex Meridian Runner"

Replace the following:

  • PROJECT_ID with the project ID where Cortex for Meridian is deployed.
  • cortex-meridian-colab-runner: Optionally, you can use a different service account ID. Consider that cortex-meridian-colab-runner is the default, and changing it might require adjustments to other commands.

The minimum required roles for a dedicated service account for Cortex for Meridian are the following:

  • BigQuery Data Viewer (roles/bigquery.dataViewer)
  • BigQuery Job User (roles/bigquery.jobUser)
  • BigQuery Read Session User (roles/bigquery.readSessionUser)
  • Cloud Build Editor (roles/cloudbuild.builds.editor)
  • Colab Enterprise Admin (roles/aiplatform.colabEnterpriseAdmin)
  • Logs Writer (roles/logging.logWriter)
  • Notebook Runtime Admin (aiplatform.notebookRuntimeAdmin)
  • Storage Admin (roles/storage.admin)
  • Storage Object User (roles/storage.objectUser)
  • Vertex AI Colab Service Agent (roles/aiplatform.colabServiceAgent)

Use the following command to grant the roles to the dedicated service account for Cortex for Meridian:

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/bigquery.dataViewer'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/bigquery.jobUser'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/bigquery.readSessionUser'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/cloudbuild.builds.editor'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/aiplatform.colabEnterpriseAdmin'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/logging.logWriter'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/aiplatform.notebookRuntimeAdmin'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/storage.admin'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/storage.objectUser'

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member='serviceAccount:SERVICE_ACCOUNT_EMAIL' \
    --role='roles/aiplatform.colabServiceAgent'

Replace the following:

  • PROJECT_ID with the project ID where Cortex for Meridian is deployed.
  • SERVICE_ACCOUNT_EMAIL with your service account for Cortex for Meridian.

Optional steps

For a better customization of your deployment, consider to follow these optional steps:

A storage bucket is required to store processing DAG scripts and other temporary files generated during deployment. These scripts need to be manually moved into a Cloud Composer or Apache Airflow instance after deployment.

You can create the storage bucket from Google Cloud CLI or Google Cloud console with the following steps.

Console

  1. Go to Cloud Storage.

    Cloud Storage

  2. Create a bucket in the same region as your BigQuery datasets.

  3. Select the created bucket.

  4. Go to the Permissions tab.

  5. Grant the permission Storage Object Creator to the user ID executing the Build command or to the Service account you created. For more information, see Set a new condition on a bucket: Console.

gcloud

  1. Create a bucket from the Cloud Shell with the following command:

    gcloud storage buckets create gs://DAG_BUCKET_NAME -l REGION/MULTI_REGION
    

    Replace the following:

    • DAG_BUCKET_NAME with the name for the new bucket.
    • REGION/MULTI_REGION with the same region as your BigQuery datasets.
  2. Use the following command to assign the permission Storage Object Creator to the service account:

    gsutil iam ch serviceAccount:CLOUD_BUILD_SA:roles/storage.objectCreator gs://DAG_BUCKET_NAME
    

    Replace the following:

    • CLOUD_BUILD_SA with the Cloud Build default service account.
    • DAG_BUCKET_NAME with the name for the new bucket.

Create a Storage bucket for logs

You can create a specific bucket for the Cloud Build process to store the logs. This is useful if you want to restrict data that may be stored in logs to a specific region. You can create the storage bucket for logs from Google Cloud CLI or Google Cloud console.

Console

To create a specific bucket for the logs, follow these steps:

  1. Go to Cloud Storage.

    Cloud Storage

  2. Create a bucket in the same region where the deployment would run.

  3. Select the created bucket.

  4. Go to the Permissions tab.

  5. Grant the permission Storage Object Admin to the user ID executing the Build command or to the Service account you created. For more information, see Set a new condition on a bucket: Console.

gcloud

To create a specific bucket for the logs, use the following commands.

  1. Create a bucket from the Cloud Shell with the following command:

    gcloud storage buckets create gs://LOGS_BUCKET_NAME -l REGION/MULTI_REGION
    

    Replace the following:

    • REGION/MULTI_REGION with the chosen region to create the bucket.
    • LOGS_BUCKET_NAME with the name for the new bucket.
  2. Use the following command to assign the permission Storage Object Admin to the service account:

    gsutil iam ch serviceAccount:CLOUD_BUILD_SA:roles/storage.objectAdmin gs://LOGS_BUCKET_NAME
    

    Replace the following:

    • CLOUD_BUILD_SA with the Cloud Build default service account.
    • LOGS_BUCKET_NAME with the name for the new bucket.

Next steps

After you complete this step, move on to the following deployment steps:

  1. Establish workloads.
  2. Clone repository.
  3. Determine integration mechanism.
  4. Set up components (this page).
  5. Configure deployment.
  6. Execute deployment.