Training with a custom service account

You can configure AI Platform Training to use a service account of your choice when it runs your training application. Using a custom service account lets you customize what Google Cloud resources your training code can access without granting overly broad permissions to the service account that your AI Platform Training uses by default. Moreover, you can use a custom service account to give your code access to additional Google Cloud services like Secret Manager.

The guide focuses on AI Platform Training resources' permissions to access other Google Cloud resources. To learn about the permissions that you need to access AI Platform Training resources themselves, read Access control.

Understanding the service agent

By default, AI Platform Training uses a service agent to run training jobs. This service agent is identified by an email address with the following format:

PROJECT_NUMBER is replaced by the project number for your Google Cloud project.

Find the corresponding service account for your project in the Google Cloud console or by using the Google Cloud CLI:

Google Cloud console

Go to the IAM page in the Google Cloud console, select Include Google-provided role grants, and find the principal that matches the email address format described previously in this section. The service account also has the name Google Cloud ML Engine Service Agent.

Go to the IAM page


Run the following command in a Shell environment where you have initialized the gcloud CLI:

gcloud projects get-iam-policy PROJECT_ID \
  --flatten="bindings[].members" \
  --format="table(bindings.members)" \
  --filter="bindings.role:roles/ml.serviceAgent" \
  | grep serviceAccount:

Replace PROJECT_ID with the ID of your Google Cloud project.

This command outputs the following:


GOOGLE_MANAGED_SERVICE_ACCOUNT is the email address of your project's AI Platform service agent.

This service agent has permissions that are appropriate for most training jobs. For example, it can read from and write to Cloud Storage buckets in the same Google Cloud project.

If you need your training applications to run with additional permissions, you can assign additional Identity and Access Management (IAM) roles to this service account. For example, you can give it access to Cloud Storage buckets in other Google Cloud projects.

Using a custom service account

If you want to grant or limit Google Cloud permissions for a specific training job, use a custom service account in place of the service agent.

To do this, first set up a custom service account. Then specify the custom service account when you create a training job.

Set up a custom service account

To set up a custom service account, do the following:

  1. Create a user-managed service account.

  2. Grant your new service account IAM roles to provide your training application with any permissions that it needs when it runs.

  3. If the user-managed service account is in a different project than your training jobs, configure the user-managed service account so you can attach it to your training jobs.

Specify the custom service account for your training job

To configure AI Platform Training to use your custom service account when it runs your training application, specify the trainingInput.serviceAccount field when you create a training job.

If you use the gcloud CLI to create a training job, you must use a config.yaml file to specify this field. For example:


Replace CUSTOM_SERVICE_ACCOUNT with the email address of the user-managed service account that you set up in a previous section of this guide.

Accessing Google Cloud services from training code

In your training code, if you want to access other Google Cloud services from your training job, use Application Default Credentials (ADC). Many Google Cloud client libraries authenticate with ADC by default. You don't need to configure any environment variables; AI Platform Training automatically configures ADC to authenticate as the custom service account that you specified in the previous step.

However, when you use a Google Cloud client library in your training code, it might not connect to the correct Google Cloud project by default. If your training logs report permission errors, this might be the problem. When you create a training job, AI Platform Training does not run your training code directly in your Google Cloud project; instead AI Platform Training runs your code in a separate project managed by Google. AI Platform Training uses this project exclusively for operations related to your project. Therefore, don't try to infer a project ID from the environment in your training code; specify project IDs explicitly.

If you don't want to hardcode a project ID in your training code, you can reference the CLOUD_ML_PROJECT_ID environment variable. AI Platform Training sets this environment variable in every training container to contain the project number of the project where you initiated custom training. Many Google Cloud tools can accept a project number wherever they take a project ID.

For example, consider running a training job in a Google Cloud project with ID PROJECT_ID. If you want to use the Python Client for Google BigQuery to access a BigQuery table in the same project, then do not try to infer the project in your training code:

Implicit project selection

from import bigquery

client = bigquery.Client()

Instead use code that explicitly selects a project:

Explicit project selection

from import bigquery

project_number = os.environ["CLOUD_ML_PROJECT_ID"]

client = bigquery.Client(project=project_number)

What's next