This legacy version of AI Platform Training is deprecated and will no longer be available on Google Cloud after January 31, 2025. Migrate your resources to Vertex AI custom training to get new machine learning features that are unavailable in AI Platform.

Training using the built-in BERT algorithm
Stay organized with collections Save and categorize content based on your preferences.

Training with built-in algorithms on AI Platform Training allows you to submit your dataset and train a model without writing any training code. This page explains how the built-in BERT algorithm works, and how to use it.

Overview

This built-in algorithm can do both training and model exporting:

Training: Using the dataset and the model parameters you supplied, AI Platform Training runs training using TensorFlow's BERT implementation.
Exporting: Using the initial checkpoint supplied, produce a serialized model in the desired job directory. This model can then be deployed to AI Platform.

Limitations

The following features are not supported for training with the built-in BERT algorithm:

Automated Data Preprocessing This version of BERT requires input data to be in the form of TFRecords for both training and output. A training application must be made to handle unformatted input automatically.

Supported machine types

The following AI Platform Training scale tiers and machine types are supported:

BASIC scale tier
BASIC_TPU scale tier
CUSTOM scale tier with any of the Compute Engine machine types supported by AI Platform Training.
CUSTOM scale tier with any of the following legacy machine types:
- standard
- large_model
- complex_model_s
- complex_model_m
- complex_model_l
- standard_gpu
- standard_p100
- standard_v100
- large_model_v100
- complex_model_m_gpu
- complex_model_l_gpu
- complex_model_m_p100
- complex_model_m_v100
- complex_model_l_v100
- TPU_V2 (8 cores)

We recommend using a machine type with access to TPUs.

Format input data

Ensure that input and evaluation data are in the form of TFRecords before training the model.

Check Cloud Storage bucket permissions

To store your data, use a Cloud Storage bucket in the same Google Cloud project you're using to run AI Platform Training jobs. Otherwise, grant AI Platform Training access to the Cloud Storage bucket where your data is stored.

Submit a BERT training job

This section explains how to submit a training job using the built-in BERT algorithm.

You can find brief explanations of each hyperparameter within the Google Cloud console, and a more comprehensive explanation in the reference for the built-in BERT algorithm.

Console

Go to the AI Platform Training Jobs page in the Google Cloud console:

AI Platform Training Jobs page
Click the New training job button. From the options that display below, click Built-in algorithm training.
On the Create a new training job page, select BERT and click Next.
To learn more about all the available parameters, follow the links in the Google Cloud console and refer to the built-in BERT reference for more details.

gcloud

Set environment variables for your job, filling in [VALUES-IN-BRACKETS] with your own values:

   # Specify the name of the Cloud Storage bucket where you want your
   # training outputs to be stored, and the Docker container for
   # your built-in algorithm selection.
   BUCKET_NAME='BUCKET_NAME'
   IMAGE_URI='gcr.io/cloud-ml-algos/bert:latest'

   DATE="$(date '+%Y%m%d_%H%M%S')"
   MODEL_NAME='MODEL_NAME'
   JOB_ID="${MODEL_NAME}_${DATE}"

   JOB_DIR="gs://${BUCKET_NAME}/algorithm_training/${MODEL_NAME}/${DATE}"
   BERT_BASE_DIR='gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16'
   MODEL_DIR='${STORAGE_BUCKET}/bert-output'
   GLUE_DIR='gs://cloud-tpu-checkpoints/bert/classification'
   TASK='mnli'

Submit the training job using gcloud ai-platform jobs training submit. Adjust this generic example to work with your dataset:

   gcloud ai-platform jobs submit training $JOB_ID \
      --master-image-uri=$IMAGE_URI --scale-tier=BASIC_TPU --job-dir=$JOB_DIR \
      -- \
      --mode='train_and_eval' \
      --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
      --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
      --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
      --bert_config_file=${BERT_BASE_DIR}/bert_config.json \
      --init_checkpoint=${BERT_BASE_DIR}/bert_model.ckpt \
      --train_batch_size=32 \
      --eval_batch_size=32 \
      --learning_rate=2e-5 \
      --num_train_epochs=1 \
      --steps_per_loop=1000

Monitor the status of your training job by viewing logs with gcloud. Refer to gcloud ai-platform jobs describe and gcloud ai-platform jobs stream-logs.
```
   gcloud ai-platform jobs describe ${JOB_ID}
   gcloud ai-platform jobs stream-logs ${JOB_ID}
```

Further learning resources

Learn more about Cloud TPU.
Learn more about TensorFlow Model Garden.

Getting started with the BERT algorithm

BERT algorithm reference