Export model artifacts for prediction

Google Distributed Cloud (GDC) air-gapped offers prebuilt containers to serve online predictions from models trained using the following machine learning (ML) frameworks:

  • TensorFlow
  • PyTorch

To use one of these prebuilt containers, you must save your model as one or more model artifacts that comply with the requirements of the prebuilt container. These requirements apply whether or not your model artifacts are created on Distributed Cloud.

Before you begin

Before exporting model artifacts, perform the following steps:

  1. Create and train a prediction model targeting one of the supported containers.
  2. If you don't have a project, work with your Platform Administrator (PA) to create one.
  3. Work with your Infrastructure Operator (IO) to create the Prediction user cluster. The IO creates the cluster for you, associates it with your project, and assigns the appropriate node pools within the cluster, considering the resources you need for online predictions.
  4. Create a storage bucket for your project.
  5. Create the Vertex AI Default Serving (vai-default-serving-sa) service identity within your project. For more information about how to create service identities, see Manage service identities.
  6. Grant the Project Bucket Object Viewer (project-bucket-object-viewer) role to the Vertex AI Default Serving (vai-default-serving-sa) service identity for the storage bucket you created. For more information about how to grant bucket access to service identities, see Grant bucket access.

Framework-specific requirements for exporting to prebuilt containers

Depending on which ML framework you plan to use for prediction, you must export model artifacts in different formats. The following sections describe the acceptable model formats for each ML framework.

TensorFlow

If you use TensorFlow to train a model, export your model as a TensorFlow SavedModel directory.

There are several ways to export SavedModels from TensorFlow training code. The following list describes a few ways that work for various TensorFlow APIs:

If you are not using Keras or an Estimator, then make sure to use the serve tag and serving_default signature when you export your SavedModel to ensure Vertex AI can use your model artifacts to serve predictions. Keras and Estimator handle this task automatically. Learn more about specifying signatures during export.

To serve predictions using these artifacts, create a Model with the prebuilt container for prediction matching the version of TensorFlow that you used for training.

PyTorch

If you use PyTorch to train a model, you must package the model artifacts including either a default or custom handler by creating an archive file using Torch model archiver. The prebuilt PyTorch images expect the archive to be named model.mar, so make sure you set the model-name to 'model'.

For information about optimizing the memory usage, latency, or throughput of a PyTorch model served with TorchServe, see the PyTorch performance guide.

Upload your model

You must upload your model to the storage bucket you created. For more information about how to upload objects to storage buckets, see Upload and download storage objects in projects.

The path to the storage bucket of your model must have the following structure:

s3://BUCKET_NAME/MODEL_ID/MODEL_VERSION_ID

For export details, see the framework-specific requirements for exporting to prebuilt containers.