Google Distributed Cloud (GDC) air-gapped offers prebuilt containers to serve online predictions from models trained using the following machine learning (ML) frameworks:
- TensorFlow
- PyTorch
To use one of these prebuilt containers, you must save your model as one or more model artifacts that comply with the requirements of the prebuilt container. These requirements apply whether or not your model artifacts are created on Distributed Cloud.
Before you begin
Before exporting model artifacts, perform the following steps:
- Create and train a prediction model targeting one of the supported containers.
- If you don't have a project, work with your Platform Administrator (PA) to create one.
- Work with your Infrastructure Operator (IO) to create the Prediction user cluster. The IO creates the cluster for you, associates it with your project, and assigns the appropriate node pools within the cluster, considering the resources you need for online predictions.
- Create a storage bucket for your project.
- Create the Vertex AI Default Serving
(
vai-default-serving-sa
) service identity within your project. For more information about how to create service identities, see Manage service identities. - Grant the Project Bucket Object Viewer (
project-bucket-object-viewer
) role to the Vertex AI Default Serving (vai-default-serving-sa
) service identity for the storage bucket you created. For more information about how to grant bucket access to service identities, see Grant bucket access. - To get the permissions that you need to access the Online Prediction service, ask your Project IAM Admin to grant you the Vertex AI Prediction User (
vertex-ai-prediction-user
) role.
Framework-specific requirements for exporting to prebuilt containers
Depending on which ML framework you plan to use for prediction, you must export model artifacts in different formats. The following sections describe the acceptable model formats for each ML framework.
TensorFlow
If you use TensorFlow to train a model, export your model as a TensorFlow SavedModel directory.
There are several ways to export SavedModels
from TensorFlow training code.
The following list describes a few ways that work for various TensorFlow APIs:
If you use Keras for training, use
tf.keras.Model.save
to export a SavedModelIf you use an Estimator for training, use
tf.estimator.Estimator.export_saved_model
to export a SavedModel.Otherwise, use
tf.saved_model.save
or usetf.compat.v1.saved_model.SavedModelBuilder
.
If you are not using Keras or an Estimator, then make sure to use the serve
tag and serving_default
signature when you export your SavedModel
to ensure Vertex AI can use your model artifacts to serve
predictions. Keras and Estimator handle this task automatically. Learn more
about specifying signatures during export.
To serve predictions using these artifacts, create a Model
with the prebuilt container for prediction
matching the version of TensorFlow that you used for training.
PyTorch
If you use PyTorch to train a model, you
must package the model artifacts including either a
default or
custom handler by
creating an archive file using Torch model archiver. The prebuilt PyTorch images expect the archive to be named model.mar
, so make sure
you set the model-name to 'model'.
For information about optimizing the memory usage, latency, or throughput of a PyTorch model served with TorchServe, see the PyTorch performance guide.
Upload your model
You must upload your model to the storage bucket you created. For more information about how to upload objects to storage buckets, see Upload and download storage objects in projects.
The path to the storage bucket of your model must have the following structure:
s3://BUCKET_NAME/MODEL_ID/MODEL_VERSION_ID
For export details, see the framework-specific requirements for exporting to prebuilt containers.