TensorFlow integration

This page explains Vertex AI's TensorFlow integration and provides resources that show you how to use TensorFlow on Vertex AI. Vertex AI's TensorFlow integration makes it easier for you to train, deploy, and orchestrate TensorFlow models in production.

Run code in notebooks

Vertex AI provides two options for running your code in notebooks, Colab Enterprise and Vertex AI Workbench. To learn more about these options, see choose a notebook solution.

Prebuilt containers for training

Vertex AI provides prebuilt Docker container images for model training. These containers are organized by machine learning frameworks and framework versions and include common dependencies that you might want to use in your training code.

To learn about which TensorFlow versions have prebuilt training containers and how to train models with a prebuilt training container, see Prebuilt containers for custom training.

Distributed training

You can run distributed training of TensorFlow models on Vertex AI. For multi-worker training, you can use Reduction Server to optimize performance even further for all-reduce collective operations. To learn more about distributed training on Vertex AI, see Distributed training.

Prebuilt containers for inference

Similar to prebuilt containers for training, Vertex AI provides prebuilt container images for serving inferences and explanations from TensorFlow models that you either created within or outside of Vertex AI. These images provide HTTP inference servers that you can use to serve inferences with minimal configuration.

To learn about which TensorFlow versions have prebuilt training containers and how to train models with a prebuilt training container, see Prebuilt containers for custom training.

Optimized TensorFlow runtime

The optimized TensorFlow runtime uses model optimizations and new proprietary Google technologies to improve the speed and lower the cost of inferences compared to Vertex AI's standard prebuilt inference containers for TensorFlow.

TensorFlow Cloud Profiler integration

Train models cheaper and faster by monitoring and optimizing the performance of your training job using Vertex AI's TensorFlow Cloud Profiler integration. TensorFlow Cloud Profiler helps you understand the resource consumption of training operations so you can identify and eliminate performance bottlenecks.

To learn more about Vertex AI TensorFlow Cloud Profiler, see Profile model training performance using Profiler.

Resources for using TensorFlow on Vertex AI

To learn more and start using TensorFlow in Vertex AI, see the following resources.

Prototype to Production: A video series that provides and end-to-end example of developing and deploying a custom TensorFlow model on Vertex AI.
Optimize training performance with Reduction Server on Vertex AI: A blog post on optimizing distributed training on Vertex AI by using Reduction Server.
How to optimize training performance with the TensorFlow Cloud Profiler on Vertex AI: A blog post that shows you how to identify performance bottlenecks in your training job by using Vertex AI TensorFlow Cloud Profiler.
Custom model batch prediction with feature filtering: A notebook tutorial that shows you how to use the Vertex AI SDK for Python to train a custom tabular classification model and perform batch inference with feature filtering.
Vertex AI Pipelines: Custom training with prebuilt Google Cloud Pipeline Components: A notebook tutorial that shows you how to use Vertex AI Pipelines with prebuilt Google Cloud Pipeline Components for custom training.
Co-host TensorFlow models on the same VM for predictions: A codelab that shows you how to use the co-hosting model feature in Vertex AI to host multiple models on the same VM for online inferences.