This page explains Vertex AI's TensorFlow integration and provides resources that show you how to use TensorFlow on Vertex AI. Vertex AI's TensorFlow integration makes it easier for you to train, deploy, and orchestrate TensorFlow models in production.
Run code in notebooks
Vertex AI provides two options for running your code in notebooks, Colab Enterprise and Vertex AI Workbench. To learn more about these options, see choose a notebook solution.
Prebuilt containers for training
Vertex AI provides prebuilt Docker container images for model training. These containers are organized by machine learning frameworks and framework versions and include common dependencies that you might want to use in your training code.
To learn about which TensorFlow versions have prebuilt training containers and how to train models with a prebuilt training container, see Prebuilt containers for custom training.
Distributed training
You can run distributed training of TensorFlow models on Vertex AI. For multi-worker training, you can use Reduction Server to optimize performance even further for all-reduce collective operations. To learn more about distributed training on Vertex AI, see Distributed training.
Prebuilt containers for predictions
Similar to prebuilt containers for training, Vertex AI provides prebuilt container images for serving predictions and explanations from TensorFlow models that you either created within or outside of Vertex AI. These images provide HTTP predictions servers that you can use to serve predictions with minimal configuration.
To learn about which TensorFlow versions have prebuilt training containers and how to train models with a prebuilt training container, see Prebuilt containers for custom training.
Optimized TensorFlow runtime
The optimized TensorFlow runtime uses model optimizations and new proprietary Google technologies to improve the speed and lower the cost of predictions compared to Vertex AI's standard prebuilt prediction containers for TensorFlow.
TensorFlow Profiler integration
Train models cheaper and faster by monitoring and optimizing the performance of your training job using Vertex AI's TensorFlow Profiler integration. TensorFlow Profiler helps you understand the resource consumption of training operations so you can identify and eliminate performance bottlenecks.
To learn more about Vertex AI TensorFlow Profiler, see Profile model training performance using Profiler.
Resources for using TensorFlow on Vertex AI
To learn more and start using TensorFlow in Vertex AI, see the following resources.
Prototype to Production: A video series that provides and end-to-end example of developing and deploying a custom TensorFlow model on Vertex AI.
Optimize training performance with Reduction Server on Vertex AI: A blog post on optimizing distributed training on Vertex AI by using Reduction Server.
How to optimize training performance with the TensorFlow Profiler on Vertex AI: A blog post that shows you how to identify performance bottlenecks in your training job by using Vertex AI TensorFlow Profiler.
Custom model batch prediction with feature filtering: A notebook tutorial that shows you how to use the Vertex AI SDK for Python to train a custom tabular classification model and perform batch prediction with feature filtering.
Vertex AI Pipelines: Custom training with prebuilt Google Cloud Pipeline Components: A notebook tutorial that shows you how to use Vertex AI Pipelines with prebuilt Google Cloud Pipeline Components for custom training.
Co-host TensorFlow models on the same VM for predictions: A codelab that shows you how to use the co-hosting model feature in Vertex AI to host multiple models on the same VM for online predictions.