Stay organized with collections
Save and categorize content based on your preferences.
Vertex AI offers online predictions on
Google Distributed Cloud (GDC) air-gapped through the Online Prediction API. A
prediction is the output of a trained machine-learning model. Specifically,
online predictions are synchronous requests made to your model endpoint.
Online Prediction lets you upload, deploy, serve, and make requests
using your own prediction models on
a set of supported containers.
Use Online Prediction when making requests in response to application
input or in situations requiring timely inference.
You can use the Online Prediction API by applying Kubernetes custom
resources to the dedicated prediction cluster
that your Infrastructure Operator (IO) creates for you.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eOnline Prediction is a Preview feature within Vertex AI on Google Distributed Cloud (GDC) air-gapped, designed for synchronous requests to trained machine-learning model endpoints, but not recommended for production environments.\u003c/p\u003e\n"],["\u003cp\u003eYou can utilize Online Prediction to upload, deploy, serve, and request predictions from your own models using a set of specified containers.\u003c/p\u003e\n"],["\u003cp\u003eThe Online Prediction API, accessed through Kubernetes custom resources, requires a dedicated prediction cluster set up by your Infrastructure Operator.\u003c/p\u003e\n"],["\u003cp\u003eBefore using Online Prediction, you must export model artifacts and deploy the model to an endpoint, associating it with compute resources for low-latency predictions.\u003c/p\u003e\n"],["\u003cp\u003eSupported containers for Online Prediction in Distributed Cloud include TensorFlow (version 2.14 for both CPU and GPU) and PyTorch (version 2.1 for both CPU and GPU).\u003c/p\u003e\n"]]],[],null,["# Learn about online predictions\n\n| **Preview:** Online Prediction is a Preview feature that is available as-is and is not recommended for production environments. Google provides no service-level agreements (SLA) or technical support commitments for Preview features. For more information, see GDC's [feature stages](/distributed-cloud/hosted/docs/latest/gdch/resources/feature-stages).\n\nVertex AI offers online predictions on\nGoogle Distributed Cloud (GDC) air-gapped through the Online Prediction API. A\nprediction is the output of a trained machine-learning model. Specifically,\nonline predictions are synchronous requests made to your model endpoint.\n\nOnline Prediction lets you upload, deploy, serve, and make requests\nusing your own prediction models on\n[a set of supported containers](#available-container-images).\nUse Online Prediction when making requests in response to application\ninput or in situations requiring timely inference.\n\nYou can use the Online Prediction API by applying Kubernetes custom\nresources to the dedicated [prediction cluster](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/prediction-user-cluster)\nthat your Infrastructure Operator (IO) creates for you.\n\nBefore getting online predictions, you must\n[export model artifacts](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-export-model-artifacts)\nand [deploy the model to an endpoint](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-deploy-model).\nThis action associates compute resources with the model to serve online\npredictions with low latency.\n\nThen, you can get online predictions from a custom-trained model by\n[formatting](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-format-online-prediction)\nand [sending](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/quickstart-op)\na request.\n\nAvailable container images\n--------------------------\n\nThe following table contains the list of supported containers for\nOnline Prediction in Distributed Cloud:"]]