[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[[["\u003cp\u003eOnline Prediction is a Preview feature within Vertex AI on Google Distributed Cloud (GDC) air-gapped, designed for synchronous requests to trained machine-learning model endpoints, but not recommended for production environments.\u003c/p\u003e\n"],["\u003cp\u003eYou can utilize Online Prediction to upload, deploy, serve, and request predictions from your own models using a set of specified containers.\u003c/p\u003e\n"],["\u003cp\u003eThe Online Prediction API, accessed through Kubernetes custom resources, requires a dedicated prediction cluster set up by your Infrastructure Operator.\u003c/p\u003e\n"],["\u003cp\u003eBefore using Online Prediction, you must export model artifacts and deploy the model to an endpoint, associating it with compute resources for low-latency predictions.\u003c/p\u003e\n"],["\u003cp\u003eSupported containers for Online Prediction in Distributed Cloud include TensorFlow (version 2.14 for both CPU and GPU) and PyTorch (version 2.1 for both CPU and GPU).\u003c/p\u003e\n"]]],[],null,["# Learn about online predictions\n\n| **Preview:** Online Prediction is a Preview feature that is available as-is and is not recommended for production environments. Google provides no service-level agreements (SLA) or technical support commitments for Preview features. For more information, see GDC's [feature stages](/distributed-cloud/hosted/docs/latest/gdch/resources/feature-stages).\n\nVertex AI offers online predictions on\nGoogle Distributed Cloud (GDC) air-gapped through the Online Prediction API. A\nprediction is the output of a trained machine-learning model. Specifically,\nonline predictions are synchronous requests made to your model endpoint.\n\nOnline Prediction lets you upload, deploy, serve, and make requests\nusing your own prediction models on\n[a set of supported containers](#available-container-images).\nUse Online Prediction when making requests in response to application\ninput or in situations requiring timely inference.\n\nYou can use the Online Prediction API by applying Kubernetes custom\nresources to the dedicated [prediction cluster](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/prediction-user-cluster)\nthat your Infrastructure Operator (IO) creates for you.\n\nBefore getting online predictions, you must\n[export model artifacts](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-export-model-artifacts)\nand [deploy the model to an endpoint](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-deploy-model).\nThis action associates compute resources with the model to serve online\npredictions with low latency.\n\nThen, you can get online predictions from a custom-trained model by\n[formatting](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/vertex-ai-format-online-prediction)\nand [sending](/distributed-cloud/hosted/docs/latest/gdch/application/ao-user/quickstart-op)\na request.\n\nAvailable container images\n--------------------------\n\nThe following table contains the list of supported containers for\nOnline Prediction in Distributed Cloud:"]]