[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Fully-managed Llama models\n\n\u003cbr /\u003e\n\nLlama models on Vertex AI offer fully managed and serverless\nmodels as APIs. To use a Llama model on Vertex AI, send\na request directly to the Vertex AI API endpoint. Because\nLlama models use a managed API, there's no need to provision or\nmanage infrastructure.\n\nYou can stream your responses to reduce the end-user latency perception. A\nstreamed response uses server-sent events (SSE) to incrementally stream the\nresponse.\n\nAvailable Llama models\n----------------------\n\nThe following Llama models are available from Meta to use in\nVertex AI. To access a Llama model, go to its\nModel Garden model card.\n\nModels that are in [Preview](/products#product-launch-stages) also have self-deploy option. If you\nrequire a production-ready service, use the [self-deploy Llama\nmodels](/vertex-ai/generative-ai/docs/open-models/use-llama).\n\n### Llama 4 Maverick 17B-128E\n\nLlama 4 Maverick 17B-128E is the largest and most capable Llama 4 model that\noffers coding, reasoning, and image capabilities. It features\nMixture-of-Experts (MoE) architecture with 17 billion active parameters out of\n400 billion total parameters and 128 experts. Llama 4 Maverick 17B-128E uses\nalternating dense and MoE layers, where each token activates a shared expert\nplus one of the 128 routed experts. The model is pretrained on 200 languages and\noptimized for high-quality chat interactions through a refined post-training\npipeline.\n\nLlama 4 Maverick 17B-128E is multimodal and is suited for advanced image\ncaptioning, analysis, precise image understanding, visual questions and answers,\ncreative text generation, general-purpose AI assistants, and sophisticated\nchatbots requiring top-tier intelligence and image understanding.\n\n#### Considerations\n\n- You can include a maximum of three images per request.\n- The MaaS endpoint doesn't use Llama Guard, unlike previous versions. To use Llama Guard, deploy Llama Guard from Model Garden and then send the prompts and responses to that endpoint. However, compared to Llama 4, Llama Guard has a more limited context (128,000) and can only process requests with a single image at the beginning of the prompt.\n- Batch predictions aren't supported.\n\n[Go to the Llama 4 model card](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-4-maverick-17b-128e-instruct-maas)\n\n### Llama 4 Scout 17B-16E\n\nLlama 4 Scout 17B-16E delivers state-of-the-art results for its size class that\noutperforms previous Llama generations and other open and proprietary models on\nseveral benchmarks. It features MoE architecture with 17 billion active\nparameters out of the 109 billion total parameters and 16 experts.\n\nLlama 4 Scout 17B-16E is suited for retrieval tasks within long contexts and\ntasks that demand reasoning over large amounts of information, such as\nsummarizing multiple large documents, analyzing extensive user interaction logs\nfor personalization, and reasoning across large codebases.\n\n[Go to the Llama 4 model card](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-4-maverick-17b-128e-instruct-maas)\n\n#### Considerations\n\n- You can include a maximum of three images per request.\n- The MaaS endpoint doesn't use Llama Guard, unlike previous versions. To use Llama Guard, deploy Llama Guard from Model Garden and then send the prompts and responses to that endpoint. However, compared to Llama 4, Llama Guard has a more limited context (128,000) and can only process requests with a single image at the beginning of the prompt.\n- Batch predictions aren't supported.\n\n[Go to the Llama 4 model card](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-4-maverick-17b-128e-instruct-maas)\n\n### Llama 3.3\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nLlama 3.3 is a text-only 70B instruction-tuned model that provides enhanced\nperformance relative to Llama 3.1 70B and to Llama 3.2 90B when used for\ntext-only applications.\n\n[Go to the Llama 3.3 70B model card](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3.3-70b-instruct-maas)\n\nDuring the Preview period, you are charged as you use the model (pay as you go).\nFor pay-as-you-go pricing, see Llama model pricing on the\nVertex AI [pricing page](/vertex-ai/generative-ai/pricing#partner-models).\n\n### Llama 3.2\n\n|\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nLlama 3.2 lets developers to build and deploy the latest generative AI models\nand applications that use the latest Llama's capabilities, such as image\nreasoning. Llama 3.2 is also designed to be more accessible for on-device\napplications.\n\n[Go to the Llama 3.2 90B model card](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3.2-90b-vision-instruct-maas)\n\nThere are no charges during the Preview period. If you require a\nproduction-ready service, use the [self-hosted Llama models](/vertex-ai/generative-ai/docs/open-models/use-llama).\n\n#### Considerations\n\nWhen using `llama-3.2-90b-vision-instruct-maas`, there are no restriction when you send\ntext-only prompts. However, if you include an image in your prompt, the image\nmust be at beginning of your prompt, and you can include only one image. You\ncannot, for example, include some text and then an image.\n\n### Llama 3.1\n\n| To see an example of getting started with Llama on Vertex AI,\n| run the \"Getting started with Llama 3.1\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_openai_api_llama3_1.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fcommunity%2Fmodel_garden%2Fmodel_garden_openai_api_llama3_1.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fcommunity%2Fmodel_garden%2Fmodel_garden_openai_api_llama3_1.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_openai_api_llama3_1.ipynb)\n\nLlama 3.1 is an auto-regressive language model that uses an optimized\ntransformer architecture. The tuned versions use supervised fine-tuning (SFT)\nand reinforcement learning with human feedback (RLHF) to align with human\npreferences for helpfulness and safety.\n\nLlama 3.1 405B is [Generally Available](/products#product-launch-stages). You are charged\nas you use the model (pay as you go). For pay-as-you-go pricing, see\nLlama model pricing on the Vertex AI [pricing page](/vertex-ai/generative-ai/pricing#partner-models).\n\nThe other Llama 3.1 models are in Preview. There are no charges for the Preview\nmodels. If you require a production-ready service, use the [self-hosted Llama\nmodels](/vertex-ai/generative-ai/docs/open-models/use-llama).\n\n[Go to the Llama 3.1 model card](https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-3.1-405b-instruct-maas)\n\nWhat's next\n-----------\n\n[Learn how to use Llama models](/vertex-ai/generative-ai/docs/partner-models/llama/use-llama)."]]