Mistral AI models on Vertex AI offer fully managed and serverless
models as APIs. To use a Mistral AI model on Vertex AI, send
a request directly to the Vertex AI API endpoint. Because
Mistral AI models use a managed API, there's no need to provision or
manage infrastructure. You can stream your responses to reduce the end-user latency perception. A
streamed response uses server-sent events (SSE) to incrementally stream the
response. You pay for Mistral AI models as you use them (pay as you go). For
pay-as-you-go pricing, see Mistral AI model pricing on the
Vertex AI pricing
page. The following models are available from Mistral AI to use in
Vertex AI. To access a Mistral AI model, go to its
Model Garden model card. Mistral OCR (25.05) is an Optical Character Recognition API for document
understanding. Mistral OCR (25.05) excels in understanding complex
document elements, including interleaved imagery, mathematical expressions,
tables, and advanced layouts such as LaTeX formatting. The model enables deeper
understanding of rich documents such as scientific papers with charts, graphs,
equations and figures. Mistral OCR (25.05) is an ideal model to use in combination with a RAG
system that takes multimodal documents (such as slides or complex PDFs) as input. You can couple Mistral OCR (25.05) with other Mistral models
to reformat the results. This combination ensures that the extracted
content is not only accurate but also presented in a structured and coherent
manner, making it suitable for various downstream applications and analyses. Go to the Mistral OCR (25.05) model card Mistral Small 3.1 (25.03) features multimodal capabilities and a context of up
to 128,000. The model can process and understand visual inputs and long
documents, further expanding its range of applications compared to the previous
Mistral AI Small model. Mistral Small 3.1 (25.03) is a versatile model
designed for various tasks such as programming, mathematical reasoning, document
understanding, and dialogue. Mistral Small 3.1 (25.03) is designed for
low-latency applications to deliver best-in-class efficiency compared to models
of the same quality. Mistral Small 3.1 (25.03) has undergone a full post-training process to align
the model with human preferences and needs, making it usable out-of-the-box for
applications that require chat or precise instruction following. Go to the Mistral Small 3.1 (25.03) model card Mistral Large (24.11) is the latest version of Mistral AI's Large
model now with improved reasoning and function calling capabilities. Go to the Mistral Large (24.11) model card Codestral (25.01) is designed for code generation tasks. It
helps developers write and interact with code through a shared instruction and
completion API endpoint. As it masters code along with its ability to converse
in a variety of languages, you can use Codestral (25.01) to design
advanced AI applications for software developers. Codestral (25.01) is optimized for the following use cases: Go to the Codestral (25.01) model card You can use curl commands to send requests to the Vertex AI endpoint
using the following model names: For more information about using the Mistral AI SDK, see the
Mistral AI Vertex AI documentation. To use Mistral AI models with Vertex AI, you must perform the
following steps. The Vertex AI API
( In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Verify that billing is enabled for your Google Cloud project.
Enable the Vertex AI API.
The following sample makes a streaming call to a Mistral AI model.
After you
set up your environment,
you can use REST to test a text prompt. The following sample sends a request to the publisher
model endpoint.
Before using any of the request data,
make the following replacements:
Specify a lower value for shorter responses and a higher value for potentially longer
responses.
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following.Available Mistral AI models
Mistral OCR (25.05)
Mistral Small 3.1 (25.03)
Mistral Large (24.11)
Codestral (25.01)
Use Mistral AI models
mistral-ocr-2505
mistral-small-2503
mistral-large-2411
mistral-nemo
codestral-2501
Before you begin
aiplatform.googleapis.com
) must be enabled to use
Vertex AI. If you already have an existing project with the
Vertex AI API enabled, you can use that project instead of creating a
new project.
Make a streaming call to a Mistral AI model
REST
@
model version
number.user
or an assistant
.
The first message must use the user
role. The models
operate with alternating user
and assistant
turns.
If the final message uses the assistant
role, then the response
content continues immediately from the content in that message. You can use
this to constrain part of the model's response.true
to stream the response
and false
to return the response all at once.user
or assistant
message.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:streamRawPredict
{
"model": MODEL,
"messages": [
{
"role": "ROLE",
"content": "CONTENT"
}],
"max_tokens": MAX_TOKENS,
"stream": true
}
curl
request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:streamRawPredict"PowerShell
request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:streamRawPredict" | Select-Object -Expand Content
Make a unary call to a Mistral AI model
The following sample makes a unary call to a Mistral AI model.
REST
After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.
Before using any of the request data, make the following replacements:
- LOCATION: A region that supports Mistral AI models.
- MODEL: The model name you want to use. In
the request body, exclude the
@
model version number. - ROLE: The role associated with a
message. You can specify a
user
or anassistant
. The first message must use theuser
role. The models operate with alternatinguser
andassistant
turns. If the final message uses theassistant
role, then the response content continues immediately from the content in that message. You can use this to constrain part of the model's response. - STREAM: A boolean that specifies
whether the response is streamed or not. Stream your response to reduce the
end-use latency perception. Set to
true
to stream the response andfalse
to return the response all at once. - CONTENT: The content, such as
text, of the
user
orassistant
message. - MAX_OUTPUT_TOKENS:
Maximum number of tokens that can be generated in the response. A token is
approximately 3.5 characters. 100 tokens correspond to roughly 60-80 words.
Specify a lower value for shorter responses and a higher value for potentially longer responses.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:rawPredict
Request JSON body:
{ "model": MODEL, "messages": [ { "role": "ROLE", "content": "CONTENT" }], "max_tokens": MAX_TOKENS, "stream": false }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:rawPredict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:rawPredict" | Select-Object -Expand Content
You should receive a JSON response similar to the following.
Mistral AI model region availability and quotas
For Mistral AI models, a quota applies for each region where the model is available. The quota is specified in queries per minute (QPM) and tokens per minute (TPM). TPM includes both input and output tokens.
Model | Region | Quotas | Context length |
---|---|---|---|
Mistral OCR (25.05) | |||
us-central1 |
|
30 pages | |
europe-west4 |
|
30 pages | |
Mistral Small 3.1 (25.03) | |||
us-central1 |
|
128,000 | |
europe-west4 |
|
128,000 | |
Mistral Large (24.11) | |||
us-central1 |
|
128,000 | |
europe-west4 |
|
128,000 | |
Mistral Nemo | |||
us-central1 |
|
128,000 | |
europe-west4 |
|
128,000 | |
Codestral (25.01) | |||
us-central1 |
|
32,000 | |
europe-west4 |
|
32,000 |
If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.