MiniMax models

MiniMax models on Vertex AI offer fully managed and serverless models as APIs. To use a MiniMax model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because MiniMax models use a managed API, there's no need to provision or manage infrastructure.

You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

Available MiniMax models

The following models are available from MiniMax to use in Vertex AI. To access a MiniMax model, go to its Model Garden model card.

MiniMax M2

MiniMax M2 is a model from MiniMax that's designed for agentic and code-related tasks. It is built for end-to-end development workflows and has strong capabilities in planning and executing complex tool-calling tasks. The model is optimized to provide a balance of performance, cost, and inference speed.

Go to the MiniMax M2 model card

Use MiniMax models

You can use curl commands to send requests to the Vertex AI endpoint using the following model names:

For MiniMax M2, use minimax-m2-maas

To learn how to make streaming and non-streaming calls to MiniMax models, see Call open model APIs.

MiniMax model region availability and quotas

For MiniMax models, a quota applies for each region where the model is available. The quota is specified in queries per minute (QPM).

Model	Region	Quotas	Context length
MiniMax M2
MiniMax M2	`global endpoint`		196,608

If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see the Cloud Quotas overview.

What's next

Learn how to Call open model APIs.