Overview of language models

You can access, tune, and deploy Google's generative AI language models by using the Vertex AI PaLM API and Vertex AI Codey APIs. Use the language models to perform a variety of workflows from Vertex AI, like use the APIs to interact with the models and deploy the models to a Jupyter notebook. You can also customize language models for your specific use case by performing model tuning. This page gives you an overview of the available language models, the APIs you use to interact with the models, and ways to customize their behaviors.

Model naming scheme

Foundation model names have two components: use case and model size. The naming convention is in the format <use case>-<model size>. For example, text-bison represents the Bison text model.

The model sizes are:

  • Unicorn: The largest model in PaLM family. Unicorn models excel at complex tasks, such as coding and chain-of-thought (CoT), due to the extensive knowledge embedded into the model and its reasoning capabilities.
  • Bison: The best value PaLM model that handles a wide range of language tasks, such as classification, summarization. It is optimized for accuracy and latency at a reasonable cost. The text, chat, code, and codechat interfaces simplifies deployment and integration into your application.
  • Gecko: Smallest and lowest cost model for simple tasks.

You can use the stable or the latest version of a model. For more information, see Model versions and lifecycle.

Vertex AI PaLM API

The Vertex AI PaLM API gives you access to the PaLM 2 family of models, which support the generation of natural language text, text embeddings, and code (we recommend using the Vertex AI Codey APIs for code generation). PaLM 2 is the second generation of the Pathways Language Model developed by Google Labs. By using the Vertex AI PaLM API, you can take advantage of the MLOps tools, enterprise-level security, safety, privacy, and scalability offered by Vertex AI.

The Vertex AI PaLM API exposes PaLM 2 models by using global publisher endpoints that are unique to each Google Cloud project. The following is an example of a publisher model endpoint:


The Vertex AI PaLM API has publisher endpoints for the following PaLM 2 models:

  • text-bison: Optimized for performing natural language tasks, such as classification, summarization, extraction, content creation, and ideation.
  • chat-bison: Optimized for multi-turn chat, where the model keeps track of previous messages in the chat and uses it as context for generating new responses.
  • textembedding-gecko: Generates text embeddings for a given text. You can use embeddings for tasks like semantic search, recommendation, classification, and outlier detection.

To learn more about these models, see Available models.

Vertex AI Codey APIs

The Vertex AI Codey APIs are optimized to support code generation, code chat, and code completion for several programming languages. The Vertex AI Codey APIs are based on the PaLM 2 family of models. For more information, see Code models overview.

Prompt design

To interact with foundation models like PaLM 2, you send natural language instructions, also called prompts, that tell the model what you want it to generate. However, LLMs can sometimes behave in unpredictable ways. Prompt design is an iterative process of trial and error that takes time and practice to become proficient in. To learn about general prompt design strategies, see Introduction to prompt design. For task-specific prompt design guidance for text, see Design text prompts.

Model tuning

If you need to customize PaLM 2 models for a specific use case, you can tune the models by using a dataset of input and output examples. Tuned models are automatically deployed to a new endpoint in your project to serve requests. To learn more about model tuning, see Tune foundation models.

What's next