Google Cloud offers a range of products and tools for building generative AI applications with enterprise-grade scaling, security, and observability.

Use this page to learn the stages of developing a generative AI application, choose the best products and tools for your use case, and access the documentation you need to get started.

Learn the fundamentals of generative AI development

Learn about developing a generative AI application.
Learn about common generative AI use cases and model types.
Identify whether generative AI, traditional AI, or a combination of both might suit your business use case.
Learn how to address the challenges in each stage of developing a generative AI application.

Choose infrastructure for your generative AI application

Learn which products, frameworks, and tools are the best match for building your generative AI application. Common components in a Cloud-hosted generative AI application include:

  1. Application hosting: Compute to host your application. Your application can use Google Cloud's client libraries and SDKs to talk to different Cloud products.
  2. Model hosting: Scalable and secure hosting for a generative model.
  3. Model: Generative model for text, chat, images, code, embeddings, and multimodal.
  4. Grounding solution: Anchor model output to verifiable, updated sources of information.
  5. Database: Store your application's data. You might reuse your existing database as your grounding solution, by augmenting prompts via SQL query, and/or storing your data as vector embeddings using an extension like pgvector.
  6. Storage: Store files such as images, videos, or static web frontends. You might also use Storage for the raw grounding data (eg. PDFs) that you later convert into embeddings and store in a vector database.

Diagram showing a high-level overview of a gen AI application hosting infrastructure, including a model and its model hosting infrastructure, grounding solution, database, storage, and application hosting.

The sections below walk through each of those components, helping you choose which Google Cloud products to try.

Application hosting infrastructure

Choose a product to host and serve your application workload, which makes calls out to the generative model.
Decision tree guiding users through the selection of an appropriate service for application hosting.

Get started with:

Model hosting infrastructure

Google Cloud provides multiple ways to host a generative model, from the flagship Vertex AI platform, to customizable and portable hosting on Google Kubernetes Engine.

Decision tree guiding users to choose the right model hosting cloud service based on their priorities and requirements.

Get started with:


Google Cloud provides a set of state-of-the-art foundation models through Vertex AI, including Gemini. You can also deploy a third-party model to either Vertex AI Model Garden or self-host on GKE, Cloud Run, or Compute Engine.

Decision tree guiding users to choose a Vertex AI service, to generate text or code, with options for using text embeddings, images, or video.

Get started with:


To ensure informed and accurate model responses, you may want to ground your generative AI application with real-time data. This is called retrieval-augmented generation (RAG).

You can implement grounding with your own data in a vector database, which is an optimal format for operations like similarity search. Google Cloud offers multiple vector database solutions, for different use cases.

Note: You can also ground with traditional (non vector) databases, by simply querying an existing database like Cloud SQL or Firestore, and using the result in your model prompt.

Decision tree guiding the user through choosing the right vector database solution for their needs.

Get started with:

Grounding with APIs

Instead of (or in addition to) using your own data for grounding, many online services offer APIs that you can use to retrieve grounding data to augment your model prompt.
Create, deploy, and manage extensions that connect large language models to the APIs of external systems.
Explore a variety of document loaders and API integrations for your gen AI apps, from YouTube to Google Scholar.
If you're using models hosted in Vertex AI, you can ground model responses using Vertex AI Search, Google Search, or inline/infile text.

Start Building

Set up your development environment

Install the tools you need to build a generative AI application on Google Cloud.
Command line tools for setting up a local development environment and for interacting with Cloud APIs.
View API documentation and samples in your IDE, and speed up GKE and Cloud Run local development.
Learn how to authenticate to Google Cloud APIs from your local environment and from hosted workloads.
LangChain is an open source framework for gen AI apps that allows you to build context into your prompts, and take action based on the model's response.

Design prompts and evaluate models

Learn strategies for prompting, and experiment with different models.
Learn strategies for designing prompts for different kinds of data types, from text and code to multimodal.
Learn how to design, test, and manage your prompts in Vertex AI Studio.
View dozens of prompt examples spanning use cases including classification, ideation, and summarization.
Explore prompt examples for generating, developing, and communicating new ideas.

Code samples

Fork a sample on GitHub, and start building.

Learn to build a web-based question-answering chatbot, using Vertex AI Agent Builder and Firebase.

Beginner Node.js

Learn to build a simple Python Flask application that calls a pre-trained foundation model in Vertex AI.

Beginner Python

Build a web app to generate marketing campaign ideas, using Gemini on Vertex AI, Cloud Run, and Streamlit.

Beginner Python

Learn how to use Vertex AI Search and LangChain to ground model prompts to a verifiable knowledge source (Google Cloud whitepaper).

Intermediate Python

Learn how to implement function-calling, the process of using an LLM to populate a request body that you can then send to an external API.

Intermediate Python

Sample app for retrieval-augmented generation with AlloyDB for PostgreSQL and Vertex AI. (blog post, codelab).

Intermediate Python

Architecture guidance and jump start solutions

Recommendations and examples of generative AI applications that are secure, efficient, resilient, high-performing, and cost-effective.

Use this reference architecture to design the infrastructure to run a generative AI application with retrieval-augmented generation (RAG) using Vertex AI and AlloyDB for PostgreSQL.


Use this reference architecture design the infrastructure to run a generative AI application with retrieval-augmented generation (RAG) using GKE, Cloud SQL, and open source tools like Ray, Hugging Face, and LangChain.


This document provides design guidance on how to use and integrate the variety of storage options offered by Google Cloud for key AI and ML workloads.


Deploy a one-click sample application to summarize long documents with Vertex AI.


Deploy a one-click sample application that uses vector embeddings stored in Cloud SQL to improve the accuracy of responses from a chat application.


Deploy a one-click sample application that extracts question-and-answer pairs from a set of documents, along with a pipeline that triggers the application when a document is uploaded.