Host AI apps and agents on Cloud Run

This page highlights some use cases for using Cloud Run as a hosting platform for the following AI use cases:

Host AI applications on Cloud Run

AI applications use AI models to operate or perform a specific task. For example, an AI application can use an AI model to summarize documents, or be a chat interface that uses a vector database to retrieve more context.

Cloud Run is one of the application hosting infrastructures that provides a fully managed environment for your AI application workloads. Cloud Run integrates with AI models such as Gemini API, Vertex AI endpoints, or models hosted on a GPU-enabled Cloud Run service. Cloud Run also integrates with Cloud SQL for PostgreSQL and AlloyDB for PostgreSQL, which are two databases offering the pgvector extension for Retrieval-Augmented Generation (RAG).

Host AI Agents on Cloud Run

AI agents combine the intelligence of advanced AI models, with access to tools, to take actions on behalf of the user behalf and under the user's control.

You can implement AI agents as Cloud Run services to orchestrate a set of asynchronous tasks and provide information to users, through involving multiple request-response interactions.

AI agent on Cloud Run architecture

A typical AI agent architecture deployed on Cloud Run can involve several components from Google Cloud and outside of Google Cloud:

Architecture of an AI agent on Cloud Run

  1. Serving and Orchestration: A Cloud Run service acts as a scalable API endpoint, and can handle multiple concurrent users through automatic, on-demand, rapid scaling of instances. This service runs the core agent logic, often using an AI orchestration framework like LangGraph or Agent Development Kit (ADK). This layer coordinates calls to other components. Cloud Run supports streaming HTTP responses back to the user using WebSockets. Cloud Run's built-in service identity provides secure and automatic credentials for calling Google Cloud APIs without managing API keys.

  2. AI Models: The orchestration layer calls models for reasoning capabilities. These can be:

  3. Memory: Agents often need memory to retain context and learn from past interactions.

    • Short-term memory can be implemented by connecting Cloud Run to Memorystore for Redis.
    • Long-term memory for storing the conversational history or remembering the user's preferences can be implemented by connecting Cloud Run to Firestore, a scalable, serverless NoSQL database.
  4. Databases and Retrieval: For Retrieval-Augmented Generation (RAG) or fetching structured data:

  5. Tools: The orchestrator uses tools to perform specific tasks that models are not suited for or to interact with external services, APIs, or websites. This can include:

    • Basic utilities: Precise math calculations, time conversions, or other similar utilities can run in the orchestrating Cloud Run service.
    • API calling: Make calls to other internal or third-party APIs (read or write access).
    • Image or chart generation: Use image generation models or run chart libraries to quickly and effectively create visual content.
    • Browser and OS automation: Run a headless or a full graphical Operating System within container instances to allow the agent to browse the web, extract information from websites, or perform actions using clicks and keyboard input. The Cloud Run service returns pixels of screens. Use libraries like Puppeteer to control the browser.
    • Code execution: Cloud Run provides a secure environment with multi-layered sandboxing and can be configured to the code execution service with minimal or no IAM permissions. A Cloud Run job can be used to execute code asynchronously and a Cloud Run service with a concurrency of 1 can be used for synchronous execution.

What's next