Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
Vertex AI in express mode lets you quickly try out core
generative AI features that are available on Vertex AI. This tutorial
shows you how to perform the following tasks by using the Vertex AI API
in express mode:
Install and initialize the Google Gen AI SDK for express mode.
Send a request to the Gemini for Google Cloud API, including the
following:
Non-streaming request
Streaming request
Function calling request
Install and initialize the Google Gen AI SDK for express mode
The Google Gen AI SDK lets you use Google generative AI models and
features to build AI-powered applications. When using Vertex AI in
express mode, install and initialize the google-genai package to
authenticate using your generated API key.
Install
To install the Google Gen AI SDK for express mode, run the following
commands:
# Developer TODO: If you're using Colab, uncomment the following lines:# from google.colab import auth# auth.authenticate_user()!pipinstallgoogle-genai!pipinstall--force-reinstall-qq"numpy<2.0"
If you're using Colab, ignore any dependency conflicts and restart the runtime
after installation.
Initialize
Configure the API key for express mode and environment variables. For details on
getting an API key, see Vertex AI in express mode overview.
fromgoogleimportgenaifromgoogle.genaiimporttypes# Developer TODO: Replace YOUR_API_KEY with your API key.API_KEY="YOUR_API_KEY"client=genai.Client(vertexai=True,api_key=API_KEY)
Send a request to the Gemini for Google Cloud API
You can send either streaming or non-streaming requests to the
Gemini for Google Cloud API. Streaming requests return the response in chunks as
the request is being processed. To a human user, streamed responses reduce the
perception of latency. Non-streaming requests return the response in one chunk
after the request is processed.
Streaming request
To send a streaming request, set stream=True and print the response in chunks.
fromgoogleimportgenaifromgoogle.genaiimporttypesdefgenerate():client=genai.Client(vertexai=True,api_key=YOUR_API_KEY)config=types.GenerateContentConfig(temperature=0,top_p=0.95,top_k=20,candidate_count=1,seed=5,max_output_tokens=100,stop_sequences=["STOP!"],presence_penalty=0.0,frequency_penalty=0.0,safety_settings=[types.SafetySetting(category="HARM_CATEGORY_HATE_SPEECH",threshold="BLOCK_ONLY_HIGH",)],)forchunkinclient.models.generate_content_stream(model="gemini-2.5-flash-lite",contents="Explain bubble sort to me",config=config,):print(chunk.text)generate()
Non-streaming request
The following code sample defines a function that sends a non-streaming request
to the gemini-2.5-flash-lite. It shows you how to configure basic request
parameters and safety settings.
fromgoogleimportgenaifromgoogle.genaiimporttypesdefgenerate():client=genai.Client(vertexai=True,api_key=YOUR_API_KEY)config=types.GenerateContentConfig(temperature=0,top_p=0.95,top_k=20,candidate_count=1,seed=5,max_output_tokens=100,stop_sequences=["STOP!"],presence_penalty=0.0,frequency_penalty=0.0,safety_settings=[types.SafetySetting(category="HARM_CATEGORY_HATE_SPEECH",threshold="BLOCK_ONLY_HIGH",)],)response=client.models.generate_content(model="gemini-2.5-flash-lite",contents="Explain bubble sort to me",config=config,)print(response.text)generate()
Function calling request
The following code sample declares a function and passes it as a tool, and then
receives a function call part in the response. After you receive the function
call part from the model, you can invoke the function and get the response, and
then pass the response to the model.
function_response_parts=[{'function_response':{'name':'get_current_weather','response':{'name':'get_current_weather','content':{'weather':'super nice'},},},},]manual_function_calling_contents=[{'role':'user','parts':[{'text':'What is the weather in Boston?'}]},{'role':'model','parts':[{'function_call':{'name':'get_current_weather','args':{'location':'Boston'},}}],},{'role':'user','parts':function_response_parts},]function_declarations=[{'name':'get_current_weather','description':'Get the current weather in a city','parameters':{'type':'OBJECT','properties':{'location':{'type':'STRING','description':'The location to get the weather for',},'unit':{'type':'STRING','enum':['C','F'],},},},}]response=client.models.generate_content(model="gemini-2.0-flash-001",contents=manual_function_calling_contents,config=dict(tools=[{'function_declarations':function_declarations}]),)print(response.text)
Clean up
This tutorial does not create any Google Cloud resources, so no clean up is
needed to avoid charges.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,["# Tutorial: Vertex AI API in express mode\n\n| **Preview**\n|\n|\n| This feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n[launch stage descriptions](/products#product-launch-stages). \n| To see an example of Vertex AI in Express Mode,\n| run the \"Getting started with Gemini using Vertex AI in Express Mode\" notebook in one of the following\n| environments:\n|\n| [Open in Colab](https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb)\n|\n|\n| \\|\n|\n| [Open in Colab Enterprise](https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fgetting-started%2Fintro_gemini_express.ipynb)\n|\n|\n| \\|\n|\n| [Open\n| in Vertex AI Workbench](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fgetting-started%2Fintro_gemini_express.ipynb)\n|\n|\n| \\|\n|\n| [View on GitHub](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb)\n\nVertex AI in [express mode](/vertex-ai/generative-ai/docs/start/express-mode/overview) lets you quickly try out core\ngenerative AI features that are available on Vertex AI. This tutorial\nshows you how to perform the following tasks by using the Vertex AI API\nin express mode:\n\n- Install and initialize the Google Gen AI SDK for express mode.\n- Send a request to the Gemini for Google Cloud API, including the following:\n - Non-streaming request\n - Streaming request\n - Function calling request\n\nInstall and initialize the Google Gen AI SDK for express mode\n-------------------------------------------------------------\n\nThe Google Gen AI SDK lets you use Google generative AI models and\nfeatures to build AI-powered applications. When using Vertex AI in\nexpress mode, install and initialize the `google-genai` package to\nauthenticate using your generated API key.\n\n### Install\n\nTo install the Google Gen AI SDK for express mode, run the following\ncommands: \n\n # Developer TODO: If you're using Colab, uncomment the following lines:\n # from google.colab import auth\n # auth.authenticate_user()\n\n !pip install google-genai\n\n !pip install --force-reinstall -qq \"numpy\u003c2.0\"\n\nIf you're using Colab, ignore any dependency conflicts and restart the runtime\nafter installation.\n\n### Initialize\n\nConfigure the API key for express mode and environment variables. For details on\ngetting an API key, see [Vertex AI in express mode overview](/vertex-ai/generative-ai/docs/start/express-mode/overview). \n\n from google import genai\n from google.genai import types\n\n # Developer TODO: Replace YOUR_API_KEY with your API key.\n API_KEY = \"YOUR_API_KEY\"\n\n client = genai.Client(\n vertexai=True, api_key=API_KEY\n )\n\nSend a request to the Gemini for Google Cloud API\n-------------------------------------------------\n\nYou can send either streaming or non-streaming requests to the\nGemini for Google Cloud API. Streaming requests return the response in chunks as\nthe request is being processed. To a human user, streamed responses reduce the\nperception of latency. Non-streaming requests return the response in one chunk\nafter the request is processed.\n\n### Streaming request\n\nTo send a streaming request, set `stream=True` and print the response in chunks. \n\n from google import genai\n from google.genai import types\n\n def generate():\n client = genai.Client(vertexai=True, api_key=YOUR_API_KEY)\n\n config=types.GenerateContentConfig(\n temperature=0,\n top_p=0.95,\n top_k=20,\n candidate_count=1,\n seed=5,\n max_output_tokens=100,\n stop_sequences=[\"STOP!\"],\n presence_penalty=0.0,\n frequency_penalty=0.0,\n safety_settings=[\n types.SafetySetting(\n category=\"HARM_CATEGORY_HATE_SPEECH\",\n threshold=\"BLOCK_ONLY_HIGH\",\n )\n ],\n )\n for chunk in client.models.generate_content_stream(\n model=\"gemini-2.5-flash-lite\",\n contents=\"Explain bubble sort to me\",\n config=config,\n ):\n print(chunk.text)\n\n generate()\n\n### Non-streaming request\n\nThe following code sample defines a function that sends a non-streaming request\nto the `gemini-2.5-flash-lite`. It shows you how to configure basic request\nparameters and safety settings. \n\n from google import genai\n from google.genai import types\n\n def generate():\n client = genai.Client(vertexai=True, api_key=YOUR_API_KEY)\n\n config=types.GenerateContentConfig(\n temperature=0,\n top_p=0.95,\n top_k=20,\n candidate_count=1,\n seed=5,\n max_output_tokens=100,\n stop_sequences=[\"STOP!\"],\n presence_penalty=0.0,\n frequency_penalty=0.0,\n safety_settings=[\n types.SafetySetting(\n category=\"HARM_CATEGORY_HATE_SPEECH\",\n threshold=\"BLOCK_ONLY_HIGH\",\n )\n ],\n )\n response = client.models.generate_content(\n model=\"gemini-2.5-flash-lite\",\n contents=\"Explain bubble sort to me\",\n config=config,\n )\n print(response.text)\n\n generate()\n\n### Function calling request\n\nThe following code sample declares a function and passes it as a tool, and then\nreceives a function call part in the response. After you receive the function\ncall part from the model, you can invoke the function and get the response, and\nthen pass the response to the model. \n\n\n function_response_parts = [\n {\n 'function_response': {\n 'name': 'get_current_weather',\n 'response': {\n 'name': 'get_current_weather',\n 'content': {'weather': 'super nice'},\n },\n },\n },\n ]\n manual_function_calling_contents = [\n {'role': 'user', 'parts': [{'text': 'What is the weather in Boston?'}]},\n {\n 'role': 'model',\n 'parts': [{\n 'function_call': {\n 'name': 'get_current_weather',\n 'args': {'location': 'Boston'},\n }\n }],\n },\n {'role': 'user', 'parts': function_response_parts},\n ]\n function_declarations = [{\n 'name': 'get_current_weather',\n 'description': 'Get the current weather in a city',\n 'parameters': {\n 'type': 'OBJECT',\n 'properties': {\n 'location': {\n 'type': 'STRING',\n 'description': 'The location to get the weather for',\n },\n 'unit': {\n 'type': 'STRING',\n 'enum': ['C', 'F'],\n },\n },\n },\n }]\n\n response = client.models.generate_content(\n model=\"gemini-2.0-flash-001\",\n contents=manual_function_calling_contents,\n config=dict(tools=[{'function_declarations': function_declarations}]),\n )\n print(response.text)\n\nClean up\n--------\n\nThis tutorial does not create any Google Cloud resources, so no clean up is\nneeded to avoid charges.\n\nWhat's next\n-----------\n\n- Try the [Vertex AI Studio tutorial](/vertex-ai/generative-ai/docs/start/express-mode/vertex-ai-studio-express-mode-quickstart) for Vertex AI in express mode.\n- See the complete [API reference](/vertex-ai/generative-ai/docs/start/express-mode/vertex-ai-express-mode-api-reference) for Vertex AI in express mode."]]