The code for querying an agent is the same regardless of whether it is running
locally or
deployed remotely. Therefore, in this
page, the term agent
refers to either local_agent
or remote_agent
interchangeably. As the set of supported operations varies across frameworks, we
provide usage instructions for framework-specific templates:
Framework | Description |
---|---|
Agent Development Kit | Designed based on Google's internal best practices for developers building AI applications or teams needing to rapidly prototype and deploy robust agent-based solutions. |
Agent2Agent (preview) | The Agent2Agent (A2A) protocol is an open standard designed to enable seamless communication and collaboration between AI agents. |
LangChain | Easier to use for basic use cases because of its predefined configurations and abstractions. |
LangGraph | Graph-based approach to defining workflows, with advanced human-in-the-loop and rewind/replay capabilities. |
AG2 (formerly AutoGen) | AG2 provides multi-agent conversation framework as a high-level abstraction for building LLM workflows. |
LlamaIndex (preview) | LlamaIndex's query pipeline offers a high-level interface for creating Retrieval-Augmented Generation (RAG) workflows. |
For custom agents that are not based on one of the framework-specific templates, you can follow these steps:
- User authentication.
- Get an agent instance.
- Look up supported operations.
- Query the agent using supported operations.
User authentication
Follows the same instructions as setting up your environment.
Get an instance of an agent
To query an agent, you first need an instance of an agent. You can either create a new instance or get an existing instance of an agent.
To get the agent corresponding to a specific resource ID:
Vertex AI SDK for Python
Run the following code:
agent = client.agent_engines.get(name="projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID")
requests
Run the following code:
from google import auth as google_auth
from google.auth.transport import requests as google_requests
import requests
def get_identity_token():
credentials, _ = google_auth.default()
auth_request = google_requests.Request()
credentials.refresh(auth_request)
return credentials.token
response = requests.get(
f"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID",
headers={
"Content-Type": "application/json; charset=utf-8",
"Authorization": f"Bearer {get_identity_token()}",
},
)
REST
curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID
When using the Vertex AI SDK for Python, the agent
object corresponds to an
AgentEngine
class that contains the following:
- an
agent.api_resource
with information about the deployed agent. You can also callagent.operation_schemas()
to return the list of operations that the agent supports. See Supported operations for details. - an
agent.api_client
that allows for synchronous service interactions - an
agent.async_api_client
that allows for asynchronous service interactions
The rest of this section assumes that you have an instance, named as agent
.
List supported operations
When developing the agent locally, you have access and knowledge of the operations that it supports. To use a deployed agent, you can enumerate the operations that it supports:
Vertex AI SDK for Python
Run the following code:
agent.operation_schemas()
requests
Run the following code:
import json
json.loads(response.content).get("spec").get("classMethods")
REST
Represented in spec.class_methods
from the response to the curl request.
The schema for each operation is a dictionary that documents the information of a method for the agent that you can call. The set of supported operations depends on the framework you used to develop your agent:
As an example, the following is the schema for the query
operation of a
LangchainAgent
:
{'api_mode': '',
'name': 'query',
'description': """Queries the Agent with the given input and config.
Args:
input (Union[str, Mapping[str, Any]]):
Required. The input to be passed to the Agent.
config (langchain_core.runnables.RunnableConfig):
Optional. The config (if any) to be used for invoking the Agent.
Returns:
The output of querying the Agent with the given input and config.
""", ' ',
'parameters': {'$defs': {'RunnableConfig': {'description': 'Configuration for a Runnable.',
'properties': {'configurable': {...},
'run_id': {...},
'run_name': {...},
...},
'type': 'object'}},
'properties': {'config': {'nullable': True},
'input': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}},
'required': ['input'],
'type': 'object'}}
where
name
is the name of the operation (i.e.agent.query
for an operation namedquery
).api_mode
is the API mode of the operation (""
for synchronous,"stream"
for streaming).description
is a description of the operation based on the method's docstring.parameters
is the schema of the input arguments in OpenAPI schema format.
Query the agent using supported operations
For custom agents, you can use any of the following query or streaming operations you defined when developing your agent:
Note that certain frameworks only support specific query or streaming operations:
Framework | Supported query operations |
---|---|
Agent Development Kit | async_stream_query |
LangChain | query , stream_query |
LangGraph | query , stream_query |
AG2 | query |
LlamaIndex | query |
Query the agent
Query the agent using the query
operation:
Vertex AI SDK for Python
agent.query(input="What is the exchange rate from US dollars to Swedish Krona today?")
requests
from google import auth as google_auth
from google.auth.transport import requests as google_requests
import requests
def get_identity_token():
credentials, _ = google_auth.default()
auth_request = google_requests.Request()
credentials.refresh(auth_request)
return credentials.token
requests.post(
f"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID:query",
headers={
"Content-Type": "application/json; charset=utf-8",
"Authorization": f"Bearer {get_identity_token()}",
},
data=json.dumps({
"class_method": "query",
"input": {
"input": "What is the exchange rate from US dollars to Swedish Krona today?"
}
})
)
REST
curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID:query -d '{
"class_method": "query",
"input": {
"input": "What is the exchange rate from US dollars to Swedish Krona today?"
}
}'
The query response is a string that is similar to the output of a local application test:
{"input": "What is the exchange rate from US dollars to Swedish Krona today?",
# ...
"output": "For 1 US dollar you will get 10.7345 Swedish Krona."}
Stream responses from the agent
Stream a response from the agent using the stream_query
operation:
Vertex AI SDK for Python
agent = agent_engines.get("projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID")
for response in agent.stream_query(
input="What is the exchange rate from US dollars to Swedish Krona today?"
):
print(response)
requests
from google import auth as google_auth
from google.auth.transport import requests as google_requests
import requests
def get_identity_token():
credentials, _ = google_auth.default()
auth_request = google_requests.Request()
credentials.refresh(auth_request)
return credentials.token
requests.post(
f"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID:streamQuery",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {get_identity_token()}",
},
data=json.dumps({
"class_method": "stream_query",
"input": {
"input": "What is the exchange rate from US dollars to Swedish Krona today?"
},
}),
stream=True,
)
REST
curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID:streamQuery?alt=sse -d '{
"class_method": "stream_query",
"input": {
"input": "What is the exchange rate from US dollars to Swedish Krona today?"
}
}'
Vertex AI Agent Engine streams responses as a sequence of iteratively generated objects. For example, a set of three responses might look like the following:
{'actions': [{'tool': 'get_exchange_rate', ...}]} # first response
{'steps': [{'action': {'tool': 'get_exchange_rate', ...}}]} # second response
{'output': 'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.'} # final response
Asynchronously query the agent
If you defined an async_query
operation when developing the agent,
there is support for client-side async querying of the agent in the
Vertex AI SDK for Python:
Vertex AI SDK for Python
agent = agent_engines.get("projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID")
response = await agent.async_query(
input="What is the exchange rate from US dollars to Swedish Krona today?"
)
print(response)
The query response is a dictionary that is the same as the output of a local test:
{"input": "What is the exchange rate from US dollars to Swedish Krona today?",
# ...
"output": "For 1 US dollar you will get 10.7345 Swedish Krona."}
Asynchronously stream responses from the agent
If you defined an async_stream_query
operation when developing the agent,
you can asynchronously stream a response from the agent using one of its
operations (e.g. async_stream_query
):
Vertex AI SDK for Python
agent = agent_engines.get("projects/PROJECT_ID/locations/LOCATION/reasoningEngines/RESOURCE_ID")
async for response in agent.async_stream_query(
input="What is the exchange rate from US dollars to Swedish Krona today?"
):
print(response)
The async_stream_query
operation calls the same
streamQuery
endpoint under-the-hood and asynchronously
stream responses as a sequence of iteratively generated objects. For example, a
set of three responses might look like the following:
{'actions': [{'tool': 'get_exchange_rate', ...}]} # first response
{'steps': [{'action': {'tool': 'get_exchange_rate', ...}}]} # second response
{'output': 'The exchange rate is 11.0117 SEK per USD as of 2024-12-03.'} # final response
The responses should be the same as those generated during local testing.
What's next
- Use a LangChain agent.
- Use a LangGraph agent.
- Use an AG2 agent.
- Use a LlamaIndex Query Pipeline agent.
- Evaluate an agent.
- Manage deployed agents.
- Get support.