This page explains how to use the Vertex AI RAG Engine LLM parser. Vertex AI RAG Engine uses LLMs for document parsing. LLMs have
the ability to effectively process documents in the following ways: The capabilities of the Vertex AI RAG Engine significantly
improves the quality of generated responses. The LLM parser only supports Gemini models. If you have the RAG API
enabled, you have access to the supported models. For a list of supported
generation models, see Generative models. The following file types are supported by the LLM parser: For pricing details, see Vertex AI pricing. For quotas that apply, see Rate quotas. The LLM parser calls Gemini models to parse your documents. This
creates additional costs, which are charged to your project. The cost can be
roughly estimated using this formula: For example, you have 1,000 PDF files, and each PDF file has 50 pages. The
average PDF page has 500 tokens, and we need an additional 100 tokens for
prompting. The average output is 100 tokens. Gemini 2.0 Flash-Lite is used in your configuration for
parsing, and it costs $0.075 for 1M input tokens and $0.3 for output text
tokens. The cost is $3.75. Replace the values in the following variables used in the code samples: Format:
MAX_PARSING_REQUESTS_PER_MIN: Optional:
The maximum number of requests the job can make to the
Vertex AI model per minute. For more information, see
Generative AI on Vertex AI rate limits and
the Quotas & System Limits page for your
project to set an appropriate value. To learn how to install or update the Vertex AI SDK for Python, see Install the
Vertex AI SDK for Python. For
more information, see the Python API reference
documentation. The Vertex AI RAG Engine LLM parser uses a predefined and tuned prompt
for parsing documents. However, if you have specialized documents that might not
be suitable for a general prompt, you have the option to specify your custom
parsing prompt when using the API. When requesting Gemini to parse your
documents, Vertex AI RAG Engine appends a prompt to your default
system prompt. To help with document parsing, the following table provides a prompt template
example to guide you in creating prompts that Vertex AI RAG Engine can use to parse your documents: Follow these guidelines to write your prompt to send to the LLM parser. This table lists results from scenarios that customers ran using
Vertex AI RAG Engine. The feedback shows that the LLM parser improves
the quality of parsing documents. The LLM parser enhances the LLM's ability to understand and reason about the
context within a document, which leads to more accurate and comprehensive
responses. After you enter a prompt that's sent to a generative AI model, the retrieval
component in RAG searches through its knowledge base to find information that's
relevant to the query. For an example of retrieving RAG files from a corpus
based on a query text, see Retrieval
query.Introduction
Supported models
Supported file types
application/pdf
image/png
image/jpeg
image/webp
image/heic
image/heif
Pricing and quotas
cost = number_of_document_files * average_pages_per_document *
(average_input_tokens * input_token_pricing_of_selected_model +
average_output_tokens * output_token_pricing_of_selected_model)
cost = 1,000 * 50 * (600 * 0.075 / 1M + 100 * 0.3 / 1M) = 3.75
Import files with
LlmParser
enabled
projects/{project_id}/locations/{location}/publishers/google/models/{model_id}
REST
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE/ragFiles:import" -d '{
"import_rag_files_config": {
"gcs_source": {
"uris": ["GCS_URI", "GOOGLE_DRIVE_URI"]
},
"rag_file_chunking_config": {
"chunk_size": 512,
"chunk_overlap": 102
},
"rag_file_parsing_config": {
"llm_parser": {
"model_name": "MODEL_NAME",
"custom_parsing_prompt": "CUSTOM_PARSING_PROMPT"
"max_parsing_requests_per_min": "MAX_PARSING_REQUESTS_PER_MIN"
}
}
}
}'
Python
from vertexai import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "RAG_CORPUS_RESOURCE"
LOCATION = "LOCATION"
MODEL_ID = "MODEL_ID"
MODEL_NAME = "projects/{PROJECT_ID}/locations/{LOCATION}/publishers/google/models/{MODEL_ID}"
MAX_PARSING_REQUESTS_PER_MIN = MAX_PARSING_REQUESTS_PER_MIN # Optional
CUSTOM_PARSING_PROMPT = "Your custom prompt" # Optional
PATHS = ["https://drive.google.com/file/123", "gs://my_bucket/my_files_dir"]
# Initialize Vertex AI API once per session
vertexai.init(project={PROJECT_ID}, location={LOCATION})
transformation_config = rag.TransformationConfig(
chunking_config=rag.ChunkingConfig(
chunk_size=1024, # Optional
chunk_overlap=200, # Optional
),
)
llm_parser_config = rag.LlmParserConfig(
model_name = MODEL_NAME,
max_parsing_requests_per_min=MAX_PARSING_REQUESTS_PER_MIN, # Optional
custom_parsing_prompt=CUSTOM_PARSING_PROMPT, # Optional
)
rag.import_files(
CORPUS_NAME,
PATHS,
llm_parser=llm_parser_config,
transformation_config=transformation_config,
)
Prompting
Prompt template table
Instruction
Template statement
Example
Specify role.
You are a/an [Specify the role, such as a factual data
extractor or an information retriever].
You are an information retriever.
Specify task.
Extract [Specify the type of information, such as factual
statements, key data, or specific details] from the [Specify the document
source, such as a document, text, article, image, table].
Extract key data from the sample.txt file.
Explain how you want the LLM to generate the output according to your documents.
Present each fact in a [Specify the output format, such as a
structured list or text format], and link to its [Specify the source location,
such as a page, paragraph, table, or row].
Present each fact in a structured list, and link to its sample page.
Highlight what should be the focus of the LLM.
Extract [Specify the key data
types, such as the names, dates, numbers, attributes, or relationships]
exactly as stated.
Extract names and dates.
Highlight what you want the LLM to avoid.
[List the actions to avoid, such as analysis, interpretation, summarizing, inferring, or giving opinions]. Extract only what the document explicitly says.
No giving opinions. Extract only what the document explicitly says.
General guidance
Parsing quality analysis
Scenario
Result
Parsing information across slides and linking sections
The LLM parser successfully linked section titles on one slide to the detailed information presented on subsequent slides.
Understanding and extracting information from tables
The LLM parser correctly related columns and headers within a large table to answer specific questions.
Interpreting flowcharts
The LLM parser was able to follow the logic of a flowchart and extract the correct sequence of actions and corresponding information.
Extracting data from graphs
The LLM parser could interpret different types of graphs, such as line graphs, and extract specific data points based on the query.
Capturing relationships between headings and text
The LLM parser, guided by the prompt, paid attention to heading structures and could retrieve all relevant information associated with a particular topic or section.
Potential to overcome embedding limitations with prompt engineering
While initially hampered by embedding model limitations in some use cases, additional experiments demonstrated that a well-crafted LLM parser prompt could potentially mitigate these issues and retrieve the correct information even when semantic understanding is challenging for the embedding model alone.
Retrieval query
What's next
Use the LLM parser
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-29 UTC.