The ML.GENERATE_TEXT function
This document describes the ML.GENERATE_TEXT
function, which lets you perform
generative natural language tasks by using text from BigQuery
standard tables, or
visual content from BigQuery
object tables.
The function works by sending requests to a BigQuery ML remote model that represents one of the Vertex AI text or multimodal foundation models (LLMs), and then returning the model's response. Several of the function's arguments provide the parameters that shape the Vertex AI model's response.
The ML.GENERATE_TEXT
function works with the Vertex AI models to
perform text processing tasks like classification, sentiment analysis,
and code generation. For more information on the types of tasks these models
can perform, see the following use cases:
- Vertex AI Gemini API multimodal model use cases
- Gemini API text model use cases
- Vertex AI PaLM API text model use cases
Prompt design can strongly affect the responses returned by the Vertex AI model. For more information, see Design multimodal prompts or Design text prompts.
Syntax
ML.GENERATE_TEXT
syntax differs depending on the Vertex AI model
that your remote models targets. Choose the option appropriate for your use
case.
gemini-pro
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most (based on the
top_k
value) to least probable until the sum of their probabilities equals thetop_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
gemini-pro-vision
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, TABLEproject_id.dataset.table
, STRUCT( prompt AS prompt [, max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the object table that contains the visual content to analyze. For more information on what types of visual content you can analyze, see Supported visual content.The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the
ML.GENERATE_TEXT
function.
prompt
: aSTRING
value that contains the prompt to use to analyze the visual content. Theprompt value
must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. This value must be in the range[1,2048]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is2048
.top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is32
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most (based on the
top_k
value) to least probable until the sum of their probabilities equals thetop_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0.4
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-bison
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,1024]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most (based on the
top_k
value) to least probable until the sum of their probabilities equals thetop_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-bison-32
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,8192]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most (based on the
top_k
value) to least probable until the sum of their probabilities equals thetop_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
text-unicorn
ML.GENERATE_TEXT( MODELproject_id.dataset.model
, { TABLEproject_id.dataset.table
| (query_statement) }, STRUCT( [max_output_tokens AS max_output_tokens] [, top_k AS top_k] [, top_p AS top_p] [, temperature AS temperature] [, flatten_json_output AS flatten_json_output] [, stop_sequences AS stop_sequences]) )
Arguments
ML.GENERATE_TEXT
takes the following arguments:
project_id
: your project ID.dataset
: the BigQuery dataset that contains the model.model
: the name of the remote model over the Vertex AI LLM. For more information about how to create this type of remote model, see TheCREATE MODEL
statement for remote models over LLMs.You can confirm what LLM is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.
table
: the name of the BigQuery table that contains the prompt data. The text in the column that's namedprompt
is sent to the model. If your table does not have aprompt
column, use aSELECT
statement for this argument to provide an alias for an existing table column. An error occurs if noprompt
column is available.query_statement
: the GoogleSQL query that generates the prompt data.
max_output_tokens
: anINT64
value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range[1,1024]
. Specify a lower value for shorter responses and a higher value for longer responses. The default is128
.top_k
: anINT64
value in the range[1,40]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is40
.A
top_k
value of1
means the next selected token is the most probable among all tokens in the model's vocabulary, while atop_k
value of3
means that the next token is selected from among the three most probable tokens by using thetemperature
value.For each token selection step, the
top_k
tokens with the highest probabilities are sampled. Then tokens are further filtered based on thetop_p
value, with the final token selected using temperature sampling.top_p
: aFLOAT64
value in the range[0.0,1.0]
that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is0.95
.Tokens are selected from the most (based on the
top_k
value) to least probable until the sum of their probabilities equals thetop_p
value. For example, if tokens A, B, and C have a probability of0.3
,0.2
, and0.1
, and thetop_p
value is0.5
, then the model selects either A or B as the next token by using thetemperature
value and doesn't consider C.temperature
: aFLOAT64
value in the range[0.0,1.0]
that is used for sampling during the response generation, which occurs when thetop_k
andtop_p
values are applied. It controls the degree of randomness in token selection. Lowertemperature
values are good for prompts that require a more deterministic and less open-ended or creative response, while highertemperature
values can lead to more diverse or creative results. Atemperature
value of0
is deterministic, meaning that the highest probability response is always selected. The default is0
.flatten_json_output
: aBOOL
value that determines whether the JSON content returned by the function is parsed into separate columns. The default isFALSE
.stop_sequences
: anARRAY<STRING>
value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.
Details
The model and input table must be in the same region.
Output
ML.GENERATE_TEXT
returns the input table plus the following columns:
Gemini API models
ml_generate_text_result
: This is the JSON response from theprojects.locations.endpoints.generateContent
call to the model. The generated text is in thetext
element. The safety attributes are in thesafety_ratings
element. This column is returned whenflatten_json_output
isFALSE
.ml_generate_text_llm_result
: aSTRING
value that contains the generated text. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_rai_result
: aSTRING
value that contains the safety attributes. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
PaLM API models
ml_generate_text_result
: the JSON response from theprojects.locations.endpoints.predict
call to the model. The generated text is in thecontent
element. The safety attributes are in thesafetyAttributes
element. This column is returned whenflatten_json_output
isFALSE
.ml_generate_text_llm_result
: aSTRING
value that contains the generated text. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_rai_result
: aSTRING
value that contains the safety attributes. This column is returned whenflatten_json_output
isTRUE
.ml_generate_text_status
: aSTRING
value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
Examples
Text analysis
Example 1
This example shows a request with the following characteristics:
- Provides a single prompt.
- Returns a longer generated text response.
- Returns a less probable generated text response.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.llm_model`, (SELECT 'What is the purpose of dreams?' AS prompt), STRUCT( 0.8 AS temperature, 1024 AS max_output_tokens, 0.95 AS top_p, 40 AS top_k));
Example 2
This example shows a request with the following characteristics:
- Provides prompt data from a table column that's named
prompt
. - Returns a shorter generated text response.
- Returns a more probable generated text response.
- Flattens the JSON response into separate columns.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.llm_model`, TABLE `mydataset.prompt_table`, STRUCT( 0.2 AS temperature, 75 AS max_output_tokens, 0.3 AS top_p, 15 AS top_k, TRUE AS flatten_json_output));
Example 3
This example shows a request with the following characteristics:
- Provides prompt data from a table column named
question
that is aliased asprompt
. - Returns a moderately long generated text response.
- Returns a moderately probable generated text response.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.llm_model`, (SELECT question AS prompt FROM `mydataset.prompt_table`), STRUCT( 0.4 AS temperature, 750 AS max_output_tokens, 0.5 AS top_p, 30 AS top_k));
Example 4
This example shows a request with the following characteristics:
- Concatenates strings and a table column to provide the prompt data.
- Returns a longer generated text response.
- Returns a more probable generated text response.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.llm_model`, ( SELECT CONCAT( 'Classify the sentiment of the following text as positive or negative.Text:', input_column, 'Sentiment:') AS prompt FROM `mydataset.input_table` ), STRUCT( 0.1 AS temperature, 1000 AS max_output_tokens, 0.1 AS top_p, 10 AS top_k));
Example 5
This example shows a request with the following characteristics:
- Provides prompt data from a table column that's named
prompt
. - Returns a shorter generated text response.
- Returns a more probable generated text response.
- Flattens the JSON response into separate columns.
- Excludes model responses that contain the strings
Golf
orfootball
.
SELECT * FROM ML.GENERATE_TEXT( MODEL `mydataset.llm_model`, TABLE `mydataset.prompt_table`, STRUCT( 0.2 AS temperature, 75 AS max_output_tokens, 0.3 AS top_p, 15 AS top_k, TRUE AS flatten_json_output, ['Golf', 'football'] AS stop_sequences));
Visual content analysis
This example analyzes visual content from an object table that's named
dogs
and identifies the breed of dog contained in the content:
SELECT uri, ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `mydataset.gemini_pro_vision_model`, TABLE `mydataset.dogs` STRUCT( 'What is the breed of the dog?' AS PROMPT, TRUE AS FLATTEN_JSON_OUTPUT));
Supported visual content
When you are using a remote model over a gemini-pro-vision
multimodal
model, you can use ML.GENERATE_TEXT
to analyze visual content that meets the
following requirements:
- Content must be in the supported image and video formats that are described
in the
mimeType
field of the Gemini API request body. - Each piece of content must be no greater than 20 MB.
- The supported maximum video length is 2 minutes. If the video is longer than
2 minutes,
ML.GENERATE_TEXT
only returns results for the first 2 minutes.
Locations
ML.GENERATE_TEXT
must run in the same
region or multi-region as the remote model that the
function references. You can create remote models over Vertex AI
models in all of the
regions
that support Generative AI APIS, and also in the US
and EU
multi-regions.
Quotas
See Vertex AI and Cloud AI service functions quotas and limits.
Known issues
Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:
A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
This issue occurs because BigQuery query jobs finish successfully
even if the function fails for some of the rows. The function fails when the
volume of API calls to the remote endpoint exceeds the quota limits for that
service. This issue occurs most often when you are running multiple parallel
batch queries. BigQuery retries these calls, but if the retries
fail, the resource exhausted
error message is returned.
What's next
- Try a tutorial on generating text using a public dataset.
- Get step-by-step instructions on how to generate text using your own data.
- Get step-by-step instructions on how to tune an LLM and use it to generate text.
- For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview.
- For more information about using Cloud AI APIs to perform AI tasks, see AI application overview.