This tutorial shows you how to export a transformer model to Open Neural Network Exchange (ONNX) format, import the ONNX model into a BigQuery dataset, and then use the model to generate embeddings from a SQL query.
This tutorial uses the
sentence-transformers/all-MiniLM-L6-v2
model.
This sentence transformer model is known for its fast and effective performance
at generating sentence embeddings. Sentence embedding enables tasks like
semantic search, clustering, and sentence similarity by capturing the
underlying meaning of the text.
ONNX provides a uniform format that is designed to represent any machine learning (ML) framework. BigQuery ML support for ONNX lets you do the following:
- Train a model using your favorite framework.
- Convert the model into the ONNX model format.
- Import the ONNX model into BigQuery and make predictions using BigQuery ML.
Objectives
- Use the
Hugging Face Optimum CLI
to export the
sentence-transformers/all-MiniLM-L6-v2
model to ONNX. - Use the
CREATE MODEL
statement to import the ONNX model into BigQuery. - Use the
ML.PREDICT
function to generate embeddings with the imported ONNX model.
Costs
In this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator
), which contains theresourcemanager.projects.create
permission. Learn how to grant roles.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator
(
roles/resourcemanager.projectCreator
), which contains theresourcemanager.projects.create
permission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery and Cloud Storage APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin
), which contains theserviceusage.services.enable
permission. Learn how to grant roles. - Ensure that you have the necessary permissions to perform the tasks in this document.
Required roles
If you create a new project, you're the project owner, and you're granted all of the required Identity and Access Management (IAM) permissions that you need to complete this tutorial.
If you're using an existing project, do the following.
Make sure that you have the following role or roles on the project:
- BigQuery Studio Admin (
roles/bigquery.studioAdmin
) - Storage Object Creator (
roles/storage.objectCreator
)
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access.
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- In the Select a role list, select a role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save.
For more information about IAM permissions in BigQuery, see IAM permissions.
Convert the transformer model files to ONNX
Optionally, you can follow the steps in this section to manually convert the
sentence-transformers/all-MiniLM-L6-v2
model and tokenizer to ONNX.
Otherwise, you can use sample files from the public gs://cloud-samples-data
Cloud Storage bucket that have already been converted.
If you choose to manually convert the files, you must have a local command-line environment that has Python installed. For more information on installing Python, see Python downloads.
Export the transformer model to ONNX
Use the Hugging face Optimum CLI to export the
sentence-transformers/all-MiniLM-L6-v2
model to ONNX.
For more information about exporting models with the Optimum CLI, see
Export a model to ONNX with optimum.exporters.onnx
.
To export the model, open a command-line environment and follow these steps:
Install the Optimum CLI:
pip install optimum[onnx]
Export the model. The
--model
argument specifies the Hugging Face model ID. The--opset
argument specifies the ONNXRuntime library version, and is set to17
to maintain compatibility with the ONNXRuntime library supported by BigQuery.optimum-cli export onnx \ --model sentence-transformers/all-MiniLM-L6-v2 \ --task sentence-similarity \ --opset 17 all-MiniLM-L6-v2/
The model file is exported to the all-MiniLM-L6-v2
directory as model.onnx
.
Apply quantization to the transformer model
Use the Optimum CLI to apply quantization to the exported transformer model in order to reduce model size and speed up inference. For more information, see Quantization.
To apply quantization to the model, run the following command on the command-line:
optimum-cli onnxruntime quantize \
--onnx_model all-MiniLM-L6-v2/ \
--avx512_vnni -o all-MiniLM-L6-v2_quantized
The quantized model file is exported to the all-MiniLM-L6-v2_quantized
directory as model_quantized.onnx
.
Convert the tokenizer to ONNX
To generate embeddings using a transformer model in ONNX format, you typically
use a
tokenizer
to produce two inputs to the model,
input_ids
and
attention_mask
.
To produce these inputs, convert the tokenizer for the
sentence-transformers/all-MiniLM-L6-v2
model to ONNX format by using the
onnxruntime-extensions
library. After you convert the tokenizer, you can perform tokenization
directly on raw text inputs to generate ONNX predictions.
To convert the tokenizer, follow these steps on the command-line:
Install the Optimum CLI:
pip install optimum[onnx]
Using the text editor of your choice, create a file named
convert-tokenizer.py
. The following example uses the nano text editor:nano convert-tokenizer.py
Copy and paste the following Python script into the
convert-tokenizer.py
file:from onnxruntime_extensions import gen_processing_models # Load the Huggingface tokenizer tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2") # Export the tokenizer to ONNX using gen_processing_models onnx_tokenizer_path = "tokenizer.onnx" # Generate the tokenizer ONNX model, and set the maximum token length. # Ensure 'max_length' is set to a value less than the model's maximum sequence length, failing to do so will result in error during inference. tokenizer_onnx_model = gen_processing_models(tokenizer, pre_kwargs={'max_length': 256})[0] # Modify the tokenizer ONNX model signature. # This is because certain tokenizers don't support batch inference. tokenizer_onnx_model.graph.input[0].type.tensor_type.shape.dim[0].dim_value = 1 # Save the tokenizer ONNX model with open(onnx_tokenizer_path, "wb") as f: f.write(tokenizer_onnx_model.SerializeToString())
Save the
convert-tokenizer.py
file.Run the Python script to convert the tokenizer:
python convert-tokenizer.py
The converted tokenizer is exported to the all-MiniLM-L6-v2_quantized
directory as tokenizer.onnx
.
Upload the converted model files to Cloud Storage
After you have converted the transformer model and the tokenizer, do the following:
- Create a Cloud Storage bucket to store the converted files.
- Upload the converted transformer model and tokenizer files to your Cloud Storage bucket.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click your project name.
Click
View actions > Create datasetOn the Create dataset page, do the following:
For Dataset ID, enter
bqml_tutorial
.For Location type, select Multi-region, and then select US (multiple regions in United States).
Leave the remaining default settings as they are, and click Create dataset.
bq
To create a new dataset, use the
bq mk
command
with the --location
flag. For a full list of possible parameters, see the
bq mk --dataset
command
reference.
Create a dataset named
bqml_tutorial
with the data location set toUS
and a description ofBigQuery ML tutorial dataset
:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--dataset
flag, the command uses the-d
shortcut. If you omit-d
and--dataset
, the command defaults to creating a dataset.Confirm that the dataset was created:
bq ls
API
Call the datasets.insert
method with a defined dataset resource.
{ "datasetReference": { "datasetId": "bqml_tutorial" } }
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Import the ONNX models into BigQuery
Import the converted tokenizer and the sentence transformer models as BigQuery ML models.
Select one of the following options:
Console
In the Google Cloud console, open BigQuery Studio.
In the query editor, run the following
CREATE MODEL
statement to create thetokenizer
model.CREATE OR REPLACE MODEL `bqml_tutorial.tokenizer` OPTIONS (MODEL_TYPE='ONNX', MODEL_PATH='TOKENIZER_BUCKET_PATH')
Replace
TOKENIZER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTOKENIZER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx
.When the operation is complete, you see a message similar to the following:
Successfully created model named tokenizer
in the Query results pane.Click Go to model to open the Details pane.
Review the Feature Columns section to see the model inputs and the Label Column to see model outputs.
In the query editor, run the following
CREATE MODEL
statement to create theall-MiniLM-L6-v2
model.CREATE OR REPLACE MODEL `bqml_tutorial.all-MiniLM-L6-v2` OPTIONS (MODEL_TYPE='ONNX', MODEL_PATH='TRANSFORMER_BUCKET_PATH')
Replace
TRANSFORMER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTRANSFORMER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx
.When the operation is complete, you see a message similar to the following:
Successfully created model named all-MiniLM-L6-v2
in the Query results pane.Click Go to model to open the Details pane.
Review the Feature Columns section to see the model inputs and the Label Column to see model outputs.
bq
Use the bq command-line tool
query
command
to run the CREATE MODEL
statement.
On the command line, run the following command to create the
tokenizer
model.bq query --use_legacy_sql=false \ "CREATE OR REPLACE MODEL `bqml_tutorial.tokenizer` OPTIONS (MODEL_TYPE='ONNX', MODEL_PATH='TOKENIZER_BUCKET_PATH')"
Replace
TOKENIZER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTOKENIZER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx
.When the operation is complete, you see a message similar to the following:
Successfully created model named tokenizer
.On the command line, run the following command to create the
all-MiniLM-L6-v2
model.bq query --use_legacy_sql=false \ "CREATE OR REPLACE MODEL `bqml_tutorial.all-MiniLM-L6-v2` OPTIONS (MODEL_TYPE='ONNX', MODEL_PATH='TRANSFORMER_BUCKET_PATH')"
Replace
TRANSFORMER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTRANSFORMER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx
.When the operation is complete, you see a message similar to the following:
Successfully created model named all-MiniLM-L6-v2
.After you import the models, verify that the models appear in the dataset.
bq ls -m bqml_tutorial
The output is similar to the following:
tableId Type ------------------------ tokenizer MODEL all-MiniLM-L6-v2 MODEL
API
Use the jobs.insert
method
to import the models.Populate the query
parameter of the
QueryRequest
resource
in the request body with the CREATE MODEL
statement.
Use the following
query
parameter value to create thetokenizer
model.{ "query": "CREATE MODEL `PROJECT_ID :bqml_tutorial.tokenizer` OPTIONS(MODEL_TYPE='ONNX' MODEL_PATH='TOKENIZER_BUCKET_PATH')" }
Replace the following:
PROJECT_ID
with your project ID.TOKENIZER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTOKENIZER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx
.
Use the following
query
parameter value to create theall-MiniLM-L6-v2
model.{ "query": "CREATE MODEL `PROJECT_ID :bqml_tutorial.all-MiniLM-L6-v2` OPTIONS(MODEL_TYPE='ONNX' MODEL_PATH='TRANSFORMER_BUCKET_PATH')" }
Replace the following:
PROJECT_ID
with your project ID.TRANSFORMER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTRANSFORMER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx
.
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Import the tokenizer and sentence transformer models by using the
ONNXModel
object.
import bigframes from bigframes.ml.imported import ONNXModel bigframes.options.bigquery.project = PROJECT_ID bigframes.options.bigquery.location = "US" tokenizer = ONNXModel( model_path= "TOKENIZER_BUCKET_PATH" ) imported_onnx_model = ONNXModel( model_path="TRANSFORMER_BUCKET_PATH" )
Replace the following:
PROJECT_ID
with your project ID.TOKENIZER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTOKENIZER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx
.TRANSFORMER_BUCKET_PATH
with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replaceTRANSFORMER_BUCKET_PATH
with the following value:gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx
.
Generate embeddings with the imported ONNX models
Use the imported tokenizer and the sentence transformer models to generate
embeddings based on data from the bigquery-public-data.imdb.reviews
public dataset.
Select one of the following options:
Console
Use the
ML.PREDICT
function
to generate embeddings with the models.
The query uses a nested ML.PREDICT
call, to process raw text directly
through the tokenizer and the embedding model, as follows:
- Tokenization (inner query): the inner
ML.PREDICT
call uses thebqml_tutorial.tokenizer
model. It takes thetitle
column from thebigquery-public-data.imdb.reviews
public dataset as itstext
input. Thetokenizer
model converts the raw text strings into the numerical token inputs that the main model requires, including theinput_ids
andattention_mask
inputs. - Embedding generation (outer query): the outer
ML.PREDICT
call uses thebqml_tutorial.all-MiniLM-L6-v2
model. The query takes theinput_ids
andattention_mask
columns from the inner query's output as its input.
The SELECT
statement retrieves the sentence_embedding
column, which
is an array of FLOAT
values that represent the text's semantic embedding.
In the Google Cloud console, open BigQuery Studio.
In the query editor, run the following query.
SELECT sentence_embedding FROM ML.PREDICT (MODEL `bqml_tutorial.all-MiniLM-L6-v2`, ( SELECT input_ids, attention_mask FROM ML.PREDICT(MODEL `bqml_tutorial.tokenizer`, ( SELECT title AS text FROM `bigquery-public-data.imdb.reviews` limit 10))))
The result is similar to the following:
+-----------------------+ | sentence_embedding | +-----------------------+ | -0.02361682802438736 | | 0.02025664784014225 | | 0.005168713629245758 | | -0.026361213997006416 | | 0.0655381828546524 | | ... | +-----------------------+
bq
Use the bq command-line tool
query
command
to run a query. The query uses the
ML.PREDICT
function
to generate embeddings with the models.
The query uses a nested ML.PREDICT
call, to process raw text directly
through the tokenizer and the embedding model, as follows:
- Tokenization (inner query): the inner
ML.PREDICT
call uses thebqml_tutorial.tokenizer
model. It takes thetitle
column from thebigquery-public-data.imdb.reviews
public dataset as itstext
input. Thetokenizer
model converts the raw text strings into the numerical token inputs that the main model requires, including theinput_ids
andattention_mask
inputs. - Embedding generation (outer query): the outer
ML.PREDICT
call uses thebqml_tutorial.all-MiniLM-L6-v2
model. The query takes theinput_ids
andattention_mask
columns from the inner query's output as its input.
The SELECT
statement retrieves the sentence_embedding
column, which
is an array of FLOAT
values that represent the text's semantic embedding.
On the command line, run the following command to run the query.
bq query --use_legacy_sql=false \ 'SELECT sentence_embedding FROM ML.PREDICT (MODEL `bqml_tutorial.all-MiniLM-L6-v2`, ( SELECT input_ids, attention_mask FROM ML.PREDICT(MODEL `bqml_tutorial.tokenizer`, ( SELECT title AS text FROM `bigquery-public-data.imdb.reviews` limit 10))))'
The result is similar to the following:
+-----------------------+ | sentence_embedding | +-----------------------+ | -0.02361682802438736 | | 0.02025664784014225 | | 0.005168713629245758 | | -0.026361213997006416 | | 0.0655381828546524 | | ... | +-----------------------+
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Use the predict
method
to generate embeddings using the ONNX models.
import bigframes.pandas as bpd
df = bpd.read_gbq("bigquery-public-data.imdb.reviews", max_results=10)
df_pred = df.rename(columns={"title": "text"})
tokens = tokenizer.predict(df_pred)
predictions = imported_onnx_model.predict(tokens)
predictions.peek(5)
The output is similar to the following:
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
Console
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
gcloud
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete individual resources
Alternatively, to remove the individual resources used in this tutorial, do the following:
Optional: Delete the dataset.
What's next
- Learn how to use text embeddings for semantic search and retrieval-augmented generation (RAG).
- For more information about converting transformers models to ONNX, see
Export a model to ONNX with
optimum.exporters.onnx
. - For more information about importing ONNX models, see
The
CREATE MODEL
statement for ONNX models. - For more information about performing prediction, see
The
ML.PREDICT
function. - For an overview of BigQuery ML, see Introduction to BigQuery ML.
- To get started using BigQuery ML, see Create machine learning models in BigQuery ML.