ML pipelines overview
This document provides an overview of the services you can use to build an ML pipeline to manage your BigQuery ML MLOps workflow.
An ML pipeline is a representation of an MLOps workflow that is composed of a series of pipeline tasks. Each pipeline task performs a specific step in the MLOps workflow to train and deploy a model. Separating each step into a standardized, reusable task lets you automate and monitor repeatable processes in your ML practice.
You can use any of the following services to create BigQuery ML ML pipelines:
- Use Vertex AI Pipelines to create portable, extensible ML pipelines.
- Use GoogleSQL queries to create less complex SQL-based ML pipelines.
- Use Dataform to create more complex SQL-based ML pipelines, or ML pipelines where you need to use version control.
Vertex AI Pipelines
In Vertex AI Pipelines, an ML pipeline is structured as a directed acyclic graph (DAG) of containerized pipeline tasks that are interconnected using input-output dependencies. Each pipeline task is an instantiation of a pipeline component with specific inputs. When defining your ML pipeline, you connect multiple pipeline tasks to form a DAG by routing the outputs of one pipeline task to the inputs for the next pipeline task in the ML workflow. You can also use the original inputs to the ML pipeline as the inputs for a given pipeline task.
Use the BigQuery ML components of the Google Cloud Pipeline Components SDK to compose ML pipelines in Vertex AI Pipelines. To get started with BigQuery ML components, see the following notebooks:
GoogleSQL queries
You can use GoogleSQL procedural language to execute multiple statements in a multi-statement query. You can use a multi-statement query to:
- Run multiple statements in a sequence, with shared state.
- Automate management tasks such as creating or dropping tables.
- Implement complex logic using programming constructs such as
IF
andWHILE
.
After creating a multi-statement query, you can save and schedule the query to automate model training, inference, and monitoring.
If your ML pipeline includes use of the
ML.GENERATE_TEXT
function,
see
Handle quota errors by calling ML.GENERATE_TEXT
iteratively for more information on how to use SQL to
iterate through calls to the function. Calling the function
iteratively lets you address any retryable errors that occur due to exceeding
the quotas and limits.
Dataform
You can use Dataform to develop, test, version control, and schedule complex SQL workflows for data transformation in BigQuery. You can use Dataform for such tasks as data transformation in the Extraction, Loading, and Transformation (ELT) process for data integration. After raw data is extracted from source systems and loaded into BigQuery, Dataform helps you to transform it into a well-defined, tested, and documented suite of data tables.
If your ML pipeline includes use of the
ML.GENERATE_TEXT
function,
you can adapt the
structured_table_ml.js
example library
to iterate through calls to the function. Calling the function
iteratively lets you address any retryable errors that occur due to exceeding
the quotas and limits that apply to the function.