Introduction to routines
This document describes how to choose a routine, which is a resource type that you use to create functions or stored procedures in BigQuery.
Supported routines
BigQuery supports the following routines:
- User-defined functions (UDFs)
- User-defined aggregate functions (UDAFs)
- Table functions
- Remote functions
- Stored procedures
How to choose a routine
This section describes factors to consider when choosing a routine and compares routines by task.
Factors to consider
To choose a routine, consider the following factors, which are described in the sections for each type of routine:
- The type of task to implement.
- The programming language to use.
- The type of persistence to implement for the routine: temporary or persistent.
- The type of reuse required for the routine: across single or multiple queries.
- Performance considerations.
- Accessing external services.
- Sharing the routine with users.
Compare routines by task
The following table shows the type of tasks you can perform for each type of routine:
Task |
Routine resource type |
---|---|
Create functions that perform general-purpose tasks in BigQuery. |
SQL or Javascript UDF SQL or Javascript UDAF |
Create functions that perform general-purpose tasks in BigQuery and that communicate with external Google Cloud systems using a Cloud resource connection. |
Python UDF |
Create functions that aggregate data. |
UDAFs |
Create a table using parameters. |
Table functions |
Create functions that use languages, libraries, or services that are unsupported in BigQuery. These functions directly integrate with Cloud Run functions and Cloud Run. |
Remote functions |
Execute multiple statements in one query as a multi-statement query using procedural language. You can use a multi-statement query to do the following:
Create and call stored procedures for Apache Spark in BigQuery. |
Stored procedures |
User-defined functions (UDFs)
A UDF lets you create a function by using a SQL expression, JavaScript code, or Python code. UDFs accept columns of input, perform actions on the input, and return the result of those actions as a value.
You can define UDFs as either persistent or temporary. You can reuse persistent UDFs across multiple queries, while temporary UDFs only exist in the scope of a single query.
You can create UDFs for use with custom masking routines, which return a column's value after applying a UDF to the column. After you create the custom masking routine, it's available as a masking rule in Create data policies.
For more information about UDFs, see the following resources:
Language-based UDFs
- SQL-based UDFs support templated UDF parameters, which can match more than one argument type when the UDF is called. SQL UDFs can also return the value of a scalar subquery.
- Javascript-based UDFs let you call code written in JavaScript from a SQL
query.
- JavaScript UDFs typically consume more slot resources as compared to standard SQL queries, decreasing job performance.
- If the function can be expressed in SQL, it's often more optimal to run the code as a standard SQL query job.
- Python-based UDFs are built and run on BigQuery managed
resources. These UDFs let you implement a function in Python and use it in a
SQL query.
- You can access a Google Cloud service or an external service from a Python UDF by using the Cloud resource connection service account.
- You can also install third-party libraries from from the Python Package Index (PyPI).
Community contributed UDFs
In addition to the UDFs you create, community contributed UDFs are available in
the bigquery-public-data.persistent_udfs
public dataset and the open source
bigquery-utils
GitHub repository.
User-defined aggregate functions (UDAFs)
A UDAF lets you create an aggregate function by using an expression that contains SQL or Javascript code. A UDAF accepts columns of input, performs a calculation on a group of rows at a time, and then returns the result of that calculation as a single value.
UDAFs can't mutate data, talk to external systems, or send logs to Google Cloud Observability or similar applications.
For more information, see the following resources:
SQL UDAFs
SQL UDAFs normally aggregate function parameters across all rows in a
group.
However, you can specify a function parameter as non-aggregate by using the
NOT AGGREGATE
keyword. A non-aggregate function parameter is a scalar function
parameter with a constant value for all rows in a group. SQL UDAFs can contain
both aggregate and non-aggregate parameters.
Javascript UDAFs
Javascript UDAFs can include Javascript libraries. The JavaScript function body can include custom JavaScript code such as JavaScript global variables and custom functions.
Because Javascript-based functions typically use more resources, consulting these performance tips can be helpful.
Javascript UDAFs have some constraints. Only specific type encodings are allowed, and there are requirements for serialization and deserialization.
Compare UDFs and UDAFs
Choosing a UDF rather than choosing a UDAF depends on the specific task that you are trying to perform.
- To perform a calculation or transformation on individual data values, use a UDF.
- To do the same on groups of data values, use a UDAF.
For example, if you want to calculate the average of a column of numbers, then use a UDAF. If you want to convert a column of strings to uppercase, then use a UDF.
UDFs and UDAFs have the following similarities:
- UDFs and UDAFs can't mutate data, talk to external systems, or send logs to Google Cloud Observability or similar applications. The exception is Python UDFs, which can access external services using a Cloud resource connection. However, Python UDFs don't support VPC service controls or customer-managed encryption keys (CMEK).
- UDAFs have the same limitations as UDFs, plus a few more.
- UDFs and UDAFs have the same quotas and limits.
UDFs and UDAFs have the following differences:
Attribute |
UDFs |
UDAFs |
---|---|---|
Definition |
User-defined functions (UDFs) accept columns of input, perform actions on the input, and return the result of those actions as a value. |
User-defined aggregate functions (UDAFs) accept columns of input, perform a calculation on a group of rows at a time, and then return the result of that calculation as a single value. |
Languages supported |
SQL, Javascript, and Python |
SQL and Javascript |
Persistence |
|
|
Arguments and data types |
UDFs accept parameter values that conform to GoogleSQL for BigQuery data types. Some SQL types have a direct mapping to JavaScript types, but others don't. See supported types for Javascript. For a SQL UDF, parameter values can be Only Javascript UDFs have a determinism specifier that provides a hint to BigQuery as to whether the query result can be cached. |
SQL and Javascript UDAFs accept parameters values that conform to GoogleSQL for BigQuery data types. Function parameters can be aggregate or non-aggregate. |
Usage |
UDFs are commonly used for data cleaning, transformation, and validation. |
UDAFs are commonly used for calculating summary statistics, such as averages, sums, and counts. |
Table functions
A table function, also called a table-valued function (TVF), is a UDF that returns a table. You can use a table function anywhere that you can use a table. Table functions behave similarly to views, but a table function can take parameters.
You can do the following with table functions:
- Pass in multiple parameters.
- Call a table function in any context where a table is valid.
- Join the output from a table function with another table.
- Use a table function in a subquery.
For more information about table functions, see Table functions, Limitations, and Quotas and limits.
Remote functions
Remote functions enable you to implement your function in languages other than SQL and Javascript, or enable you to use libraries or services that are not supported in BigQuery UDFs.
A BigQuery remote function integrates your Google SQL function with Cloud Run functions and Cloud Run using any supported language, and then invokes those functions from Google SQL queries.
The following tasks are examples of what you can do with remote functions:
Creating a remote function requires the following steps:
- Create the HTTP endpoint in Cloud Run functions or Cloud Run.
- Create a remote function in BigQuery using the
CLOUD_RESOURCE
connection type. - Use the remote function in a query just like any other UDF for BigQuery.
For more information about remote functions, see Remote functions, Limitations, and Quotas and limits.
Stored procedures
A SQL stored procedure is a collection of statements that can be called from other queries or other stored procedures. You name and store a procedure in a BigQuery dataset.
Stored procedures support procedural language statements, which let you do things like define variables and implement control flow. You can learn more about procedural language statements in the Procedural language reference.
A stored procedure can do the following:
- Take input arguments and return values as output.
- Access or modify data across multiple datasets by multiple users.
- Contain a multi-statement query.
Some stored procedures are built into BigQuery and don't need to be created. These are called system procedures, and you can learn more about them in the System procedures reference.
Stored procedures for Spark in BigQuery are also supported. These procedures have quotas and limits.
To learn more about stored procedures, see SQL stored procedures.