Specify ObjectRef columns in table schemas

This document describes how to define a BigQuery standard table schema with columns that can store ObjectRef values.

ObjectRef values provide metadata and connection information for objects in Cloud Storage. Use ObjectRef values when you need to integrate unstructured data into a standard table. For example, in a products table, you could store product images in the same row with the rest of the product information by adding a column containing ObjectRef values. You can store ObjectRef values in STRUCT columns that use the ObjectRef format, which is STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>.

For more information about working with multimodal data, see Analyze multimodal data. For a tutorial that shows how to work with ObjectRef data, see Analyze multimodal data with SQL. For information about working with multimodal data in Python, see Analyze multimodal data in Python with BigQuery DataFrames.

Prerequisites

To populate and update ObjectRef values in a standard table, the table must have a STRING column that contains URI information for the related Cloud Storage objects.

You must have a Cloud Storage bucket that contains the same objects that are identified in the URI data of the target standard table. If you want to maintain ObjectRef values in a standard table by using an object table, you must also have an object table that represents the objects in that bucket.

Maintaining ObjectRef values

You can use an object table to populate and update ObjectRef values in a standard table. If you are on the allowlist for the preview, any object tables you create have a ref column that contains an ObjectRef value for the given object. You can use the object URI to join the standard table to the object table in order to populate and update ObjectRef values. We recommend this approach for scalability, because it avoids the need to retrieve object metadata from Cloud Storage.

If you don't want to create an object table, you can use the OBJ.FETCH_METADATA and OBJ.MAKE_REF functions to populate and update ObjectRef values by fetching object metadata directly from Cloud Storage. This approach might be less scalable, because it requires the retrieval of object metadata from Cloud Storage.

Create an ObjectRef column

To create and populate an ObjectRef column in a standard table, select one of the following options:

Object table

Create and populate an ObjectRef column based on data from an object table ref column:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE OR REPLACE TABLE PROJECT_ID.DATASET_ID.TABLE_NAME
    AS
    SELECT TABLE_NAME.*, OBJECT_TABLE.ref AS objectrefcolumn
    FROM DATASET_ID.TABLE_NAME
    INNER JOIN DATASET_ID.OBJECT_TABLE
    ON OBJECT_TABLE.uri = TABLE_NAME.uri;

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • OBJECT_TABLE: the name of the object table that contains the object data that you want to integrate into the standard table.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

SQL functions

Create and populate an ObjectRef column based on output from the OBJ.FETCH_METADATA and OBJ.MAKE_REF functions:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    CREATE OR REPLACE TABLE PROJECT_ID.DATASET_ID.TABLE_NAME
    AS
    SELECT TABLE_NAME.*,
    OBJ.FETCH_METADATA(OBJ.MAKE_REF(uri, 'CONNECTION_ID')) AS objectrefcolumn
    FROM DATASET_ID.TABLE_NAME;

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • CONNECTION_ID: A STRING value that contains a Cloud resource connection that the service can use to access the objects in Cloud Storage, in the format location.connection_id. For example, us-west1.myconnection. You can get the connection ID by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example, projects/myproject/locations/connection_location/connections/myconnection.

      You must grant the Storage Object User (roles/storage.objectUser) role to the connection's service account on any Cloud Storage bucket where you are using it to access objects.

      The connection must be in the same project and region as the query where you are calling the function.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

Create an ARRAY<ObjectRef> column

You can create an ARRAY<STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>> column to contain arrays of ObjectRef values. For example, you could chunk a video into separate images, and then store these images as an array of ObjectRef values.

You can use the ARRAY_AGG function to aggregate arrays of ObjectRef values, including using the ORDER BY clause preserve object order if necessary. You can use the UNNEST operator to parse an array of ObjectRef values into individual ObjectRef values, including using the WITH OFFSET clause to preserve object order if necessary. You can use object metadata, like the URI path and object filename, to map ObjectRef values that represent object chunks to an ObjectRef value that represents the original object.

To see an example of how to work with arrays of ObjectRef values, see the Process ordered multimodal data using ARRAY<ObjectRef> values section of the Analyze multimodal data with SQL tutorial.

Update an ObjectRef column

To update an ObjectRef column in a standard table, select one of the following options:

Object table

Update an ObjectRef column by using data from an object table ref column:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    UPDATE PROJECT_ID.DATASET_ID.TABLE_NAME
    SET objectrefcolumn = (SELECT ref FROM DATASET_ID.OBJECT_TABLE WHERE OBJECT_TABLE.uri = TABLE_NAME.uri)
    WHERE uri != "";

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • OBJECT_TABLE: the name of the object table that contains the same object data as the standard table ObjectRef column.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

SQL functions

Update an ObjectRef column by using output from the OBJ.FETCH_METADATA and OBJ.MAKE_REF functions:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    UPDATE PROJECT_ID.DATASET_ID.TABLE_NAME
    SET objectrefcolumn = (SELECT OBJ.FETCH_METADATA(OBJ.MAKE_REF(uri, 'CONNECTION_ID')))
    WHERE uri != "";

    Replace the following:

    • PROJECT_ID: your project ID. You can skip this argument if you are creating the table in your current project.
    • DATASET_ID: the ID of the dataset that you are creating.
    • TABLE_NAME: the name of the standard table that you are recreating.
    • CONNECTION_ID: A STRING value that contains a Cloud resource connection that the service can use to access the objects in Cloud Storage, in the format location.connection_id. For example, us-west1.myconnection. You can get the connection ID by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example, projects/myproject/locations/connection_location/connections/myconnection.

      You must grant the Storage Object User (roles/storage.objectUser) role to the connection's service account on any Cloud Storage bucket where you are using it to access objects.

      The connection must be in the same project and region as the query where you are calling the function.

  3. Click Run.

For more information about how to run queries, see Run an interactive query.

What's next