使用 SQL 和 Python 分析多模態資料

本教學課程將說明如何使用 SQL 查詢和 Python 使用者定義函式 (UDF)分析多模態資料

本教學課程使用公開的 Cymbal 寵物商店資料集產品目錄。

目標

  • 使用 ObjectRef 值,在 BigQuery 標準資料表中儲存圖片資料和結構化資料。
  • 使用 AI.GENERATE_TABLE 函式,根據標準資料表中的圖片資料產生文字。
  • 使用 Python UDF 轉換現有圖片,以建立新的圖片。
  • 使用 Python UDF 將 PDF 分割成多個區塊,以利後續分析。
  • 使用 Gemini 模型和 ML.GENERATE_TEXT 函式分析分割的 PDF 資料。
  • 使用 ML.GENERATE_EMBEDDING 函式,根據標準資料表中的圖片資料產生嵌入項目。
  • 使用 ObjectRef 值陣列處理排序的多模態資料。

費用

在本文件中,您會使用 Google Cloud的下列計費元件:

  • BigQuery: you incur costs for the data that you process in BigQuery.
  • BigQuery Python UDFs: you incur costs for using Python UDFs.
  • Cloud Storage: you incur costs for the objects stored in Cloud Storage.
  • Vertex AI: you incur costs for calls to Vertex AI models.

您可以使用 Pricing Calculator 根據預測用量產生預估費用。 新 Google Cloud 使用者可能符合申請免費試用的資格。

如需詳細資訊,請參閱下列定價頁面:

事前準備

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the BigQuery, BigQuery Connection, Cloud Storage, and Vertex AI APIs.

    Enable the APIs

必要的角色

如要取得完成本教學課程所需的權限,請管理員授予您下列 IAM 角色:

如要進一步瞭解如何授予角色,請參閱「管理專案、資料夾和機構的存取權」。

您或許還可透過自訂角色或其他預先定義的角色取得必要權限。

設定

在本節中,您將建立本教學課程中使用的資料集、連線、資料表和模型。

建立資料集

建立 BigQuery 資料集,以便容納您在本教學課程中建立的物件:

  1. 前往 Google Cloud 控制台的「BigQuery」頁面。

    前往 BigQuery

  2. 在「Explorer」窗格中選取專案。

  3. 展開 「動作」選項,然後按一下「建立資料集」。「Create dataset」窗格隨即開啟。

  4. 在「資料集 ID」部分,輸入 cymbal_pets

  5. 點選「建立資料集」

建立值區

建立 Cloud Storage 值區來儲存轉換後的物件:

  1. 前往「Buckets」(值區) 頁面。

    前往「Buckets」(值區) 頁面

  2. 按一下 「Create」(建立)

  3. 在「Create a bucket」(建立值區) 頁面的「Get started」(開始使用) 專區中,輸入符合值區名稱規定的全球專屬名稱。

  4. 按一下 [建立]。

建立連線

建立 Cloud 資源連線,並取得連線的服務帳戶。BigQuery 會使用此連線存取 Cloud Storage 中的物件:

  1. 前往「BigQuery」頁面

    前往 BigQuery

  2. 在「Explorer」窗格中,按一下 「Add data」

    「Add data」對話方塊隨即開啟。

  3. 在「Filter By」窗格中的「Data Source Type」部分,選取「Business Applications」

    或者,您也可以在「Search for data sources」欄位中輸入 Vertex AI

  4. 在「精選資料來源」部分,按一下「Vertex AI」

  5. 按一下「Vertex AI 模型:BigQuery 聯盟」解決方案資訊卡。

  6. 在「連線類型」清單中,選取「Vertex AI 遠端模型、遠端函式和 BigLake (Cloud 資源)」

  7. 在「連線 ID」欄位中輸入 cymbal_conn

  8. 點選「建立連線」

  9. 按一下「前往連線」

  10. 在「連線資訊」窗格中,複製服務帳戶 ID,以便在後續步驟中使用。

將權限授予連線的服務帳戶

將適當的角色授予連線的服務帳戶,以便存取其他服務。您必須在「開始前」一節中建立或選取的專案中授予這些角色。在其他專案中授予角色會導致 bqcx-1234567890-xxxx@gcp-sa-bigquery-condel.iam.gserviceaccount.com does not have the permission to access resource 錯誤。

授予 Cloud Storage 值區的權限

授予服務帳戶存取權,以便使用您建立的值區中的物件:

  1. 前往「Buckets」(值區) 頁面。

    前往「Buckets」(值區) 頁面

  2. 按一下您建立的值區名稱。

  3. 按一下 [權限]

  4. 按一下 「授予存取權」。系統會隨即開啟「授予存取權」對話方塊。

  5. 在「新增主體」欄位,輸入先前複製的服務帳戶 ID。

  6. 在「請選擇角色」欄位中,依序選取「Cloud Storage」和「Storage Object User」(Storage 物件使用者)

  7. 按一下 [儲存]

授予使用 Vertex AI 模型的權限

授予服務帳戶使用 Vertex AI 模型的權限:

  1. 前往「IAM & Admin」(IAM 與管理) 頁面。

    前往「IAM & Admin」(IAM 與管理)

  2. 按一下 「授予存取權」。系統會隨即開啟「授予存取權」對話方塊。

  3. 在「新增主體」欄位,輸入先前複製的服務帳戶 ID。

  4. 在「請選擇角色」欄位中,依序選取「Vertex AI」和「Vertex AI 使用者」

  5. 按一下 [儲存]

建立範例資料的資料表

建立資料表來儲存 Cymbal 寵物產品資訊。

建立 products 資料表

建立包含 Cymbal 寵物產品資訊的標準資料表:

  1. 前往 Google Cloud 控制台的「BigQuery」頁面。

    前往 BigQuery

  2. 在查詢編輯器中執行下列查詢,建立 products 資料表:

    LOAD DATA OVERWRITE cymbal_pets.products
    FROM
      FILES(
        format = 'avro',
        uris = [
          'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/tables/products/products_*.avro']);

建立 product_images 資料表

建立包含 Cymbal 寵物產品圖片的物件資料表:

  • BigQuery 頁面的查詢編輯器中,執行下列查詢來建立 product_images 資料表:

    CREATE OR REPLACE EXTERNAL TABLE cymbal_pets.product_images
      WITH CONNECTION `us.cymbal_conn`
      OPTIONS (
        object_metadata = 'SIMPLE',
        uris = ['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png'],
        max_staleness = INTERVAL 30 MINUTE,
        metadata_cache_mode = AUTOMATIC);

建立 product_manuals 資料表

建立包含 Cymbal 寵物產品手冊的物件資料表:

  • BigQuery 頁面的查詢編輯器中,執行下列查詢來建立 product_manuals 資料表:

    CREATE OR REPLACE EXTERNAL TABLE cymbal_pets.product_manuals
      WITH CONNECTION `us.cymbal_conn`
      OPTIONS (
        object_metadata = 'SIMPLE',
        uris = ['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/documents/*.pdf']);

建立文字產生模型

建立代表 Vertex AI Gemini 模型的 BigQuery ML 遠端模型

  • BigQuery 頁面的查詢編輯器中執行下列查詢,建立遠端模型:

    CREATE OR REPLACE MODEL `cymbal_pets.gemini`
      REMOTE WITH CONNECTION `us.cymbal_conn`
      OPTIONS (ENDPOINT = 'gemini-2.0-flash');

建立嵌入產生模型

建立代表 Vertex AI 多模態嵌入模型的 BigQuery ML 遠端模型:

  • BigQuery 頁面的查詢編輯器中執行下列查詢,建立遠端模型:

    CREATE OR REPLACE MODEL `cymbal_pets.embedding_model`
      REMOTE WITH CONNECTION `us.cymbal_conn`
      OPTIONS (ENDPOINT = 'multimodalembedding@001');

使用多模態資料建立 products_mm 資料表

建立 products_mm 資料表,其中包含 image 資料欄,並填入來自 product_images 物件資料表的產品圖片。建立的 image 欄是使用 ObjectRef 格式的 STRUCT 欄。

  1. BigQuery 頁面的查詢編輯器中,執行下列查詢,建立 products_mm 資料表並填入 image 欄:

    CREATE OR REPLACE TABLE cymbal_pets.products_mm
    AS
    SELECT products.* EXCEPT (uri), ot.ref AS image FROM cymbal_pets.products
    INNER JOIN cymbal_pets.product_images ot
    ON ot.uri = products.uri;
  2. BigQuery 頁面的查詢編輯器中,執行下列查詢,查看 image 欄資料:

    SELECT product_name, image
    FROM cymbal_pets.products_mm`

    結果如下所示:

    +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
    | product_name                   | image.uri                            | image.version | image.authorizer              | image.details                                  |
    +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
    |  AquaClear Aquarium Background | gs://cloud-samples-data/bigquery/    | 1234567891011 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"image/png",   |
    |                                | tutorials/cymbal-pets/images/        |               |                               | "md5_hash":"494f63b9b137975ff3e7a11b060edb1d", |
    |                                | aquaclear-aquarium-background.png    |               |                               | "size":1282805,"updated":1742492680017000}}    |
    +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
    |  AquaClear Aquarium            | gs://cloud-samples-data/bigquery/    | 2345678910112 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"image/png",   |
    |  Gravel Vacuum                 | tutorials/cymbal-pets/images/        |               |                               | "md5_hash":"b7bfc2e2641a77a402a1937bcf0003fd", |
    |                                | aquaclear-aquarium-gravel-vacuum.png |               |                               | "size":820254,"updated":1742492682411000}}     |
    +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
    | ...                            | ...                                  | ...           |                               | ...                                            |
    +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
    

使用 Gemini 模型產生產品資訊

使用 Gemini 模型為寵物用品店產品產生下列資料:

  • products_mm 資料表中新增 image_description 欄。
  • products_mm 資料表填入 animal_typesearch_keywordssubcategory 欄。
  • 執行查詢,傳回每個產品品牌的說明,以及該品牌的產品數量。系統會分析該品牌所有產品的產品資訊 (包括產品圖片),產生品牌說明。
  1. BigQuery 頁面的查詢編輯器中,執行下列查詢,建立並填入 image_description 資料欄:

    CREATE OR REPLACE TABLE cymbal_pets.products_mm
    AS
    SELECT
      product_id,
      product_name,
      brand,
      category,
      subcategory,
      animal_type,
      search_keywords,
      price,
      description,
      inventory_level,
      supplier_id,
      average_rating,
      image,
      image_description
    FROM
      AI.GENERATE_TABLE(
        MODEL `cymbal_pets.gemini`,
        (
          SELECT
            ('Can you describe the following image?', OBJ.GET_ACCESS_URL(image, 'r')) AS prompt,
            *
          FROM
            cymbal_pets.products_mm
        ),
        STRUCT('image_description STRING' AS output_schema));
  2. BigQuery 頁面的查詢編輯器中,執行下列查詢,以產生的資料更新 animal_typesearch_keywordssubcategory 欄:

    UPDATE cymbal_pets.products_mm p
    SET
      p.animal_type = s.animal_type,
      p.search_keywords = s.search_keywords,
      p.subcategory = s.subcategory
    FROM
      (
        SELECT
          animal_type,
          search_keywords,
          subcategory,
          uri
        FROM
          AI.GENERATE_TABLE(
            MODEL `cymbal_pets.gemini`,
            (
              SELECT
                (
                  'For the image of a pet product, concisely generate the following metadata.'
                    '1) animal_type and 2) 5 SEO search keywords, and 3) product subcategory',
                  OBJ.GET_ACCESS_URL(image, 'r'),
                  description) AS prompt,
                image.uri AS uri,
              FROM cymbal_pets.products_mm
            ),
            STRUCT(
              'animal_type STRING, search_keywords ARRAY<STRING>, subcategory STRING' AS output_schema,
              100 AS max_output_tokens))
      ) s
    WHERE p.image.uri = s.uri;
  3. BigQuery 頁面的查詢編輯器中執行下列查詢,即可查看產生的資料:

    SELECT
      product_name,
      image_description,
      animal_type,
      search_keywords,
      subcategory,
    FROM cymbal_pets.products_mm;

    結果如下所示:

    +--------------------------------+-------------------------------------+-------------+------------------------+------------------+
    | product_name                   | image.description                   | animal_type | search_keywords        | subcategory      |
    +--------------------------------+-------------------------------------+-------------+------------------------+------------------+
    |  AquaClear Aquarium Background | The image shows a colorful coral    | fish        | aquarium background    | aquarium decor   |
    |                                | reef backdrop. The background is a  |             | fish tank backdrop     |                  |
    |                                | blue ocean with a bright light...   |             | coral reef decor       |                  |
    |                                |                                     |             | underwater scenery     |                  |
    |                                |                                     |             | aquarium decoration    |                  |
    +--------------------------------+-------------------------------------+-------------+------------------------+------------------+
    |  AquaClear Aquarium            | The image shows a long, clear       | fish        | aquarium gravel vacuum | aquarium         |
    |  Gravel Vacuum                 | plastic tube with a green hose      |             | aquarium cleaning      | cleaning         |
    |                                | attached to one end. The tube...    |             | aquarium maintenance   |                  |
    |                                |                                     |             | fish tank cleaning     |                  |
    |                                |                                     |             | gravel siphon          |                  |
    +--------------------------------+-------------------------------------+-------------+------------------------+------------------+
    | ...                            | ...                                 | ...         |  ...                   | ...              |
    +--------------------------------+-------------------------------------+-------------+------------------------+------------------+
    
  4. BigQuery 頁面的查詢編輯器中,執行下列查詢,產生每個產品品牌的說明,以及該品牌的產品數量計數:

    SELECT
      brand,
      brand_description,
      cnt
    FROM
      AI.GENERATE_TABLE(
        MODEL `cymbal_pets.gemini`,
        (
          SELECT
            brand,
            COUNT(*) AS cnt,
            (
              'Use the images and text to give one concise brand description for a website brand page.'
                'Return the description only.',
              ARRAY_AGG(OBJ.GET_ACCESS_URL(image, 'r')),
              ARRAY_AGG(description),
              ARRAY_AGG(category),
              ARRAY_AGG(subcategory)) AS prompt
          FROM cymbal_pets.products_mm
          GROUP BY brand
        ),
        STRUCT('brand_description STRING' AS output_schema))
    ORDER BY cnt DESC;

    結果如下所示:

    +--------------+-------------------------------------+-----+
    | brand        | brand.description                   | cnt |
    +--------------+-------------------------------------+-----+
    |  AquaClear   | AquaClear is a brand of aquarium    | 33  |
    |              | and pond care products that offer   |     |
    |              | a wide range of solutions for...    |     |
    +--------------+-------------------------------------+-----+
    |  Ocean       | Ocean Bites is a brand of cat food  | 28  |
    |  Bites       | that offers a variety of recipes    |     |
    |              | and formulas to meet the specific.. |     |
    +--------------+-------------------------------------+-----+
    |  ...         | ...                                 |...  |
    +--------------+-------------------------------------+-----+
    

建立 Python UDF 來轉換產品圖片

建立 Python UDF,將產品圖片轉換為灰階。

Python UDF 會使用開放原始碼程式庫,並且會使用並行執行功能同時轉換多個圖片。

  1. BigQuery 頁面的查詢編輯器中執行下列查詢,建立 to_grayscale UDF:

    CREATE OR REPLACE FUNCTION cymbal_pets.to_grayscale(src_json STRING, dst_json STRING)
    RETURNS STRING
    LANGUAGE python
    WITH CONNECTION `us.cymbal_conn`
    OPTIONS (entry_point='to_grayscale', runtime_version='python-3.11', packages=['numpy', 'opencv-python'])
    AS """
    
    import cv2 as cv
    import numpy as np
    from urllib.request import urlopen, Request
    import json
    
    # Transform the image to grayscale.
    def to_grayscale(src_ref, dst_ref):
      src_json = json.loads(src_ref)
      srcUrl = src_json["access_urls"]["read_url"]
    
      dst_json = json.loads(dst_ref)
      dstUrl = dst_json["access_urls"]["write_url"]
    
      req = urlopen(srcUrl)
      arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
      img = cv.imdecode(arr, -1) # 'Load it as it is'
    
      # Convert the image to grayscale
      gray_image = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
    
      # Send POST request to the URL
      _, img_encoded = cv.imencode('.png', gray_image)
    
      req = Request(url=dstUrl, data=img_encoded.tobytes(), method='PUT', headers = {
          "Content-Type": "image/png",
      })
      with urlopen(req) as f:
          pass
      return dst_ref
    """;

變換產品圖片

建立 products_grayscale 資料表,其中包含 ObjectRef 資料欄,其中包含灰階圖片的目的路徑和授權者。目的地路徑是從原始圖片路徑衍生而來。

建立資料表後,請執行 to_grayscale 函式來建立灰階圖片、將圖片寫入 Cloud Storage 值區,然後傳回包含灰階圖片存取網址和中繼資料的 ObjectRefRuntime 值。

  1. BigQuery 頁面的查詢編輯器中,執行下列查詢來建立 products_grayscale 資料表:

    CREATE OR REPLACE TABLE cymbal_pets.products_grayscale
    AS
    SELECT
      product_id,
      product_name,
      image,
      OBJ.MAKE_REF(
        CONCAT('gs://BUCKET/cymbal-pets-images/grayscale/', REGEXP_EXTRACT(image.uri, r'([^/]+)$')),
        'us.cymbal_conn') AS gray_image
    FROM cymbal_pets.products_mm;

    BUCKET 替換為您建立的值區名稱

  2. BigQuery 頁面的查詢編輯器中,執行下列查詢,建立灰階圖片、將圖片寫入 Cloud Storage 值區,然後傳回包含灰階圖片存取網址和中繼資料的 ObjectRefRuntime 值:

    SELECT cymbal_pets.to_grayscale(
      TO_JSON_STRING(OBJ.GET_ACCESS_URL(image, 'r')),
      TO_JSON_STRING(OBJ.GET_ACCESS_URL(gray_image, 'rw')))
    FROM cymbal_pets.products_grayscale;

    結果如下所示:

    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | f0                                                                                                                                                                    |
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | {"access_urls":{"expiry_time":"2025-04-26T03:00:48Z",                                                                                                                 |
    | "read_url":"https://storage.googleapis.com/mybucket/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_read URL_information",       |
    | "write_url":"https://storage.googleapis.com/myproject/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_write URL_information"},   |
    | "objectref":{"authorizer":"myproject.region.myconnection","uri":"gs://myproject/cymbal-pets-images/grayscale/ocean-bites-salmon-&-tuna-cat-food.png"}}                |
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | {"access_urls":{"expiry_time":"2025-04-26T03:00:48Z",                                                                                                                 |
    | "read_url":"https://storage.googleapis.com/mybucket/cymbal-pets-images%2Fgrayscale%2Ffluffy-buns-guinea-pig-tunnel.png?additional _read URL_information",             |
    | "write_url":"https://storage.googleapis.com/myproject/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_write_URL_information"},   |
    | "objectref":{"authorizer":"myproject.region.myconnection","uri":"gs://myproject/cymbal-pets-images%2Fgrayscale%2Ffluffy-buns-guinea-pig-tunnel.png"}}                 |
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |  ...                                                                                                                                                                  |
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

建立 Python UDF 來分割 PDF 資料

建立 Python UDF,將含有 Cymbal pets 產品手冊的 PDF 物件分割成多個部分。

PDF 通常很大,可能無法在單一呼叫中傳送至生成式 AI 模型。透過分割 PDF,您可以將 PDF 資料儲存在模型可用的格式中,方便進行分析。

  1. BigQuery 頁面的查詢編輯器中執行下列查詢,建立 chunk_pdf UDF:

    -- This function chunks the product manual PDF into multiple parts.
    -- The function accepts an ObjectRefRuntime value for the PDF file and the chunk size.
    -- It then parses the PDF, chunks the contents, and returns an array of chunked text.
    CREATE OR REPLACE FUNCTION cymbal_pets.chunk_pdf(src_json STRING, chunk_size INT64, overlap_size INT64)
    RETURNS ARRAY<STRING>
    LANGUAGE python
    WITH CONNECTION `us.cymbal_conn`
    OPTIONS (entry_point='chunk_pdf', runtime_version='python-3.11', packages=['pypdf'])
    AS """
    import io
    import json
    
    from pypdf import PdfReader  # type: ignore
    from urllib.request import urlopen, Request
    
    def chunk_pdf(src_ref: str, chunk_size: int, overlap_size: int) -> str:
      src_json = json.loads(src_ref)
      srcUrl = src_json["access_urls"]["read_url"]
    
      req = urlopen(srcUrl)
      pdf_file = io.BytesIO(bytearray(req.read()))
      reader = PdfReader(pdf_file, strict=False)
    
      # extract and chunk text simultaneously
      all_text_chunks = []
      curr_chunk = ""
      for page in reader.pages:
          page_text = page.extract_text()
          if page_text:
              curr_chunk += page_text
              # split the accumulated text into chunks of a specific size with overlaop
              # this loop implements a sliding window approach to create chunks
              while len(curr_chunk) >= chunk_size:
                  split_idx = curr_chunk.rfind(" ", 0, chunk_size)
                  if split_idx == -1:
                      split_idx = chunk_size
                  actual_chunk = curr_chunk[:split_idx]
                  all_text_chunks.append(actual_chunk)
                  overlap = curr_chunk[split_idx + 1 : split_idx + 1 + overlap_size]
                  curr_chunk = overlap + curr_chunk[split_idx + 1 + overlap_size :]
      if curr_chunk:
          all_text_chunks.append(curr_chunk)
    
      return all_text_chunks
    """;

分析 PDF 資料

執行 chunk_pdf 函式,將 product_manuals 資料表中的 PDF 資料分割成多個區塊,然後建立 product_manual_chunk_strings 資料表,每列包含一個 PDF 區塊。在 product_manual_chunk_strings 資料上使用 Gemini 模型,以便摘要產品手冊中的法律資訊。

  1. BigQuery 頁面的查詢編輯器中,執行下列查詢來建立 product_manual_chunk_strings 資料表:

    CREATE OR REPLACE TABLE cymbal_pets.product_manual_chunk_strings
    AS
    SELECT chunked
    FROM cymbal_pets.product_manuals,
    UNNEST (cymbal_pets.chunk_pdf(
      TO_JSON_STRING(
        OBJ.GET_ACCESS_URL(OBJ.MAKE_REF(uri, 'us.cymbal_conn'), 'r')),
        1000,
        100
    )) as chunked;
  2. BigQuery 頁面的查詢編輯器中,執行下列查詢,使用 Gemini 模型分析 PDF 資料:

    SELECT
      ml_generate_text_llm_result
    FROM
      ML.GENERATE_TEXT(
        MODEL `cymbal_pets.gemini`,
        (
          SELECT
            (
              'Can you summarize the product manual as bullet points? Highlight the legal clauses',
              chunked) AS prompt,
          FROM cymbal_pets.product_manual_chunk_strings
        ),
        STRUCT(
          TRUE AS FLATTEN_JSON_OUTPUT));

    結果如下所示:

    +-------------------------------------------------------------------------------------------------------------------------------------------+
    | ml_generate_text_llm_result                                                                                                               |
    +-------------------------------------------------------------------------------------------------------------------------------------------+
    | ## CritterCuisine Pro 5000 Automatic Pet Feeder Manual Summary:                                                                           |
    |                                                                                                                                           |
    | **Safety:**                                                                                                                               |
    |                                                                                                                                           |
    | * **Stability:** Place feeder on a level, stable surface to prevent tipping.                                                              |
    | * **Power Supply:** Only use the included AC adapter. Using an incompatible adapter can damage the unit and void the warranty.            |
    | * **Cord Safety:** Keep the power cord out of reach of pets to prevent chewing or entanglement.                                           |
    | * **Children:** Supervise children around the feeder. This is not a toy.                                                                  |
    | * **Pet Health:** Consult your veterinarian before using an automatic feeder if your pet has special dietary needs, health conditions, or |
    +-------------------------------------------------------------------------------------------------------------------------------------------+
    | ## Product Manual Summary:                                                                                                                |
    |                                                                                                                                           |
    | **6.3 Manual Feeding:**                                                                                                                   |
    |                                                                                                                                           |
    | * Press MANUAL button to dispense a single portion (Meal 1 size). **(Meal Enabled)**                                                      |
    |                                                                                                                                           |
    | **6.4 Recording a Voice Message:**                                                                                                        |
    |                                                                                                                                           |
    | * Press and hold VOICE button.                                                                                                            |
    | * Speak clearly into the microphone (up to 10 seconds).                                                                                   |
    | * Release VOICE button to finish recording.                                                                                               |
    | * Briefly press VOICE button to play back the recording.                                                                                  |
    | * To disable the voice message, record a blank message (hold VOICE button for 10 seconds without speaking). **(Meal Enabled)**            |
    |                                                                                                                                           |
    | **6.5 Low Food Level Indicator:**                                                                                                         |
    +-------------------------------------------------------------------------------------------------------------------------------------------+
    | ...                                                                                                                                       |
    +-------------------------------------------------------------------------------------------------------------------------------------------+
    

從圖片資料產生嵌入項目,然後使用向量搜尋功能,透過嵌入項目傳回類似圖片。

在實際工作情境中,建議您先建立向量索引,再執行向量搜尋。向量索引可讓您更快速地執行向量搜尋,但代價是降低喚回率,因此會傳回較為近似的結果。

  1. BigQuery 頁面的查詢編輯器中執行下列查詢,建立 products_embeddings 資料表:

    CREATE OR REPLACE TABLE cymbal_pets.products_embedding
    AS
    SELECT product_id, ml_generate_embedding_result as embedding, content as image
    FROM ML.GENERATE_EMBEDDING(
    MODEL `cymbal_pets.embedding_model`,
      (
        SELECT OBJ.GET_ACCESS_URL(image, 'r') as content, image, product_id
        FROM cymbal_pets.products_mm
      ),
      STRUCT ()
    );
  2. BigQuery 頁面的查詢編輯器中,執行以下查詢來執行向量搜尋,以便傳回與指定輸入圖片相似的產品圖片:

    SELECT *
    FROM
    VECTOR_SEARCH(
      TABLE cymbal_pets.products_embedding,
      'embedding',
      (SELECT ml_generate_embedding_result as embedding FROM ML.GENERATE_EMBEDDING(
        MODEL `cymbal_pets.embedding_model`,
        (SELECT OBJ.FETCH_METADATA(OBJ.MAKE_REF('gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/cozy-naps-cat-scratching-post-with-condo.png', 'us.cymbal_conn')) as content)
      ))
    );

    結果如下所示:

    +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
    | query.embedding | base.product_id | base.embedding | base.image.uri                               | base.image.version | base.image.authorizer         | base.image.details                             | distance       |
    +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
    | -0.0112330541   | 181             | -0.0112330541  | gs://cloud-samples-data/bigquery/            | 12345678910        | myproject.region.myconnection | {"gcs_metadata":{"content_type":               | 0.0            |
    | 0.0142525584    |                 |  0.0142525584  | tutorials/cymbal-pets/images/                |                    |                               | "image/png","md5_hash":"21234567hst16555w60j", |                |
    | 0.0135886827    |                 |  0.0135886827  | cozy-naps-cat-scratching-post-with-condo.png |                    |                               | "size":828318,"updated":1742492688982000}}     |                |
    | 0.0149955815    |                 |  0.0149955815  |                                              |                    |                               |                                                |                |
    | ...             |                 |  ...           |                                              |                    |                               |                                                |                |
    |                 |                 |                |                                              |                    |                               |                                                |                |
    |                 |                 |                |                                              |                    |                               |                                                |                |
    +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
    | -0.0112330541   | 187             | -0.0190353896  | gs://cloud-samples-data/bigquery/            | 23456789101        | myproject.region.myconnection | {"gcs_metadata":{"content_type":               | 0.4216330832.. |
    | 0.0142525584    |                 |  0.0116206668  | tutorials/cymbal-pets/images/                |                    |                               | "image/png","md5_hash":"7328728fhakd9937djo4", |                |
    | 0.0135886827    |                 |  0.0136198215  | cozy-naps-cat-scratching-post-with-bed.png   |                    |                               | "size":860113,"updated":1742492688774000}}     |                |
    | 0.0149955815    |                 |  0.0173457414  |                                              |                    |                               |                                                |                |
    | ...             |                 |  ...           |                                              |                    |                               |                                                |                |
    |                 |                 |                |                                              |                    |                               |                                                |                |
    |                 |                 |                |                                              |                    |                               |                                                |                |
    +---------C--------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
    | ...             | ...             | ...            | ...                                          | ...                | ...                           | ...                                            | ...            |
    +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
    

使用 ObjectRef 值陣列處理排序的多模態資料

本節說明如何完成下列工作:

  1. 重新建立 product_manuals 資料表,讓該資料表包含 Crittercuisine 5000 產品手冊的 PDF 檔案,以及該手冊每個頁面的 PDF 檔案。
  2. 建立表格,將手冊與其章節對應。代表完整手冊的 ObjectRef 值會儲存在 STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>> 欄中。代表手冊頁面的 ObjectRef 值會儲存在 ARRAY<STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>> 資料欄中。
  3. 分析一組 ObjectRef 值,以便傳回單一產生的值。
  4. 分別分析 ObjectRef 值陣列,並針對每個陣列值傳回產生的值。

在分析工作中,您可以將 ObjectRef 值陣列轉換為 ObjectRefRuntime 值的排序清單,然後將該清單傳遞至 Gemini 模型,並指定 ObjectRefRuntime 值做為提示的一部分。ObjectRefRuntime 值會提供已簽署的網址,讓模型用於存取 Cloud Storage 中的物件資訊。

請按照下列步驟,使用 ObjectRef 值陣列處理排序的多模態資料:

  1. 前往「BigQuery」頁面

    前往 BigQuery

  2. 在查詢編輯器中執行下列查詢,重新建立 product_manuals 資料表:

    CREATE OR REPLACE EXTERNAL TABLE `cymbal_pets.product_manuals`
      WITH CONNECTION `us.cymbal_conn`
      OPTIONS (
        object_metadata = 'SIMPLE',
        uris = [
            'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/documents/*.pdf',
            'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/document_chunks/*.pdf']);
  3. 在查詢編輯器中執行下列查詢,將 PDF 資料寫入 map_manual_to_chunks 資料表:

    -- Extract the file and chunks into a single table.
    -- Store the chunks in the chunks column as array of ObjectRefs (ordered by page number)
    CREATE OR REPLACE TABLE cymbal_pets.map_manual_to_chunks
    AS
    SELECT ARRAY_AGG(m1.ref)[0] manual, ARRAY_AGG(m2.ref ORDER BY m2.ref.uri) chunks
    FROM cymbal_pets.product_manuals m1
    JOIN cymbal_pets.product_manuals m2
      ON
        REGEXP_EXTRACT(m1.uri, r'.*/([^.]*).[^/]+')
        = REGEXP_EXTRACT(m2.uri, r'.*/([^.]*)_page[0-9]+.[^/]+')
    GROUP BY m1.uri;
  4. 在查詢編輯器中執行下列查詢,即可查看 map_manual_to_chunks 資料表中的 PDF 資料:

    SELECT *
    FROM cymbal_pets.map_manual_to_chunks;

    結果如下所示:

    +-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+
    | manual.uri                          | manual.version                 | manual.authorizer                 | manual.details                                       | chunks.uri                                | chunks.version                  | chunks.authorizer                  | chunks.details                                        |
    +-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+
    | gs://cloud-samples-data/bigquery/   | 1742492785900455               | myproject.region.myconnection     | {"gcs_metadata":{"content_type":"application/pef",   | gs://cloud-samples-data/bigquery/         | 1745875761227129                | myproject.region.myconnection      | {"gcs_metadata":{"content_type":"application/pdf",    |
    | tutorials/cymbal-pets/documents/    |                                |                                   | "md5_hash":"c9032b037693d15a33210d638c763d0e",       | tutorials/cymbal-pets/documents/          |                                 |                                    | "md5_hash":"5a1116cce4978ec1b094d8e8b49a1d7c",        |
    | crittercuisine_5000_user_manual.pdf |                                |                                   | "size":566105,"updated":1742492785941000}}           | crittercuisine_5000_user_manual_page1.pdf |                                 |                                    | "size":504583,"updated":1745875761266000}}            |
    |                                     |                                |                                   |                                                      +-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+
    |                                     |                                |                                   |                                                      | crittercuisine_5000_user_manual_page1.pdf | 1745875760613874                | myproject.region.myconnection      | {"gcs_metadata":{"content_type":"application/pdf",    |
    |                                     |                                |                                   |                                                      | tutorials/cymbal-pets/documents/          |                                 |                                    | "md5_hash":"94d03ec65d28b173bc87eac7e587b325",        |
    |                                     |                                |                                   |                                                      | crittercuisine_5000_user_manual_page2.pdf |                                 |                                    | "size":94622,"updated":1745875760649000}}             |
    |                                     |                                |                                   |                                                      +-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+
    |                                     |                                |                                   |                                                      | ...                                       | ...                             |  ...                               | ...                                                   |
    +-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+
    
  5. 在查詢編輯器中執行下列查詢,根據 ObjectRef 值陣列的分析結果,從 Gemini 模型產生單一回應:

    WITH
      manuals AS (
        SELECT
          OBJ.GET_ACCESS_URL(manual, 'r') AS manual,
          ARRAY(
            SELECT OBJ.GET_ACCESS_URL(chunk, 'r') AS chunk
            FROM UNNEST(m1.chunks) AS chunk WITH OFFSET AS idx
            ORDER BY idx
          ) AS chunks
        FROM cymbal_pets.map_manual_to_chunks AS m1
      )
    SELECT ml_generate_text_llm_result AS Response
    FROM
      ML.GENERATE_TEXT(
        MODEL `cymbal_pets.gemini`,
        (
          SELECT
            (
              'Can you provide a page by page summary for the first 3 pages of the attached manual? Only write one line for each page. The pages are provided in serial order',
              manuals.chunks) AS prompt,
          FROM manuals
        ),
        STRUCT(TRUE AS FLATTEN_JSON_OUTPUT));

    結果如下所示:

    +-------------------------------------------+
    | Response                                  |
    +-------------------------------------------+
    | Page 1: This manual is for the            |
    | CritterCuisine Pro 5000 automatic         |
    | pet feeder.                               |
    | Page 2: The manual covers safety          |
    | precautions, what's included,             |
    | and product overview.                     |
    | Page 3: The manual covers assembly,       |
    | initial setup, and programming the clock. |
    +-------------------------------------------+
    
  6. 在查詢編輯器中執行下列查詢,根據 ObjectRef 值陣列的分析結果,從 Gemini 模型產生多個回應:

    WITH
      input_chunked_objrefs AS (
        SELECT row_id, offset, chunk_ref
        FROM
          (
            SELECT ROW_NUMBER() OVER () AS row_id, * FROM `cymbal_pets.map_manual_to_chunks`
          ) AS indexed_table
        LEFT JOIN
          UNNEST(indexed_table.chunks) AS chunk_ref
          WITH OFFSET
      ),
      get_access_urls AS (
        SELECT row_id, offset, chunk_ref, OBJ.GET_ACCESS_URL(chunk_ref, 'r') AS ObjectRefRuntime
        FROM input_chunked_objrefs
      ),
      valid_get_access_urls AS (
        SELECT *
        FROM get_access_urls
        WHERE ObjectRefRuntime['runtime_errors'] IS NULL
      ),
      ordered_output_objrefruntime_array AS (
        SELECT ARRAY_AGG(ObjectRefRuntime ORDER BY offset) AS ObjectRefRuntimeArray
        FROM valid_get_access_urls
        GROUP BY row_id
      )
    SELECT
      page1_summary,
      page2_summary,
      page3_summary
    FROM
      AI.GENERATE_TABLE(
        MODEL `cymbal_pets.gemini`,
        (
          SELECT
            (
              'Can you provide a page by page summary for the first 3 pages of the attached manual? Only write one line for each page. The pages are provided in serial order',
              ObjectRefRuntimeArray) AS prompt,
          FROM ordered_output_objrefruntime_array
        ),
        STRUCT(
          'page1_summary STRING, page2_summary STRING, page3_summary STRING' AS output_schema));

    結果如下所示:

    +-----------------------------------------------+-------------------------------------------+----------------------------------------------------+
    | page1_summary                                 | page2_summary                             | page3_summary                                      |
    +-----------------------------------------------+-------------------------------------------+----------------------------------------------------+
    | This manual provides an overview of the       | This section explains how to program      | This page covers connecting the feeder to Wi-Fi    |
    | CritterCuisine Pro 5000 automatic pet feeder, | the feeder's clock, set feeding           | using the CritterCuisine Connect app,  remote      |
    | including its features, safety precautions,   | schedules, copy and delete meal settings, | feeding, managing feeding schedules, viewing       |
    | assembly instructions, and initial setup.     | manually feed your pet, record            | feeding logs, receiving low food alerts,           |
    |                                               | a voice message, and understand           | updating firmware, creating multiple pet profiles, |
    |                                               | the low food level indicator.             | sharing access with other users, and cleaning      |
    |                                               |                                           | and maintaining the feeder.                        |
    +-----------------------------------------------+-------------------------------------------+----------------------------------------------------+
    

清除所用資源

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.