使用 SQL 和 Python 分析多模態資料
本教學課程將說明如何使用 SQL 查詢和 Python 使用者定義函式 (UDF),分析多模態資料。
本教學課程使用公開的 Cymbal 寵物商店資料集產品目錄。
目標
- 使用
ObjectRef
值,在 BigQuery 標準資料表中儲存圖片資料和結構化資料。 - 使用
AI.GENERATE_TABLE
函式,根據標準資料表中的圖片資料產生文字。 - 使用 Python UDF 轉換現有圖片,以建立新的圖片。
- 使用 Python UDF 將 PDF 分割成多個區塊,以利後續分析。
- 使用 Gemini 模型和
ML.GENERATE_TEXT
函式分析分割的 PDF 資料。 - 使用
ML.GENERATE_EMBEDDING
函式,根據標準資料表中的圖片資料產生嵌入項目。 - 使用
ObjectRef
值陣列處理排序的多模態資料。
費用
在本文件中,您會使用 Google Cloud的下列計費元件:
- BigQuery: you incur costs for the data that you process in BigQuery.
- BigQuery Python UDFs: you incur costs for using Python UDFs.
- Cloud Storage: you incur costs for the objects stored in Cloud Storage.
- Vertex AI: you incur costs for calls to Vertex AI models.
您可以使用 Pricing Calculator 根據預測用量產生預估費用。
如需詳細資訊,請參閱下列定價頁面:
事前準備
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, BigQuery Connection, Cloud Storage, and Vertex AI APIs.
必要的角色
如要取得完成本教學課程所需的權限,請管理員授予您下列 IAM 角色:
-
建立連線:
BigQuery 連線管理員 (
roles/bigquery.connectionAdmin
) -
將權限授予連線的服務帳戶:
專案 IAM 管理員 (
roles/resourcemanager.projectIamAdmin
) -
建立 Cloud Storage 值區:
Storage Admin (
roles/storage.admin
) -
建立資料集、模型、自訂函式和資料表,並執行 BigQuery 作業:
BigQuery 管理員 (
roles/bigquery.admin
) -
建立可讓您讀取及修改 Cloud Storage 物件的網址:
BigQuery ObjectRef 管理員 (
roles/bigquery.objectRefAdmin
)
如要進一步瞭解如何授予角色,請參閱「管理專案、資料夾和機構的存取權」。
設定
在本節中,您將建立本教學課程中使用的資料集、連線、資料表和模型。
建立資料集
建立 BigQuery 資料集,以便容納您在本教學課程中建立的物件:
前往 Google Cloud 控制台的「BigQuery」頁面。
在「Explorer」窗格中選取專案。
展開
「動作」選項,然後按一下「建立資料集」。「Create dataset」窗格隨即開啟。在「資料集 ID」部分,輸入
cymbal_pets
。點選「建立資料集」。
建立值區
建立 Cloud Storage 值區來儲存轉換後的物件:
前往「Buckets」(值區) 頁面。
按一下
「Create」(建立)。在「Create a bucket」(建立值區) 頁面的「Get started」(開始使用) 專區中,輸入符合值區名稱規定的全球專屬名稱。
按一下 [建立]。
建立連線
建立 Cloud 資源連線,並取得連線的服務帳戶。BigQuery 會使用此連線存取 Cloud Storage 中的物件:
前往「BigQuery」頁面
在「Explorer」窗格中,按一下
「Add data」。「Add data」對話方塊隨即開啟。
在「Filter By」窗格中的「Data Source Type」部分,選取「Business Applications」。
或者,您也可以在「Search for data sources」欄位中輸入
Vertex AI
。在「精選資料來源」部分,按一下「Vertex AI」。
按一下「Vertex AI 模型:BigQuery 聯盟」解決方案資訊卡。
在「連線類型」清單中,選取「Vertex AI 遠端模型、遠端函式和 BigLake (Cloud 資源)」。
在「連線 ID」欄位中輸入
cymbal_conn
。點選「建立連線」。
按一下「前往連線」。
在「連線資訊」窗格中,複製服務帳戶 ID,以便在後續步驟中使用。
將權限授予連線的服務帳戶
將適當的角色授予連線的服務帳戶,以便存取其他服務。您必須在「開始前」一節中建立或選取的專案中授予這些角色。在其他專案中授予角色會導致 bqcx-1234567890-xxxx@gcp-sa-bigquery-condel.iam.gserviceaccount.com
does not have the permission to access resource
錯誤。
授予 Cloud Storage 值區的權限
授予服務帳戶存取權,以便使用您建立的值區中的物件:
前往「Buckets」(值區) 頁面。
按一下您建立的值區名稱。
按一下 [權限]。
按一下
「授予存取權」。系統會隨即開啟「授予存取權」對話方塊。在「新增主體」欄位,輸入先前複製的服務帳戶 ID。
在「請選擇角色」欄位中,依序選取「Cloud Storage」和「Storage Object User」(Storage 物件使用者)。
按一下 [儲存]。
授予使用 Vertex AI 模型的權限
授予服務帳戶使用 Vertex AI 模型的權限:
前往「IAM & Admin」(IAM 與管理) 頁面。
按一下
「授予存取權」。系統會隨即開啟「授予存取權」對話方塊。在「新增主體」欄位,輸入先前複製的服務帳戶 ID。
在「請選擇角色」欄位中,依序選取「Vertex AI」和「Vertex AI 使用者」。
按一下 [儲存]。
建立範例資料的資料表
建立資料表來儲存 Cymbal 寵物產品資訊。
建立 products
資料表
建立包含 Cymbal 寵物產品資訊的標準資料表:
前往 Google Cloud 控制台的「BigQuery」頁面。
在查詢編輯器中執行下列查詢,建立
products
資料表:LOAD DATA OVERWRITE cymbal_pets.products FROM FILES( format = 'avro', uris = [ 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/tables/products/products_*.avro']);
建立 product_images
資料表
建立包含 Cymbal 寵物產品圖片的物件資料表:
在 BigQuery 頁面的查詢編輯器中,執行下列查詢來建立
product_images
資料表:CREATE OR REPLACE EXTERNAL TABLE cymbal_pets.product_images WITH CONNECTION `us.cymbal_conn` OPTIONS ( object_metadata = 'SIMPLE', uris = ['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png'], max_staleness = INTERVAL 30 MINUTE, metadata_cache_mode = AUTOMATIC);
建立 product_manuals
資料表
建立包含 Cymbal 寵物產品手冊的物件資料表:
在 BigQuery 頁面的查詢編輯器中,執行下列查詢來建立
product_manuals
資料表:CREATE OR REPLACE EXTERNAL TABLE cymbal_pets.product_manuals WITH CONNECTION `us.cymbal_conn` OPTIONS ( object_metadata = 'SIMPLE', uris = ['gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/documents/*.pdf']);
建立文字產生模型
建立代表 Vertex AI Gemini 模型的 BigQuery ML 遠端模型:
在 BigQuery 頁面的查詢編輯器中執行下列查詢,建立遠端模型:
CREATE OR REPLACE MODEL `cymbal_pets.gemini` REMOTE WITH CONNECTION `us.cymbal_conn` OPTIONS (ENDPOINT = 'gemini-2.0-flash');
建立嵌入產生模型
建立代表 Vertex AI 多模態嵌入模型的 BigQuery ML 遠端模型:
在 BigQuery 頁面的查詢編輯器中執行下列查詢,建立遠端模型:
CREATE OR REPLACE MODEL `cymbal_pets.embedding_model` REMOTE WITH CONNECTION `us.cymbal_conn` OPTIONS (ENDPOINT = 'multimodalembedding@001');
使用多模態資料建立 products_mm
資料表
建立 products_mm
資料表,其中包含 image
資料欄,並填入來自 product_images
物件資料表的產品圖片。建立的 image
欄是使用 ObjectRef
格式的 STRUCT
欄。
在 BigQuery 頁面的查詢編輯器中,執行下列查詢,建立
products_mm
資料表並填入image
欄:CREATE OR REPLACE TABLE cymbal_pets.products_mm AS SELECT products.* EXCEPT (uri), ot.ref AS image FROM cymbal_pets.products INNER JOIN cymbal_pets.product_images ot ON ot.uri = products.uri;
在 BigQuery 頁面的查詢編輯器中,執行下列查詢,查看
image
欄資料:SELECT product_name, image FROM cymbal_pets.products_mm`
結果如下所示:
+--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+ | product_name | image.uri | image.version | image.authorizer | image.details | +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+ | AquaClear Aquarium Background | gs://cloud-samples-data/bigquery/ | 1234567891011 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"image/png", | | | tutorials/cymbal-pets/images/ | | | "md5_hash":"494f63b9b137975ff3e7a11b060edb1d", | | | aquaclear-aquarium-background.png | | | "size":1282805,"updated":1742492680017000}} | +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+ | AquaClear Aquarium | gs://cloud-samples-data/bigquery/ | 2345678910112 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"image/png", | | Gravel Vacuum | tutorials/cymbal-pets/images/ | | | "md5_hash":"b7bfc2e2641a77a402a1937bcf0003fd", | | | aquaclear-aquarium-gravel-vacuum.png | | | "size":820254,"updated":1742492682411000}} | +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+ | ... | ... | ... | | ... | +--------------------------------+--------------------------------------+-----------------------------------------------+------------------------------------------------+
使用 Gemini 模型產生產品資訊
使用 Gemini 模型為寵物用品店產品產生下列資料:
- 在
products_mm
資料表中新增image_description
欄。 - 為
products_mm
資料表填入animal_type
、search_keywords
和subcategory
欄。 - 執行查詢,傳回每個產品品牌的說明,以及該品牌的產品數量。系統會分析該品牌所有產品的產品資訊 (包括產品圖片),產生品牌說明。
在 BigQuery 頁面的查詢編輯器中,執行下列查詢,建立並填入
image_description
資料欄:CREATE OR REPLACE TABLE cymbal_pets.products_mm AS SELECT product_id, product_name, brand, category, subcategory, animal_type, search_keywords, price, description, inventory_level, supplier_id, average_rating, image, image_description FROM AI.GENERATE_TABLE( MODEL `cymbal_pets.gemini`, ( SELECT ('Can you describe the following image?', OBJ.GET_ACCESS_URL(image, 'r')) AS prompt, * FROM cymbal_pets.products_mm ), STRUCT('image_description STRING' AS output_schema));
在 BigQuery 頁面的查詢編輯器中,執行下列查詢,以產生的資料更新
animal_type
、search_keywords
和subcategory
欄:UPDATE cymbal_pets.products_mm p SET p.animal_type = s.animal_type, p.search_keywords = s.search_keywords, p.subcategory = s.subcategory FROM ( SELECT animal_type, search_keywords, subcategory, uri FROM AI.GENERATE_TABLE( MODEL `cymbal_pets.gemini`, ( SELECT ( 'For the image of a pet product, concisely generate the following metadata.' '1) animal_type and 2) 5 SEO search keywords, and 3) product subcategory', OBJ.GET_ACCESS_URL(image, 'r'), description) AS prompt, image.uri AS uri, FROM cymbal_pets.products_mm ), STRUCT( 'animal_type STRING, search_keywords ARRAY<STRING>, subcategory STRING' AS output_schema, 100 AS max_output_tokens)) ) s WHERE p.image.uri = s.uri;
在 BigQuery 頁面的查詢編輯器中執行下列查詢,即可查看產生的資料:
SELECT product_name, image_description, animal_type, search_keywords, subcategory, FROM cymbal_pets.products_mm;
結果如下所示:
+--------------------------------+-------------------------------------+-------------+------------------------+------------------+ | product_name | image.description | animal_type | search_keywords | subcategory | +--------------------------------+-------------------------------------+-------------+------------------------+------------------+ | AquaClear Aquarium Background | The image shows a colorful coral | fish | aquarium background | aquarium decor | | | reef backdrop. The background is a | | fish tank backdrop | | | | blue ocean with a bright light... | | coral reef decor | | | | | | underwater scenery | | | | | | aquarium decoration | | +--------------------------------+-------------------------------------+-------------+------------------------+------------------+ | AquaClear Aquarium | The image shows a long, clear | fish | aquarium gravel vacuum | aquarium | | Gravel Vacuum | plastic tube with a green hose | | aquarium cleaning | cleaning | | | attached to one end. The tube... | | aquarium maintenance | | | | | | fish tank cleaning | | | | | | gravel siphon | | +--------------------------------+-------------------------------------+-------------+------------------------+------------------+ | ... | ... | ... | ... | ... | +--------------------------------+-------------------------------------+-------------+------------------------+------------------+
在 BigQuery 頁面的查詢編輯器中,執行下列查詢,產生每個產品品牌的說明,以及該品牌的產品數量計數:
SELECT brand, brand_description, cnt FROM AI.GENERATE_TABLE( MODEL `cymbal_pets.gemini`, ( SELECT brand, COUNT(*) AS cnt, ( 'Use the images and text to give one concise brand description for a website brand page.' 'Return the description only.', ARRAY_AGG(OBJ.GET_ACCESS_URL(image, 'r')), ARRAY_AGG(description), ARRAY_AGG(category), ARRAY_AGG(subcategory)) AS prompt FROM cymbal_pets.products_mm GROUP BY brand ), STRUCT('brand_description STRING' AS output_schema)) ORDER BY cnt DESC;
結果如下所示:
+--------------+-------------------------------------+-----+ | brand | brand.description | cnt | +--------------+-------------------------------------+-----+ | AquaClear | AquaClear is a brand of aquarium | 33 | | | and pond care products that offer | | | | a wide range of solutions for... | | +--------------+-------------------------------------+-----+ | Ocean | Ocean Bites is a brand of cat food | 28 | | Bites | that offers a variety of recipes | | | | and formulas to meet the specific.. | | +--------------+-------------------------------------+-----+ | ... | ... |... | +--------------+-------------------------------------+-----+
建立 Python UDF 來轉換產品圖片
建立 Python UDF,將產品圖片轉換為灰階。
Python UDF 會使用開放原始碼程式庫,並且會使用並行執行功能同時轉換多個圖片。
在 BigQuery 頁面的查詢編輯器中執行下列查詢,建立
to_grayscale
UDF:CREATE OR REPLACE FUNCTION cymbal_pets.to_grayscale(src_json STRING, dst_json STRING) RETURNS STRING LANGUAGE python WITH CONNECTION `us.cymbal_conn` OPTIONS (entry_point='to_grayscale', runtime_version='python-3.11', packages=['numpy', 'opencv-python']) AS """ import cv2 as cv import numpy as np from urllib.request import urlopen, Request import json # Transform the image to grayscale. def to_grayscale(src_ref, dst_ref): src_json = json.loads(src_ref) srcUrl = src_json["access_urls"]["read_url"] dst_json = json.loads(dst_ref) dstUrl = dst_json["access_urls"]["write_url"] req = urlopen(srcUrl) arr = np.asarray(bytearray(req.read()), dtype=np.uint8) img = cv.imdecode(arr, -1) # 'Load it as it is' # Convert the image to grayscale gray_image = cv.cvtColor(img, cv.COLOR_BGR2GRAY) # Send POST request to the URL _, img_encoded = cv.imencode('.png', gray_image) req = Request(url=dstUrl, data=img_encoded.tobytes(), method='PUT', headers = { "Content-Type": "image/png", }) with urlopen(req) as f: pass return dst_ref """;
變換產品圖片
建立 products_grayscale
資料表,其中包含 ObjectRef
資料欄,其中包含灰階圖片的目的路徑和授權者。目的地路徑是從原始圖片路徑衍生而來。
建立資料表後,請執行 to_grayscale
函式來建立灰階圖片、將圖片寫入 Cloud Storage 值區,然後傳回包含灰階圖片存取網址和中繼資料的 ObjectRefRuntime
值。
在 BigQuery 頁面的查詢編輯器中,執行下列查詢來建立
products_grayscale
資料表:CREATE OR REPLACE TABLE cymbal_pets.products_grayscale AS SELECT product_id, product_name, image, OBJ.MAKE_REF( CONCAT('gs://BUCKET/cymbal-pets-images/grayscale/', REGEXP_EXTRACT(image.uri, r'([^/]+)$')), 'us.cymbal_conn') AS gray_image FROM cymbal_pets.products_mm;
將
BUCKET
替換為您建立的值區名稱。在 BigQuery 頁面的查詢編輯器中,執行下列查詢,建立灰階圖片、將圖片寫入 Cloud Storage 值區,然後傳回包含灰階圖片存取網址和中繼資料的
ObjectRefRuntime
值:SELECT cymbal_pets.to_grayscale( TO_JSON_STRING(OBJ.GET_ACCESS_URL(image, 'r')), TO_JSON_STRING(OBJ.GET_ACCESS_URL(gray_image, 'rw'))) FROM cymbal_pets.products_grayscale;
結果如下所示:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | f0 | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"access_urls":{"expiry_time":"2025-04-26T03:00:48Z", | | "read_url":"https://storage.googleapis.com/mybucket/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_read URL_information", | | "write_url":"https://storage.googleapis.com/myproject/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_write URL_information"}, | | "objectref":{"authorizer":"myproject.region.myconnection","uri":"gs://myproject/cymbal-pets-images/grayscale/ocean-bites-salmon-&-tuna-cat-food.png"}} | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"access_urls":{"expiry_time":"2025-04-26T03:00:48Z", | | "read_url":"https://storage.googleapis.com/mybucket/cymbal-pets-images%2Fgrayscale%2Ffluffy-buns-guinea-pig-tunnel.png?additional _read URL_information", | | "write_url":"https://storage.googleapis.com/myproject/cymbal-pets-images%2Fgrayscale%2Focean-bites-salmon-%26-tuna-cat-food.png?additional_write_URL_information"}, | | "objectref":{"authorizer":"myproject.region.myconnection","uri":"gs://myproject/cymbal-pets-images%2Fgrayscale%2Ffluffy-buns-guinea-pig-tunnel.png"}} | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ... | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
建立 Python UDF 來分割 PDF 資料
建立 Python UDF,將含有 Cymbal pets 產品手冊的 PDF 物件分割成多個部分。
PDF 通常很大,可能無法在單一呼叫中傳送至生成式 AI 模型。透過分割 PDF,您可以將 PDF 資料儲存在模型可用的格式中,方便進行分析。
在 BigQuery 頁面的查詢編輯器中執行下列查詢,建立
chunk_pdf
UDF:-- This function chunks the product manual PDF into multiple parts. -- The function accepts an ObjectRefRuntime value for the PDF file and the chunk size. -- It then parses the PDF, chunks the contents, and returns an array of chunked text. CREATE OR REPLACE FUNCTION cymbal_pets.chunk_pdf(src_json STRING, chunk_size INT64, overlap_size INT64) RETURNS ARRAY<STRING> LANGUAGE python WITH CONNECTION `us.cymbal_conn` OPTIONS (entry_point='chunk_pdf', runtime_version='python-3.11', packages=['pypdf']) AS """ import io import json from pypdf import PdfReader # type: ignore from urllib.request import urlopen, Request def chunk_pdf(src_ref: str, chunk_size: int, overlap_size: int) -> str: src_json = json.loads(src_ref) srcUrl = src_json["access_urls"]["read_url"] req = urlopen(srcUrl) pdf_file = io.BytesIO(bytearray(req.read())) reader = PdfReader(pdf_file, strict=False) # extract and chunk text simultaneously all_text_chunks = [] curr_chunk = "" for page in reader.pages: page_text = page.extract_text() if page_text: curr_chunk += page_text # split the accumulated text into chunks of a specific size with overlaop # this loop implements a sliding window approach to create chunks while len(curr_chunk) >= chunk_size: split_idx = curr_chunk.rfind(" ", 0, chunk_size) if split_idx == -1: split_idx = chunk_size actual_chunk = curr_chunk[:split_idx] all_text_chunks.append(actual_chunk) overlap = curr_chunk[split_idx + 1 : split_idx + 1 + overlap_size] curr_chunk = overlap + curr_chunk[split_idx + 1 + overlap_size :] if curr_chunk: all_text_chunks.append(curr_chunk) return all_text_chunks """;
分析 PDF 資料
執行 chunk_pdf
函式,將 product_manuals
資料表中的 PDF 資料分割成多個區塊,然後建立 product_manual_chunk_strings
資料表,每列包含一個 PDF 區塊。在 product_manual_chunk_strings
資料上使用 Gemini 模型,以便摘要產品手冊中的法律資訊。
在 BigQuery 頁面的查詢編輯器中,執行下列查詢來建立
product_manual_chunk_strings
資料表:CREATE OR REPLACE TABLE cymbal_pets.product_manual_chunk_strings AS SELECT chunked FROM cymbal_pets.product_manuals, UNNEST (cymbal_pets.chunk_pdf( TO_JSON_STRING( OBJ.GET_ACCESS_URL(OBJ.MAKE_REF(uri, 'us.cymbal_conn'), 'r')), 1000, 100 )) as chunked;
在 BigQuery 頁面的查詢編輯器中,執行下列查詢,使用 Gemini 模型分析 PDF 資料:
SELECT ml_generate_text_llm_result FROM ML.GENERATE_TEXT( MODEL `cymbal_pets.gemini`, ( SELECT ( 'Can you summarize the product manual as bullet points? Highlight the legal clauses', chunked) AS prompt, FROM cymbal_pets.product_manual_chunk_strings ), STRUCT( TRUE AS FLATTEN_JSON_OUTPUT));
結果如下所示:
+-------------------------------------------------------------------------------------------------------------------------------------------+ | ml_generate_text_llm_result | +-------------------------------------------------------------------------------------------------------------------------------------------+ | ## CritterCuisine Pro 5000 Automatic Pet Feeder Manual Summary: | | | | **Safety:** | | | | * **Stability:** Place feeder on a level, stable surface to prevent tipping. | | * **Power Supply:** Only use the included AC adapter. Using an incompatible adapter can damage the unit and void the warranty. | | * **Cord Safety:** Keep the power cord out of reach of pets to prevent chewing or entanglement. | | * **Children:** Supervise children around the feeder. This is not a toy. | | * **Pet Health:** Consult your veterinarian before using an automatic feeder if your pet has special dietary needs, health conditions, or | +-------------------------------------------------------------------------------------------------------------------------------------------+ | ## Product Manual Summary: | | | | **6.3 Manual Feeding:** | | | | * Press MANUAL button to dispense a single portion (Meal 1 size). **(Meal Enabled)** | | | | **6.4 Recording a Voice Message:** | | | | * Press and hold VOICE button. | | * Speak clearly into the microphone (up to 10 seconds). | | * Release VOICE button to finish recording. | | * Briefly press VOICE button to play back the recording. | | * To disable the voice message, record a blank message (hold VOICE button for 10 seconds without speaking). **(Meal Enabled)** | | | | **6.5 Low Food Level Indicator:** | +-------------------------------------------------------------------------------------------------------------------------------------------+ | ... | +-------------------------------------------------------------------------------------------------------------------------------------------+
產生嵌入項目並執行向量搜尋
從圖片資料產生嵌入項目,然後使用向量搜尋功能,透過嵌入項目傳回類似圖片。
在實際工作情境中,建議您先建立向量索引,再執行向量搜尋。向量索引可讓您更快速地執行向量搜尋,但代價是降低喚回率,因此會傳回較為近似的結果。
在 BigQuery 頁面的查詢編輯器中執行下列查詢,建立
products_embeddings
資料表:CREATE OR REPLACE TABLE cymbal_pets.products_embedding AS SELECT product_id, ml_generate_embedding_result as embedding, content as image FROM ML.GENERATE_EMBEDDING( MODEL `cymbal_pets.embedding_model`, ( SELECT OBJ.GET_ACCESS_URL(image, 'r') as content, image, product_id FROM cymbal_pets.products_mm ), STRUCT () );
在 BigQuery 頁面的查詢編輯器中,執行以下查詢來執行向量搜尋,以便傳回與指定輸入圖片相似的產品圖片:
SELECT * FROM VECTOR_SEARCH( TABLE cymbal_pets.products_embedding, 'embedding', (SELECT ml_generate_embedding_result as embedding FROM ML.GENERATE_EMBEDDING( MODEL `cymbal_pets.embedding_model`, (SELECT OBJ.FETCH_METADATA(OBJ.MAKE_REF('gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/cozy-naps-cat-scratching-post-with-condo.png', 'us.cymbal_conn')) as content) )) );
結果如下所示:
+-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+ | query.embedding | base.product_id | base.embedding | base.image.uri | base.image.version | base.image.authorizer | base.image.details | distance | +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+ | -0.0112330541 | 181 | -0.0112330541 | gs://cloud-samples-data/bigquery/ | 12345678910 | myproject.region.myconnection | {"gcs_metadata":{"content_type": | 0.0 | | 0.0142525584 | | 0.0142525584 | tutorials/cymbal-pets/images/ | | | "image/png","md5_hash":"21234567hst16555w60j", | | | 0.0135886827 | | 0.0135886827 | cozy-naps-cat-scratching-post-with-condo.png | | | "size":828318,"updated":1742492688982000}} | | | 0.0149955815 | | 0.0149955815 | | | | | | | ... | | ... | | | | | | | | | | | | | | | | | | | | | | | | +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+ | -0.0112330541 | 187 | -0.0190353896 | gs://cloud-samples-data/bigquery/ | 23456789101 | myproject.region.myconnection | {"gcs_metadata":{"content_type": | 0.4216330832.. | | 0.0142525584 | | 0.0116206668 | tutorials/cymbal-pets/images/ | | | "image/png","md5_hash":"7328728fhakd9937djo4", | | | 0.0135886827 | | 0.0136198215 | cozy-naps-cat-scratching-post-with-bed.png | | | "size":860113,"updated":1742492688774000}} | | | 0.0149955815 | | 0.0173457414 | | | | | | | ... | | ... | | | | | | | | | | | | | | | | | | | | | | | | +---------C--------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+ | ... | ... | ... | ... | ... | ... | ... | ... | +-----------------+-----------------+----------------+----------------------------------------------+--------------------+-------------------------------+------------------------------------------------+----------------+
使用 ObjectRef
值陣列處理排序的多模態資料
本節說明如何完成下列工作:
- 重新建立
product_manuals
資料表,讓該資料表包含Crittercuisine 5000
產品手冊的 PDF 檔案,以及該手冊每個頁面的 PDF 檔案。 - 建立表格,將手冊與其章節對應。代表完整手冊的
ObjectRef
值會儲存在STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>>
欄中。代表手冊頁面的ObjectRef
值會儲存在ARRAY<STRUCT<uri STRING, version STRING, authorizer STRING, details JSON>>
資料欄中。 - 分析一組
ObjectRef
值,以便傳回單一產生的值。 - 分別分析
ObjectRef
值陣列,並針對每個陣列值傳回產生的值。
在分析工作中,您可以將 ObjectRef
值陣列轉換為 ObjectRefRuntime
值的排序清單,然後將該清單傳遞至 Gemini 模型,並指定 ObjectRefRuntime
值做為提示的一部分。ObjectRefRuntime
值會提供已簽署的網址,讓模型用於存取 Cloud Storage 中的物件資訊。
請按照下列步驟,使用 ObjectRef
值陣列處理排序的多模態資料:
前往「BigQuery」頁面
在查詢編輯器中執行下列查詢,重新建立
product_manuals
資料表:CREATE OR REPLACE EXTERNAL TABLE `cymbal_pets.product_manuals` WITH CONNECTION `us.cymbal_conn` OPTIONS ( object_metadata = 'SIMPLE', uris = [ 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/documents/*.pdf', 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/document_chunks/*.pdf']);
在查詢編輯器中執行下列查詢,將 PDF 資料寫入
map_manual_to_chunks
資料表:-- Extract the file and chunks into a single table. -- Store the chunks in the chunks column as array of ObjectRefs (ordered by page number) CREATE OR REPLACE TABLE cymbal_pets.map_manual_to_chunks AS SELECT ARRAY_AGG(m1.ref)[0] manual, ARRAY_AGG(m2.ref ORDER BY m2.ref.uri) chunks FROM cymbal_pets.product_manuals m1 JOIN cymbal_pets.product_manuals m2 ON REGEXP_EXTRACT(m1.uri, r'.*/([^.]*).[^/]+') = REGEXP_EXTRACT(m2.uri, r'.*/([^.]*)_page[0-9]+.[^/]+') GROUP BY m1.uri;
在查詢編輯器中執行下列查詢,即可查看
map_manual_to_chunks
資料表中的 PDF 資料:SELECT * FROM cymbal_pets.map_manual_to_chunks;
結果如下所示:
+-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+ | manual.uri | manual.version | manual.authorizer | manual.details | chunks.uri | chunks.version | chunks.authorizer | chunks.details | +-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+ | gs://cloud-samples-data/bigquery/ | 1742492785900455 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"application/pef", | gs://cloud-samples-data/bigquery/ | 1745875761227129 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"application/pdf", | | tutorials/cymbal-pets/documents/ | | | "md5_hash":"c9032b037693d15a33210d638c763d0e", | tutorials/cymbal-pets/documents/ | | | "md5_hash":"5a1116cce4978ec1b094d8e8b49a1d7c", | | crittercuisine_5000_user_manual.pdf | | | "size":566105,"updated":1742492785941000}} | crittercuisine_5000_user_manual_page1.pdf | | | "size":504583,"updated":1745875761266000}} | | | | | +-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+ | | | | | crittercuisine_5000_user_manual_page1.pdf | 1745875760613874 | myproject.region.myconnection | {"gcs_metadata":{"content_type":"application/pdf", | | | | | | tutorials/cymbal-pets/documents/ | | | "md5_hash":"94d03ec65d28b173bc87eac7e587b325", | | | | | | crittercuisine_5000_user_manual_page2.pdf | | | "size":94622,"updated":1745875760649000}} | | | | | +-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+ | | | | | ... | ... | ... | ... | +-------------------------------------+--------------------------------+-----------------------------------+------------------------------------------------------+-------------------------------------------+---------------------------------+------------------------------------+-------------------------------------------------------+
在查詢編輯器中執行下列查詢,根據
ObjectRef
值陣列的分析結果,從 Gemini 模型產生單一回應:WITH manuals AS ( SELECT OBJ.GET_ACCESS_URL(manual, 'r') AS manual, ARRAY( SELECT OBJ.GET_ACCESS_URL(chunk, 'r') AS chunk FROM UNNEST(m1.chunks) AS chunk WITH OFFSET AS idx ORDER BY idx ) AS chunks FROM cymbal_pets.map_manual_to_chunks AS m1 ) SELECT ml_generate_text_llm_result AS Response FROM ML.GENERATE_TEXT( MODEL `cymbal_pets.gemini`, ( SELECT ( 'Can you provide a page by page summary for the first 3 pages of the attached manual? Only write one line for each page. The pages are provided in serial order', manuals.chunks) AS prompt, FROM manuals ), STRUCT(TRUE AS FLATTEN_JSON_OUTPUT));
結果如下所示:
+-------------------------------------------+ | Response | +-------------------------------------------+ | Page 1: This manual is for the | | CritterCuisine Pro 5000 automatic | | pet feeder. | | Page 2: The manual covers safety | | precautions, what's included, | | and product overview. | | Page 3: The manual covers assembly, | | initial setup, and programming the clock. | +-------------------------------------------+
在查詢編輯器中執行下列查詢,根據
ObjectRef
值陣列的分析結果,從 Gemini 模型產生多個回應:WITH input_chunked_objrefs AS ( SELECT row_id, offset, chunk_ref FROM ( SELECT ROW_NUMBER() OVER () AS row_id, * FROM `cymbal_pets.map_manual_to_chunks` ) AS indexed_table LEFT JOIN UNNEST(indexed_table.chunks) AS chunk_ref WITH OFFSET ), get_access_urls AS ( SELECT row_id, offset, chunk_ref, OBJ.GET_ACCESS_URL(chunk_ref, 'r') AS ObjectRefRuntime FROM input_chunked_objrefs ), valid_get_access_urls AS ( SELECT * FROM get_access_urls WHERE ObjectRefRuntime['runtime_errors'] IS NULL ), ordered_output_objrefruntime_array AS ( SELECT ARRAY_AGG(ObjectRefRuntime ORDER BY offset) AS ObjectRefRuntimeArray FROM valid_get_access_urls GROUP BY row_id ) SELECT page1_summary, page2_summary, page3_summary FROM AI.GENERATE_TABLE( MODEL `cymbal_pets.gemini`, ( SELECT ( 'Can you provide a page by page summary for the first 3 pages of the attached manual? Only write one line for each page. The pages are provided in serial order', ObjectRefRuntimeArray) AS prompt, FROM ordered_output_objrefruntime_array ), STRUCT( 'page1_summary STRING, page2_summary STRING, page3_summary STRING' AS output_schema));
結果如下所示:
+-----------------------------------------------+-------------------------------------------+----------------------------------------------------+ | page1_summary | page2_summary | page3_summary | +-----------------------------------------------+-------------------------------------------+----------------------------------------------------+ | This manual provides an overview of the | This section explains how to program | This page covers connecting the feeder to Wi-Fi | | CritterCuisine Pro 5000 automatic pet feeder, | the feeder's clock, set feeding | using the CritterCuisine Connect app, remote | | including its features, safety precautions, | schedules, copy and delete meal settings, | feeding, managing feeding schedules, viewing | | assembly instructions, and initial setup. | manually feed your pet, record | feeding logs, receiving low food alerts, | | | a voice message, and understand | updating firmware, creating multiple pet profiles, | | | the low food level indicator. | sharing access with other users, and cleaning | | | | and maintaining the feeder. | +-----------------------------------------------+-------------------------------------------+----------------------------------------------------+
清除所用資源
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.