Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Pengantar framework resolusi entity BigQuery
Dokumen ini menjelaskan arsitektur framework resolusi entity BigQuery. Penyelesaian entitas adalah kemampuan untuk mencocokkan data di seluruh data bersama tanpa ID umum atau untuk menambah data bersama menggunakan layanan identitas dari partner Google Cloud .
Anda dapat menggunakan resolusi entitas BigQuery untuk data apa pun yang
dipersiapkan sebelum berkontribusi pada data ke dalam
ruang bersih data.
Resolusi entity tersedia dalam model harga on-demand dan kapasitas, serta di semua edisi BigQuery.
Manfaat
Sebagai pengguna akhir, Anda dapat memanfaatkan resolusi entity dengan cara berikut:
Anda dapat me-resolve entity di tempat tanpa memicu biaya transfer data karena
subscriber atau Google Cloud partner mencocokkan data Anda dengan tabel identitas mereka
dan menulis hasil pencocokan ke set data dalam project Anda.
Anda tidak perlu mengelola tugas ekstrak, transformasi, dan pemuatan (ETL).
Sebagai penyedia identitas, Anda dapat memanfaatkan resolusi entity dengan
cara berikut:
Anda dapat menawarkan resolusi entitas sebagai penawaran software as a service (SaaS) terkelola di Google Cloud Marketplace.
Anda dapat menggunakan grafik identitas eksklusif dan logika pencocokan tanpa
mengungkapkannya kepada pengguna.
Arsitektur
BigQuery menerapkan resolusi entity menggunakan panggilan fungsi jarak jauh yang mengaktifkan proses resolusi entity di lingkungan penyedia identitas. Data Anda tidak perlu disalin atau dipindahkan selama proses ini.
Diagram dan penjelasan berikut menjelaskan alur kerja untuk resolusi entity:
Pengguna akhir memberikan akses baca akun layanan penyedia identitas
ke set data input mereka, dan akses tulis ke set data output mereka.
Pengguna memanggil fungsi jarak jauh yang mencocokkan data inputnya dengan
data grafik identitas penyedia. Parameter yang cocok diteruskan ke
penyedia dengan fungsi jarak jauh.
Akun layanan penyedia membaca set data input dan memprosesnya.
Akun layanan penyedia menulis hasil resolusi entitas ke
set data output pengguna.
Bagian berikut menjelaskan komponen pengguna akhir dan project penyedia.
Komponen pengguna akhir
Komponen pengguna akhir mencakup hal berikut:
Panggilan fungsi jarak jauh: panggilan yang menjalankan prosedur yang ditentukan dan
diimplementasikan oleh penyedia identitas. Panggilan ini memulai proses
resolusi entitas.
Set data input: set data sumber yang berisi data yang akan
dicocokkan. Secara opsional, set data dapat berisi tabel metadata dengan
parameter tambahan. Penyedia menentukan persyaratan skema untuk set data
input.
Set data output: set data tujuan tempat penyedia menyimpan
hasil yang cocok sebagai tabel output. Secara opsional, penyedia dapat menulis
tabel status tugas yang berisi detail tugas resolusi entitas ke set data
ini. Set data output dapat sama dengan set data input.
Komponen penyedia identitas
Komponen penyedia identitas mencakup hal berikut:
Plane kontrol: berisi fungsi jarak jauh BigQuery yang mengatur proses pencocokan. Fungsi ini dapat diterapkan sebagai
tugas Cloud Run, atau
fungsi Cloud Run. Bidang kontrol juga dapat berisi layanan lain, seperti autentikasi dan
otorisasi.
Data plane: berisi set data grafik identitas dan prosedur
yang disimpan yang menerapkan logika pencocokan penyedia. Prosedur tersimpan
dapat diterapkan sebagai
prosedur tersimpan SQL
atau
prosedur tersimpan Apache Spark.
Set data grafik identitas berisi tabel yang cocok dengan data pengguna akhir.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-17 UTC."],[[["\u003cp\u003eBigQuery entity resolution matches records across shared data without common identifiers or augments data using an identity service from a Google Cloud partner.\u003c/p\u003e\n"],["\u003cp\u003eEnd users benefit from in-place entity resolution without data transfer fees or the need to manage ETL jobs, as the matching is done by a subscriber or Google Cloud partner.\u003c/p\u003e\n"],["\u003cp\u003eIdentity providers can offer entity resolution as a managed SaaS product on Google Cloud Marketplace and use their proprietary identity graphs without revealing them.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery's entity resolution architecture uses remote function calls to activate processes in the identity provider's environment without moving the user's data.\u003c/p\u003e\n"],["\u003cp\u003eThe entity resolution process involves end users granting access to their datasets, calling a remote function, and the provider reading the input and writing the matched results to the user's output dataset.\u003c/p\u003e\n"]]],[],null,["# Introduction to the BigQuery entity resolution framework\n========================================================\n\nThis document describes the architecture of the BigQuery entity\nresolution framework. Entity resolution is the ability to match records across\nshared data where no common identifier exists or to augment shared data using an\nidentity service from a Google Cloud partner.\n\nThis document is intended for entity resolution end users (hereafter referred\nto as *end users* ) and identity providers. For implementation details, see\n[Configure and use entity resolution in\nBigQuery](/bigquery/docs/entity-resolution-setup).\n\nYou can use BigQuery entity resolution for any data that is\nprepared before contributing data into a\n[data clean room](/bigquery/docs/data-clean-rooms).\nEntity resolution is available in both the on-demand and capacity pricing\nmodels and in all BigQuery editions.\n\nBenefits\n========\n\nAs an end user, you can benefit from entity resolution in the following ways:\n\n- You can resolve entities in place without invoking data transfer fees because a subscriber or Google Cloud partner matches your data to their identity table and writes the match results to a dataset in your project.\n- You don't need to manage extract, transform, and load (ETL) jobs.\n\nAs an identity provider, you can benefit from entity resolution in the\nfollowing ways:\n\n- You can offer entity resolution as a managed software as a service (SaaS) offering on [Google Cloud Marketplace](/marketplace/docs/partners/integrated-saas).\n- You can use your proprietary identity graphs and match logic without revealing them to users.\n\nArchitecture\n------------\n\nBigQuery implements entity resolution by using remote function\ncalls that activate entity resolution processes in an identity provider's\nenvironment. Your data does not need to be copied or moved during this process.\nThe following diagram and explanation describe the workflow for entity\nresolution:\n\n1. The end user grants the identity provider's service account read access to their input dataset, and write access to their output dataset.\n2. The user calls the remote function that matches their input data with the provider's identity graph data. Matching parameters are passed to the provider with the remote function.\n3. The provider's service account reads the input dataset and processes it.\n4. The provider's service account writes the entity resolution results to the user's output dataset.\n\nThe following sections describe the end-user components and provider projects.\n\n### End-user components\n\nEnd-user components include the following:\n\n- **Remote function call**: a call that runs a procedure defined and implemented by the identity provider. This call starts the entity resolution process.\n- **Input dataset**: the source dataset that contains the data to be matched. Optionally, the dataset can contain a metadata table with additional parameters. Providers specify schema requirements for input datasets.\n- **Output dataset**: the destination dataset where the provider stores the matched results as an output table. Optionally, the provider can write a job status table that contains entity resolution job details to this dataset. The output dataset can be the same as the input dataset.\n\n### Identity provider components\n\nIdentity provider components include the following:\n\n- **Control plane** : contains a [BigQuery remote function](/bigquery/docs/remote-functions) that orchestrates the matching process. This function can be implemented as a [Cloud Run](/run/docs/overview/what-is-cloud-run) job, or a [Cloud Run function](/functions/docs/concepts/overview). The control plane can also contain other services, such as authentication and authorization.\n- **Data plane** : contains the identity graph dataset and the stored procedure that implements the provider matching logic. The stored procedure can be implemented as a [SQL stored procedure](/bigquery/docs/procedures) or an [Apache Spark stored procedure](/bigquery/docs/spark-procedures). The identity graph dataset contains the tables that the end-user data is matched against.\n\n| **Note:** Identity graphs can also be stored in some external databases.\n\nWhat's next\n-----------\n\n- To learn how to use entity resolution in your project, see [Configure and use entity resolution in BigQuery](/bigquery/docs/entity-resolution-setup)."]]