Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Menjelajahi dan memvisualisasikan data di BigQuery dari dalam JupyterLab
Halaman ini menunjukkan beberapa contoh cara menjelajahi dan memvisualisasikan data
yang disimpan di BigQuery dari dalam antarmuka JupyterLab
dari instance notebook terkelola Vertex AI Workbench Anda.
Membuat kueri data menggunakan perintah magic %%bigquery
Di bagian ini, Anda akan menulis SQL langsung di sel notebook dan membaca data dari BigQuery ke dalam notebook Python.
Perintah magic yang menggunakan karakter persentase tunggal atau ganda (% atau %%) memungkinkan Anda menggunakan sintaksis minimal untuk berinteraksi dengan BigQuery dalam notebook. Library klien BigQuery untuk Python otomatis diinstal di instance notebook terkelola. Di balik layar, perintah magic %%bigquery menggunakan library klien BigQuery agar Python dapat menjalankan kueri yang diberikan, mengonversi hasilnya menjadi DataFrame pandas, menyimpan hasil ke variabel secara opsional, dan kemudian menampilkan hasil tersebut.
Catatan: Mulai versi 1.26.0 paket Python google-cloud-bigquery, BigQuery Storage API digunakan secara default untuk mendownload hasil dari perintah magic %%bigquery.
Untuk membuka file notebook, pilih File > New > Notebook.
Dalam dialog Select Kernel, pilih Python (Local), lalu klik Select.
File IPYNB baru Anda akan terbuka.
Untuk mendapatkan jumlah region berdasarkan negara dalam set data international_top_terms, masukkan pernyataan berikut:
Query complete after 0.07s: 100%|██████████| 4/4 [00:00<00:00, 1440.60query/s]
Downloading: 100%|██████████| 41/41 [00:02<00:00, 20.21rows/s]
country_code country_name num_regions
0 TR Turkey 81
1 TH Thailand 77
2 VN Vietnam 63
3 JP Japan 47
4 RO Romania 42
5 NG Nigeria 37
6 IN India 36
7 ID Indonesia 34
8 CO Colombia 33
9 MX Mexico 32
10 BR Brazil 27
11 EG Egypt 27
12 UA Ukraine 27
13 CH Switzerland 26
14 AR Argentina 24
15 FR France 22
16 SE Sweden 21
17 HU Hungary 20
18 IT Italy 20
19 PT Portugal 20
20 NO Norway 19
21 FI Finland 18
22 NZ New Zealand 17
23 PH Philippines 17
...
Pada sel berikutnya (di bawah output dari sel sebelumnya), masukkan perintah berikut untuk menjalankan kueri yang sama, tetapi kali ini simpan hasilnya ke DataFrame pandas baru yang bernama regions_by_country. Anda memberikan nama tersebut menggunakan argumen dengan perintah magic %%bigquery.
Membuat kueri data dengan langsung menggunakan library klien BigQuery
<
Mendapatkan ringkasan data dalam tabel BigQuery
Di bagian ini, Anda akan menggunakan pintasan notebook guna mendapatkan ringkasan statistik dan
visualisasi untuk semua kolom tabel BigQuery. Ini bisa
menjadi cara cepat untuk membuat profil data Anda sebelum mempelajarinya lebih lanjut.
Library klien BigQuery menyediakan perintah magic,
%bigquery_stats, yang dapat Anda panggil dengan nama tabel tertentu untuk memberikan
ringkasan tabel dan statistik mendetail dari setiap kolom
tabel.
Di sel berikutnya, masukkan kode berikut untuk menjalankan analisis tersebut di
tabel top_terms AS:
Setelah berjalan selama beberapa waktu, gambar akan muncul dengan berbagai statistik untuk
masing-masing dari 7 variabel dalam tabel top_terms. Gambar berikut menunjukkan
bagian dari beberapa contoh output:
Memvisualisasikan data BigQuery
Di bagian ini, Anda akan menggunakan kemampuan plot untuk memvisualisasikan hasil dari
kueri yang sebelumnya Anda jalankan di notebook Jupyter.
Di sel berikutnya, masukkan kode berikut untuk menggunakan metode DataFrame.plot() pandas
untuk membuat diagram batang yang memvisualisasikan hasil
kueri yang menampilkan jumlah region berdasarkan negara:
Di sel berikutnya, masukkan kode berikut untuk menggunakan metode
DataFrame.plot() pandas guna membuat diagram sebar yang memvisualisasikan
hasil dari kueri untuk persentase tumpang tindih dengan istilah penelusuran teratas
dari beberapa hari sebelumnya:
Outputnya serupa dengan yang berikut ini: Ukuran setiap titik mencerminkan
jumlah pasangan tanggal yang berjarak beberapa hari dalam data. Misalnya,
ada lebih banyak pasangan yang berjarak 1 hari daripada yang berjarak 30 hari
karena istilah penelusuran teratas muncul setiap hari selama sekitar satu bulan.
Untuk mengetahui informasi selengkapnya tentang visualisasi data, lihat
dokumentasi pandas.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-02 UTC."],[],[],null,["# Explore and visualize data in BigQuery from within JupyterLab\n=============================================================\n\n\n| Vertex AI Workbench managed notebooks is\n| [deprecated](/vertex-ai/docs/deprecations). On\n| April 14, 2025, support for\n| managed notebooks will end and the ability to create managed notebooks instances\n| will be removed. Existing instances will continue to function\n| but patches, updates, and upgrades won't be available. To continue using\n| Vertex AI Workbench, we recommend that you\n| [migrate\n| your managed notebooks instances to Vertex AI Workbench instances](/vertex-ai/docs/workbench/managed/migrate-to-instances).\n\n\u003cbr /\u003e\n\nThis page shows you some examples of how to explore and visualize data\nthat is stored in BigQuery from within the JupyterLab interface\nof your Vertex AI Workbench managed notebooks instance.\n\nBefore you begin\n----------------\n\nIf you haven't already,\n[create\na managed notebooks instance](/vertex-ai/docs/workbench/managed/create-instance#create).\n\nOpen JupyterLab\n---------------\n\n1. In the Google Cloud console, go to the **Managed notebooks** page.\n\n [Go to Managed notebooks](https://console.cloud.google.com/vertex-ai/workbench/managed)\n2. Next to your managed notebooks instance's name,\n click **Open JupyterLab**.\n\n Your managed notebooks instance opens JupyterLab.\n\nRead data from BigQuery\n-----------------------\n\nIn the next two sections, you read data from BigQuery\nthat you will use to visualize later. These steps are identical to those\nin [Query data in BigQuery from\nwithin JupyterLab](/vertex-ai/docs/workbench/managed/bigquery), so if you've completed\nthem already, you can skip to\n[Get a summary of data in a BigQuery table](#summary).\n\n### Query data by using the %%bigquery magic command\n\nIn this section, you write SQL directly in notebook cells and read data from\nBigQuery into the Python notebook.\n\nMagic commands that use a single or double percentage character (`%` or `%%`)\nlet you use minimal syntax to interact with BigQuery within the\nnotebook. The BigQuery client library for Python is automatically\ninstalled in a managed notebooks instance. Behind the scenes, the `%%bigquery` magic\ncommand uses the BigQuery client library for Python to run the\ngiven query, convert the results to a pandas DataFrame, optionally save the\nresults to a variable, and then display the results.\n\n**Note** : As of version 1.26.0 of the `google-cloud-bigquery` Python package,\nthe [BigQuery Storage API](/bigquery/docs/reference/storage)\nis used by default to download results from the `%%bigquery` magics.\n\n1. To open a notebook file, select **File \\\u003e New \\\u003e\n Notebook**.\n\n \u003cbr /\u003e\n\n2. In the **Select Kernel** dialog, select **Python (Local)** , and then click\n **Select**.\n\n Your new IPYNB file opens.\n3. To get the number of regions by country in the `international_top_terms`\n dataset, enter the following statement:\n\n ```sql\n %%bigquery\n SELECT\n country_code,\n country_name,\n COUNT(DISTINCT region_code) AS num_regions\n FROM\n `bigquery-public-data.google_trends.international_top_terms`\n WHERE\n refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)\n GROUP BY\n country_code,\n country_name\n ORDER BY\n num_regions DESC;\n ```\n4. Click play_circle_filled **Run cell**.\n\n The output is similar to the following: \n\n ```\n Query complete after 0.07s: 100%|██████████| 4/4 [00:00\u003c00:00, 1440.60query/s]\n Downloading: 100%|██████████| 41/41 [00:02\u003c00:00, 20.21rows/s]\n country_code country_name num_regions\n 0 TR Turkey 81\n 1 TH Thailand 77\n 2 VN Vietnam 63\n 3 JP Japan 47\n 4 RO Romania 42\n 5 NG Nigeria 37\n 6 IN India 36\n 7 ID Indonesia 34\n 8 CO Colombia 33\n 9 MX Mexico 32\n 10 BR Brazil 27\n 11 EG Egypt 27\n 12 UA Ukraine 27\n 13 CH Switzerland 26\n 14 AR Argentina 24\n 15 FR France 22\n 16 SE Sweden 21\n 17 HU Hungary 20\n 18 IT Italy 20\n 19 PT Portugal 20\n 20 NO Norway 19\n 21 FI Finland 18\n 22 NZ New Zealand 17\n 23 PH Philippines 17\n ...\n ```\n | **Note:** Your results might differ from what is above as the `google_trends` dataset being queried is refreshed with new data on an ongoing basis.\n5. In the next cell (below the output from the previous cell), enter the\n following command to run the same query, but this time save the results to\n a new pandas DataFrame that's named `regions_by_country`. You provide that\n name by using an argument with the `%%bigquery` magic command.\n\n ```sql\n %%bigquery regions_by_country\n SELECT\n country_code,\n country_name,\n COUNT(DISTINCT region_code) AS num_regions\n FROM\n `bigquery-public-data.google_trends.international_top_terms`\n WHERE\n refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)\n GROUP BY\n country_code, country_name\n ORDER BY\n num_regions DESC;\n ```\n\n **Note:** For more information about available arguments for the\n `%%bigquery` command, see the [client library magics documentation](/python/docs/reference/bigquery/latest/magics).\n6. Click play_circle_filled **Run cell**.\n\n7. In the next cell, enter the following command to look at the first few\n rows of the query results that you just read in:\n\n regions_by_country.head()\n\n8. Click play_circle_filled **Run cell**.\n\n The pandas DataFrame `regions_by_country` is ready to plot.\n\n### Query data by using the BigQuery client library directly\n\n\\\u003c\n\nGet a summary of data in a BigQuery table\n-----------------------------------------\n\n\nIn this section, you use a notebook shortcut to get summary statistics and\nvisualizations for all fields of a BigQuery table. This can\nbe a fast way to profile your data before exploring further.\n\nThe BigQuery client library provides a magic command,\n`%bigquery_stats`, that you can call with a specific table name to provide an\noverview of the table and detailed statistics on each of the table's\ncolumns.\n\n1. In the next cell, enter the following code to run that analysis on the US\n [`top_terms` table](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=google_trends&t=top_terms&page=table):\n\n %bigquery_stats bigquery-public-data.google_trends.top_terms\n\n2. Click play_circle_filled **Run cell**.\n\n After running for some time, an image appears with various statistics on\n each of the 7 variables in the `top_terms` table. The following image shows\n part of some example output:\n\n| **Note:** Your results might differ from what is above as the `google_trends` dataset being queried is refreshed with new data on an ongoing basis.\n\nVisualize BigQuery data\n-----------------------\n\nIn this section, you use plotting capabilities to visualize the results from\nthe queries that you previously ran in your Jupyter notebook.\n\n1. In the next cell, enter the following code to use the pandas\n `DataFrame.plot()` method to create a bar chart that visualizes the results\n of the query that returns the number of regions by country:\n\n regions_by_country.plot(kind=\"bar\", x=\"country_name\", y=\"num_regions\", figsize=(15, 10))\n\n2. Click play_circle_filled **Run cell**.\n\n The chart is similar to the following:\n\n3. In the next cell, enter the following code to use the pandas\n `DataFrame.plot()` method to create a scatter plot that visualizes the\n results from the query for the percentage of overlap in the top search terms\n by days apart:\n\n pct_overlap_terms_by_days_apart.plot(\n kind=\"scatter\",\n x=\"days_apart\",\n y=\"pct_overlap_terms\",\n s=len(pct_overlap_terms_by_days_apart[\"num_date_pairs\"]) * 20,\n figsize=(15, 10)\n )\n\n4. Click play_circle_filled **Run cell**.\n\n The chart is similar to the following. The size of each point reflects\n the number of date pairs that are that many days apart in the data. For\n example, there are more pairs that are 1 day apart than 30 days apart\n because the top search terms are surfaced daily over about a month's time.\n\nFor more information about data visualization, see the\n[pandas documentation](https://pandas.pydata.org/pandas-docs/stable/visualization.html)."]]