Stay organized with collections
Save and categorize content based on your preferences.
Explore and visualize data in BigQuery from within JupyterLab
This page shows you some examples of how to explore and visualize data
that is stored in BigQuery from within the JupyterLab interface
of your Vertex AI Workbench managed notebooks instance.
In this section, you write SQL directly in notebook cells and read data from
BigQuery into the Python notebook.
Magic commands that use a single or double percentage character (% or %%)
let you use minimal syntax to interact with BigQuery within the
notebook. The BigQuery client library for Python is automatically
installed in a managed notebooks instance. Behind the scenes, the %%bigquery magic
command uses the BigQuery client library for Python to run the
given query, convert the results to a pandas DataFrame, optionally save the
results to a variable, and then display the results.
Note: As of version 1.26.0 of the google-cloud-bigquery Python package,
the BigQuery Storage API
is used by default to download results from the %%bigquery magics.
To open a notebook file, select File > New >
Notebook.
In the Select Kernel dialog, select Python (Local), and then click
Select.
Your new IPYNB file opens.
To get the number of regions by country in the international_top_terms
dataset, enter the following statement:
Query complete after 0.07s: 100%|██████████| 4/4 [00:00<00:00, 1440.60query/s]
Downloading: 100%|██████████| 41/41 [00:02<00:00, 20.21rows/s]
country_code country_name num_regions
0 TR Turkey 81
1 TH Thailand 77
2 VN Vietnam 63
3 JP Japan 47
4 RO Romania 42
5 NG Nigeria 37
6 IN India 36
7 ID Indonesia 34
8 CO Colombia 33
9 MX Mexico 32
10 BR Brazil 27
11 EG Egypt 27
12 UA Ukraine 27
13 CH Switzerland 26
14 AR Argentina 24
15 FR France 22
16 SE Sweden 21
17 HU Hungary 20
18 IT Italy 20
19 PT Portugal 20
20 NO Norway 19
21 FI Finland 18
22 NZ New Zealand 17
23 PH Philippines 17
...
In the next cell (below the output from the previous cell), enter the
following command to run the same query, but this time save the results to
a new pandas DataFrame that's named regions_by_country. You provide that
name by using an argument with the %%bigquery magic command.
In the next cell, enter the following command to look at the first few
rows of the query results that you just read in:
regions_by_country.head()
Click play_circle_filledRun cell.
The pandas DataFrame regions_by_country is ready to plot.
Query data by using the BigQuery client library directly
<
Get a summary of data in a BigQuery table
In this section, you use a notebook shortcut to get summary statistics and
visualizations for all fields of a BigQuery table. This can
be a fast way to profile your data before exploring further.
The BigQuery client library provides a magic command,
%bigquery_stats, that you can call with a specific table name to provide an
overview of the table and detailed statistics on each of the table's
columns.
In the next cell, enter the following code to run that analysis on the US
top_terms table:
After running for some time, an image appears with various statistics on
each of the 7 variables in the top_terms table. The following image shows
part of some example output:
Visualize BigQuery data
In this section, you use plotting capabilities to visualize the results from
the queries that you previously ran in your Jupyter notebook.
In the next cell, enter the following code to use the pandas
DataFrame.plot() method to create a bar chart that visualizes the results
of the query that returns the number of regions by country:
In the next cell, enter the following code to use the pandas
DataFrame.plot() method to create a scatter plot that visualizes the
results from the query for the percentage of overlap in the top search terms
by days apart:
The chart is similar to the following. The size of each point reflects
the number of date pairs that are that many days apart in the data. For
example, there are more pairs that are 1 day apart than 30 days apart
because the top search terms are surfaced daily over about a month's time.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Explore and visualize data in BigQuery from within JupyterLab\n=============================================================\n\n\n| Vertex AI Workbench managed notebooks is\n| [deprecated](/vertex-ai/docs/deprecations). On\n| April 14, 2025, support for\n| managed notebooks will end and the ability to create managed notebooks instances\n| will be removed. Existing instances will continue to function\n| but patches, updates, and upgrades won't be available. To continue using\n| Vertex AI Workbench, we recommend that you\n| [migrate\n| your managed notebooks instances to Vertex AI Workbench instances](/vertex-ai/docs/workbench/managed/migrate-to-instances).\n\n\u003cbr /\u003e\n\nThis page shows you some examples of how to explore and visualize data\nthat is stored in BigQuery from within the JupyterLab interface\nof your Vertex AI Workbench managed notebooks instance.\n\nBefore you begin\n----------------\n\nIf you haven't already,\n[create\na managed notebooks instance](/vertex-ai/docs/workbench/managed/create-instance#create).\n\nOpen JupyterLab\n---------------\n\n1. In the Google Cloud console, go to the **Managed notebooks** page.\n\n [Go to Managed notebooks](https://console.cloud.google.com/vertex-ai/workbench/managed)\n2. Next to your managed notebooks instance's name,\n click **Open JupyterLab**.\n\n Your managed notebooks instance opens JupyterLab.\n\nRead data from BigQuery\n-----------------------\n\nIn the next two sections, you read data from BigQuery\nthat you will use to visualize later. These steps are identical to those\nin [Query data in BigQuery from\nwithin JupyterLab](/vertex-ai/docs/workbench/managed/bigquery), so if you've completed\nthem already, you can skip to\n[Get a summary of data in a BigQuery table](#summary).\n\n### Query data by using the %%bigquery magic command\n\nIn this section, you write SQL directly in notebook cells and read data from\nBigQuery into the Python notebook.\n\nMagic commands that use a single or double percentage character (`%` or `%%`)\nlet you use minimal syntax to interact with BigQuery within the\nnotebook. The BigQuery client library for Python is automatically\ninstalled in a managed notebooks instance. Behind the scenes, the `%%bigquery` magic\ncommand uses the BigQuery client library for Python to run the\ngiven query, convert the results to a pandas DataFrame, optionally save the\nresults to a variable, and then display the results.\n\n**Note** : As of version 1.26.0 of the `google-cloud-bigquery` Python package,\nthe [BigQuery Storage API](/bigquery/docs/reference/storage)\nis used by default to download results from the `%%bigquery` magics.\n\n1. To open a notebook file, select **File \\\u003e New \\\u003e\n Notebook**.\n\n \u003cbr /\u003e\n\n2. In the **Select Kernel** dialog, select **Python (Local)** , and then click\n **Select**.\n\n Your new IPYNB file opens.\n3. To get the number of regions by country in the `international_top_terms`\n dataset, enter the following statement:\n\n ```sql\n %%bigquery\n SELECT\n country_code,\n country_name,\n COUNT(DISTINCT region_code) AS num_regions\n FROM\n `bigquery-public-data.google_trends.international_top_terms`\n WHERE\n refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)\n GROUP BY\n country_code,\n country_name\n ORDER BY\n num_regions DESC;\n ```\n4. Click play_circle_filled **Run cell**.\n\n The output is similar to the following: \n\n ```\n Query complete after 0.07s: 100%|██████████| 4/4 [00:00\u003c00:00, 1440.60query/s]\n Downloading: 100%|██████████| 41/41 [00:02\u003c00:00, 20.21rows/s]\n country_code country_name num_regions\n 0 TR Turkey 81\n 1 TH Thailand 77\n 2 VN Vietnam 63\n 3 JP Japan 47\n 4 RO Romania 42\n 5 NG Nigeria 37\n 6 IN India 36\n 7 ID Indonesia 34\n 8 CO Colombia 33\n 9 MX Mexico 32\n 10 BR Brazil 27\n 11 EG Egypt 27\n 12 UA Ukraine 27\n 13 CH Switzerland 26\n 14 AR Argentina 24\n 15 FR France 22\n 16 SE Sweden 21\n 17 HU Hungary 20\n 18 IT Italy 20\n 19 PT Portugal 20\n 20 NO Norway 19\n 21 FI Finland 18\n 22 NZ New Zealand 17\n 23 PH Philippines 17\n ...\n ```\n | **Note:** Your results might differ from what is above as the `google_trends` dataset being queried is refreshed with new data on an ongoing basis.\n5. In the next cell (below the output from the previous cell), enter the\n following command to run the same query, but this time save the results to\n a new pandas DataFrame that's named `regions_by_country`. You provide that\n name by using an argument with the `%%bigquery` magic command.\n\n ```sql\n %%bigquery regions_by_country\n SELECT\n country_code,\n country_name,\n COUNT(DISTINCT region_code) AS num_regions\n FROM\n `bigquery-public-data.google_trends.international_top_terms`\n WHERE\n refresh_date = DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY)\n GROUP BY\n country_code, country_name\n ORDER BY\n num_regions DESC;\n ```\n\n **Note:** For more information about available arguments for the\n `%%bigquery` command, see the [client library magics documentation](/python/docs/reference/bigquery/latest/magics).\n6. Click play_circle_filled **Run cell**.\n\n7. In the next cell, enter the following command to look at the first few\n rows of the query results that you just read in:\n\n regions_by_country.head()\n\n8. Click play_circle_filled **Run cell**.\n\n The pandas DataFrame `regions_by_country` is ready to plot.\n\n### Query data by using the BigQuery client library directly\n\n\\\u003c\n\nGet a summary of data in a BigQuery table\n-----------------------------------------\n\n\nIn this section, you use a notebook shortcut to get summary statistics and\nvisualizations for all fields of a BigQuery table. This can\nbe a fast way to profile your data before exploring further.\n\nThe BigQuery client library provides a magic command,\n`%bigquery_stats`, that you can call with a specific table name to provide an\noverview of the table and detailed statistics on each of the table's\ncolumns.\n\n1. In the next cell, enter the following code to run that analysis on the US\n [`top_terms` table](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=google_trends&t=top_terms&page=table):\n\n %bigquery_stats bigquery-public-data.google_trends.top_terms\n\n2. Click play_circle_filled **Run cell**.\n\n After running for some time, an image appears with various statistics on\n each of the 7 variables in the `top_terms` table. The following image shows\n part of some example output:\n\n| **Note:** Your results might differ from what is above as the `google_trends` dataset being queried is refreshed with new data on an ongoing basis.\n\nVisualize BigQuery data\n-----------------------\n\nIn this section, you use plotting capabilities to visualize the results from\nthe queries that you previously ran in your Jupyter notebook.\n\n1. In the next cell, enter the following code to use the pandas\n `DataFrame.plot()` method to create a bar chart that visualizes the results\n of the query that returns the number of regions by country:\n\n regions_by_country.plot(kind=\"bar\", x=\"country_name\", y=\"num_regions\", figsize=(15, 10))\n\n2. Click play_circle_filled **Run cell**.\n\n The chart is similar to the following:\n\n3. In the next cell, enter the following code to use the pandas\n `DataFrame.plot()` method to create a scatter plot that visualizes the\n results from the query for the percentage of overlap in the top search terms\n by days apart:\n\n pct_overlap_terms_by_days_apart.plot(\n kind=\"scatter\",\n x=\"days_apart\",\n y=\"pct_overlap_terms\",\n s=len(pct_overlap_terms_by_days_apart[\"num_date_pairs\"]) * 20,\n figsize=(15, 10)\n )\n\n4. Click play_circle_filled **Run cell**.\n\n The chart is similar to the following. The size of each point reflects\n the number of date pairs that are that many days apart in the data. For\n example, there are more pairs that are 1 day apart than 30 days apart\n because the top search terms are surfaced daily over about a month's time.\n\nFor more information about data visualization, see the\n[pandas documentation](https://pandas.pydata.org/pandas-docs/stable/visualization.html)."]]