Manage BigQuery DataFrames sessions and I/O
This document explains how to manage sessions and perform input and output (I/O) operations when you use BigQuery DataFrames. You will learn how to create and use sessions, work with in-memory data, and read from and write to files and BigQuery tables.
BigQuery sessions
BigQuery DataFrames uses a local session object internally to manage
metadata. Each DataFrame
and Series
object connects to a session, each
session connects to a location, and each query in a
session runs in the location where you created the session. Use the following
code sample to manually create a session and use it for loading data:
You can't combine data from multiple session instances, even if you initialize them with the same settings. The following code sample shows that trying to combine data from different session instances causes an error:
Global session
BigQuery DataFrames provides a default global session that you can
access with the bigframes.pandas.get_global_session()
method. In
Colab, you must provide a project ID for the
bigframes.pandas.options.bigquery.project
attribute before you use it. You
can also set a location with the
bigframes.pandas.options.bigquery.location
attribute, which defaults to
the US
multi-region.
The following code sample shows how to set options for the global session:
To reset the global session's location or project, close the current session by
running the bigframes.pandas.close_session()
method.
Many BigQuery DataFrames built-in functions use the global session by default. The following code sample shows how built-in functions use the global session:
In-memory data
You can create Dataframes
and Series
objects with built-in Python or NumPy
data structures, similar to how you create objects with pandas. Use the
following code sample to create an object:
To convert pandas
objects to DataFrames
objects using the read_pandas()
method or constructors, use the following code sample:
To use the to_pandas()
method to load BigQuery DataFrames data into
your memory, use the following code sample:
Cost estimation with the dry_run
parameter
Loading a large amount of data can take a lot of time and resources. To see how
much data is being processed, use the dry_run=True
parameter in the
to_pandas()
call. Use the following code sample to perform a dry run:
Read and write files
You can read data from compatible files into a BigQuery DataFrames. These files can be on your local machine or in Cloud Storage. Use the following code sample to read data from a CSV file:
To save your BigQuery DataFrames to local files or Cloud Storage files
using the to_csv
method, use the following code sample:
Read and write BigQuery tables
To create BigQuery DataFrames using BigQuery table
references and the bigframes.pandas.read_gbq
function, use the following code
sample:
To use a SQL string with the read_gbq()
function to read data into
BigQuery DataFrames, use the following code sample:
To save your DataFrame
object to a BigQuery table, use the
to_gbq()
method of your DataFrame
object. The following code sample shows
how to do that:
What's next
- Learn how to use BigQuery DataFrames.
- Learn how to work with data types in BigQuery DataFrames.
- Learn how to visualize graphs using BigQuery DataFrames.
- Explore the BigQuery DataFrames API reference.