Use the BigQuery DataFrames data type system
The BigQuery DataFrames data type system is built upon BigQuery data types. This design ensures seamless integration and alignment with the Google Cloud data warehouse, reflecting the built-in types used for data storage in BigQuery.
Type mappings
The following table shows data type equivalents in BigQuery, BigQuery DataFrames, and other Python libraries as well as their levels of support:
Data type | BigQuery | BigQuery DataFrames | Python built-in | PyArrow |
---|---|---|---|---|
Boolean | BOOL |
pandas.BooleanDtype() |
bool |
bool_() |
Integer | INT64 |
pandas.Int64Dtype() |
int |
int64() |
Float | FLOAT64 |
pandas.Float64Dtype() |
float |
float64() |
String | STRING |
pandas.StringDtype(storage="pyarrow") |
str |
string() |
Bytes | BYTES |
pandas.ArrowDtype(pyarrow.binary()) |
bytes |
binary() |
Date | DATE |
pandas.ArrowDtype(pyarrow.date32()) |
datetime.date |
date32() |
Time | TIME |
pandas.ArrowDtype(pyarrow.time64("us")) |
datetime.time |
time64("us") |
Datetime | DATETIME |
pandas.ArrowDtype(pyarrow.timestamp("us")) |
datetime.datetime |
timestamp("us") |
Timestamp | TIMESTAMP |
pandas.ArrowDtype(pyarrow.timestamp("us", tz="UTC")) |
Datetime.datetime with timezone |
timestamp("us", tz="UTC") |
Numeric | NUMERIC |
pandas.ArrowDtype(pyarrow.decimal128(38, 9)) |
decimal.Decimal |
decimal128(38, 9) |
Big numeric | BIGNUMERIC |
pandas.ArrowDtype(pyarrow.decimal256(76, 38)) |
decimal.Decimal |
decimal256(76, 38) |
List |
ARRAY <T> |
pandas.ArrowDtype(pyarrow.list_(T)) |
list[T] |
list_(T) |
Struct | STRUCT |
pandas.ArrowDtype(pyarrow.struct()) |
dict |
struct() |
JSON | JSON |
pandas.ArrowDtype(pyarrow.json_(pa.string()) in pandas version 3.0 or later and PyArrow version 19.0 or later; otherwise, JSON columns are exposed as pandas.ArrowDtype(db_dtypes.JSONArrowType()) . This feature is in Preview. |
Not supported | json_() (Preview) |
Geography | GEOGRAPHY |
Geopandas.array.GeometryDtype() Supported by to_pandas() only. |
Not supported | Not supported |
Timedelta | Not supported | pandas.ArrowDtype(pyarrow.duration("us")) |
datetime.timedelta |
duration("us") |
Type conversions
When used with local data, BigQuery DataFrames converts data types to their corresponding BigQuery DataFrames equivalents wherever a type mapping is defined, as shown in the following example:
PyArrow dictates behavior when there are discrepancies between the data type equivalents. In rare cases when the Python built-in type functions differently from its PyArrow counterpart, BigQuery DataFrames generally favors the PyArrow behavior to ensure consistency.
The following code sample uses the datetime.date + timedelta
operation to
show that, unlike the Python datetime library that still returns a date
instance, BigQuery DataFrames follows the PyArrow behavior by returning
a timestamp instance:
Special types
The following sections describe the special data types that BigQuery DataFrames uses.
JSON
Within BigQuery DataFrames, columns using the BigQuery
JSON format
(a lightweight standard) are represented by pandas.ArrowDtype
. The exact
underlying Arrow type depends on your library versions. Older environments
typically use db_dtypes.JSONArrowType()
for compatibility, which is an Arrow
extension type that acts as a light wrapper around pa.string()
. In contrast,
newer setups (pandas 3.0 and later and PyArrow 19.0 and later) utilize the more
recent pa.json_(pa.string())
representation.
timedelta
The timedelta
type lacks a direct equivalent within the
BigQuery native type system. To manage duration data,
BigQuery DataFrames utilizes the INT64
type as the underlying storage
format in BigQuery tables. You can expect the results of your
computations to be consistent with the behavior you would expect from
equivalent operations performed with the pandas library.
You can directly load timedelta
values into BigQuery DataFrames and
Series
objects, as shown in the following example:
Unlike pandas, BigQuery DataFrames only supports timedelta
values with
microsecond precision. If your data includes nanoseconds, you must round them to
avoid potential exceptions, as shown in the following example:
You can use the bigframes.pandas.to_timedelta
function to cast a
BigQuery DataFrames Series
object to the timedelta
type, as shown
in the following example:
When you load data containing timedelta
values to a BigQuery table, the
values are converted to microseconds and stored in INT64
columns. To
preserve the type information, BigQuery DataFrames appends the
#microseconds
string to the descriptions of these columns. Some operations,
such as SQL query executions and UDF invocations, don't preserve column
descriptions, and the timedelta
type information is lost after these
operations are completed.
Tools for composite types
For certain composite types, BigQuery DataFrames provides tools that let you access and process the elemental values within those types.
List accessor
The ListAccessor
object can help you perform operations on each list element
by using the list property of the Series
object, as shown in the
following example:
Struct accessor
The StructAccessor
object can access and process fields in a series of
structs. The API accessor object is series.struct
, as shown in the
following example:
If the struct
field you plan to access is unambiguous from other Series
properties, you can skip calling struct
, as shown in the following example:
However, it's a best practice to use struct
for accessing fields, because
it makes your code easier to understand and less error-prone.
String accessor
You can access the StringAccessor
object with the str
property on a Series
object, as shown in the following example:
Geography accessor
BigQuery DataFrames provides a GeographyAccessor
object that shares
similar APIs with the GeoSeries structure provided by the GeoPandas library. You
can invoke the GeographyAccessor
object with the geo
property on a Series
object, as shown in the following example:
What's next
- Learn how to use BigQuery DataFrames.
- Learn how to visualize graphs using BigQuery DataFrames.
- Explore the BigQuery DataFrames API reference.