Data type enforcement in Bigtable

Bigtable's flexible schema lets you store data of any type – strings, dates, numbers, JSON documents, or even images or PDFs – in a Bigtable table.

This document describes when Bigtable enforces type, requiring you to encode or decode it in your application code. For a list of Bigtable data types, see Type in the Data API reference documentation.

Enforced types

Data type is enforced for the following data:

Aggregate column families (counters)
Timestamps
Materialized views

Aggregates

For the aggregate data type, encoding depends on the aggregation type. When you create an aggregate column family, you must specify an aggregation type.

This table shows the input type and encoding for each aggregation type.

Aggregate type	Input type	Encoding
Sum	Int64	`BigEndianBytes`
Min	Int64	`BigEndianBytes`
Max	Int64	`BigEndianBytes`
HLL	Bytes	Zetasketch HLL++

When you query the data in aggregate cells using SQL, SQL automatically incorporates type information.

When you read the data in aggregate cells using the Data API's ReadRows method, Bigtable returns bytes, so your application must decode the values using the encoding that Bigtable used to map the typed data to bytes.

You can't convert a column family that contains non-aggregate data into an aggregate column family. Columns in aggregate column families can't contain non-aggregate cells, and standard column families can't contain aggregate cells.

For more information about creating tables with aggregate column families, see Create a table. For code samples that show how to increment an aggregate cell with encoded values, see Increment a value.

Timestamps

Each Bigtable cell has an Int64 timestamp that must be a microsecond value with, at most, millisecond precision. Bigtable rejects a timestamp with microsecond precision, such as 3023483279876543. In this example, the acceptable timestamp value is 3023483279876000. A timestamp is the number of microseconds since the Unix epoch, 1970-01-01 00:00:00 UTC.

Continuous materialized views

Continuous materialized views are read-only resources that you can read by using SQL or with a ReadRows Data API call. Data in a materialized view is typed based on the query that defines it. For an overview, see Continuous materialized views.

When you use SQL to query a continuous materialized view, SQL automatically incorporates type information.

When you read from a continuous materialized view using a Data API ReadRows request, you must know each column's type and decode it in your application code.

Aggregated values in a continuous materialized view are stored using encoding described in the following table, based on the output type of the column from the view definition.

Type	Encoding
BOOL	1 byte value, 1 = true, 0 = false
BYTES	No encoding
INT64 (or INT, SMALLINT, INTEGER, BIGINT, TINYINT, BYTEINT)	64-bit big-endian
FLOAT64	64-bit IEEE 754, excluding NaN and +/-inf
STRING	UTF-8
TIME/TIMESTAMP	64-bit integer representing the number of microseconds since the Unix epoch (consistent with GoogleSQL)

For more information, see Encoding in the Data API reference.

Structured row keys

Structured row keys let you access your data using multi-column keys, similar to composite keys in relational databases.

The type and encoding for structured row keys are defined by a row key schema that you can optionally add to a Bigtable table. Structured row key data is stored as bytes, but GoogleSQL for Bigtable automatically uses the type and encoding defined in the row key schema when you execute a SQL query on the table.

Using a row key schema to query a table with a ReadRows request isn't supported. A continuous materialized view has a row key schema by default. For more information about structured row keys, see Manage row key schemas.

Unenforced types

If no type information is provided, then Bigtable treats each cell as bytes with an unknown encoding.

When querying column families that are created without type enforcement, you must provide type information at read time to ensure that the data is read correctly. This is relevant with database functions whose behavior depends on the data type. GoogleSQL for Bigtable offers CAST functions to do type conversions at query time. These functions convert from bytes to the types that various functions expect.

While Bigtable doesn't enforce types, certain operations assume a data type. Knowing this helps you ensure that your data is written in a way that can be processed within the database. The following are examples:

Increments using ReadModifyWriteRow assume the cell contains a 64-bit big-endian signed integer.
The TO_VECTOR64 function in SQL expects the cell to contain a byte array that's a concatenation of the big-endian bytes of 64-bit floating point numbers.
The TO_VECTOR32 function in SQL expects the cell to contain a byte array that's a concatenation of the big-endian bytes of 32-bit floating point numbers.

Data type enforcement in Bigtable

Enforced types

Aggregates

Timestamps

Continuous materialized views

Structured row keys

Unenforced types

What's next