Cassandra interface

This page compares Apache Cassandra and Spanner architecture as well as helps you understand the capabilities and limitations of the Spanner's Cassandra interface. It assumes you're familiar with Cassandra and want to migrate existing applications or design new applications while using Spanner as your database.

Cassandra and Spanner are both large-scale distributed databases built for applications requiring high scalability and low latency. While both databases can support demanding NoSQL workloads, Spanner provides advanced features for data modeling, querying, and transactional operations. For more information about how Spanner meets NoSQL database criteria, see Spanner for non-relational workloads.

Core concepts

This section compares key Cassandra and Spanner concepts.

Terminology

Cassandra	Spanner
Cluster	Instance A Cassandra cluster is equivalent to a Spanner instance - a collection of servers and storage resources. Because Spanner is a managed service, you don't have to configure the underlying hardware or software. You only need to specify the amount of nodes you want to reserve for your instance or use autoscaling to automatically scale the instance. An instance acts like a container for your databases. You also choose the data replication topology (regional, dual-region, or multi-region) at the instance level.
Keyspace	Database A Cassandra keyspace is equivalent to a Spanner database, which is a collection of tables and other schema elements (for example, indexes and roles). Unlike a keyspace, you don't need to configure the replication location. Spanner automatically replicates your data to the region designated in your instance.
Table	Table In both Cassandra and Spanner, tables are a collection of rows identified by a primary key specified in the table schema.
Partition	Split Both Cassandra and Spanner scale by sharding data. In Cassandra, each shard is called a partition, while in Spanner, each shard is called a split. Cassandra uses hash-partitioning, which means that each row is independently assigned to a storage node based on a hash of the primary key. Spanner is range-sharded, which means that rows that are contiguous in the primary key keyspace are contiguous in storage as well (except at split boundaries). Spanner takes care of splitting and merging based on load and storage, and this is transparent to the application. The key implication is that unlike Cassandra, range scans over a prefix of the primary key is an efficient operation in Spanner.
Row	Row In both Cassandra and Spanner, a row is a collection of columns identified uniquely by a primary key. Like Cassandra, Spanner supports composite primary keys. Unlike Cassandra, Spanner doesn't make a distinction between partition key and sort key, because data is range-sharded. One can think of Spanner as only having sort keys, with partitioning managed behind the scenes.
Column	Column In both Cassandra and Spanner, a column is a set of data values that have the same type. There is one value for each row of a table. For more information about comparing Cassandra column types to Spanner, see Data types.

Architecture

A Cassandra cluster consists of a set of servers and storage colocated with those servers. A hash function maps rows from a partition keyspace to a virtual node (vnode). A set of vnodes is then randomly assigned to each server to serve a portion of the cluster keyspace. Storage for the vnodes is locally attached to the serving node. Client drivers connect directly to the serving nodes and handle load balancing and query routing.

A Spanner instance consists of a set of servers in a replication topology. Spanner dynamically shards each table into row ranges based on CPU and disk usage. Shards are assigned to compute nodes for serving. Data is physically stored on Colossus, Google's distributed file system, separate from the compute nodes. Client drivers connect to Spanner's frontend servers which perform request routing and load balancing. To learn more, see the Life of Spanner reads and writes whitepaper.

At a high level, both architectures scale as resources are added to the underlying cluster. Spanner's compute and storage separation lets the load between compute nodes rebalance faster in response to workload changes. Unlike Cassandra, shard moves don't involve data moves as the data stays on Colossus. Moreover, Spanner's range-based partitioning might be more natural for applications that expect data to be sorted by partition key. The flip-side of range-based partitioning is that workloads that write to one end of the keyspace (for example, tables keyed by the current timestamp) might have hotspots if additional schema designs aren't considered. For more information about techniques for overcoming hotspots, see Schema design best practices.

Consistency

With Cassandra, you must specify a consistency level for each operation. If you use the quorum consistency level, a replica node majority must respond to the coordinator node for the operation to be considered successful. If you use a consistency level of one, Cassandra needs a single replica node to respond for the operation to be considered successful.

Spanner provides strong consistency. The Spanner API doesn't expose replicas to the client. Spanner clients interact with Spanner as if it were a single machine database. A write is always written to a majority of replicas before Spanner reports its success to the user. Any subsequent reads reflects the newly written data. Applications can choose to read a snapshot of the database at a time in the past, which might have performance benefits over strong reads. For more information about the consistency properties of Spanner, see the Transactions overview.

Spanner was built to support the consistency and availability needed in large scale applications. Spanner provides strong consistency at scale and with high performance. For use cases that require it, Spanner supports snapshot (stale) reads that relax freshness requirements.

Cassandra interface

The Cassandra interface lets you take advantage of Spanner's fully managed, scalable, and highly available infrastructure using familiar Cassandra tools and syntax. This page helps you understand the capabilities and limitations of the Cassandra interface.

Benefits of the Cassandra interface

Portability: the Cassandra interface provides access to the breadth of Spanner features, using schemas, queries, and clients that are compatible with Cassandra. This simplifies moving an application built on Spanner to another Cassandra environment or vice-versa. This portability provides deployment flexibility and supports disaster recovery scenarios, such as a stressed exit.
Familiarity: if you already use Cassandra, you can quickly get started with Spanner using many of the same CQL statements and types.
Uncompromisingly Spanner: because it's built on Spanner's existing foundation, the Cassandra interface provides all of Spanner's existing availability, consistency, and price-performance benefits without having to compromise on any of the capabilities available in the complementary GoogleSQL ecosystem.

CQL Compatibility

CQL dialect support: Spanner provides a subset of the CQL dialect, including Data Query Language (DQL), Data Manipulation Language (DML), lightweight transactions (LWT), aggregate and datetime functions.
Supported Cassandra functionality: the Cassandra interface supports many of the most commonly used features of Cassandra. This includes core parts of the schema and type system, many common query shapes, a variety of functions and operators, and the key aspects of Cassandra's system catalog. Applications can use many Cassandra clients or drivers by connecting over Spanner's implementation of the Cassandra wire protocol.
Client and wire protocol support: Spanner supports the core query capabilities of the Cassandra wire protocol v4 using Cassandra Adapter, a lightweight client that runs alongside your application. This lets many Cassandra clients work as-is with a Spanner Cassandra interface database, while leveraging Spanner's global endpoint and connection management and IAM authentication.

Supported Cassandra data types

The following table shows supported Cassandra data types and maps each data type to the equivalent Spanner GoogleSQL data type.

	Supported Cassandra data types	Spanner GoogleSQL data type
Numeric types	`tinyint` (8-bit signed integer)	`INT64` (64-bit signed integer) Spanner supports a single 64-bit wide data type for signed integers.
	`smallint` (16-bit signed integer)
	`int` (32-bit signed integer)
	`bigint` (64-bit signed integer)
	`float` (32-bit IEEE-754 floating point)	`FLOAT32` (32-bit IEEE-754 floating point)
	`double` (64-bit IEEE-754 floating point)	`FLOAT64` (64-bit IEEE-754 floating point)
	`decimal`	For fixed precision decimal numbers, use the `NUMERIC` data type (precision 38 scale 9).
	`varint` (variable precision integer)
String types	`text`	`STRING(MAX)` Both `text` and `varchar` store and validate for UTF-8 strings. In Spanner, `STRING` columns need to specify their maximum length. There is no impact on storage; this is for validation purposes.
	`varchar`
	`ascii`	`STRING(MAX)`
	`uuid`	`STRING(MAX)`
	`inet`	`STRING(MAX)`
	`blob`	`BYTES(MAX)` To store binary data, use the `BYTES` data type.
Date and time types	`date`	`DATE`
	`time`	`INT64` Spanner doesn't support a dedicated time data type. Use `INT64` to store nanosecond duration.
	`timestamp`	`TIMESTAMP`
Container types	`set`	`ARRAY` Spanner doesn't support a dedicated `set` data type. Use `ARRAY` columns to represent a `set`.
	`list`	`ARRAY` Use `ARRAY` to store a list of typed objects.
	`map`	`JSON` Spanner doesn't support a dedicated map type. Use `JSON` columns to represent maps.
Other types	`boolean`	`BOOL`
	`counter`	`INT64`

Data type Annotations

The cassandra_type column option lets you define mappings between the Cassandra and Spanner data types. When you create a table in Spanner that you intend to interact with it using Cassandra-compatible queries, you can use the cassandra_type option to specify the corresponding Cassandra data type for each column. This mapping is then used by Spanner to correctly interpret and convert data when transferring it between the two database systems.

For example, if there's a table in Cassandra with the following schema:

CREATE TABLE Albums (
  albumId uuid,
  title varchar,
  artists set<varchar>,
  tags  map<varchar, varchar>,
  numberOfSongs tinyint,
  releaseDate date,
  copiesSold bigint,
  ....
  PRIMARY KEY(albumId)
)

In Spanner, you use type annotations to map to the Cassandra data types, as shown in the following:

CREATE TABLE Albums (
  albumId       STRING(MAX) OPTIONS (cassandra_type = 'uuid'),
  title         STRING(MAX) OPTIONS (cassandra_type = 'varchar'),
  artists       ARRAY<STRING(max)> OPTIONS (cassandra_type = 'set<varchar>'),
  tags          JSON OPTIONS (cassandra_type = 'map<varchar, varchar>'),
  numberOfSongs INT64 OPTIONS (cassandra_type = 'tinyint'),
  releaseDate   DATE OPTIONS (cassandra_type = 'date'),
  copiesSold    INT64 OPTIONS (cassandra_type = 'bigint')
  ...
) PRIMARY KEY (albumId);

In the previous example, the OPTIONS clause maps the column's Spanner data type to its corresponding Cassandra data type.

albumId (Spanner STRING(MAX)) is mapped to uuid in Cassandra.
title (Spanner STRING(MAX)) is mapped to varchar in Cassandra.
artists (Spanner ARRAY<STRING(MAX)>) is mapped to set<varchar> in Cassandra.
tags (Spanner JSON) is mapped to map<varchar,varchar> in Cassandra.
numberOfSongs (Spanner INT64) is mapped to tinyint in Cassandra.
releaseDate (Spanner DATE) is mapped to date in Cassandra.
copiesSold (Spanner INT64) is mapped to bigint in Cassandra.

Modify the `cassandra_type` option

You can use the ALTER TABLE statement to add or modify the cassandra_type option on existing columns.

To add a cassandra_type option to a column that doesn't have it yet, use the following statement:

ALTER TABLE Albums ALTER COLUMN uuid SET OPTIONS (cassandra_type='uuid');

In this example, the uuid column in the Albums table is updated with the cassandra_type option set to uuid.

To modify an existing cassandra_type option, use the ALTER TABLE statement with the new cassandra_type value. For example, to change the cassandra_type of the numberOfSongs column in the Albums table from tinyint to bigint, use the following statement:

ALTER TABLE Albums ALTER COLUMN numberOfSongs SET OPTIONS (cassandra_type='bigint');

You are only permitted to modify the following types:

From	To
tinyint	smallint, int, bigint
smallint	int, bigint
int	bigint
float	double
text	varchar
ascii	varchar, text

Direct and nuanced mappings

In many cases, the mapping between Spanner and Cassandra data types is straightforward. For example, a Spanner STRING(MAX) maps to a Cassandra varchar, and a Spanner INT64 maps to a Cassandra bigint.

However, there are situations where the mapping requires more consideration and adjustment. For example, you might need to map a Cassandra smallint to a Spanner INT64.

Supported Cassandra functions

This section lists the Cassandra functions supported in Spanner.

The following list shows Spanner support for Cassandra functions.

All aggregate functions
All datetime functions except for currentTimeUUID
All cast functions except for blob conversion functions
All lightweight transaction functions except for BATCH conditions
None of the following query functions:

Unsupported Cassandra features on Spanner

It's important to understand that the Cassandra interface provides the capabilities of Spanner through schemas, types, queries, and clients that are compatible with Cassandra. It doesn't support all of the features of Cassandra. Migrating an existing Cassandra application to Spanner, even using the Cassandra interface, likely requires some rework to accommodate unsupported Cassandra capabilities or differences in behavior, like query optimization or primary key design. However, once it's migrated, your workloads can take advantage of Spanner's reliability and unique multi-model capabilities.

The following list provides more information on unsupported Cassandra features:

Some CQL language features aren't supported: user defined types and functions, TimeUUID, TTL, Write-timestamp.
Spanner and Spanner control plane: databases with Cassandra interfaces use Spanner and Google Cloud tools to provision, secure, monitor, and optimize instances. Spanner doesn't support tools, such as nodetool for administrative activities.

DDL support

CQL DDL statements are not directly supported using Cassandra interface. For DDL changes, you must use the Spanner Google Cloud console, gcloud command, or client libraries.

Connectivity

Cassandra client support

Spanner lets you connect to databases from a variety of clients:
- Cassandra adapter can be used in-process helper or as sidecar proxy to connect your Cassandra applications to Cassandra interface. For more information, see Connect to Spanner using the Cassandra Adapter.
- Cassandra adapter can be started as a standalone process locally and connected using CQLSH. For more information, see Connect the Cassandra Adapter to your application.

Access control with Identity and Access Management

You need to have the spanner.databases.adapt, spanner.databases.select, and spanner.databases.write permissions to perform read and write operations against the Cassandra endpoint. For more information, see the IAM overview.

For more information about how to grant Spanner IAM permissions, see Apply IAM roles.

Monitoring

Spanner provides the following metrics to help you monitor the Cassandra Adapter:

spanner.googleapis.com/api/adapter_request_count: captures and exposes the number of adapter requests that Spanner performs per second, or the number of errors that occurs on the Spanner server per second.
spanner.googleapis.com/api/adapter_request_latencies: captures and exposes the amount of time that Spanner takes to handle adapter requests.

You can create a custom Cloud Monitoring dashboard to display and monitor metrics for Cassandra Adapter. The custom dashboard contains the following charts:

P99 Request Latencies: The 99th percentile distribution of server request latencies per message_type for your database.
P50 Request Latencies: The 50th percentile distribution of server request latencies per message_type for your database.
API Request Count by Message Type: The API request count per message_type for your database.
API Request Count by Operation Type: The API request count per op_type for your database.
Error Rates: The API error rates for your database.

Google Cloud console

Download the cassandra-adapter-dashboard.json file. This file has the information needed to populate a custom dashboard in Monitoring.
In the Google Cloud console, go to the Dashboards page:
Go to Dashboards

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
In the Dashboards Overview page, click Create Custom Dashboard.
In the dashboard toolbar, click the Dashboard settings icon. Then select JSON, followed by JSON Editor.
In the JSON Editor pane, copy the contents of the cassandra-adapter-dashboard.json file you downloaded and paste it in the editor.
To apply your changes to the dashboard, click Apply changes. If you don't want to use this dashboard, navigate back to the Dashboards Overview page.
After the dashboard is created, click Add Filter. Then select either project_id or instance_id to monitor the Cassandra Adapter.

gcloud CLI

Download the cassandra-adapter-dashboard.json file. This file has the information needed to populate a custom dashboard in Monitoring.
To create a dashboard in a project, use the gcloud monitoring dashboards create command:
```
gcloud monitoring dashboards create --config-from-file=cassandra-adapter-dashboard.json
```
For more information, see the gcloud monitoring dashboards create reference.

Additionally, the following Spanner metrics are helpful for monitoring the adapter:

CPU utilization metrics provide information about CPU usage for user and system tasks with breakdowns by priority and operation type.
Storage utilization metrics provide information about database and backup storage.
Spanner built-in statistics tables provide insights about queries, transactions, and reads to help you discover issues in your databases.

For a complete list of system insights, see Monitor instances with system insights. To learn more about monitoring your Spanner resources, see Monitor instances with Cloud Monitoring.

Pricing

There is no additional charge for using the Cassandra endpoint. You are charged the standard Spanner pricing for the amount of compute capacity that your instance uses and the amount of storage that your database uses.

For more information, see Spanner pricing.

What's next

Learn how to Migrate from Cassandra to Spanner.
Learn how to Connect to Spanner using the Cassandra Adapter.

Cassandra interface Stay organized with collections Save and categorize content based on your preferences.

Core concepts

Terminology

Architecture

Consistency

Cassandra interface

Benefits of the Cassandra interface

CQL Compatibility

Supported Cassandra data types

Data type Annotations

Modify the cassandra_type option

Direct and nuanced mappings

Supported Cassandra functions

Unsupported Cassandra features on Spanner

DDL support

Connectivity

Access control with Identity and Access Management

Monitoring

Google Cloud console

gcloud CLI

Pricing

What's next

Cassandra interface

Modify the `cassandra_type` option