Cassandra interface

This page compares Apache Cassandra and Spanner architecture as well as helps you understand the capabilities and limitations of the Spanner's Cassandra interface. It assumes you're familiar with Cassandra and want to migrate existing applications or design new applications while using Spanner as your database.

Cassandra and Spanner are both large-scale distributed databases built for applications requiring high scalability and low latency. While both databases can support demanding NoSQL workloads, Spanner provides advanced features for data modeling, querying, and transactional operations. For more information about how Spanner meets NoSQL database criteria, see Spanner for non-relational workloads.

Core concepts

This section compares key Cassandra and Spanner concepts.

Terminology

Cassandra Spanner
Cluster Instance

A Cassandra cluster is equivalent to a Spanner instance - a collection of servers and storage resources. Because Spanner is a managed service, you don't have to configure the underlying hardware or software. You only need to specify the amount of nodes you want to reserve for your instance or use autoscaling to automatically scale the instance. An instance acts like a container for your databases. You also choose the data replication topology (regional, dual-region, or multi-region) at the instance level.
Keyspace Database

A Cassandra keyspace is equivalent to a Spanner database, which is a collection of tables and other schema elements (for example, indexes and roles). Unlike a keyspace, you don't need to configure the replication location. Spanner automatically replicates your data to the region designated in your instance.
Table Table

In both Cassandra and Spanner, tables are a collection of rows identified by a primary key specified in the table schema.
Partition Split

Both Cassandra and Spanner scale by sharding data. In Cassandra, each shard is called a partition, while in Spanner, each shard is called a split. Cassandra uses hash-partitioning, which means that each row is independently assigned to a storage node based on a hash of the primary key. Spanner is range-sharded, which means that rows that are contiguous in the primary key keyspace are contiguous in storage as well (except at split boundaries). Spanner takes care of splitting and merging based on load and storage, and this is transparent to the application. The key implication is that unlike Cassandra, range scans over a prefix of the primary key is an efficient operation in Spanner.
Row Row

In both Cassandra and Spanner, a row is a collection of columns identified uniquely by a primary key. Like Cassandra, Spanner supports composite primary keys. Unlike Cassandra, Spanner doesn't make a distinction between partition key and sort key, because data is range-sharded. One can think of Spanner as only having sort keys, with partitioning managed behind the scenes.
Column Column

In both Cassandra and Spanner, a column is a set of data values that have the same type. There is one value for each row of a table. For more information about comparing Cassandra column types to Spanner, see Data types.

Architecture

A Cassandra cluster consists of a set of servers and storage colocated with those servers. A hash function maps rows from a partition keyspace to a virtual node (vnode). A set of vnodes is then randomly assigned to each server to serve a portion of the cluster keyspace. Storage for the vnodes is locally attached to the serving node. Client drivers connect directly to the serving nodes and handle load balancing and query routing.

A Spanner instance consists of a set of servers in a replication topology. Spanner dynamically shards each table into row ranges based on CPU and disk usage. Shards are assigned to compute nodes for serving. Data is physically stored on Colossus, Google's distributed file system, separate from the compute nodes. Client drivers connect to Spanner's frontend servers which perform request routing and load balancing. To learn more, see the Life of Spanner reads and writes whitepaper.

At a high level, both architectures scale as resources are added to the underlying cluster. Spanner's compute and storage separation lets the load between compute nodes rebalance faster in response to workload changes. Unlike Cassandra, shard moves don't involve data moves as the data stays on Colossus. Moreover, Spanner's range-based partitioning might be more natural for applications that expect data to be sorted by partition key. The flip-side of range-based partitioning is that workloads that write to one end of the keyspace (for example, tables keyed by the current timestamp) might have hotspots if additional schema designs aren't considered. For more information about techniques for overcoming hotspots, see Schema design best practices.

Consistency

With Cassandra, you must specify a consistency level for each operation. If you use the quorum consistency level, a replica node majority must respond to the coordinator node for the operation to be considered successful. If you use a consistency level of one, Cassandra needs a single replica node to respond for the operation to be considered successful.

Spanner provides strong consistency. The Spanner API doesn't expose replicas to the client. Spanner clients interact with Spanner as if it were a single machine database. A write is always written to a majority of replicas before Spanner reports its success to the user. Any subsequent reads reflects the newly written data. Applications can choose to read a snapshot of the database at a time in the past, which might have performance benefits over strong reads. For more information about the consistency properties of Spanner, see the Transactions overview.

Spanner was built to support the consistency and availability needed in large scale applications. Spanner provides strong consistency at scale and with high performance. For use cases that require it, Spanner supports snapshot (stale) reads that relax freshness requirements.

Cassandra interface

The Cassandra interface lets you take advantage of Spanner's fully managed, scalable, and highly available infrastructure using familiar Cassandra tools and syntax. This page helps you understand the capabilities and limitations of the Cassandra interface.

Benefits of the Cassandra interface

  • Portability: the Cassandra interface provides access to the breadth of Spanner features, using schemas, queries, and clients that are compatible with Cassandra. This simplifies moving an application built on Spanner to another Cassandra environment or vice-versa. This portability provides deployment flexibility and supports disaster recovery scenarios, such as a stressed exit.
  • Familiarity: if you already use Cassandra, you can quickly get started with Spanner using many of the same CQL statements and types.
  • Uncompromisingly Spanner: because it's built on Spanner's existing foundation, the Cassandra interface provides all of Spanner's existing availability, consistency, and price-performance benefits without having to compromise on any of the capabilities available in the complementary GoogleSQL ecosystem.

CQL Compatibility

  • CQL dialect support: Spanner provides a subset of the CQL dialect, including Data Query Language (DQL), Data Manipulation Language (DML), lightweight transactions (LWT), aggregate and datetime functions.

  • Supported Cassandra functionality: the Cassandra interface supports many of the most commonly used features of Cassandra. This includes core parts of the schema and type system, many common query shapes, a variety of functions and operators, and the key aspects of Cassandra's system catalog. Applications can use many Cassandra clients or drivers by connecting over Spanner's implementation of the Cassandra wire protocol.

  • Client and wire protocol support: Spanner supports the core query capabilities of the Cassandra wire protocol v4 using Cassandra Adapter, a lightweight client that runs alongside your application. This lets many Cassandra clients work as-is with a Spanner Cassandra interface database, while leveraging Spanner's global endpoint and connection management and IAM authentication.

Supported Cassandra data types

The following table shows supported Cassandra data types and maps each data type to the equivalent Spanner GoogleSQL data type.

Supported Cassandra data types Spanner GoogleSQL data type
Numeric types tinyint (8-bit signed integer) INT64 (64-bit signed integer)

Spanner supports a single 64-bit wide data type for signed integers.

smallint (16-bit signed integer)
int (32-bit signed integer)
bigint (64-bit signed integer)
float (32-bit IEEE-754 floating point) FLOAT32 (32-bit IEEE-754 floating point)
double (64-bit IEEE-754 floating point) FLOAT64 (64-bit IEEE-754 floating point)
decimal For fixed precision decimal numbers, use the NUMERIC data type (precision 38 scale 9).
varint (variable precision integer)
String types text STRING(MAX)

Both text and varchar store and validate for UTF-8 strings. In Spanner, STRING columns need to specify their maximum length. There is no impact on storage; this is for validation purposes.

varchar
ascii STRING(MAX)
uuid STRING(MAX)
inet STRING(MAX)
blob BYTES(MAX)

To store binary data, use the BYTES data type.

Date and time types date DATE
time INT64

Spanner doesn't support a dedicated time data type. Use INT64 to store nanosecond duration.

timestamp TIMESTAMP
Container types set ARRAY

Spanner doesn't support a dedicated set data type. Use ARRAY columns to represent a set.

list ARRAY

Use ARRAY to store a list of typed objects.

map JSON

Spanner doesn't support a dedicated map type. Use JSON columns to represent maps.

Other types boolean BOOL
counter INT64

Data type Annotations

The cassandra_type column option lets you define mappings between the Cassandra and Spanner data types. When you create a table in Spanner that you intend to interact with it using Cassandra-compatible queries, you can use the cassandra_type option to specify the corresponding Cassandra data type for each column. This mapping is then used by Spanner to correctly interpret and convert data when transferring it between the two database systems.

For example, if there's a table in Cassandra with the following schema:

CREATE TABLE Albums (
  albumId uuid,
  title varchar,
  artists set<varchar>,
  tags  map<varchar, varchar>,
  numberOfSongs tinyint,
  releaseDate date,
  copiesSold bigint,
  ....
  PRIMARY KEY(albumId)
)

In Spanner, you use type annotations to map to the Cassandra data types, as shown in the following:

CREATE TABLE Albums (
  albumId       STRING(MAX) OPTIONS (cassandra_type = 'uuid'),
  title         STRING(MAX) OPTIONS (cassandra_type = 'varchar'),
  artists       ARRAY<STRING(max)> OPTIONS (cassandra_type = 'set<varchar>'),
  tags          JSON OPTIONS (cassandra_type = 'map<varchar, varchar>'),
  numberOfSongs INT64 OPTIONS (cassandra_type = 'tinyint'),
  releaseDate   DATE OPTIONS (cassandra_type = 'date'),
  copiesSold    INT64 OPTIONS (cassandra_type = 'bigint')
  ...
) PRIMARY KEY (albumId);

In the previous example, the OPTIONS clause maps the column's Spanner data type to its corresponding Cassandra data type.

  • albumId (Spanner STRING(MAX)) is mapped to uuid in Cassandra.
  • title (Spanner STRING(MAX)) is mapped to varchar in Cassandra.
  • artists (Spanner ARRAY<STRING(MAX)>) is mapped to set<varchar> in Cassandra.
  • tags (Spanner JSON) is mapped to map<varchar,varchar> in Cassandra.
  • numberOfSongs (Spanner INT64) is mapped to tinyint in Cassandra.
  • releaseDate (Spanner DATE) is mapped to date in Cassandra.
  • copiesSold (Spanner INT64) is mapped to bigint in Cassandra.
Direct and nuanced mappings

In many cases, the mapping between Spanner and Cassandra data types is straightforward. For example, a Spanner STRING(MAX) maps to a Cassandra varchar, and a Spanner INT64 maps to a Cassandra bigint.

However, there are situations where the mapping requires more consideration and adjustment. For example, you might need to map a Cassandra smallint to a Spanner INT64.

Supported Cassandra functions

This section lists the Cassandra functions supported in Spanner.

The following list shows Spanner support for Cassandra functions.

Unsupported Cassandra features on Spanner

It's important to understand that the Cassandra interface provides the capabilities of Spanner through schemas, types, queries, and clients that are compatible with Cassandra. It doesn't support all of the features of Cassandra. Migrating an existing Cassandra application to Spanner, even using the Cassandra interface, likely requires some rework to accommodate unsupported Cassandra capabilities or differences in behavior, like query optimization or primary key design. However, once it's migrated, your workloads can take advantage of Spanner's reliability and unique multi-model capabilities.

The following list provides more information on unsupported Cassandra features:

  • Some CQL language features aren't supported: user defined types and functions, TimeUUID, TTL, Write-timestamp.
  • Spanner and Spanner control plane: databases with Cassandra interfaces use Spanner and Google Cloud tools to provision, secure, monitor, and optimize instances. Spanner doesn't support tools, such as nodetool for administrative activities.

DDL support

CQL DDL statements are not directly supported using Cassandra interface. For DDL changes, you must use the Spanner Google Cloud console, gcloud command, or client libraries.

Connectivity

Access control with Identity and Access Management

You need to have the spanner.databases.adapt, spanner.databases.select, and spanner.databases.write permissions to perform read and write operations against the Cassandra endpoint. For more information, see the IAM overview.

For more information about how to grant Spanner IAM permissions, see Apply IAM roles.

Monitoring

Spanner provides the following metrics to help you monitor the Cassandra Adapter:

  • spanner.googleapis.com/api/adapter_request_count: captures and exposes the number of adapter requests that Spanner performs per second, or the number of errors that occurs on the Spanner server per second.
  • spanner.googleapis.com/api/adapter_request_latencies: captures and exposes the amount of time that Spanner takes to handle adapter requests.

Additionally, the following Spanner metrics are helpful for monitoring the adapter:

For a complete list of system insights, see Monitor instances with system insights. To learn more about monitoring your Spanner resources, see Monitor instances with Cloud Monitoring.

Pricing

There is no additional charge for using the Cassandra endpoint. You are charged the standard Spanner pricing for the amount of compute capacity that your instance uses and the amount of storage that your database uses.

For more information, see Spanner pricing.

What's next