Migrate from Apache Cassandra to Bigtable

This document guides you through the process of migrating data from Apache Cassandra to Bigtable with minimal disruption. It describes how to use open-source tools, such as the Cassandra to Bigtable proxy adapter or the Cassandra to Bigtable client for Java, to perform the migration. Before you begin, make sure you're familiar with Bigtable for Cassandra users.

Cassandra to Bigtable proxy adapter

The Cassandra to Bigtable proxy adapter lets you connect Cassandra-based applications to Bigtable. The proxy adapter functions as a wire-compatible Cassandra interface, and it lets your application interact with Bigtable using Cassandra Query Language (CQL). Using the proxy adapter doesn't require you to change Cassandra drivers, and configuration adjustments are minimal.

To set up and configure the proxy adapter, see Cassandra to Bigtable proxy adapter.

To learn which Cassandra versions support the proxy adapter, see Supported Cassandra versions.

Cassandra keyspace

A Cassandra keyspace stores your tables and manages resources in a similar way to a Bigtable instance. The Cassandra to Bigtable proxy adapter handles keyspace naming transparently, so that you can query using the same keyspaces. However, you must create a new Bigtable instance to achieve logical grouping of your tables. You must also configure Bigtable replication separately.

Supported data types

The following table shows how supported Cassandra CQL data types map to their Bigtable equivalents.

CQL type Bigtable mapping
text RAW BYTES
blob RAW BYTES
timestamp RAW BYTES
int RAW BYTES
bigint RAW BYTES
float RAW BYTES
double RAW BYTES
boolean RAW BYTES
MAP<key, value> The column name in Cassandra is used as the column family name in Bigtable. The key of the map is used as the column qualifier, and the value of the map is stored as the cell value.
SET<item> The column name in Cassandra is used as the column family name in Bigtable. Each item in the SET is used as a column qualifier, and the cell value is left empty.
LIST<item> The column name in Cassandra is used as the column family name in Bigtable. The current timestamp is used as the column qualifier, and the list items are stored as the cell value.

For more information about mapping data types in Bigtable, see GoogleSQL for Bigtable overview.

Unsupported data types

The following Cassandra data types aren't supported:

  • counter
  • date
  • decimal
  • duration
  • frozen
  • inet
  • smallint
  • time
  • timeuuid
  • tinyint
  • US-ASCII
  • user-defined types (UDT)
  • uuid
  • varint

DDL support

The Cassandra to Bigtable proxy adapter supports Data Definition Language (DDL) operations. DDL operations let you create and manage tables directly through CQL commands. We recommend this approach for setting up your schema because it's similar to SQL but you don't need to define your schema in configuration files and then execute scripts to create tables.

The following examples show how the Cassandra to Bigtable proxy adapter supports DDL operations:

  • To create a Cassandra table using CQL, run the CREATE TABLE command:

    CREATE TABLE keyspace.table (
        id bigint,
        name text,
        age int,
        PRIMARY KEY ((id), name)
    );
    
  • To add a new column to the table, run the ALTER TABLE command:

    ALTER TABLE keyspace.table ADD email text;
    
  • To delete a table, run the DROP TABLE command:

    DROP TABLE keyspace.table;
    

For more information, see DDL Support for Schema Creation (Recommended Method).

DML support

The Cassandra to Bigtable proxy adapter supports Data Manipulation Language (DML) operations such as INSERT, DELETE, UPDATE, and SELECT.

To run the raw DML queries, all values except numerics must have single quotes, as shown in the following examples:

  • SELECT * FROM keyspace.table WHERE name='john doe';
    
  • INSERT INTO keyspace.table (id, name) VALUES (1, 'john doe');
    

Achieve zero downtime migration

Use the Cassandra to Bigtable proxy adapter with the open-source Zero Downtime Migration (ZDM) proxy tool and the Cassandra data migrator tool to migrate data with minimal downtime.

The following diagram shows the steps for migrating from Cassandra to Bigtable using the proxy adapter:

The process of migrating Cassandra to Bigtable.
Figure 1. The process of migrating Cassandra to Bigtable (click to enlarge).

To migrate Cassandra to Bigtable, follow these steps:

  1. Connect your Cassandra application to the ZDM proxy tool.
  2. Enable dual writes to Cassandra and Bigtable.
  3. Move data in bulk using the Cassandra data migrator tool.
  4. Validate your migration. Once validated, you can terminate the connection to Cassandra and connect directly to Bigtable.

When using the proxy adapter with the ZDM proxy tool, the following migration capabilities are supported:

  • Dual writes: maintain data availability during migration
  • Asynchronous reads: scale and stress-test your Bigtable instance
  • Automated data verification and reporting: ensure data integrity throughout the process
  • Data mapping: map field and data types to meet your production standards

To practice migrating Cassandra to Bigtable, see the Migration from Cassandra to Bigtable with a Dual-Write Proxy codelab.

Cassandra to Bigtable client for Java

If you want to integrate directly with Bigtable and replace your Cassandra drivers, the Cassandra to Bigtable client for Java library lets you integrate Cassandra-based Java applications with Bigtable using CQL.

For instructions on building the library and including the dependency in application code, see Cassandra to Bigtable Client for Java.

The following example shows how to configure your application with the Cassandra to Bigtable client for Java:

import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.BoundStatement;
import com.datastax.oss.driver.api.core.cql.PreparedStatement;
import com.datastax.oss.driver.api.core.cql.ResultSet;
import com.datastax.oss.driver.api.core.cql.Row;
import com.google.bigtable.cassandra.BigtableCqlConfiguration;
import com.google.bigtable.cassandra.BigtableCqlSessionFactory;

/**
 * Example using Bigtable CQLSession
 */
public class ExampleWithBigtableCqlSession {

  public static void main(String[] args) {

    // Construct BigtableCqlConfiguration
    BigtableCqlConfiguration bigtableCqlConfiguration = BigtableCqlConfiguration.builder()
        .setProjectId("example-project-id")
        .setInstanceId("example-instance-id")
        .setDefaultColumnFamily("example-column-family")
        .setBigtableChannelPoolSize(4)
        .build();

    // Create CqlSession with BigtableCqlConfiguration
    BigtableCqlSessionFactory bigtableCqlSessionFactory = new BigtableCqlSessionFactory(bigtableCqlConfiguration);

    // Create CqlSession
    try (CqlSession session = bigtableCqlSessionFactory.newSession()) {

      // Create a table
      String createTableQuery = "CREATE TABLE <KEYSPACE>.<TABLE_NAME> (<COLUMN> <TYPE> PRIMARY KEY);";
      session.execute(createTableQuery);

      // Prepare an insert statement
      PreparedStatement preparedInsert = session.prepare(
          "INSERT INTO <KEYSPACE>.<TABLE_NAME> (<COLUMN>) VALUES (?)" // replace with your keyspace, table and columns
      );

      // Insert
      BoundStatement boundInsert = preparedInsert
          .bind()
          .setString("<COLUMN>", "<VALUE>");
      session.execute(boundInsert);

      // Query for all entries
      ResultSet resultSet = session.execute("SELECT <COLUMN> FROM <KEYSPACE>.<TABLE_NAME>;");
      // Print results
      for (Row row : resultSet) {
        System.out.println(row);
      }

    }

  }

}

Additional Cassandra open-source tools

The wire compatibility of the Cassandra to Bigtable proxy adapter with CQL lets you use additional tools in the Cassandra open-source ecosystem. These tools include the following:

  • Cqlsh: The CQL shell lets you connect directly to Bigtable through the proxy adapter. You can use it for debugging and quick data lookups using CQL.
  • Cassandra Data Migrator (CDM): This Spark-based tool is suitable for migrating large volumes (up to billions of rows) of historical data. The tool provides validation, diff reporting, and replay capabilities, and it's fully compatible with the proxy adapter.

What's next