Migrate primary keys

This document offers instructions to migrate primary keys from your source database tables to Spanner. You must be familiar with the information available in Primary key migration overview.

Before you begin

  • To get the permissions that you need to migrate primary keys to Spanner, ask your administrator to grant you the Cloud Spanner Database Admin (roles/spanner.databaseAdmin) IAM role on instance.

Migrate auto-generated sequential keys

If you are migrating from a database that uses sequential monotonic keys, such as AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL, or the standard IDENTITY type in SQL Server or Oracle, consider the following high-level migration strategy:

  1. In Spanner, replicate the table structure from your source database, using an integer primary key.
  2. For each column in Spanner that contains sequential values, create a sequence and assign the GET_NEXT_SEQUENCE_VALUE (GoogleSQL, PostgreSQL) function as the default value for the column.
  3. Migrate existing data with original keys from the source database into Spanner. Consider using the Spanner migration tool or a Dataflow template.
  4. Optionally, you can establish foreign key constraints for any dependent tables.
  5. Before you insert new data, adjust the Spanner sequence to skip the range of existing key values.
  6. Insert new data, allowing the sequence to generate unique keys automatically.

Sample migration workflow

The following code defines the table structure and related sequence in Spanner using a SEQUENCE object and sets the object as the default primary value of the destination table:

GoogleSQL

CREATE SEQUENCE singer_id_sequence OPTIONS (
     SequenceKind = 'bit_reversed_positive'
  );

CREATE TABLE Singers (
     SingerId INT64 DEFAULT
     (GET_NEXT_SEQUENCE_VALUE(SEQUENCE SingerIdSequence)),
     Name STRING(1024),
     Biography STRING(MAX),
  ) PRIMARY KEY (SingerId);

CREATE TABLE Albums (
     AlbumId INT64,
     SingerId INT64,
     AlbumName STRING(1024),
     SongList STRING(MAX),
     CONSTRAINT FK_singer_album
     FOREIGN KEY (SingerId)
       REFERENCES Singers (SingerId)
  ) PRIMARY KEY (AlbumId);

PostgreSQL

CREATE SEQUENCE SingerIdSequence BIT_REVERSED_POSITIVE;

CREATE TABLE Singers (
  SingerId BIGINT DEFAULT nextval('SingerIdSequence') PRIMARY KEY,
  Name VARCHAR(1024) NOT NULL,
  Biography TEXT
);

CREATE TABLE Albums (
  AlbumId BIGINT PRIMARY KEY,
  SingerId BIGINT,
  AlbumName VARCHAR(1024),
  SongList TEXT,
  CONSTRAINT FK_singer_album FOREIGN KEY (SingerId) REFERENCES Singers (SingerId)
);

The bit_reversed_positive option indicates that the values generated by the sequence are of type INT64, are greater than zero, and are not sequential.

As you migrate existing rows from your source database to Spanner, the primary keys remain unchanged.

For new inserts that don't specify a primary key, Spanner automatically retrieves a new value by calling the GET_NEXT_SEQUENCE_VALUE()(GoogleSQL or PostgreSQL) function.

These values are uniformly distributed across the range [1, 263] and there could be possible collisions with the existing keys. To prevent this, you can configure the sequence using ALTER_SEQUENCE (GoogleSQL or PostgreSQL) to skip the range of values covered by the existing keys.

Assume the singers table was migrated from PostgreSQL, where its primary key singer_id is of SERIAL type. The following PostgreSQL shows your source database DDL:

PostgreSQL

CREATE TABLE Singers (
SingerId SERIAL PRIMARY KEY,
Name varchar(1024),
Biography varchar
);

The primary key values are monotonically increasing. After the migration, you can retrieve the maximum value of the primary key singer_id on Spanner. Use the following code in Spanner:

GoogleSQL

SELECT MAX(SingerId) FROM Singers;

PostgreSQL

SELECT MAX(SingerId) FROM Singers;

Assume the returned value is 20,000. You can configure the Spanner sequence to skip the range [1, 21000]. The additional 1,000 serves as a buffer to accommodate writes to the source database after the initial migration. New keys generated in Spanner don't conflict with the range of primary keys generated in the source PostgreSQL database. Use the following code in Spanner:

GoogleSQL

ALTER SEQUENCE SingerIdSequence SET OPTIONS (
skip_range_min = 1,
skip_range_max = 21000
);

PostgreSQL

ALTER SEQUENCE SingerIdSequence SKIP RANGE 1 21000;

Use Spanner and your source database

You can use the skip range concept to support scenarios where either Spanner or your source database generates primary keys, for example to enable replication in either direction for disaster recovery during a migration cutover.

To support this, both databases generate primary keys and the data is synchronized between them. You can configure each database to create primary keys in non-overlapping key ranges. When you define a range for your source database, you can configure Spanner sequence to skip over that range.

For example, after the migration of the music tracks application, replicate the data from PostgreSQL to Spanner to reduce the time taken to cut over.

Once you've updated and tested the application on Spanner, you can stop using your source PostgreSQL database and use Spanner, making it the system of record for updates and new primary keys. Once Spanner takes over, you can reverse the flow of data between the databases into the PostgreSQL instance.

Assume your source PostgreSQL database uses SERIAL primary keys, which are 32-bit signed integers. Spanner primary keys are larger 64-bit numbers. In PostgreSQL, alter the primary key column to be a 64-bit column, or bigint. Use the following code on your source PostgreSQL database:

PostgreSQL

ALTER TABLE Singers ALTER COLUMN SingerId TYPE bigint;

You can set a CHECK constraint to the table in the source PostgreSQL database to ensure the values of the SingerId primary key are always smaller than or equal to 231-1. Use the following code on your source PostgreSQL database:

PostgreSQL

ALTER TABLE Singers ADD CHECK (SingerId <= 2147483647);

In Spanner, we can alter the sequence to skip the [1, 231-1] range. Use the following code in Spanner:

GoogleSQL

ALTER SEQUENCE SingerIdSequence SET OPTIONS (
skip_range_min = 1,
skip_range_max = 2147483647 -- 231-1
);

PostgreSQL

ALTER SEQUENCE SingerIdSequence SKIP RANGE 1 2147483648;

Your source PostgreSQL database always generates keys in the 32-bit integer space, while Spanner keys are restricted to the 64-bit integer space, larger than all of the 32-bit integer values. This ensures both your databases can independently generate primary keys that don't conflict.

Migrate UUID key columns

UUIDv4 keys are effectively unique regardless of where they are generated. UUID keys generated elsewhere integrate with new UUID keys generated in Spanner.

Consider the following high-level strategy to migrate UUID keys to Spanner:

  1. Define your UUID keys in Spanner using string columns with a default expression. Use the GENERATE_UUID() function (GoogleSQL, PostgreSQL).
  2. Export the data from the source system, serializing the UUID keys as strings.
  3. Import the primary keys into Spanner.
  4. Optional: Enable foreign keys.

Here's a sample migration workflow:

In Spanner, define a UUID primary key column as a STRING or TEXT type and assign GENERATE_UUID() (GoogleSQL or PostgreSQL) as its default value. Migrate all the data from your source database to Spanner. After migration, as new rows are inserted, Spanner calls GENERATE_UUID() to generate new UUID values for the primary keys. For example, the primary key FanClubId gets a UUIDv4 value when a new row is inserted in the table FanClubs. Use the following code in Spanner:

GoogleSQL

CREATE TABLE Fanclubs (
FanClubId STRING(36) DEFAULT (GENERATE_UUID()),
ClubName STRING(1024),
) PRIMARY KEY (FanClubId);

INSERT INTO FanClubs (ClubName) VALUES ("SwiftFanClub");

PostgreSQL

CREATE TABLE FanClubs (
  FanClubId TEXT DEFAULT spanner.generate_uuid() PRIMARY KEY,
  ClubName VARCHAR(1024)
);

INSERT INTO FanClubs (ClubName) VALUES ('SwiftFanClub');

Migrate your own primary keys

Your application may rely on the primary key order to determine how recent the data is or to sequence newly created data. To use externally generated sequential keys in Spanner, you can create a composite key that combines a uniformly distributed value, such as a hash, as the first component and your sequential key as the second component. This way, you can preserve the sequential key values, without creating hot spots at scale. Consider the following migration workflow:

Assume you need to migrate a MySQL table students with an AUTO_INCREMENT primary key to Spanner. Use the following code in your source MySQL database:

MySQL

CREATE TABLE Students (
StudentId INT NOT NULL AUTO_INCREMENT,
Info VARCHAR(2048),
PRIMARY KEY (StudentId)
);

In Spanner, you can add a generated column StudentIdHash by creating a hash of the StudentId column. For example:

  StudentIdHash = FARM_FINGERPRINT(CAST(StudentId AS STRING))

You can use the following code in Spanner:

GoogleSQL

CREATE TABLE student (
  StudentIdHash INT64 AS (FARM_FINGERPRINT(cast(StudentId as string))) STORED,
  StudentId INT64 NOT NULL,
  Info STRING(2048),
) PRIMARY KEY(StudentIdHash, StudentId);

PostgreSQL

CREATE TABLE Student (
  StudentIdHash bigint GENERATED ALWAYS AS
  (FARM_FINGERPRINT(cast(StudentId AS varchar))) STORED,
  StudentId bigint NOT NULL,
  Info varchar(2048),
  PRIMARY KEY (StudentIdHash, StudentId)
);

Migrate sequential key columns

If your source database system generates sequential values for a key column, you can use a bit-reversed positive sequence (GoogleSQL, PostgreSQL) in your Spanner schema to generate values that distribute evenly across the positive 64-bit integer number space. To prevent the Spanner sequence from generating a value that overlaps with a migrated value, you can define a skipped range for it.

For example, you can skip the range from 1 to 4,294,967,296 (2^32) for the following two sequences, if you know the source database only generates 32-bit integers:

GoogleSQL

CREATE SEQUENCE MyFirstSequence OPTIONS (
  sequence_kind = "bit_reversed_positive",
  skip_range_min = 1,
  skip_range_max = 4294967296
);

ALTER SEQUENCE MySecondSequence SET OPTIONS (
  skip_range_min = 1,
  skip_range_max = 4294967296
);

PostgreSQL

CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE
  SKIP RANGE 1 4294967296;

ALTER SEQUENCE MySecondSequence SKIP RANGE 1 4294967296;

If you're using IDENTITY columns to automatically generate integer values for your key columns, you can set skip ranges:

GoogleSQL

To set a skip range, use the GENERATED BY DEFAULT AS IDENTITY command:

ALTER DATABASE db SET OPTIONS (
  default_sequence_kind = 'bit_reversed_positive',
);

CREATE TABLE MyFirstTable (
  Id INT64 GENERATED BY DEFAULT AS IDENTITY (SKIP RANGE 1, 4294967296),
  Name STRING(MAX),
) PRIMARY KEY (Id);

ALTER TABLE MyFirstTable ALTER COLUMN Id ALTER IDENTITY SET SKIP RANGE 1, 4294967296;

PostgreSQL

To set a skip range, use the GENERATED BY DEFAULT AS IDENTITY command:

ALTER DATABASE db
    SET spanner.default_sequence_kind = 'bit_reversed_positive';

CREATE TABLE MyFirstTable (
  Id bigint GENERATED BY DEFAULT AS IDENTITY (SKIP RANGE 1 4294967296),
  Name text,
  PRIMARY KEY (Id)
);

ALTER TABLE MyFirstTable ALTER COLUMN Id SET SKIP RANGE 1 4294967296;

Migrate bit-reversed key columns

If you already bit-reversed your key values to avoid hot spot issues in your source database, you can also use a Spanner bit-reversed positive sequence (GoogleSQL, PostgreSQL) to continue generating such values. To avoid generating duplicate values, you can configure the sequence to start its counter from a custom number.

For example, if you reversed numbers from 1 to 1000 to generate primary key values, the Spanner sequence can start its counter from any number larger than 10,000. Optionally, you can choose a high number to leave a buffer for new writes that occur in the source database after data migration. In the following example, the counters start at 11,000:

GoogleSQL

CREATE SEQUENCE MyFirstSequence OPTIONS (
  sequence_kind = "bit_reversed_positive",
  start_with_counter = 11000
);

ALTER SEQUENCE MySecondSequence SET OPTIONS (
  start_with_counter = 11000
);

PostgreSQL

CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE
  START COUNTER 11000;

ALTER SEQUENCE MySecondSequence RESTART COUNTER 11000;

If you're using IDENTITY columns to automatically generate integer values for your key columns, you can set a start counter:

GoogleSQL

To set a start counter, use the GENERATED BY DEFAULT AS IDENTITY command:

ALTER DATABASE db SET OPTIONS (
  default_sequence_kind = 'bit_reversed_positive',
);

CREATE TABLE MyFirstTable (
  Id INT64 GENERATED BY DEFAULT AS IDENTITY (START COUNTER WITH 11000),
  Name STRING(MAX),
) PRIMARY KEY (Id);

ALTER TABLE MyFirstTable ALTER COLUMN Id ALTER IDENTITY RESTART COUNTER WITH 11000;

PostgreSQL

To set a start counter, use the GENERATED BY DEFAULT AS IDENTITY command:

ALTER DATABASE db
    SET spanner.default_sequence_kind = 'bit_reversed_positive';

CREATE TABLE MyFirstTable (
  Id bigint GENERATED BY DEFAULT AS IDENTITY (START COUNTER WITH 11000),
  Name text,
  PRIMARY KEY (Id)
);

ALTER TABLE MyFirstTable ALTER COLUMN Id RESTART COUNTER WITH 11000;

What's next