This page describes how to migrate primary keys from your source database tables to Spanner GoogleSQL-dialect databases and PostgreSQL-dialect databases. Before performing the procedures on the page, review the Primary key migration overview.
Before you begin
-
To get the permissions that you need to migrate primary keys to Spanner, ask your administrator to grant you the Cloud Spanner Database Admin (
roles/spanner.databaseAdmin
) IAM role on instance.
Migrate auto-generated sequential keys
If you are migrating from a database that uses sequential monotonic keys, such
as AUTO_INCREMENT
in MySQL, SERIAL
in PostgreSQL, or the standard IDENTITY
type in SQL Server or
Oracle, consider the following high-level migration strategy:
- In Spanner, replicate the table structure from your source database, using an integer primary key.
- For each column in Spanner that contains sequential values,
create a sequence and assign the
GET_NEXT_SEQUENCE_VALUE
( GoogleSQL, PostgreSQL) function as the default value for the column. - Migrate existing data with original keys from the source database into
Spanner. Consider using the Spanner migration
tool or a
Dataflow
template.
- Optionally, you can establish foreign key constraints for any dependent tables.
- Before you insert new data, adjust the Spanner sequence to skip the range of existing key values.
- Insert new data, allowing the sequence to generate unique keys automatically.
Sample migration workflow
The following code defines the table structure and related sequence in
Spanner using a SEQUENCE
object and sets the object as the default primary value of the destination
table:
GoogleSQL
CREATE SEQUENCE singer_id_sequence OPTIONS ( SequenceKind = 'bit_reversed_positive' ); CREATE TABLE Singers ( SingerId INT64 DEFAULT (GET_NEXT_SEQUENCE_VALUE(SEQUENCE SingerIdSequence)), Name STRING(1024), Biography STRING(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( AlbumId INT64, SingerId INT64, AlbumName STRING(1024), SongList STRING(MAX), CONSTRAINT FK_singer_album FOREIGN KEY (SingerId) REFERENCES Singers (SingerId) ) PRIMARY KEY (AlbumId);
PostgreSQL
CREATE SEQUENCE SingerIdSequence BIT_REVERSED_POSITIVE; CREATE TABLE Singers ( SingerId BIGINT DEFAULT nextval('SingerIdSequence') PRIMARY KEY, Name VARCHAR(1024) NOT NULL, Biography TEXT ); CREATE TABLE Albums ( AlbumId BIGINT PRIMARY KEY, SingerId BIGINT, AlbumName VARCHAR(1024), SongList TEXT, CONSTRAINT FK_singer_album FOREIGN KEY (SingerId) REFERENCES Singers (SingerId) );
The bit_reversed_positive
option indicates that the values generated by the
sequence are of type INT64
, are greater than zero, and aren't sequential.
As you migrate existing rows from your source database to Spanner, the primary keys remain unchanged.
For new inserts that don't specify a primary key, Spanner
automatically retrieves a new value by calling the GET_NEXT_SEQUENCE_VALUE
(GoogleSQL
or PostgreSQL)
function.
These values are uniformly distributed across the range
[1, 263]
and there could be possible collisions with the
existing keys. To prevent this, you can configure the sequence using ALTER_SEQUENCE
(GoogleSQL
or PostgreSQL)
to skip the range of values covered by the existing keys.
Assume the singers
table was migrated from PostgreSQL, where its
primary key singer_id
is of SERIAL
type. The following PostgreSQL
shows your source database DDL:
PostgreSQL
CREATE TABLE Singers ( SingerId SERIAL PRIMARY KEY, Name varchar(1024), Biography varchar );
The primary key values are monotonically increasing. After the migration, you
can retrieve the maximum value of the primary key singer_id
on
Spanner. Use the following code in Spanner:
GoogleSQL
SELECT MAX(SingerId) FROM Singers;
PostgreSQL
SELECT MAX(SingerId) FROM Singers;
Assume the returned value is 20,000. You can configure the
Spanner sequence to skip the range [1, 21000]
. The additional
1,000 serves as a buffer to accommodate writes to the source database after the
initial migration. New keys generated in Spanner don't conflict
with the range of primary keys generated in the source PostgreSQL
database. Use the following code in Spanner:
GoogleSQL
ALTER SEQUENCE SingerIdSequence SET OPTIONS ( skip_range_min = 1, skip_range_max = 21000 );
PostgreSQL
ALTER SEQUENCE SingerIdSequence SKIP RANGE 1 21000;
Use Spanner and your source database
You can use the skip range concept to support scenarios where either Spanner or your source database generates primary keys, for example to enable replication in either direction for disaster recovery during a migration cutover.
To support this, both databases generate primary keys and the data is synchronized between them. You can configure each database to create primary keys in non-overlapping key ranges. When you define a range for your source database, you can configure Spanner sequence to skip over that range.
For example, after the migration of the music tracks application, replicate the data from PostgreSQL to Spanner to reduce the time taken to cut over.
After you've updated and tested the application on Spanner, you can stop using your source PostgreSQL database and use Spanner, making it the system of record for updates and new primary keys. When Spanner takes over, you can reverse the flow of data between the databases into the PostgreSQL instance.
Assume your source PostgreSQL database uses SERIAL
primary keys,
which are 32-bit signed integers. Spanner primary keys are
larger 64-bit numbers. In PostgreSQL, alter the primary key column to
be a 64-bit column, or bigint
. Use the following code on your source
PostgreSQL database:
PostgreSQL
ALTER TABLE Singers ALTER COLUMN SingerId TYPE bigint;
You can set a CHECK
constraint to the table in the source PostgreSQL
database to ensure the values of the SingerId
primary key are always smaller
than or equal to 231-1
.
Use the following code on your source PostgreSQL database:
PostgreSQL
ALTER TABLE Singers ADD CHECK (SingerId <= 2147483647);
In Spanner, we can alter the sequence to skip the
[1, 231-1]
range.
Use the following code in Spanner:
GoogleSQL
ALTER SEQUENCE SingerIdSequence SET OPTIONS (
skip_range_min = 1,
skip_range_max = 2147483647 -- 231-1
);
PostgreSQL
ALTER SEQUENCE SingerIdSequence SKIP RANGE 1 2147483648;
Your source PostgreSQL database always generates keys in the 32-bit integer space, while Spanner keys are restricted to the 64-bit integer space, larger than all of the 32-bit integer values. This ensures both your databases can independently generate primary keys that don't conflict.
Migrate UUID key columns
UUIDv4 keys are effectively unique regardless of where they are generated. UUID keys generated elsewhere integrate with new UUID keys generated in Spanner.
Consider the following high-level strategy to migrate UUID keys to Spanner:
- Define your UUID keys in Spanner using string columns with a
default expression. Use the
GENERATE_UUID()
function (GoogleSQL, PostgreSQL). - Export the data from the source system, serializing the UUID keys as strings.
- Import the primary keys into Spanner.
- Optional: Enable foreign keys.
Here's a sample migration workflow:
In Spanner, define a UUID primary key column as a STRING
or
TEXT
type and assign GENERATE_UUID()
(GoogleSQL
or
PostgreSQL)
as its default value. Migrate all the data from your source database to
Spanner. After migration, as new rows are inserted,
Spanner calls GENERATE_UUID()
to generate new UUID values for
the primary keys. For example, the primary key FanClubId
gets a UUIDv4 value
when a new row is inserted in the table FanClubs
. Use the following code in
Spanner:
GoogleSQL
CREATE TABLE Fanclubs ( FanClubId STRING(36) DEFAULT (GENERATE_UUID()), ClubName STRING(1024), ) PRIMARY KEY (FanClubId); INSERT INTO FanClubs (ClubName) VALUES ("SwiftFanClub");
PostgreSQL
CREATE TABLE FanClubs ( FanClubId TEXT DEFAULT spanner.generate_uuid() PRIMARY KEY, ClubName VARCHAR(1024) ); INSERT INTO FanClubs (ClubName) VALUES ('SwiftFanClub');
Migrate your own primary keys
Your application might rely on the primary key order to determine how recent the data is or to sequence newly created data. To use externally generated sequential keys in Spanner, you can create a composite key that combines a uniformly distributed value, such as a hash, as the first component and your sequential key as the second component. This way, you can preserve the sequential key values, without creating hot spots at scale. Consider the following migration workflow:
Assume you need to migrate a MySQL table students
with an AUTO_INCREMENT
primary key to Spanner. Use the following code in your source
MySQL database:
MySQL
CREATE TABLE Students ( StudentId INT NOT NULL AUTO_INCREMENT, Info VARCHAR(2048), PRIMARY KEY (StudentId) );
In Spanner, you can add a generated column StudentIdHash
by
creating a hash of the StudentId
column. For example:
StudentIdHash = FARM_FINGERPRINT(CAST(StudentId AS STRING))
You can use the following code in Spanner:
GoogleSQL
CREATE TABLE student ( StudentIdHash INT64 AS (FARM_FINGERPRINT(cast(StudentId as string))) STORED, StudentId INT64 NOT NULL, Info STRING(2048), ) PRIMARY KEY(StudentIdHash, StudentId);
PostgreSQL
CREATE TABLE Student ( StudentIdHash bigint GENERATED ALWAYS AS (FARM_FINGERPRINT(cast(StudentId AS varchar))) STORED, StudentId bigint NOT NULL, Info varchar(2048), PRIMARY KEY (StudentIdHash, StudentId) );
Migrate sequential key columns
If your source database system generates sequential values for a key column, you can use a bit-reversed positive sequence (GoogleSQL, PostgreSQL) in your Spanner schema to generate values that distribute evenly across the positive 64-bit integer number space. To prevent the Spanner sequence from generating a value that overlaps with a migrated value, you can define a skipped range for it.
For example, you can skip the range from 1 to 4,294,967,296 (2^32) for the following two sequences, if you know the source database only generates 32-bit integers:
GoogleSQL
CREATE SEQUENCE MyFirstSequence OPTIONS ( sequence_kind = "bit_reversed_positive", skip_range_min = 1, skip_range_max = 4294967296 ); ALTER SEQUENCE MySecondSequence SET OPTIONS ( skip_range_min = 1, skip_range_max = 4294967296 );
PostgreSQL
CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE SKIP RANGE 1 4294967296; ALTER SEQUENCE MySecondSequence SKIP RANGE 1 4294967296;
If you're using IDENTITY
columns to
automatically generate integer values for your key columns, you can set skip
ranges:
GoogleSQL
To set a skip range, use the
GENERATED BY DEFAULT AS IDENTITY
command:
ALTER DATABASE db SET OPTIONS (
default_sequence_kind = 'bit_reversed_positive',
);
CREATE TABLE MyFirstTable (
Id INT64 GENERATED BY DEFAULT AS IDENTITY (SKIP RANGE 1, 4294967296),
Name STRING(MAX),
) PRIMARY KEY (Id);
ALTER TABLE MyFirstTable ALTER COLUMN Id ALTER IDENTITY SET SKIP RANGE 1, 4294967296;
PostgreSQL
To set a skip range, use the
GENERATED BY DEFAULT AS IDENTITY
command:
ALTER DATABASE db
SET spanner.default_sequence_kind = 'bit_reversed_positive';
CREATE TABLE MyFirstTable (
Id bigint GENERATED BY DEFAULT AS IDENTITY (SKIP RANGE 1 4294967296),
Name text,
PRIMARY KEY (Id)
);
ALTER TABLE MyFirstTable ALTER COLUMN Id SET SKIP RANGE 1 4294967296;
Migrate bit-reversed key columns
If you already bit-reversed your key values to avoid hot spot issues in your source database, you can also use a Spanner bit-reversed positive sequence (GoogleSQL, PostgreSQL) to continue generating such values. To avoid generating duplicate values, you can configure the sequence to start its counter from a custom number.
For example, if you reversed numbers from 1 to 1000 to generate primary key values, the Spanner sequence can start its counter from any number larger than 10,000. Optionally, you can choose a high number to leave a buffer for new writes that occur in the source database after data migration. In the following example, the counters start at 11,000:
GoogleSQL
CREATE SEQUENCE MyFirstSequence OPTIONS ( sequence_kind = "bit_reversed_positive", start_with_counter = 11000 ); ALTER SEQUENCE MySecondSequence SET OPTIONS ( start_with_counter = 11000 );
PostgreSQL
CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE START COUNTER 11000; ALTER SEQUENCE MySecondSequence RESTART COUNTER 11000;
If you're using IDENTITY
columns to
automatically generate integer values for your key columns, you can set a start
counter:
GoogleSQL
To set a start counter, use the
GENERATED BY DEFAULT AS IDENTITY
command:
ALTER DATABASE db SET OPTIONS (
default_sequence_kind = 'bit_reversed_positive',
);
CREATE TABLE MyFirstTable (
Id INT64 GENERATED BY DEFAULT AS IDENTITY (START COUNTER WITH 11000),
Name STRING(MAX),
) PRIMARY KEY (Id);
ALTER TABLE MyFirstTable ALTER COLUMN Id ALTER IDENTITY RESTART COUNTER WITH 11000;
PostgreSQL
To set a start counter, use the
GENERATED BY DEFAULT AS IDENTITY
command:
ALTER DATABASE db
SET spanner.default_sequence_kind = 'bit_reversed_positive';
CREATE TABLE MyFirstTable (
Id bigint GENERATED BY DEFAULT AS IDENTITY (START COUNTER WITH 11000),
Name text,
PRIMARY KEY (Id)
);
ALTER TABLE MyFirstTable ALTER COLUMN Id RESTART COUNTER WITH 11000;