Mappings of unified types to source and destination data types

Overview

Unified types are the data types that appear in the Avro or JSON events. They are a Datastream-specific, unified representation of a data type across multiple data sources and destinations, such as a source Oracle database, MySQL database, or PostgreSQL database, and a BigQuery or Cloud Storage destination.

The unified types are the superset of all type representations across all supported source types, which represent the original source type in a generic but lossless way.

The following tables list:

  • The unified types associated with Datastream
  • The mappings between the data types for an Oracle database, a MySQL database, or a PostgreSQL database, and the Datastream unified types

Refer to the BigQuery destination documentation for information about the mappings between the data types for the different sources and BigQuery.

Datastream unified types

Type name Info Avro definition JSON definition
BOOLEAN boolean boolean boolean
BYTES A sequence of unsigned bytes bytes string
DATE Days since the epoch A date logical type string [ISO-8601]
DATETIME The date (in days since the epoch) and time (in microseconds since midnight)

A custom type

{
  "type": "record",
  "name": "datetime",
  "fields": [
    {"name": "date",
     "type": "int",
     "logicalType": "date"},
    {"name": "time",
     "type": "long",
     "logicalType": "time-micros"}
  ]
}
    
string [ISO-8601]
DECIMAL (p,s) An arbitrary-precision signed decimal number A decimal logical type number
DOUBLE 64-bit floating point numbers double number
FLOAT 32-bit floating point numbers float number
INTEGER A 32-bit integer int number
INTERVAL Duration between two events (in months, hours, and microseconds)

A custom type

{
  "type": "record",
  "name": "interval",
  "fields": [
    {"name": "months",
     "type": "int"}
    {"name": "hours",
     "type": "int"},
    {"name": "micros",
     "type": "long"}
  ]
}
    
string [ISO-8601]
JSON A JSON object

A custom logical type

{
  "type": "string",
  "logicalType": "json"
}
nested JSON
LONG A 64-bit integer long number
NUMBER A numeric data type

A custom logical type

{
  "type": "string",
  "logicalType": "number"
}
string
STRING An unlimited string length string string
TIME How many microseconds elapsed since midnight, regardless of timezone. A time-micros logical type string [ISO-8601]
TIME_INTERVAL How many microseconds elapsed between two events

A custom logical type

{
  "type": "long",
  "logicalType": "time-interval-micros"
}
long
TIMESTAMP How many microseconds elapsed since the epoch, regardless of timezone A timestamp logical type string [ISO-8601]
TIMESTAMP WITH TIME ZONE How many microseconds elapsed since the epoch with a specific timezone offset in milliseconds

A custom type

{
  "type": "record",
  "name": "timestampTz",
  "fields": [
    {"name": "timestamp",
     "type": "long"
     "logicalType": "timestamp-micros"},
    {"name": "offset",
     "type": "int"
     "logicalType": "time-millis"}
  ]
}
    
string [ISO-8601]
TIME WITH TIME ZONE How many microseconds elapsed since midnight with a specific timezone offset

A custom type

{
  "type": "record",
  "name": "timeTz",
  "fields": [
    {"name": "time",
     "type": "long"
     "logicalType": "time-micros"},
    {"name": "offset",
     "type": "int",
     "logicalType": "time-millis"}
  ]
}
    
string [ISO-8601]
UNION A varying data type Union array
UNSUPPORTED An unsupported data type

A custom logical type

{
  "type": "null",
  "logicalType": "unsupported"
}
null
VARCHAR A string with a maximum length of n characters

A custom logical type

{
  "type": "string",
  "logicalType": "varchar"
  "length": N
}
string

Map Oracle data types to Datastream unified types

Oracle data type Datastream unified type
ANYDATA UNSUPPORTED
BFILE STRING
BINARY DOUBLE DOUBLE
BINARY FLOAT FLOAT
BLOB BYTES
CHAR VARCHAR
CLOB STRING
DATE DATETIME
DOUBLE PRECISION DOUBLE
FLOAT(p) DOUBLE
INTERVAL DAY TO SECOND UNSUPPORTED
INTERVAL YEAR TO MONTH UNSUPPORTED
LONG/LONG RAW UNSUPPORTED
NCHAR STRING
NCLOB STRING
NUMBER NUMBER
NUMBER(p,s<=0)

If p<=18, then LONG. If p>18 or p=*, then NUMBER.

NUMBER(p,s>0)

If p=*, then DECIMAL(38,s), else DECIMAL(p,s).

NVARCHAR2 STRING
RAW STRING
ROWID STRING
SDO_GEOMETRY UNSUPPORTED
SMALLINT INTEGER
TIMESTAMP TIMESTAMP
TIMESTAMP WITH TIME ZONE TIMESTAMP WITH TIME ZONE
UDT (user-defined type) UNSUPPORTED
UROWID UNSUPPORTED
VARCHAR VARCHAR
VARCHAR2 VARCHAR
XMLTYPE UNSUPPORTED

Map MySQL data types to Datastream unified types

MySQL data type Datastream unified type
BIGINT(size) SIGNED LONG
BIGINT(size) UNSIGNED If the destination is BigQuery, then DECIMAL, if Cloud Storage, then NUMBER
BINARY(size) STRING (hex encoded)
BIT(size) LONG
BLOB(size) STRING (hex encoded)
BOOL INTEGER
CHAR(size) STRING
DATE If the destination is BigQuery, then DATE, if Cloud Storage, then TIMESTAMP
DATETIME(fsp) If the destination is BigQuery, then DATETIME, if Cloud Storage, then TIMESTAMP
DECIMAL(size, d) DECIMAL(size, d)
DOUBLE(size, d) DOUBLE
ENUM(val1, val2, val3, ...) STRING
FLOAT(p) FLOAT
FLOAT(size, d) FLOAT
GEOMETRY UNSUPPORTED
INTEGER(size) SIGNED INTEGER
INTEGER(size) UNSIGNED LONG
JSON If the destination is BigQuery, then JSON, if Cloud Storage, then STRING
LONGBLOB STRING (hex encoded)
LONGTEXT STRING
MEDIUMBLOB STRING (hex encoded)
MEDIUMINT(size) INTEGER
MEDIUMTEXT STRING
SET(val1, val2, val3, ...) STRING
SMALLINT(size) INTEGER
TEXT(size) STRING
TIME(fsp) If the destination is BigQuery, then INTERVAL, if Cloud Storage, then TIME_INTERVAL
TIMESTAMP(fsp) TIMESTAMP
TINYBLOB STRING (hex encoded)
TINYINT(size) INTEGER
TINYTEXT STRING
VARBINARY(size) STRING (hex encoded)
VARCHAR STRING
YEAR INTEGER

Map PostgreSQL data types to Datastream unified types

PostgreSQL data type Datastream unified type
ARRAY JSON
BIGINT LONG
BIT BYTES
BIT_VARYING BYTES
BOOLEAN BOOLEAN
BOX UNSUPPORTED
BYTEA BYTES
CHARACTER
  • If there is limit for length, then map to VARCHAR
  • Otherwise, map to STRING
CHARACTER_VARYING
  • If there is limit for length, then map to VARCHAR
  • Otherwise, map to STRING
CIDR STRING
CIRCLE UNSUPPORTED
CITEXT STRING
DATE DATE
DOUBLE_PRECISION DOUBLE
ENUM STRING
INET STRING
INTEGER INTEGER
INTERVAL INTERVAL
JSON JSON
JSONB JSON
LINE UNSUPPORTED
LSEG UNSUPPORTED
MACADDR STRING
MONEY DOUBLE
NUMERIC
  • If precision = -1 and scale = -1 then map to NUMBER
  • Otherwise, map to DECIMAL
OID LONG
PATH UNSUPPORTED
POINT UNSUPPORTED
POLYGON UNSUPPORTED
REAL FLOAT
SMALLINT INTEGER
SMALLSERIAL INTEGER
SERIAL INTEGER
TEXT STRING
TIME TIME
TIMESTAMP TIMESTAMP
TIMESTAMP_WITH_TIMEZONE TIMESTAMP_WITH_TIMEZONE
TIME_WITH_TIMEZONE TIME_WITH_TIMEZONE
TSQUERY STRING
TSVECTOR STRING
TXID_SNAPSHOT STRING
UUID STRING
XML STRING

Map SQL Server data types to Datastream unified types

SQL Server data type Datastream unified type
BIGINT LONG
BINARY BYTES
BIT BOOLEAN
CHAR STRING
DATE DATE
DATETIME2 DATETIME
DATETIME DATETIME
DATETIMEOFFSET TIMESTAMP WITH TIMEZONE
DECIMAL DECIMAL (p, s)
FLOAT DOUBLE
INT INTEGER
IMAGE BYTES
MONEY DECIMAL
NCHAR STRING
NVARCHAR STRING
NVARCHAR(MAX) STRING
NTEXT STRING
NUMERIC DECIMAL (p, s)
REAL FLOAT
SMALLDATETIME DATETIME
SMALLINT INTEGER
SMALLMONEY DECIMAL
TEXT STRING
TINYINT INTEGER
TIME TIME
TIMESTAMP/ROWVERSION BYTES
UNIQUEIDENTIFIER STRING
VARCHAR STRING
VARCHAR(MAX) STRING
VARBINARY BYTES
VARBINARY(MAX) BYTES
XML STRING

What's next