Mainframe Connector transcodes Queued Sequential Access Method (QSAM)
flat files to Google Cloud compatible formats, and the other way around using
the qsam
commands. The qsam
commands perform the
following transcoding operations:
- The
qsam decode
command decodes mainframe data to Google Cloud - The
qsam encode
command encodes Google Cloud data to the mainframe.
These operations perform symmetric transformations, that is, they move the same data to and from Google Cloud. You can define the structure of a QSAM file in a copybook file using the COBOL data structure definition. You can also define advanced transformations using the Mainframe Connector transcoder configuration file. The following diagrams describes these operations in detail.


This page provides an overview of the transcoding process using the
qsam decode
and qsam encode
commands, the physical and
logical types of mainframe data, and the Optimized Row Columnar (ORC) and
BigQuery type mappings.
Physical types
Physical types define how field data is laid out on a disk. Physical types are converted to Mainframe Connector logical types which can then be mapped to database types (ORC or BigQuery).
Alphanumeric fields
Alphanumeric fields are used to process alphanumeric strings. The data is
treated as a series of characters and is stored as strings with a specific
encoding, for example, Extended Binary Coded Decimal Interchange Code (EBCDIC).
The transcoding process doesn't terminate if any errors occur during the
encoding or decoding of alphanumeric fields. Instead, a SUB
character for the encoding is placed at the location where the error occurred,
and the transcoding process continues.
Picture symbols | Picture attributes | Logical type |
---|---|---|
A, B, G, N, U, X, 9 | DISPLAY, DISPLAY-1, NATIONAL, UTF-8 | String |
Example
01 REC 02 STR PIC X(10) 02 NATIONAL PIC N(10) 02 UTF8 PIC U(1) USAGE UTF-8
Encoding format
Alphanumeric fields are encoded as follows:
- X fields default to EBCDIC encoding
- National (N) fields default to Unicode Transformation Format 16-bit (UTF-16 BE) encoding
- UTF8 fields default to Unicode Transformation Format-8 (UTF-8) encoding
Mainframe Connector supports most single byte character set (SBCS), double byte character set (DBCS) encodings. You can also define your own custom SBCS encoding, if needed.
Binary fields (COMPUTATIONAL)
Binary fields are stored as either signed or unsigned big-endian integers. Mainframe Connector always stores binary fields logically as signed 64-bit integers. Therefore, unsigned long inputs must use only the lower 63 bits, otherwise, the transcoding process fails.
Picture symbols | Picture attributes | Logical type |
---|---|---|
S, 9 | COMP, COMPUTATIONAL | Long (signed 64-bit integer) |
Example
01 REC 02 INT PIC S9(8) COMP
Packed decimal fields (COMP-3)
Packed decimal fields are fully supported. During the transcoding process, Mainframe Connector selects the most performant logical type based on the specified precision and scale.
Picture symbols | Picture attributes | Logical type |
---|---|---|
S, 9, V | COMP-3 | Long (signed 64-bit integer), BigInteger, Decimal64, BigDecimal |
Example
01 REC 02 DEC PIC S9(2)V9(8) COMP-3
Zoned decimal field (DISPLAY)
Zoned decimal fields are fully supported. During the transcoding process, Mainframe Connector selects the most performant logical type based on the specified precision and scale.
Picture symbols | Picture attributes | Logical type |
---|---|---|
S, 9, V | DISPLAY | Long (signed 64-bit integer), BigInteger, Decimal64, BigDecimal |
Example
01 REC 02 DEC PIC S9(2)V9(8) DISPLAY
Hexadecimal floating point fields (COMP-4)
Hexadecimal floating point (HFP) fields are fully supported. Mainframe Connector uses both single and double precision formats for HFP fields.
Picture symbols | Picture attributes | Logical type |
---|---|---|
S, 9, V | COMP-4, COMP-5 | Double (64-bit signed floating point) |
Example
01 REC 03 HFP-SINGLE PIC 9(4) COMP-4. 03 HFP-DOUBLE PIC 9(8) COMP-5.
Lists (OCCURS)
Lists are ordered collections of elements of the same type. Mainframe Connector supports the following types of lists:
Fixed lists
Fixed lists are used when the exact number of items (item count) that will be a part of the list is known in advance, and this number always remains the same. The items in a fixed list can be of a variable size.
Fixed lists are defined as follows in a copybook:
01 REC.
02 LIST OCCURS 5 TIMES PIC X(1).
02 FLD PIC X(5).
The following image shows the layout of a fixed list with an item count of 5.

Dynamic lists
Dynamic lists are used when the maximum number of items that will be a part of the list is known in advance. However, the actual item count is unknown and is dependent on another field. The items in a dynamic list can be of variable size.
The properties of dynamic lists are as follows:
- The length field can be converted to an integer without loss of precision.
- The length field must be in scope.
- The minimum item count is not enforced during the transcoding process.
Dynamic lists are defined as follows in a copybook:
01 REC.
02 LEN PIC S9(2) BINARY.
02 LIST OCCURS 1 TO 5 TIMES
DEPENDING ON LEN PIC X(1).
02 FLD PIC X(5).
The following image shows the layout of a dynamic list with a maximum number of items of five.

Packed dynamic lists
Packed dynamic lists are used when the maximum number of items that will be a part of the list is dependent on another field, and the items are packed.
The properties of packed dynamic lists are as follows:
- The length field can be converted to an integer without loss of precision.
- The length field must be in scope.
- The minimum item count is not enforced during the transcoding process.
Packed dynamic lists are defined as follows in a copybook:
01 REC.
02 LEN PIC S9(2) BINARY.
02 LIST OCCURS UNBOUNDED
DEPENDING ON LEN PIC X(1).
02 FLD PIC X(5).
The following image shows the layout of a packed dynamic list.

Redefinitions (REDEFINES)
Redefinitions is a COBOL feature that allows the same data to have multiple decoding possibilities. During the decoding process, redefinitions appear as additional columns in the resulting table, and the data is decoded multiple times.
The properties of redefinitions are as follows:
- Redefinitions to the same underlying data are not sibling fields and thus are not in scope of each other.
- Redefined fields are decoded when the underlying field is decoded, not when they are declared. The underlying field also determines the scope of the redefined fields.
- All redefined fields must have the same size and must have a fixed size. This means that you can't use variable length text fields and packed dynamic lists in redefined fields.
Redefinitions are defined as follows in a copybook:
01 Rec.
05 Field-1 PIC X(100).
05 Group-1 REDEFINES Field-1.
10 Field-2 PIC 9(4) comp-3.
10 Field-3 PIC X(96).
05 Group-2 REDEFINES Field-1.
10 Field-4 PIC 9(4) comp-5.
10 Field-5 PIC X(50).
10 Field-6 PIC X(46).
The following image shows the layout of a redefined field.

You can use redefinitions in many ways, including the following most common ways:
View the same data in two different ways: This is the most common way redefines are used. During the encoding process, the order in which the data is filled in is undefined, so you must ensure that the data in BigQuery retains its integrity when exported.
Example
01 REC. 02 FULL-NAME PIC X(12). 02 NAME REDEFINES FULL-NAME. 05 FIRST-NAME PIC X(6). 05 LAST-NAME PIC X(6).
Use tagged union: Tagged unions are a common way of using redefines when you need only one of the interpretations of the data of any record, depending on a field. You can use null indicators to mark unneeded interpretations as null. This will also prevent them from getting parsed due to null indicators having lazy evaluations. The properties of tagged unions are as follows:
- The encoding process fails if more than one redefine is defined.
- Only equality and non-equality checks are implemented.
Example
01 REC. 05 TYPE PIC X(5). 05 DATA PIC X(100). 05 VARIANT-1 REDEFINES DATA. 10 Field-2 PIC 9(4) comp-3. 10 Field-3 PIC X(96). 05 VARIANT-2 REDEFINES DATA. 10 Field-4 PIC 9(4) comp-5. 10 Field-5 PIC X(50). 10 Field-6 PIC X(46).
You can use the following example to implement a tagged union:
{ "field_override": [ { "field": "VARIANT-1", "modifier": { "null_if": { "target_field": "TYPE", "non_null_value": "VAR1" } } }, { "field": "VARIANT-2", "modifier": { "null_if": { "target_field": "TYPE", "non_null_value": "VAR2" } } } ], "transformations": [ { "field": "DATA", "transformation": { "exclude": {}} } ] }
Logical Types
To transcode data to and from multiple formats, Mainframe Connector converts all data to an intermediate representation (IR) that is based on logical types. Input and output formats define how data is converted to and from any logical type. The following table lists all the logical types supported by Mainframe Connector.
Logical type | Description |
---|---|
Long | Represents a signed 64-bit number. |
String | Represents a string of unicode characters unrelated to any specific encoding. Any valid unicode code point is representable. However, some characters may not be encodable in all encoding processes. Logical strings are of variable length. |
Decimal64 | Represents a decimal with a range that can fit in a 64-bit signed integer of any scale. |
BigInteger | Represents integers of any size. |
BigDecimal | Represents decimal numbers of any scale and precision. |
Bytes | Represents an array of bytes of variable sizes. |
Date | Represents a date independent of a specific time zone. |
Double | Represents a double precision floating point number as described in IEEE Standard for Floating-Point Arithmetic (IEEE 754). |
List | Represents a list of items of a specific type. The list can contain an arbitrary number of items. |
Record | Represents a fixed series of fields of varying types. |
Timestamp | Represents a timestamp independent of a specific time zone. |
ORC type mapping
The following table provides the mapping between Mainframe Connector logical types to ORC types.
Logical type | ORC type |
---|---|
Long | 64-bit integer (bigint) |
String | UTF-8 encoded string |
Decimal64 | decimal64 |
BigInteger | decimal |
BigDecimal | decimal |
Bytes | binary blob |
Date | date |
Double | float64 |
List | list |
Record | struct |
Timestamp | timestamp (without local timezone) |
BigQuery type mapping
The following table provides the mapping between Mainframe Connector logical types to BigQuery data types.
Logical type | BigQuery data type | Comments |
---|---|---|
Long | INT64 | |
String | STRING | |
Decimal64 | NUMERIC | |
BigInteger | NUMERIC | |
BigDecimal | NUMERIC | |
Bytes | BYTES | |
Date | DATE | |
Double | FLOAT64 | |
List | ARRAY | Nested lists and lists of maps are not supported. |
Record | STRUCT | When a union only has one variant, it's converted to a NULLABLE field.
Otherwise, a union is converted to a RECORD with a list of NULLABLE fields.
NULLABLE fields have suffixes such as field_0 ,
field_1 . Only one of these fields is assigned a value when the
data is read. |
Timestamp | TIMESTAMP |
Field scope
A field is considered to be in scope for another field if it is one of the following:
- A sibling field that is defined before the field requiring it.
- A field in a parent record that is defined before the field requiring it.