Configure the connector for CDC replication through Pub/Sub

This guide shows you how to configure CDC replication through Pub/Sub, by using the BigQuery Connector for SAP.

Before you begin

Learn about the Pub/Sub service and its terminology.

Make sure that you or your administrators have completed the following prerequisites:

Install the BigQuery Connector for SAP.
Set up authentication.
Create a dead-letter topic.

Install the BigQuery Connector for SAP

To replicate data to BigQuery using Pub/Sub, install the BigQuery Connector for SAP version 2.9 or later in your SAP environment. For installation instructions, see Install the BigQuery Connector for SAP.

If you're using streaming data replication with BigQuery Connector for SAP version 2.8 or earlier, then to use CDC replication through Pub/Sub, you must update your connector to the latest version, and then follow the migration steps.

For information about updating your connector, see Update the BigQuery Connector for SAP.
For information about migrating your replication, see Migrate to CDC replication through Pub/Sub.

Set up authentication

To access the Pub/Sub API, you need to set up authentication to Google Cloud. For information about the supported authentication methods and how to set up authentication, see Authentication overview for the BigQuery Connector for SAP.

Create a dead-letter topic

To handle messages that a BigQuery subscription fails to write to the target BigQuery table, we recommend that you create a dead-letter topic. This dead-letter topic serves as a separate queue where Pub/Sub automatically publishes undeliverable messages after all retries.

To create the dead-letter topic, create a standard Pub/Sub topic. This dead-letter topic is functionally identical to any other Pub/Sub topic, but its purpose is specifically for dead-letter messages. You can use a single dead-letter topic for multiple replication loads, or configure dedicated dead-letter topics for each replication load based on your requirements.

You specify this dead-letter topic when you configure CDC replication so that the dead-letter topic receives any messages that Pub/Sub fails to write to BigQuery through the BigQuery subscription.

To analyze failed messages, reprocess them, or debug issues, you can create another BigQuery subscription on the dead-letter topic. This automates the process of taking messages from the dead-letter topic and writing them into a dedicated BigQuery table. For more information, see Monitor dead-letter topic for failed messages.

For more information about dead-letter topics, see Dead-letter topic.

Create SAP roles and authorizations for BigQuery Connector for SAP

To work with BigQuery Connector for SAP, in addition to the standard SAP LT Replication Server authorizations, users need access to the custom transactions that are provided with BigQuery Connector for SAP: /GOOG/SLT_SETTINGS and /GOOG/REPLIC_VALID.

By default, users that have access to the custom transactions /GOOG/SLT_SETTINGS and /GOOG/REPLIC_VALIDcan modify the settings of any configuration, so if you need to, you can restrict access to specific configurations. For users who only need to view the BigQuery Connector for SAP settings, you can grant them read-only access to the custom transaction /GOOG/SLT_SETT_DISP.

The BigQuery Connector for SAP transport files include the Google BigQuery Settings Authorization object, ZGOOG_MTID, for authorizations that are specific to BigQuery Connector for SAP.

To grant access to the custom transactions and restrict access to specific configurations, perform the following steps:

Using SAP transaction code PFCG, define a role for the BigQuery Connector for SAP.
Grant the role access to the custom transactions /GOOG/SLT_SETTINGS and /GOOG/REPLIC_VALID .
To limit the access of a role, specify the authorization group of each configuration that the role can access by using the ZGOOG_MTID authorization object. For example:
- Authorization object for BigQuery Connector for SAP (ZGOOG_MTID):
  - Activity 01
  - Authorization Group AUTH_GROUP_1,AUTH_GROUP_N
The AUTH_GROUP_01 and AUTH_GROUP_N are values that are defined in the SAP LT Replication Server configuration.

The authorization groups specified for ZGOOG_MTID must match the authorization groups that are specified for the role in the SAP S_DMIS_SLT authorization object.

Create SAP roles and authorizations for viewing BigQuery Connector for SAP settings

To grant read-only access for the custom transaction /GOOG/SLT_SETT_DISP, perform the following steps:

Using SAP transaction code PFCG, define a role for viewing the BigQuery Connector for SAP settings.
Grant the role access to the custom transaction /GOOG/SLT_SETT_DISP.
Add the authorization object for BigQuery Connector for SAP (ZGOOG_MTID) with the following attributes:
- Activity 03
- Authorization Group = *
Generate the role profile and assign relevant users to the role.

Configure replication

To configure replication, you specify both BigQuery Connector for SAP and SAP LT Replication Server settings.

Create an SAP LT Replication Server replication configuration

Use SAP transaction LTRC to create an SAP LT Replication Server replication configuration.

If SAP LT Replication Server is running on a different server than the source SAP system, before you create a replication configuration, confirm that you have an RFC connection between the two systems.

Some of the settings in the replication configuration affect performance. To determine appropriate setting values for your installation, see the Performance Optimization Guide for your version of SAP LT Replication Server in the SAP Help Portal.

The interface and configuration options for SAP LT Replication Server might be slightly different depending on which version you are using.

To configure replication, use the procedure for your version of SAP LT Replication Server:

Configure replication in DMIS 2011 SP17, DMIS 2018 SP02, or later
Configure replication in DMIS 2011 SP16, DMIS 2018 SP01, or earlier

Configure replication in DMIS 2011 SP17, DMIS 2018 SP02, or later

The following steps configure replication in later versions of SAP LT Replication Server. If you are using an earlier version, see Configure replication in DMIS 2011 SP16, DMIS 2018 SP01, or earlier.

In the SAP GUI, enter transaction code LTRC.
Click the Create configuration icon. The Create Configuration wizard opens.
In the Configuration Name and Description fields, enter a name and a description for the configuration, and then click Next.

You can specify the Authorization Group for restricting access to a specific authorization group now or specify it later.
In the Source System Connection Details panel:
- Select the RFC Connection radio button.
- In the RFC Destination field, specify the name of the RFC connection to the source system.
- Select the checkboxes for Allow Multiple Usage and Read from Single Client as appropriate. For more information, see the SAP LT Replication Server documentation.
- Click Next.
These steps are for an RFC connection, but if your source is a database, you can select DB Connection if you have already defined a connection by using transaction DBACOCKPIT instead.
In the Target System Connection Details panel:
- Select the radio button for Other.
- In the Scenario field, select SLT SDK from the drop-down menu.
- Click Next.
On the Specify Transfer Settings panel:
1. In the Application field of the Data Transfer Settings section, enter /GOOG/SLT_BQ or ZGOOG_SLT_BQ.
2. In the Job options section, enter starting values in each of the following fields:
  - Number of Data Transfer Jobs
  - Number of Initial Load Jobs
  - Number of Calculation Jobs
3. In the Replication Options section, select the Real Time radio button.
4. Click Next.
After reviewing the configuration, click Save.
Make a note of the three-digit ID in the Mass Transfer column. You use it in a later step.

For more information, see the PDF attached to SAP Note 2652704: Replicating Data Using SLT SDK - DMIS 2011 SP17, DMIS 2018 SP02.pdf.

Configure replication in DMIS 2011 SP16, DMIS 2018 SP01, or earlier

The following steps configure replication in earlier versions of SAP LT Replication Server. If you are using a later version, see Configure replication in DMIS 2011 SP17, DMIS 2018 SP02, or later.

In the SAP GUI, enter transaction code LTRC.
Click New. A dialog opens for specifying a new configuration.
In the step Specify Source System:
- Choose RFC Connection as the connection type.
- Enter the RFC connection name.
- Ensure that the field Allow Multiple Usage is selected.
In the step Specify Target System:
- Enter the connection data to the target system.
- Choose RFC Connection as the connection type.
- In the field Scenario for RFC Communication, select the value Write Data to Target Using BAdI from the drop-down list. The RFC connection is automatically set to NONE.
In the step Specify Transfer Settings, press F4 Help. The application that you defined previously is displayed in the Application field.
Make a note of the three-digit ID in the Mass Transfer column. You use it in a later step.

For more information, see the PDF attached to SAP Note 2652704: Replicating Data Using SLT SDK - DMIS 2011 SP15, SP16, DMIS 2018 SP00, SP01.pdf.

Create a mass transfer configuration for BigQuery

Use the custom /GOOG/SLT_SETTINGS transaction to configure a mass transfer for BigQuery and specify the table and field mappings.

Select the initial mass transfer options

When you first enter the /GOOG/SLT_SETTINGS transaction, you select which part of the BigQuery mass transfer configuration you need to edit.

To select the part of the mass transfer configuration, do the following:

In the SAP GUI, enter the /GOOG/SLT_SETTINGS transaction preceded by /n:
```
/n/GOOG/SLT_SETTINGS
```
From the drop-down menu in the Google Cloud Partner field, select BigQuery CDC via Pub/Sub.
From the Settings Table drop-down menu in the launch screen for the /GOOG/SLT_SETTINGS transaction, select Mass Transfers.

For a new mass transfer configuration, leave the Mass Transfer Key field blank.
Click the Execute icon. The BigQuery Settings Maintenance - Mass Transfers screen displays.

Specify table creation and other general attributes

In the initial section of a BigQuery mass transfer configuration, you identify the mass transfer configuration and specify the associated client key, as well as certain properties related to the creation of the target BigQuery table.

SAP LT Replication Server saves the mass transfer configuration as a record in the /GOOG/BQ_MASTR custom configuration table.

In the BigQuery Settings Maintenance - Mass Transfers screen, click the Append Row icon.
In the Mass Transfer Key field, define a name for this transfer. This name becomes the primary key of the mass transfer.
In the Mass Transfer ID field, enter the three-digit ID that was generated when you create the corresponding SAP LT Replication Server replication configuration.
To use the labels or short descriptions of the source fields as the names for the target fields in BigQuery, click the Use Custom Names Flag checkbox. For more information about field names, see Default naming options for fields.
To store the type of change that triggered an insert and to enable the validation of record counts between the source table, SAP LT Replication Server statistics, and the BigQuery table, select the Extra Fields Flag checkbox.

When this flag is set, BigQuery Connector for SAP adds columns to the BigQuery table schema. For more information, see Extra fields for record changes and count queries.
To stop sending data when a record with a data error is encountered, the Break at First Error Flag checkbox is checked by default. We recommend leaving this checked. For more information, see The BREAK flag.
Optionally, to automatically reduce the chunk size when the byte size of a chunk exceeds the maximum byte size for HTTP requests that Pub/Sub accepts, click the Dynamic Chunk Size Flag checkbox. For more information about dynamic chunk size, see Dynamic chunk size.
When a record with a data error is encountered, to skip the record and continue inserting records into the BigQuery table, click the Skip Invalid Records Flag checkbox. We recommend leaving this unchecked. For more information, see The SKIP flag.
In the Google Cloud Key Name field, enter the name of the client key specified in the /GOOG/CLIENT_KEY configuration.

BigQuery Connector for SAP retrieves the Google Cloud Project Identifier automatically from the /GOOG/CLIENT_KEY configuration.
In the BigQuery Dataset field, enter the name of the target BigQuery dataset that you created earlier in this procedure.
In the Is Setting Active Flag field, enable the mass transfer configuration by clicking the checkbox.
Click Save.

A mass transfer record is appended in the /GOOG/BQ_MASTR table and the Changed By, Changed On, and Changed At fields are automatically populated.
Click Display Table.

The new mass transfer record is displayed followed by the table attribute entry panel.

Specify table attributes

You can specify table attributes, such as table name and table partitioning, as well as the number of records to include in each transmission or chunk that is sent to BigQuery, in the second section of the /GOOG/SLT_SETTINGS transaction.

The settings that you specify are stored as a record in the /GOOG/BQ_TABLE configuration table.

To specify table attributes, do the following:

Click the Append row icon.
In the SAP Table Name field, enter the name of the source SAP table.
In the External Table Name field, provide the target BigQuery table name. If the target table doesn't already exist, BigQuery Connector for SAP creates this table inline with the SAP table definition. It also defines the primary keys in this table to enable UPSERT or DELETE operations. The table supports composite primary keys containing up to 16 columns. For the BigQuery naming conventions for tables, see Table naming.
Skip the Send Uncompressed Flag. For CDC replication through Pub/Sub, the record compression feature is not supported.
Optionally, in the Chunk Size field, specify the maximum number of records to include in each chunk that is sent to Pub/Sub. We recommend that you use the default chunk size with BigQuery Connector for SAP, which is 1,000 records. This is the maximum number of records that Pub/Sub allows.

If the source records have a large number of fields, the number of fields can increase the overall byte size of the chunks, which can cause chunk errors. If this occurs, try reducing the chunk size to reduce the byte size. For more information, see Chunk size in the BigQuery Connector for SAP. Alternatively, to automatically adjust the chunk size, enable dynamic chunk size. For more information, see Dynamic chunk size.
Optionally, in the Partition Type field, specify an increment of time to use for partitioning. Valid values are HOUR, DAY, MONTH, or YEAR. For more information, see Table partitioning.
Optionally, in the Partition Field field, specify the name of a field in the target BigQuery table that contains a timestamp to use for partitioning. When you specify Partition Field, you must also specify Partition Type. For more information, see Table partitioning.
In the Is Setting Active Flag field, enable the table attributes by clicking the checkbox. If the Is Setting Active Flag box is not selected, BigQuery Connector for SAP creates the BigQuery table with the name of the SAP source table, the default chunk size, and no partitioning.
For CDC replication through Pub/Sub, specify the following attributes:
1. To enable CDC replication to BigQuery through Pub/Sub, select the Repl CPS (Replication Cloud Pub/Sub) checkbox.
  
  If this checkbox is cleared, the BigQuery Connector for SAP uses streaming data replication, and Pub/Sub-related configuration fields are disabled.
2. In the Pub/Sub Schema field, enter a name for the Pub/Sub schema. The BigQuery Connector for SAP automatically creates or updates an existing Avro schema to match the SAP table definition.
3. In the Pub/Sub Topic field, enter a name for the Pub/Sub topic where messages are published. The BigQuery Connector for SAP automatically creates a topic.
4. In the Pub/Sub Subscription field, enter a name for the BigQuery subscription that consumes messages from the Pub/Sub topic and writes to BigQuery. The BigQuery Connector for SAP automatically creates a subscription and links it to the topic specified in the preceding step.
5. In the Dead Letter Queue field, enter the name of the Pub/Sub dead-letter topic. This dead-letter topic receives messages that Pub/Sub fails to write to BigQuery through the BigQuery subscription. For more information about dead-letter topics, see Create a dead-letter topic.
  
  Warning: Not configuring a dead-letter topic can result in data loss that you cannot recover.
6. To enable the Change Data Capture (CDC) functionality for the table, select the Act. CDC (Activate CDC) checkbox.
  
  Note: When active, the system validates the corresponding BigQuery table has primary keys. If primary keys don't exist in the BigQuery table, the system automatically creates them based on the SAP table definition. Additionally, any incompatible changes, such as a primary key that was deleted from the SAP source, are flagged to make sure data consistency.
7. To optimize performance for initial loads, select the Cache Val (Cache Validation) checkbox.
  
  When you select this field, the connector provides maximum performance for large data transfers. The system runs Google Cloud pipeline validations at intervals and caches the results. The connector provides rapid data transformation and sends data directly to Pub/Sub. For more information, see Cache Validations.
  
  Note: For replication mode, don't select this field.
Click Save.

Your attributes are stored as a record in the /GOOG/BQ_TABLE configuration table and the Changed By, Changed On, and Changed At fields are automatically populated.
Click Display Fields.

The new table attribute record is displayed, followed by the field mapping entry panel.

Customize the default field mapping

If the source SAP table contains timestamp fields or booleans, change the default data type mapping to accurately reflect the data type in the target BigQuery table.

You can also change other data types, as well as the names that are used for target fields.

You can edit the default mapping directly in the SAP GUI or you can export the default mapping to a spreadsheet or a text file so that others can edit the values without requiring access to SAP LT Replication Server.

For more information about the default field mapping and the changes you can make, see Data type mapping.

To customize the default mapping for the target BigQuery fields:

In the BigQuery Settings Maintenance - Fields page of the transaction /GOOG/SLT_SETTINGS, display the default field mappings for the mass transfer you are configuring.
Edit the target Avro type for a field in the AVRO type column as needed. In particular, change the target Avro type for the following Avro types:
- Booleans. Change the default target Avro type from STRING to BOOLEAN.
- Hexadecimals. Change the default target Avro type from STRING to BYTES.
To edit the Avro type:
1. On the row of the field that you need to edit, click the Avro type field.
2. In the dialog for Avro type, select the BigQuery data type that you need.
3. Confirm your changes, and then click Save.
Edit the default target data types in the External Data Element column as needed. In particular, change the target data type for the following data types:
- Timestamps. Change the default target data type from NUMERIC to TIMESTAMP or TIMESTAMP (LONG).
- Booleans. Change the default target data type from STRING to BOOLEAN.
- Hexadecimals. Change the default target data type from STRING to BYTES.
To edit the default data type mapping:
1. On the row of the field that you need to edit, click the External Data Element field.
2. In the dialog for data types, select the BigQuery data type that you need.
3. Confirm your changes, and then click Save.
If you specified the Custom Names flag in the BigQuery Settings Maintenance page, edit the default target field names in the Temporary Field Name column as needed.

The values that you specify override the default names that are shown in the External Field Name column.
Edit the default target field descriptions in the Field Description column as needed.
Optionally, export the field map for external editing. For instructions, see Edit the BigQuery field map in a CSV file.
After all changes are complete and any externally edited values have been uploaded, confirm that the Is Setting Active Flag checkbox is selected. If Is Setting Active Flag is not selected, BigQuery Connector for SAP creates target tables with the default values.
Click Save.

The changes are stored in the /GOOG/BQ_FIELD configuration table and the Changed By, Changed On, and Changed At fields are automatically populated.

Enable token caching

To improve replication performance, we recommend that you enable caching for the access token that you retrieve from Google Cloud.

Enabling token caching makes sure that an access token is reused until the access token expires or is revoked, which in turn reduces the number of HTTP calls made to retrieve new access tokens.

To enable token caching, select the Token Caching flag in the client key table /GOOG/CLIENT_KEY.

When you enable token caching, the access token is cached in the Shared Memory of your SAP LT Replication Server application server for the duration that is set for the Token Refresh Seconds field in the /GOOG/CLIENT_KEY table. If Token Refresh Seconds is not specified or is set to 0, then the access token is cached for the value specified in the CMD_SECS_DEFLT parameter in advanced settings.

For SAP workloads that are not running on Google Cloud, the cached access tokens also prevent technical issues that might arise while replicating huge data loads, where several processes of SAP LT Replication Server can simultaneously request for an access token at any given time.

For SAP workloads that are running on Google Cloud and use a user-managed service account to access BigQuery or Pub/Sub, token caching can bring a significant improvement as retrieving an access token in this scenario involves making two HTTP calls.

Clear the cached access token

When token caching is enabled and you update the roles assigned to the service account that BigQuery Connector for SAP uses to access BigQuery or Pub/Sub, the new access token that corresponds to the updated roles is retrieved only after the existing cached token expires. In such situations, you can clear the access token manually.

To clear the cached access token, enter transaction SE38 and then run the program /GOOG/R_CLEAR_TOKEN_CACHE.

Test replication configuration

Test the replication configuration by starting data provisioning:

Open the SAP LT Replication Server Cockpit (transaction LTRC) in the SAP GUI.
Click the mass transfer configuration for the table replication that you are testing.
Click Data Provisioning.
In the Data Provisioning panel, start data provisioning:
1. Enter the name of the source table.
2. Click the radio button for the type of data provisioning that you want to test. For example, Start Load.
3. Click the Execute icon. The data transfer begins and the progress is displayed on the Participating objects screen.
  
  If the table doesn't exist in BigQuery, the BigQuery Connector for SAP creates the table from a schema that it builds from the table and field attributes that you previously defined with the /GOOG/SLT_SETTINGS transaction.
  
  The length of time that an initial load of a table takes depends on the size of the table and its records.
  
  Messages are written to the SAP LT Replication Server Application Logs section in transaction LTRC.

Validate replication

You can validate replication using the following methods:

In SAP LT Replication Server:
- Monitor the replication on the Data Provisioning screen.
- Check for error messages in the Application Logs screen.
On the table information tab in BigQuery:
- Check the Schema tab to ensure that the schema looks right.
- Check the Preview tab to see a preview of the inserted rows.
- Check the Details tab for the number of rows inserted, the size of the table, and other information.

Check replication in SAP LT Replication Server

Use transaction LTRC to see the progress of initial load or replication jobs after you start them and to check for error messages.

You can see the status of the load under the Load Statistics tab and the progress of the job under the Data Transfer Monitor tab in SAP LT Replication Server.

On the Application Logs screen of transaction LTRC, you can see all of the messages that are returned by BigQuery, the BigQuery Connector for SAP, and SAP LT Replication Server.

Messages that are issued by BigQuery Connector for SAP code in SAP LT Replication Server start with the prefix /GOOG/SLT. Messages that are returned from the BigQuery API start with the prefix /GOOG/MSG.

Messages that are returned by SAP LT Replication Server do not start with a /GOOG/ prefix.

Check replication in BigQuery

In the Google Cloud console, confirm that the table was created and that BigQuery is inserting data into it.

In the Google Cloud console, go to the BigQuery page.

Go to BigQuery
In the search field of the Explorer section, type the name of the target BigQuery table, and then press Enter.

The table information is displayed under a tab in the content pane on the right side of the page.
In the table information section, click the following headings to check the table and row insertion:
- Preview, which shows the rows and fields that are inserted into the BigQuery table.
- Schema, which shows the field names and data types.
- Details, which shows the table size, the total number of rows, and other details.

Troubleshoot

For information about diagnosing and resolving issues that you might encounter when you configure the BigQuery Connector for SAP, see BigQuery Connector for SAP troubleshooting guide.

Get support

If you need help resolving problems with configuring the BigQuery Connector for SAP, collect all available diagnostic information and contact Cloud Customer Care. For information about contacting Customer Care, see Getting support for SAP on Google Cloud.