Understand data scope and duration

AML AI is set up to assess money laundering risk for one line of business. An LoB is associated with one of your retail or commercial customers.

When creating a dataset for use with an LoB, you will need to include several tables. Each table should cover a sufficient time range. This page gives an overview of the tables you will need and shows how to determine the time range that each should cover.

Tables to use

The BigQuery dataset used with AML AI should contain the following tables:

  • Party: All parties relevant to that LoB
    • Retail LoB: All retail banking customers that have held accounts at any point in the required time range
    • Commercial LoB: All commercial banking customers (legal and natural entities) that have held accounts at any point in the required time range
  • AccountPartyLink: Full history of which accounts were held by which parties. This should cover all accounts for products and services when any party in the Party table was the primary account holder at any point in the required time range.
  • Transaction: All transactions for accounts in the AccountPartyLink table for the required time range.
  • RiskCaseEvent: All risk case events (see event type values) for any risk case and party in the Party table with an AML_PROCESS_START (start of investigation) in the required time range. This table may include events that have an event time earlier or later than the required time range.
  • PartySupplementaryData: (If used) For 0 to 100 unique party_supplementary_data_id values, include a full history of the values of these fields for all parties in the Party table for the required time range.

Using additional data

See Supplementary data if you have additional data on parties (not otherwise covered in the schema) that is relevant to identifying money laundering risk.

Dataset time range

The time range that any table in a dataset should cover can be worked out as follows for any given operation. You will need to know:

  • The end time. This is the latest time from which labels are used and from which data is used to generate features for tuning.
  • The Engine Version (See list of engine versions) you will use.
  • The operation you will conduct: tune, train, predict or backtest.
  • For predict or backtest operations, the number of periods for which you will conduct the operation, to be specified in the API call.

Understand Data Scope Duration

First you should work out the number of periods the operation will use. This is the number of consecutive months ending in the last full calendar month prior to the specified end time, for which AML AI will evaluate model features.

  • For predict and backtest operations, this is the number of prediction periods or backtest periods specified in the API call.
  • For other operations this depends on the Engine Version and the operation. For example, v004.004 Engine Versions use 18 periods for tuning and 15 for training.

Next you should work out the lookback window for each table. This is the maximum number of months of data needed from that table for AML AI to calculate model features for a given period.

  • For example, for v004.004 Engine Versions, this is 13 months for Transaction and AccountPartyLink tables, 12 months for the RiskCaseEvent table and 0 months for Party and PartySupplementaryData tables.

The dataset will need to cover the lookback window for all of the periods used by the chosen operation. You can calculate the number of full calendar months of data prior to the end time that you will need for a given operation with the following formula:

  • number of periods + lookback window -1

For example, for v004.00X Engine Versions conducting tuning, you require:

  • 18 + 13 - 1 = 30 months of data from the Transaction and AccountPartyLink tables,
  • 18 + 12 - 1 = 29 months of data from the Risk Case Events table as well as any more recent events for risk cases in the table,
  • And 18 + 0 - 1 = 17 months of data from the Party and PartySupplementaryData tables.