Stay organized with collections
Save and categorize content based on your preferences.
At the core of AML AI is a detailed and up-to-date understanding
of parties of the bank and their activity, covering, in particular, the
following data:
Transactional activity
Account holdings
Party demographics
Risk investigation data
This page covers the creation and management of data used by
AML AI, including details of the data model, data schema, and
data requirements for AML. The schema itself, including details for the
individual fields, appears in the AML input data model
(CSV file). A synthetic example dataset is also
available through the Quickstart.
The following prerequisites are not covered on this page:
The AML data model accepts information on retail or commercial parties, their
accounts and transactions, and detailed information on risk cases related to
these parties. This section introduces important aspects of the data model that
are valid across the different entities.
The AML data model schema is arranged into three areas: core banking data, risk
investigation data, and supplementary data.
Purpose: Serves as a structured collection of data on your customers and
their banking activity, used in detection of risk. All parties, accounts and
transactions to be monitored should be included. Provide either retail or
commercial data in an AML AI dataset
Purpose: Optional table that can contain additional information relevant
to identifying money laundering risk that is not covered in the rest of the
schema. You should start using AML AI without providing any
supplementary data.
Table relationships
The following diagram describes the table relationships, primary keys, and
foreign keys.
Errors
When you create a dataset, AML AI automatically performs data
validation checks. For information about these checks, the error messages and
how to fix them, see
Data validation errors.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eThis page details the data model, schema, and requirements for AML AI, which relies on understanding a bank's parties and their activities.\u003c/p\u003e\n"],["\u003cp\u003eThe AML data model is divided into three main areas: core banking data (parties, accounts, transactions), risk investigation data (risk cases), and optional supplementary data.\u003c/p\u003e\n"],["\u003cp\u003eThe data model includes tables such as Party, AccountPartyLink, Transaction, RiskCaseEvent, and PartySupplementaryData, each serving a specific purpose in risk detection and model training.\u003c/p\u003e\n"],["\u003cp\u003eAML AI performs data validation checks when a dataset is created, with details on errors and fixes available in the Data Validation Errors section.\u003c/p\u003e\n"],["\u003cp\u003eData lineage is important, and it's recommended to take a snapshot of your BigQuery tables to preserve data integrity for AML AI operations.\u003c/p\u003e\n"]]],[],null,["# Understand the AML data model and requirements\n\nAt the core of AML AI is a detailed and up-to-date understanding\nof parties of the bank and their activity, covering, in particular, the\nfollowing data:\n\n- Transactional activity\n- Account holdings\n- Party demographics\n- Risk investigation data\n\nThis page covers the creation and management of data used by\nAML AI, including details of the data model, data schema, and\ndata requirements for AML. The schema itself, including details for the\nindividual fields, appears in the [AML input data model](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model)\n([CSV file](/static/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model.csv)). A synthetic example dataset is also\navailable through the [Quickstart](/financial-services/anti-money-laundering/docs/train-models-to-detect-money-laundering).\n\nThe following prerequisites are not covered on this page:\n\n- Setup to use AML AI with an AML dataset (see [Set up a project and permissions](/financial-services/anti-money-laundering/docs/set-up-project-permissions))\n- Security and compliance features (see pages under [Security and compliance features](/financial-services/anti-money-laundering/docs/concepts/security-and-compliance-features))\n\nOverview of data requirements\n-----------------------------\n\nThe AML data model accepts information on retail or commercial parties, their\naccounts and transactions, and detailed information on risk cases related to\nthese parties. This section introduces important aspects of the data model that\nare valid across the different entities.\n\nThe AML data model schema is arranged into three areas: core banking data, risk\ninvestigation data, and supplementary data.\n\n### Core banking data\n\n- **Tables** : [Party](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#party), [AccountPartyLink](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#accountpartylink), [Transaction](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#transaction)\n- **Purpose**: Serves as a structured collection of data on your customers and their banking activity, used in detection of risk. All parties, accounts and transactions to be monitored should be included. Provide either retail or commercial data in an AML AI dataset\n\n### Risk investigation data\n\n- **Table** : [RiskCaseEvent](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#riskcaseevent)\n- **Purpose** :\n - Serves as a structured collection of data on risk investigation processes and parties previously identified as risky\n - Assists in the creation of training labels for AML risk models\n\n### Supplementary data\n\n- **Table** : [PartySupplementaryData](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#partysupplementarydata)\n- **Purpose**: Optional table that can contain additional information relevant to identifying money laundering risk that is not covered in the rest of the schema. You should start using AML AI without providing any supplementary data.\n\n### Table relationships\n\nThe following diagram describes the table relationships, primary keys, and\nforeign keys.\n| **Note:** Since AML AI uses data over time, primary keys may include `validity_start_time` to allow accurate representation of data over time. For example, AML AI can capture when a customer was added to or removed from a joint account. For more details, see [understanding how data changes over time](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#data-changes-over-time).\n\nErrors\n------\n\nWhen you create a dataset, AML AI automatically performs data\nvalidation checks. For information about these checks, the error messages and\nhow to fix them, see\n[Data validation errors](/financial-services/anti-money-laundering/docs/reference/data-validation-errors).\n\nFor more information about the technical schema, see\n[AML input data model](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model)\n([CSV file](/static/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model.csv)).\nTo understand the data duration requirements and scope, see\n[Understand data scope and duration](/financial-services/anti-money-laundering/docs/understand-data-scope-duration).\nWhen you have tables ready in BigQuery, you can use\nAML AI to\n[create and manage a dataset](/financial-services/anti-money-laundering/docs/create-and-manage-datasets).\n| **Note:** Most model governance policies define a requirement to track data lineage used across all ML operations from engine configuration, training, and evaluation. To ensure data remains unchanged, we recommend that you create a snapshot of your BigQuery tables. In the case that you don't snapshot data, AML AI operations read the BigQuery tables each time an operation uses the dataset, so changes to the dataset could impact tuning, training, backtesting, and predictions."]]