引擎配置、模型训练、模型回测和模型预测都需要 AML AI 数据集作为输入,并返回在其他操作中使用的相应工件。例如,模型训练会返回对经过训练的 AML AI 模型的引用,该模型可用于回测或预测。如需了解这些操作的技术详情,请参阅 REST 参考概览。
AML AI 流程的依赖项树
使用 AML AI 时的重要注意事项
本部分旨在向客户介绍反洗钱 AI 的关键概念,并提供一些最佳实践建议。专门的指南中会更详细地介绍本文中的主题,并提供链接以供进一步阅读。
日期一致性
AML AI 会针对不同的操作使用不同的时间段。请谨慎选择每项操作的日期,以确保获得可靠的结果。特别要指出的是,为避免结果出现偏差,请务必确保用于训练反洗钱 AI 模型的月份与用于回测的月份不重叠。
由于 AML AI 数据集包含数月的数据,因此数据集可用于多项操作,前提是选择正确的日期。下图展示了使用 AML AI 的开发周期,其中单个数据集(跨 42 个月)中的不同时间段用于配置引擎(超参数调优)、训练和回测。所有这些过程都使用回溯期,回溯期可为模型提供背景信息,并且可以安全地与用于其他操作的数据重叠。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-17。"],[[["\u003cp\u003eAML AI automates the training, testing, and deployment of models for detecting money laundering, primarily serving teams involved in these processes.\u003c/p\u003e\n"],["\u003cp\u003eThe AML AI process is broken down into five key steps: setting up AML AI, preparing data, generating and evaluating models, generating risk scores and explainability, and preparing for model and risk governance.\u003c/p\u003e\n"],["\u003cp\u003eCore technical operations within AML AI include creating datasets, configuring engines, training models, backtesting, registering parties, and predicting, all of which are interconnected and require a structured AML AI dataset as input.\u003c/p\u003e\n"],["\u003cp\u003eConsistency in data and time periods is crucial for reliable results, and care should be taken to avoid overlapping data between model training and backtesting.\u003c/p\u003e\n"],["\u003cp\u003eMaintaining clear data lineage across all AML AI operations is essential for model governance, achieved through unique identifiers, resource labeling, and the use of BigQuery snapshots.\u003c/p\u003e\n"]]],[],null,["# Process overview\n\nOverview\n--------\n\nThis page provides an overview of the AML AI processes and covers key concepts\nfor customers to understand. It is targeted primarily at teams who will use\nAML AI to train, test, and deploy models.\n\nAML AI enables banks to automatically train, test, and deploy\nmodels for detecting money laundering. The AML AI guides are\nsplit into five left-navigation sections that correspond to the following five\nsteps.\n\nThe core AML AI technical operations to create, test, and deploy\nmodels are as follows. These support steps 2-4 in the preceding table.\n\n1. [Create AML AI Dataset](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.datasets) - creates a structured set of BigQuery input data tables for AML AI\n2. [Engine Configuration](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.engineConfigs) - tunes an AML AI engine to an AML AI dataset, including hyperparameter tuning\n3. [Model Training](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.models) - trains an AML AI model using an Engine Configuration and a dataset\n4. [Backtest](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.backtestResults) - tests an AML AI model against historic data on a dataset and summarizes performance\n5. [Register Parties](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances/importRegisteredParties) - registers parties (customers of the bank who have banking products and send or receive transactions) so they can be scored in prediction\n6. [Prediction](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.predictionResults) - produces party scores and explainability for use in production\n\nEngine Configuration, Model Training, Model Backtest, and Model Prediction all\nrequire an AML AI dataset as input and return corresponding artifacts\nwhich are used in other operations. For example, [Model Training](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.models/create)\nreturns a reference to a [trained AML AI model](/financial-services/anti-money-laundering/docs/reference/rest/v1/projects.locations.instances.models) which can be used\nfor backtest or prediction. For technical details of the operations, see the\n[REST Reference Overview](/financial-services/anti-money-laundering/docs/reference/rest).\n\n### Dependency tree for AML AI processes\n\nImportant considerations when using AML AI\n------------------------------------------\n\nThis section is designed to give customers an introduction to the key concepts\nof AML AI and advise on some best practices. Topics here are\ncovered in more detail in dedicated guides and links are provided for further\nreading.\n\n### Date consistency\n\nAML AI uses different time periods for different operations.\nCare should be taken with the dates selected for each operation to ensure\nreliable results. In particular, to avoid bias in results, it is important that\nthe months used for training an AML AI model don't overlap with\nthe months used for backtesting.\n\nSince an AML AI dataset contains many months of data, datasets\ncan be used for multiple operations, subject to these correct date selections.\nThe following diagram illustrates a development cycle using AML AI,\nwhere different time periods within a single dataset spanning 42 months are used\nto configure an engine (hyperparameter tuning), training, and backtesting. All\nthese processes use lookback windows which provide context to the model, and can\nsafely overlap with data used for other operations.\n| **Note:** More recent Engine Versions may require fewer months of data. Check the documentation of the specific [Engine Version](/financial-services/anti-money-laundering/docs/reference/engine-versions) you will use.\n\nFor more information on AML AI datasets and time windows for different\noperations, see [Understand data scope and duration](/financial-services/anti-money-laundering/docs/understand-data-scope-duration).\n\nTo ensure you record changes to your data over time correctly, see [Data changes over time](/financial-services/anti-money-laundering/docs/reference/schemas/aml-input-data-model#data-changes-over-time).\n\n### Production batch frequency\n\nIn prediction, AML AI produces AML risk scores on a calendar month basis.\nCustomers commonly use AML AI as part of a monthly batch process, and they are\nadvised to run predictions on months with complete transaction data as much as\npossible.\n| **Note:** Running months with some transactions missing may affect the quality of party risk scores.\n\n### Field consistency\n\nAs with any machine learning process, data should be as consistent as possible\nbetween training data and test data. If fields are not populated consistently,\nthe changes may cause unreliable results. It's strongly recommended that steps\nare taken to ensure fields are populated consistently for each operation in a\ndevelopment cycle------and this is especially true if different datasets are used\nfor each operation. For more information, see [dataset consistency](/financial-services/anti-money-laundering/docs/overview-model-preparation#dataset-consistency).\n| **Note:** The skew metric in Prediction Results and Backtest Results resources can be used to monitor for changes in the data between model training and prediction.\n\n### Engine configs\n\nOnce an engine config has been created, it's not normally necessary to re-create\nit for every new dataset or in every development cycle. The hyperparameters\nchosen in an engine config for one dataset generally perform well on similar\ndatasets.\n| **Note:** If the datasets are changed substantially, creating a new engine config may improve performance. This is particularly true if deploying to a new market, or if the Risk Case Event data or Party Supplementary Data are significantly updated.\n\nIterative development cycles are illustrated in the following diagram, whereas\nthe preceding diagram uses a single dataset for both Model Training and Backtest\noperations.\n\nFor more information, see [when to tune or inherit](/financial-services/anti-money-laundering/docs/configure-engine#tune-or-inherit).\n\n### Data lineage\n\nMost model governance policies define a requirement to track data lineage used\nacross all ML operations from engine configuration, training, evaluation, and\nprediction. Customers are responsible for tracking this data lineage.\n\nWe recommend using a unique identifier in the names for all input data,\nAML AI resources, and output data to track lineage across stages.\nThis helps to ensure strong linking between resources in a particular run.\nCustomers can also label all AML AI resources to meet lineage\nrequirements.\n\nAdditionally, we recommend using BigQuery snapshots in API requests to\nensure accurate data lineage.\n\nThis configuration helps answer questions like \"where did this engine\nconfiguration come from?\" and \"where did this model come from?\" while helping\nto investigate and resolve incidents.\n\nFor details of how to create and manage AML AI resources, see the\n[REST API](/financial-services/anti-money-laundering/docs/reference/rest) pages."]]