本页介绍了创建和管理 AML AI 数据集的步骤。数据集用作引擎配置、训练、回测和预测流水线的输入。AML AI 数据集包含对 Google Cloud 项目中与 AML AI 输入数据模型匹配的 BigQuery 表的引用。
前提条件
-
如需获得创建和管理数据集所需的权限,请让您的管理员为您授予项目的 Financial Services Admin (
financialservices.admin
) IAM 角色。 如需详细了解如何授予角色,请参阅管理对项目、文件夹和组织的访问权限。 - 创建实例
-
某些 API 方法会返回长时间运行的操作 (LRO)。这些方法是异步的,并会返回一个 Operation 对象;如需了解详情,请参阅 REST 参考文档。当方法返回响应时,操作可能尚未完成。对于这些方法,请发送请求,然后检查结果。通常,所有 POST、PUT、UPDATE 和 DELETE 操作都是长时间运行的操作。
创建数据集
如需创建数据集,请发送创建请求,然后检查 LRO 的结果。
发送请求
如需创建数据集,请使用 projects.locations.instances.datasets.create
方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID
:IAM 设置中列出的 Google Cloud 项目 IDLOCATION
:实例的位置;请使用其中一个受支持的区域显示位置us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
australia-southeast1
INSTANCE_ID
:实例的用户定义标识符DATASET_ID
:AML AI 数据集的用户定义标识符;仅使用小写字母、数字、短划线和下划线(例如train_jan2018_apr2020
)BQ_INPUT_DATASET_NAME
:BigQuery 输入数据集名称PARTY_TABLE
:BigQuery 输入数据集中的 Party 表ACCOUNT_PARTY_LINK_TABLE
:BigQuery 输入数据集中的 AccountPartyLink 表TRANSACTION_TABLE
:BigQuery 输入数据集中的 Transaction 表RISK_CASE_EVENT_TABLE
:BigQuery 输入数据集中的 RiskCaseEvent 表PARTY_SUPPLEMENTARY_DATA
:BigQuery 输入数据集中的 PartySupplementaryData 表;此表是可选的,可以从请求 JSON 中移除DATA_START_DATE
:要在数据集中使用的日期的开始日期和时间;使用 RFC3339 UTC“Zulu”格式(例如2014-10-02T15:01:23Z
)DATA_END_DATE
:要在数据集中使用的日期的结束日期和时间;使用 RFC3339 UTC“Zulu”格式(例如2014-10-02T15:01:23Z
)
请求 JSON 正文:
{ "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } }
如需发送请求,请选择以下方式之一:
curl
将请求正文保存在名为 request.json
的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
cat > request.json << 'EOF' { "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } EOF
然后,执行以下命令以发送 REST 请求:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID"
PowerShell
将请求正文保存在名为 request.json
的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
@' { "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "dateRange": { "startTime": "DATA_START_DATE", "endTime": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } '@ | Out-File -FilePath request.json -Encoding utf8
然后,执行以下命令以发送 REST 请求:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
复制返回的 OPERATION_ID
以便在下一部分中使用。
检查结果
使用 projects.locations.operations.get
方法检查是否已创建数据集。如果响应包含 "done": false
,请重复该命令,直到响应包含 "done": true
。这些操作可能需要几分钟到几小时才能完成。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID
:IAM 设置中列出的 Google Cloud 项目 IDLOCATION
:实例的位置;请使用某个受支持的地区显示位置us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
australia-southeast1
OPERATION_ID
:操作的标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": "2023-03-14T15:52:55.358979323Z", "endTime": "2023-03-14T16:52:55.358979323Z", "target": "projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID", "verb": "create", "requestedCancellation": false, "apiVersion": "v1" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.Dataset", "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } }
获取数据集
如需获取数据集,请使用 projects.locations.instances.datasets.get
方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID
:IAM 设置中列出的 Google Cloud 项目 IDLOCATION
:实例的位置;请使用某个受支持的地区显示位置us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
australia-southeast1
INSTANCE_ID
:实例的用户定义标识符DATASET_ID
:用户定义的数据集标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{ "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } }
更新数据集
如需更新数据集,请使用 projects.locations.instances.datasets.patch
方法。
您只能更新 AML AI 中的标签字段。 以下示例会更新与数据集关联的用户标签键值对。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID
:IAM 设置中列出的 Google Cloud 项目 IDLOCATION
:实例的位置;请使用某个受支持的地区显示位置us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
australia-southeast1
INSTANCE_ID
:实例的用户定义标识符DATASET_ID
:用户定义的数据集标识符KEY
:键值对中的键,用于整理数据集。如需了解详情,请参阅labels
。VALUE
:键值对中的值,用于整理数据集。如需了解详情,请参阅labels
。
请求 JSON 正文:
{ "labels": { "KEY": "VALUE" } }
如需发送请求,请选择以下方式之一:
curl
将请求正文保存在名为 request.json
的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
cat > request.json << 'EOF' { "labels": { "KEY": "VALUE" } } EOF
然后,执行以下命令以发送 REST 请求:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels"
PowerShell
将请求正文保存在名为 request.json
的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
@' { "labels": { "KEY": "VALUE" } } '@ | Out-File -FilePath request.json -Encoding utf8
然后,执行以下命令以发送 REST 请求:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "update", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
如需详细了解如何获取长时间运行的操作 (LRO) 的结果,请参阅检查结果。
列出数据集
如需列出指定实例的数据集,请使用 projects.locations.instances.datasets.list
方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID
:IAM 设置中列出的 Google Cloud 项目 IDLOCATION
:实例的位置;请使用某个受支持的地区显示位置us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
australia-southeast1
INSTANCE_ID
:实例的用户定义标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{ "datasets": [ { "name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "createTime": CREATE_TIME, "updateTime": UPDATE_TIME, "tableSpecs": { "party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE", "account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE", "transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE", "risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE", "party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA" }, "state": "ACTIVE", "dateRange": { "start_time": "DATA_START_DATE", "end_time": "DATA_END_DATE" }, "timeZone": { "id": "UTC" } } ] }
删除数据集
如要删除数据集,请使用 projects.locations.instances.datasets.delete
方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID
:IAM 设置中列出的 Google Cloud 项目 IDLOCATION
:实例的位置;请使用某个受支持的地区显示位置us-central1
us-east1
asia-south1
europe-west1
europe-west2
europe-west4
northamerica-northeast1
southamerica-east1
australia-southeast1
INSTANCE_ID
:实例的用户定义标识符DATASET_ID
:用户定义的数据集标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata", "createTime": CREATE_TIME, "target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID", "verb": "delete", "requestedCancellation": false, "apiVersion": "v1" }, "done": false }
如需详细了解如何获取长时间运行的操作 (LRO) 的结果,请参阅检查结果。