此页面由 Cloud Translation API 翻译。

检查媒体推荐内容的数据质量

本页介绍了如何确定媒体数据的各种指标是否满足其要求阈值。

媒体数据质量检查简介

由于近期的用户事件对于媒体推荐非常重要，因此您必须定期检查提取的数据和用户事件的质量。为此，您可以查看媒体推荐应用的优化标签页，确定可以对数据进行哪些改进，以便优化出更优质的推荐。

如果指标未达到阈值，则该指标处于警告状态。然后，您需要查看该指标及其说明，以确定应采取什么措施来提高媒体质量。

所有模型和目标都需要通过总体质量指标阈值。部分模型和目标具有额外的应用专用质量指标和阈值。对于使用相同数据存储区的所有应用，常规质量指标都是相同的，但应用专用的质量指标会因应用的模型和目标而异。

如需了解推荐模型和目标，请参阅媒体应用推荐类型简介。

检查数据质量

控制台

如需检查媒体建议数据的质量，请按以下步骤操作：

在 Google Cloud 控制台中，前往 AI Applications 页面。

AI Applications
点击要检查数据质量的媒体推荐应用的名称。
在导航菜单中，点击数据质量，然后点击优化标签页。此页面会显示与您的应用相关联的数据的各种指标的状态。
查看页面顶部的总体质量和应用特定质量状态。如果一个或多个指标超过了其阈值，页面顶部的摘要状态会显示为警告。

这两个指标表（总体质量和应用专用质量）列出了各个指标。
在指标表格中，点击处于警告状态的任何指标的查看详情，即可了解更多信息。
可选：如果您想查看合规指标的阈值，请点击查看详情。指标表格中不会显示合规指标的阈值。

REST

使用 requirements:checkRequirement 方法检查媒体建议数据的质量，如下所示。

如需从命令行检查质量，请按以下步骤操作：

查找数据存储区 ID。如果您已拥有数据存储区 ID，请跳至下一步。
1. 在 Google Cloud 控制台中，前往 AI Applications 页面，然后在导航菜单中点击数据存储区。
  
  前往“数据存储区”页面
2. 点击您的数据存储区的名称。
3. 在数据存储区的数据页面上，获取数据存储区 ID。

运行以下 curl 命令，了解您的媒体建议是否满足一般指标的阈值：

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-GFE-SSL: yes" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/requirements:checkRequirement" \
-d '{
      "location": "projects/PROJECT_ID/locations/global",
      "requirementType": "discoveryengine.googleapis.com/media_recs/general/all/warning",
      "resources": [
        {
          "labels": {
            "branch_id": "0",
            "collection_id": "default_collection",
            "datastore_id": "DATA_STORE_ID",
            "location_id": "global",
            "project_number": "PROJECT_ID"
          },
          "type": "discoveryengine.googleapis.com/Branch"
        },
        {
          "labels": {
            "collection_id": "default_collection",
            "datastore_id": "DATA_STORE_ID",
            "location_id": "global",
            "project_number": "PROJECT_ID"
          },
          "type": "discoveryengine.googleapis.com/DataStore"
        }
      ]
    }'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：Vertex AI Search 数据存储区的 ID。

命令和结果示例

curl -X POST
-H "Authorization: Bearer $(gcloud auth print-access-token)"
-H "Content-Type: application/json"
-H "X-GFE-SSL: yes"
-H "X-Goog-User-Project: my-project-123"
"https://discoveryengine.googleapis.com/v1alpha/projects/my-project-123/locations/global/requirements:checkRequirement"
-d '{
  "location": "projects/123456/locations/global",
  "requirementType": "discoveryengine.googleapis.com/media_recs/general/all/warning",
  "resources": [
    {
      "labels": {
        "branch_id": "0",
        "collection_id": "default_collection",
        "datastore_id": "my-data-store",
        "location_id": "global",
        "project_number": "123456"
      },
      "type": "discoveryengine.googleapis.com/Branch"
    },
    {
      "labels": {
        "collection_id": "default_collection",
        "datastore_id": "my-data-store",
        "location_id": "global",
        "project_number": "123456"
      },
      "type": "discoveryengine.googleapis.com/DataStore"
    }
  ]
}'
    
{
"requirement": {
"type": "discoveryengine.googleapis.com/media_recs/general/all/warning",
"displayName": "Warning level requirements for all models and all business objectives.",
"description": "Requirements for the media recommendations model that will result in performance issue if not met for all media recommendations models and all business objectives.",
"condition": {
  "expression": "doc_with_same_title_percentage \u003c doc_with_same_title_percentage_threshold && most_common_visitor_id_percentage \u003c most_common_visitor_id_percentage_threshold && short_term_unjoined_events_percentage \u003c short_term_unjoined_events_percentage_threshold && long_term_unjoined_events_percentage \u003c long_term_unjoined_events_percentage_threshold"
},
"metricBindings": [
  {
    "variableId": "doc_with_same_title_percentage",
    "resourceType": "discoveryengine.googleapis.com/Branch",
    "metricFilter": "metric.type = 'discoveryengine.googleapis.com/branch/documents/items_with_same_title' AND metric.labels.is_percentage = 'True' AND resource.labels.project_number = '123456' AND resource.labels.branch_id = '0' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'",
    "description": "The percentage of the documents with the same title in a branch.",
    "category": "Document"
  },
  {
    "variableId": "most_common_visitor_id_percentage",
    "resourceType": "discoveryengine.googleapis.com/DataStore",
    "metricFilter": "metric.type = 'discoveryengine.googleapis.com/branch/datastore/user_events/most_used_visitor_id_events' AND metric.labels.is_percentage = 'True' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.project_number = '123456' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'",
    "description": "The percentage of the events with the same visitor id.",
    "category": "DataStore"
  },
  {
    "variableId": "short_term_unjoined_events_percentage",
    "resourceType": "discoveryengine.googleapis.com/DataStore",
    "metricFilter": "metric.type = 'discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids' AND metric.labels.is_percentage = 'True' AND metric.conditions.time_range = 'WEEK' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.project_number = '123456' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'",
    "description": "The percentage of events refers to a document id that is not in the catalog in the last 7 days.",
    "category": "DataStore"
  },
  {
    "variableId": "long_term_unjoined_events_percentage",
    "resourceType": "discoveryengine.googleapis.com/DataStore",
    "metricFilter": "metric.type = 'discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids' AND metric.labels.is_percentage = 'True' AND metric.conditions.time_range = 'NINETY_DAYS' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.project_number = '123456' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'",
    "description": "The percentage of events refers to a document id that is not in the catalog in the last 90 days.",
    "category": "DataStore"
  }
],
"thresholdBindings": [
  {
    "variableId": "doc_with_same_title_percentage_threshold",
    "threshold_values": {
      "severity": "WARNING",
      "value": 1.0
    }
    "description": "The threshold for the percentage of the documents with the same title in a branch."
  },
  {
    "variableId": "most_common_visitor_id_percentage_threshold",
    "threshold_values": {
      "severity": "WARNING",
      "value": 5.0
    }
    "description": "The threshold for the percentage of the events with the same visitor id."
  },
  {
    "variableId": "short_term_unjoined_events_percentage_threshold",
    "threshold_values": {
      "severity": "WARNING",
      "value": 5.0
    }
    "description": "The threshold for the percentage of the events refers to a document id that is not in the catalog in the last 7 days."
  },
  {
    "variableId": "long_term_unjoined_events_percentage_threshold",
    "threshold_values": {
      "severity": "WARNING",
      "value": 2.0
    }
    "description": "The threshold for the percentage of the events refers to a document id that is not in the catalog in the last 90 days"
  }
]
},
"result": "WARNING",
"requirementCondition": {
"expression": "doc_with_same_title_percentage \u003c doc_with_same_title_percentage_threshold && most_common_visitor_id_percentage \u003c most_common_visitor_id_percentage_threshold && short_term_unjoined_events_percentage \u003c short_term_unjoined_events_percentage_threshold && long_term_unjoined_events_percentage \u003c long_term_unjoined_events_percentage_threshold"
},
"metricResults": [
{
  "name": "short_term_unjoined_events_percentage",
  "value": {
    "doubleValue": 0
  },
  "timestamp": "2024-06-06T03:03:13.416900898Z",
  "unit": "%",
  "metricType": "discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids"
},
{
  "name": "long_term_unjoined_events_percentage",
  "value": {
    "doubleValue": 0
  },
  "timestamp": "2024-06-06T03:03:13.417962744Z",
  "unit": "%",
  "metricType": "discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids"
},
{
  "name": "most_common_visitor_id_percentage",
  "value": {
    "doubleValue": 0.8
  },
  "timestamp": "2024-06-06T03:03:16.090037135Z",
  "unit": "%",
  "metricType": "discoveryengine.googleapis.com/datastore/user_events/most_used_visitor_id_events"
},
{
  "name": "doc_with_same_title_percentage",
  "value": {
    "doubleValue": 30.47
  },
  "timestamp": "2024-06-06T03:03:17.599458357Z",
  "unit": "%",
  "metricType": "discoveryengine.googleapis.com/documents/items_with_same_title"
}
],
"oldestMetricTimestamp": "2024-06-06T03:03:13.416900898Z"
}

查看输出：
1. 查找 result 的值：
  - 如果值为 SUCCESS，则表示您的数据符合一般要求；请继续执行第 4 步。
  - 如果值为 WARNING，请继续执行步骤 b。
  - 如果您在输出中没有看到 result，可能有以下几种原因：
    - 请求中的 PROJECT_ID 或 DATA_STORE_ID 不正确。
    - 部分指标值不可用。请在 6 小时后重试，或与客户工程师联系以获取帮助。
2. 查找表达式 (requirement.Condition.Expression)：如果此表达式的计算结果为 false，则表示您的数据存在问题。
  
  注意：表达式中的小于号以 Unicode 字符 \u003c 显示，而不是“<”。
  
  指标的值位于 requirementCondition.metricResults.value 字段中。警告阈值位于 thresholdBindings.thresholdValues 字段中。description 字段可帮助您了解指标的用途。
  
  例如，doc_with_same_title_percentage 的值为 30.47，doc_with_same_title_percentage_threshold 的警告阈值为 1。数据存在问题，数据存储区中的许多影视内容标题都相同，需要对此进行调查。

如果您的推荐应用所用的模型和目标组合出现在此表中，那么您还需要调用检查要求方法，并使用您的模型和目标的值进行更新：

型号	目标	`MODEL_OBJ`
您可能喜欢的其他类型	转化率	`oyml/cvr`
为您推荐	转化率	`rfy/cvr`
更多类似内容	转化率	`mlt/cvr`
最热门	转化率	`mp/cvr`
您可能喜欢的其他类型	每次访问的观看时长	`oyml/wdps`
为您推荐	每次访问的观看时长	`rfy/wdps`
更多类似内容	每次访问的观看时长	`mlt/wdps`

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-GFE-SSL: yes" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/requirements:checkRequirement" \
-d '{
      "location": "projects/PROJECT_ID/locations/global",
      "requirementType": "discoveryengine.googleapis.com/media_recs/MODEL_OBJ/warning",
      "resources": [
        {
          "labels": {
            "branch_id": "0",
            "collection_id": "default_collection",
            "datastore_id": "DATA_STORE_ID",
            "location_id": "global",
            "project_number": "PROJECT_ID"
          },
          "type": "discoveryengine.googleapis.com/Branch"
        },
        {
          "labels": {
            "collection_id": "default_collection",
            "datastore_id": "DATA_STORE_ID",
            "location_id": "global",
            "project_number": "PROJECT_ID"
          },
          "type": "discoveryengine.googleapis.com/DataStore"
        }
      ]
    }'

替换以下内容：

PROJECT_ID：您的 Google Cloud 项目的 ID。
DATA_STORE_ID：Vertex AI Search 数据存储区的 ID。
MODEL_OBJ：请参阅上表，为您的推荐应用选择正确的值。

命令和结果示例

此示例适用于“更多类似内容”模型和观看时长目标：

curl -X POST
-H "Authorization: Bearer $(gcloud auth print-access-token)"
-H "Content-Type: application/json"
-H "X-GFE-SSL: yes"
-H "X-Goog-User-Project: my-project-123"
"https://discoveryengine.googleapis.com/v1alpha/projects/my-project-123/locations/global/collections/default_collection/dataStores/my-data-store/branches/0/requirements:checkRequirement"
-d '{
  "location": "projects/my-project-123/locations/global",
  "requirementType": "discoveryengine.googleapis.com/media_recs/mlt/wdps/warning",
  "resources": [
    {
      "labels": {
        "branch_id": "0",
        "collection_id": "default_collection",
        "datastore_id": "my-data-store",
        "location_id": "global",
        "project_number": "my-project-123"
      },
      "type": "discoveryengine.googleapis.com/Branch"
    },
    {
      "labels": {
        "collection_id": "default_collection",
        "datastore_id": "my-data-store",
        "location_id": "global",
        "project_number": "my-project-123"
      },
      "type": "discoveryengine.googleapis.com/DataStore"
    }
  ]
}'
    
{
"requirement": {
"type": "discoveryengine.googleapis.com/media_recs/mlt/wdps/warning",
"displayName": "Warning level requirements for 'More Like This' models and 'Watch duration per session' business objectives.",
"description": "Requirements for the media recommendations model that will result in performance issue if not met for the 'More Like This' model and the 'Watch duration per session' business objective.",
"condition": {
  "expression": "invalid_sequence_percentage \u003c= invalid_sequence_percentage_threshold"
},
"metricBindings": [
  {
    "variableId": "invalid_sequence_percentage",
    "resourceType": "discoveryengine.googleapis.com/DataStore",
    "metricFilter": "metric.type = 'discoveryengine.googleapis.com/datastore/user_events/invalid_sequences_media_play_media_complete' AND metric.labels.is_percentage = 'True'  AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection' AND resource.labels.project_number = '123456' AND resource.labels.datastore_id = 'my-data-store'",
    "description": "The percentage of invalid sequences for media play and media complete events sampled by randomly selected visitor ids.",
    "category": "DataStore"
  }
],
"thresholdBindings": [
  {
    "variableId": "invalid_sequence_percentage_threshold",
    "thresholdValues": [
      {
        "severity": "WARNING",
        "value": 50
      }
    ],
    "description": "The threshold for the percentage of invalid sequences sampled among all media play and media complete events."
  }
]
},
"result": "SUCCESS",
"requirementCondition": {
"expression": "invalid_sequence_percentage \u003c= invalid_sequence_percentage_threshold"
},
"metricResults": [
{
  "name": "invalid_sequence_percentage",
  "value": {
    "doubleValue": 0
  },
  "timestamp": "2024-06-06T02:32:00.460056386Z",
  "unit": "%",
  "metricType": "discoveryengine.googleapis.com/datastore/user_events/invalid_sequences_media_play_media_complete"
}
],
"oldestMetricTimestamp": "2024-06-06T02:32:00.460056386Z"
}

查看输出：
1. 查找 result 的值：
  - 如果值为 SUCCESS，则表示您的数据足够好。
  - 如果值为 WARNING，请继续执行步骤 b。
  - 如果您在输出中没有看到 result，可能有以下几种原因：
    - 请求中的 PROJECT_ID 或 DATA_STORE_ID 不正确。
    - 部分指标值不可用。请在 6 小时后重试，或与客户工程师联系以获取帮助。
2. 查看表达式 (requirement.Condition.Expression)。如果此表达式的计算结果为 false，则表示您的数据存在问题。
  
  注意：表达式中的小于号以 Unicode 字符 \u003c 而不是 < 显示。
  
  指标值可在 requirementCondition.metricResults.value 字段中找到，而警告阈值可在 thresholdBindings.thresholdValues 字段中找到。description字段可帮助您了解相应指标的用途。

检查媒体推荐内容的数据质量 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

媒体数据质量检查简介

检查数据质量

控制台

REST

命令和结果示例

命令和结果示例

检查媒体推荐内容的数据质量