过滤媒体搜索

如果您有媒体搜索应用,则可以使用元数据过滤搜索查询。本页介绍了如何使用元数据字段将搜索范围限制为一组特定文档。

准备工作

确保您已创建媒体应用和数据存储区,并提取了数据。如需了解详情,请参阅创建媒体数据存储区创建媒体应用

文件示例

请查看以下媒体文件示例。在阅读本页内容时,您可以随时参考这些术语。

{"id":"172851","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar: Creating the World of Pandora (2010)\",\"categories\":[\"Documentary\"],\"uri\":\"http://mytestdomain.movie/content/172851\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"243308","schemaId":"default_schema","jsonData":"{\"title\":\"Capturing Avatar (2010)\",\"categories\":[\"Documentary\"],\"uri\":\"http://mytestdomain.movie/content/243308\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"280218","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar: The Way of Water (2022)\",\"categories\":[\"Action\",\"Adventure\",\"Sci-Fi\"],\"uri\":\"http://mytestdomain.movie/content/280218\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}
{"id":"72998","schemaId":"default_schema","jsonData":"{\"title\":\"Avatar (2009)\",\"categories\":[\"Action\",\"Adventure\",\"Sci-Fi\",\"IMAX\"],\"uri\":\"http://mytestdomain.movie/content/72998\",\"available_time\":\"2023-01-01T00:00:00Z\",\"media_type\":\"movie\"}"}

过滤器表达式语法

请务必了解您将用于定义搜索过滤条件的过滤条件表达式语法。过滤条件表达式语法可按以下扩展巴科斯范式总结:

  # A single expression or multiple expressions that are joined by "AND" or "OR".
  filter = expression, { " AND " | "OR", expression };
  # Expressions can be prefixed with "-" or "NOT" to express a negation.
  expression = [ "-" | "NOT " ],
    # A parenthetical expression.
    | "(", expression, ")"
    # A simple expression applying to a text field.
    # Function "ANY" returns true if the field contains any of the literals.
    ( text_field, ":", "ANY", "(", literal, { ",", literal }, ")"
    # A simple expression applying to a numerical field. Function "IN" returns true
    # if a field value is within the range. By default, lower_bound is inclusive and
    # upper_bound is exclusive.
    | numerical_field, ":", "IN", "(", lower_bound, ",", upper_bound, ")"
    # A simple expression that applies to a numerical field and compares with a double value.
    | numerical_field, comparison, double );
    # Datetime field
    | datetime_field, comparison, literal_iso_8601_datetime_format);
  # A lower_bound is either a double or "*", which represents negative infinity.
  # Explicitly specify inclusive bound with the character 'i' or exclusive bound
  # with the character 'e'.
  lower_bound = ( double, [ "e" | "i" ] ) | "*";
  # An upper_bound is either a double or "*", which represents infinity.
  # Explicitly specify inclusive bound with the character 'i' or exclusive bound
  # with the character 'e'.
  upper_bound = ( double, [ "e" | "i" ] ) | "*";
  # Supported comparison operators.
  comparison = "<=" | "<" | ">=" | ">" | "=";
  # A literal is any double quoted string. You must escape backslash (\) and
  # quote (") characters.
  literal = double quoted string;
  text_field = text field - for example, category;
  numerical_field = numerical field - for example, score;
  datetime_field = field of datetime data type - for example available_time;
  literal_iso_8601_datetime_format = either a double quoted string representing ISO 8601 datetime or a numerical field representing microseconds from unix epoch.

如需使用元数据过滤媒体搜索结果,请按以下步骤操作:

  1. 找到您的数据存储区 ID。如果您已拥有数据存储区 ID,请跳至下一步。

    1. 在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击数据存储区

      前往“数据存储区”页面

    2. 点击您的数据存储区的名称。

    3. 在数据存储区的数据页面上,获取数据存储区 ID。

  2. 确定要按哪个/哪些文档字段进行过滤。例如,对于准备工作中的文档,您可以使用 categories 字段作为过滤条件。

    您只能在过滤条件表达式中使用可编制索引的字段。如需确定字段是否可编入索引,请执行以下操作:

    1. 在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击数据存储区

      前往“数据存储区”页面

    2. 点击您的数据存储区的名称。

    3. 名称列中,点击相应数据存储区。

    4. 点击 Schema 标签页可查看数据存储区的架构。如果字段的可编入索引设置为:

      • 选择了 ,然后该字段即可用于过滤搜索结果;请跳过第 3 步。

      • 未选择 ,然后按照第 3 步操作以启用该字段以进行编制索引。

      • 不可用 ,则该字段无法编入索引。

  3. 如需使某个字段(例如 categories 字段)可过滤,请执行以下操作:

    1. 在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击 Apps

      前往“应用”页面

    2. 点击您的媒体搜索应用。

    3. 在导航菜单中,点击数据

    4. 点击架构标签页。此标签页会显示当前的字段设置。

    5. 点击修改

    6. 如果尚未选中,请选中类别行中的可编入索引复选框,然后点击保存

    7. 请等待 6 小时,以便架构修改生效。六小时后,您可以继续执行下一步。

  4. 获取搜索结果。

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/servingConfigs/default_search:search" \
    -d '{
    "query": "QUERY",
    "filter": "FILTER"
    }'
    
    • PROJECT_ID:您的项目的 ID。
    • DATA_STORE_ID:数据存储区的 ID。
    • QUERY:要搜索的查询文本。
    • FILTER:一个文本字段,用于使用过滤表达式过滤搜索结果。

    例如,假设您想搜索开始前须知部分中的电影,并且希望搜索结果仅包含以下条件的电影:(1) 包含“Avatar”一词,(2) 属于“纪录片”类别。为此,您需要在调用中添加以下语句:

    "query": "avatar",
    "filter": "categories: ANY(\"Documentary\")"
    

    如需了解详情,请参阅 search 方法。

    点击查看示例回复。

    如果您执行与上文中相似的搜索,则应该会收到类似于以下内容的响应。请注意,响应中仅包含《阿凡达》纪录片。

    {
      "results": [
        {
          "id": "243308",
          "document": {
            "name": "projects/431678329718/locations/global/collections/default_collection/dataStores/rdds3_1698205785399/branches/0/documents/243308",
            "id": "243308",
            "structData": {
              "categories": [
                "Documentary"
              ],
              "title": "Capturing Avatar (2010)",
              "uri": "http://mytestdomain.movie/content/243308",
              "media_type": "movie"
            }
          }
        },
        {
          "id": "172851",
          "document": {
            "name": "projects/431678329718/locations/global/collections/default_collection/dataStores/rdds3_1698205785399/branches/0/documents/172851",
            "id": "172851",
            "structData": {
              "categories": [
                "Documentary"
              ],
              "uri": "http://mytestdomain.movie/content/172851",
              "media_type": "movie",
              "title": "Avatar: Creating the World of Pandora (2010)"
            }
          }
        }
      ],
      "totalSize": 2,
      "attributionToken": "XfBcCgwIvIzJqwYQ2_qNxwMSJDY1NzEzNmY1LTAwMDAtMmFhMy05YWU3LTE0MjIzYmIwOGVkMiIFTUVESUEqII6-nRXFy_MXnIaOIsLwnhXUsp0VpovvF6OAlyKiho4i",
      "guidedSearchResult": {},
      "summary": {}
    }