解析和分块文档

本页介绍了如何使用 Vertex AI Search 解析和分块文档。

您可以配置解析或分块设置,以便:

  • 指定 Vertex AI Search 解析内容的方式。您可以在将非结构化内容上传到 Vertex AI Search 时指定解析方式。Vertex AI Search 提供数字解析器、适用于 PDF 文件的光学字符识别 (OCR) 解析器和布局解析器。您还可以提供自己的解析文档。如果您需要从文档中提取丰富的内容和结构元素(例如章节、段落、表格、列表)以进行搜索和生成回答,建议使用布局解析器。

    请参阅通过解析改进内容检测

  • 将 Vertex AI Search 用于检索增强生成 (RAG)。 使用您上传到 Vertex AI Search 应用的相关数据来改进 LLM 的输出。为此,您需要开启文档分块功能,该功能会将您的数据编入索引作为分块,以提高相关性并降低 LLM 的计算负载。您还将开启布局解析器,该解析器会检测标题和列表等文档元素,以改进文档的分块方式。

    如需了解如何对 RAG 进行分块以及如何在搜索请求中返回分块,请参阅对 RAG 进行分块

解析文档

您可以通过以下方式控制内容解析:

  • 指定解析器类型。您可以根据文件类型指定要应用的解析类型:

    • 数字解析器。除非指定其他解析器类型,否则数字解析器默认适用于所有文件类型。如果未为数据存储区指定其他默认解析器,或者指定的解析器不支持提取的文档的文件类型,则数字解析器会处理提取的文档。
    • 对 PDF 文件进行 OCR 解析。如果您打算上传扫描的 PDF 或包含图片中文本的 PDF,可以开启 OCR 解析器以改进 PDF 索引编制。请参阅本文档的适用于 PDF 文件的 OCR 解析器部分。
    • 布局解析器。如果您打算将 Vertex AI Search 用于 RAG,请为 HTML、PDF 或 DOCX 文件开启布局解析器。如需了解此解析器以及如何启用它,请参阅 RAG 的区块文档
  • 自带已解析的文档。(预览版,需要许可名单)如果您已解析非结构化文档,则可以将预解析的内容导入 Vertex AI Search。请参阅自带已解析的文档

解析器可用性比较

下表按文档文件类型列出了每个解析器的可用性,并显示了每个解析器可以检测和解析哪些元素。

文件类型 数字解析器 OCR 解析器 布局解析器
HTML 检测段落元素 不适用 检测段落、表格、列表、标题和标头元素
PDF 检测段落(数字文本)元素 检测段落元素 检测段落、表格、标题和标头元素
DOCX(预览版 检测段落元素 不适用 检测段落、表格、列表、标题、标头元素
PPTX(预览版 检测段落元素 不适用 检测段落、表格、列表、标题、标头元素
TXT 检测段落元素 不适用 检测段落、表格、标题、标头元素
XLSX(预览版 检测段落元素 不适用 检测段落、表格、标题、标头元素

数字解析器

数字解析器可从文档中提取可供机器读取的文本。它可以检测文本块,但无法检测表格、列表和标题等文档元素。

如果您在创建数据存储区时未指定其他解析器作为默认解析器,或者指定的解析器不支持要上传的文件类型,则系统会将数字解析器用作默认解析器。

适用于 PDF 的 OCR 解析器

如果您有不可搜索的 PDF 文件(扫描的 PDF 文件或包含图片中文本的 PDF 文件,例如信息图),Google 建议您在创建数据存储区时开启光学字符识别 (OCR) 处理。这样,Vertex AI Search 便可提取段落元素。

如果您有可搜索的 PDF 或其他数字格式,并且这些格式主要由机器可读取的文本组成,则通常不需要使用 OCR 解析器。不过,如果您的 PDF 中同时包含不可搜索的文本(例如扫描文本或信息图)和机器可读文本,则可以在指定 OCR 解析器时将字段 useNativeText 设置为 true。在这种情况下,机器可读文本会与 OCR 解析输出合并,以提高文本提取质量。

OCR 处理功能适用于具有非结构化数据存储区的通用搜索应用。

OCR 处理器每个 PDF 文件最多可以解析 500 页。对于较长的 PDF 文件,OCR 处理器会解析前 500 页,默认解析器会解析其余页面。

布局解析器

借助布局解析,Vertex AI Search 可以检测 PDF 和 HTML 的布局。对 DOCX 文件的支持目前处于预览版阶段。然后,Vertex AI Search 可以识别文本块、表格、列表等内容元素,以及标题和标题等结构元素,并使用这些元素来定义文档的组织和层次结构。

您可以为所有文件类型开启布局解析,也可以指定要为哪些文件类型开启布局解析。布局解析器会检测段落、表格、列表等内容元素,以及标题、标题、标题、脚注等结构元素。

只有在为 RAG 使用文档分块时,布局解析器才可用。启用文档分块后,Vertex AI Search 会在提取时将文档拆分为分块,并可以将文档作为分块返回。检测文档布局可实现内容感知分块,并改进与文档元素相关的搜索和回答生成。如需详细了解如何为 RAG 分块处理文档,请参阅为 RAG 分块处理文档

布局解析器支持的 PDF 文件大小上限为 40 MB。

指定默认解析器

在创建数据存储区时添加 documentProcessingConfig 对象,您可以为该数据存储区指定默认解析器。如果您未添加 documentProcessingConfig.defaultParsingConfig,则系统会使用数字解析器。如果指定的解析器不适用于某种文件类型,系统也会使用数字解析器。

REST

如需指定默认解析器,请执行以下操作:

  1. 使用该 API 创建搜索数据存储区时,请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig。您可以指定 OCR 解析器、布局解析器或数字解析器:

    • 如需为 PDF 指定 OCR 解析器,请执行以下操作:

      "documentProcessingConfig": {
        "defaultParsingConfig": {
          "ocrParsingConfig": {
            "useNativeText": "NATIVE_TEXT_BOOLEAN"
          }
        }
      }
      
      • NATIVE_TEXT_BOOLEAN:可选。仅当您要提取 PDF 文件时才设置。如果设置为 true,则会为 OCR 解析器启用机器可读文本处理。默认值为 false
    • 如需指定布局解析器,请执行以下操作:

      "documentProcessingConfig": {
        "defaultParsingConfig": {
          "layoutParsingConfig": {}
        }
      }
      
    • 如需指定数字解析器,请执行以下操作:

       "documentProcessingConfig": {
          "defaultParsingConfig": { "digitalParsingConfig": {} }
       }
      

示例

以下示例在创建数据存储区时指定了 OCR 解析器将是默认解析器。由于 OCR 解析器仅适用于 PDF 文件,因此提取的所有 PDF 文件都将由 OCR 解析器处理,任何其他文件类型都将由数字解析器处理。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1alpha/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "defaultParsingConfig": {
      "ocrParsingConfig": {
        "useNativeText": "false"
      }
    }
  }
}'

为文件类型指定解析器替换项

您可以指定应使用与默认解析器不同的解析器解析特定文件类型(PDF、HTML 或 DOCX)。为此,请在数据存储区创建请求中添加 documentProcessingConfig 字段,并指定替换解析器。如果您未指定默认解析器,则数字解析器为默认解析器。

REST

如需指定特定于文件类型的解析器替换项,请执行以下操作:

  1. 使用该 API 创建搜索数据存储区时,请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig

    您可以为 pdfhtmldocx 指定解析器:

    "documentProcessingConfig": {
      "parsingConfigOverrides": {
        "FILE_TYPE": { PARSING_CONFIG },
      }
     }
    

    替换以下内容:

    • FILE_TYPE:接受的值包括 pdfhtmldocx
    • PARSING_CONFIG:指定要应用于文件类型的解析器配置。您可以指定 OCR 解析器、布局解析器或数字解析器:

      • 如需为 PDF 指定 OCR 解析器,请执行以下操作:

        "ocrParsingConfig": {
          "useNativeText": "NATIVE_TEXT_BOOLEAN"
        }
        
        • NATIVE_TEXT_BOOLEAN:可选。仅在提取 PDF 文件时设置。如果设置为 true,则会为 OCR 解析器启用机器可读文本处理。默认值为 false
      • 如需指定布局解析器,请执行以下操作:

        "layoutParsingConfig": {}
        
      • 如需指定数字解析器,请执行以下操作:

        "documentProcessingConfig": {
          "defaultParsingConfig": { "digitalParsingConfig": {} }
        }
        

示例

以下示例在创建数据存储区时指定,PDF 文件应由光学字符识别 (OCR) 解析器处理,HTML 文件应由布局解析器处理。在这种情况下,除了 PDF 和 HTML 文件之外的所有文件都将由数字解析器处理。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1alpha/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "parsingConfigOverrides": {
      "pdf": {
        "ocrParsingConfig": {
            "useNativeText": "false"
          },
      },
      "html": {
         "layoutParsingConfig": {}
      }
    }
  }
}'

以 JSON 格式获取已解析的文档

您可以通过调用 getProcessedDocument 方法并将 PARSED_DOCUMENT 指定为处理的文档类型,以 JSON 格式获取解析后的文档。如果您需要将解析后的文档上传到其他位置,或者决定使用自行提供解析后的文档功能将解析后的文档重新导入 Vertex AI Agent Builder,则获取 JSON 格式的解析后文档会很有帮助。

REST

如需以 JSON 格式获取解析后的文档,请按以下步骤操作:

  1. 调用 getProcessedDocument 方法:

    curl -X GET \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=PARSED_DOCUMENT"
    

    替换以下内容:

    • PROJECT_ID:您的项目的 ID。
    • DATA_STORE_ID:数据存储区的 ID。
    • DOCUMENT_ID:要获取的文档的 ID。

使用您自己的解析文档

您可以将预解析的非结构化文档导入到 Vertex AI Search 数据存储区。例如,您可以自行解析 PDF 文件,然后导入解析结果,而不是导入原始 PDF 文件。这样,您就可以以结构化方式导入文档,确保搜索和回答生成功能可以获取文档布局和元素的相关信息。

已解析的非结构化文档由 JSON 表示,该 JSON 使用一系列文本、表格和列表块来描述非结构化文档。导入包含已解析非结构化文档数据的 JSON 文件的方式与导入其他类型的非结构化文档(例如 PDF)的方式相同。启用此功能后,每当上传 JSON 文件并通过 application/json MIME 类型或 .JSON 扩展名进行识别时,系统都会将其视为已解析的文档。

如需开启此功能以及了解如何使用此功能,请与您的 Google 客户支持团队联系。

为 RAG 分块文档

默认情况下,Vertex AI Search 针对文档检索进行了优化,在这种情况下,您的搜索应用会在每个搜索结果中返回文档(例如 PDF 或网页)。

文档分块功能适用于使用非结构化数据存储区的通用搜索应用。

您可以改为针对 RAG 优化 Vertex AI Search,在这种情况下,您的搜索应用主要用于使用自定义数据增强 LLM 输出。启用文档分块后,Vertex AI Search 会将文档拆分为多个分块。在搜索结果中,您的搜索应用可以返回相关数据块,而不是完整文档。为 RAG 使用分块数据可提高 LLM 回答的相关性,并减少 LLM 的计算负载。

如需将 Vertex AI Search 用于 RAG,请执行以下操作:

  1. 创建数据存储区时,开启文档分块

    或者,如果您已将自己的文档分块,则可以上传自己的分块(使用许可名单进行预览)。

  2. 您可以通过以下方式检索和查看分块:

  3. 在搜索请求中返回分块

限制

分块存在以下限制:

  • 创建数据存储区后,便无法开启或关闭文档分块。
  • 在启用文档分块的情况下,您可以对数据存储区发出文档搜索请求,而不是分块搜索请求。不过,启用了文档分块的数据存储区未针对返回文档进行优化。系统会通过将分块汇总为文档来返回文档。
  • 启用文档分块后,公开预览版支持搜索摘要和带跟进的搜索,但正式版不支持。

文档分块选项

本部分介绍了您需要指定哪些选项才能启用文档分块。

在创建数据存储区时,请开启以下选项,以便 Vertex AI Search 可以将文档编入索引作为分块。

  • 布局感知型文档分块。如需启用此选项,请在数据存储区创建请求中添加 documentProcessingConfig 字段并指定 ChunkingConfig.LayoutBasedChunkingConfig

    启用布局感知型文档分块后,Vertex AI Search 会检测文档的布局,并在分块时将其考虑在内。这有助于提高语义一致性,并在内容用于检索和 LLM 生成时减少内容中的噪声。分块中的所有文本都来自同一布局实体,例如标题、副标题和列表。

  • 布局解析。如需启用此选项,请在创建数据存储区时指定 ParsingConfig.LayoutParsingConfig

    布局解析器可检测 PDF、HTML 和 DOCX 文件的布局。它会识别文本块、表格、列表、标题和标题等元素,并使用这些元素来定义文档的组织和层次结构。

    如需详细了解布局解析,请参阅布局解析

开启文档分块

您可以通过在数据存储区创建请求中添加 documentProcessingConfig 对象并启用布局感知型文档分块和布局解析来启用文档分块。

REST

如需开启文档分块功能,请执行以下操作:

  1. 使用该 API 创建搜索数据存储区时,请在数据存储区创建请求中添加 documentProcessingConfig.chunkingConfig 对象。

     "documentProcessingConfig": {
       "chunkingConfig": {
           "layoutBasedChunkingConfig": {
               "chunkSize": CHUNK_SIZE_LIMIT,
               "includeAncestorHeadings": HEADINGS_BOOLEAN,
           }
       },
       "defaultParsingConfig": {
         "layoutParsingConfig": {}
       }
     }
    

    替换以下内容:

    • CHUNK_SIZE_LIMIT:可选。每个分块的令牌大小限制。默认值为 500。支持的值为 100-500(包括这两个数值)。
    • HEADINGS_BOOLEAN:可选。确定每个分块中是否包含标题。默认值为 false。将标题和所有级别的标题附加到文档中间的分块有助于防止在分块检索和排名中丢失上下文。

自带分块(包含许可名单的预览版)

如果您已将自己的文档分块,则可以将其上传到 Vertex AI Search,而无需启用文档分块选项。

自带分块功能目前处于预览版阶段,且支持许可名单功能。如需使用此功能,请与您的 Google 客户支持团队联系。

列出文档的区块

如需列出特定文档的所有分块,请调用 Chunks.list 方法。

REST

如需列出文档的块,请按以下步骤操作:

  1. 调用 Chunks.list 方法

    curl -X GET \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks"
    

    替换以下内容:

    • PROJECT_ID:您的项目的 ID。
    • DATA_STORE_ID:数据存储区的 ID。
    • DOCUMENT_ID:要列出分块的文档的 ID。

从处理后的文档中获取 JSON 中的分块

您可以通过调用 getProcessedDocument 方法,以 JSON 格式获取特定文档中的所有分块。如果您需要将分块上传到其他位置,或者决定使用自行提供分块功能将分块重新导入 Vertex AI Agent Builder,则获取 JSON 格式的分块会很有帮助。

REST

如需获取文档的 JSON 分块,请按以下步骤操作:

  1. 调用 getProcessedDocument 方法:

    curl -X GET \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks:getProcessedDocument?processed_document_type=CHUNKED_DOCUMENT"
    

    替换以下内容:

    • PROJECT_ID:您的项目的 ID。
    • DATA_STORE_ID:数据存储区的 ID。
    • DOCUMENT_ID:要从中获取分块的文档的 ID。

获取特定分块

如需获取特定分块,请调用 Chunks.get 方法。

REST

如需获取特定分块,请按以下步骤操作:

  1. 调用 Chunks.get 方法

    curl -X GET \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks/CHUNK_ID"
    

    替换以下内容:

    • PROJECT_ID:您的项目的 ID。
    • DATA_STORE_ID:数据存储区的 ID。
    • DOCUMENT_ID:该分块所属文档的 ID。
    • CHUNK_ID:要返回的块的 ID。

确认数据已正确分块后,Vertex AI Search 便可在搜索结果中返回分块数据。

响应会返回与搜索查询相关的部分内容。此外,您还可以选择返回源文档中相应分块前后显示的相邻分块。相邻的区块可以添加上下文和准确性。

REST

如需获取分块数据,请执行以下操作:

  1. 发出搜索请求时,将 ContentSearchSpec.SearchResultMode 指定为 chunks

    contentSearchSpec": {
      "searchResultMode": "RESULT_MODE",
      "chunkSpec": {
           "numPreviousChunks": NUMBER_OF_PREVIOUS_CHUNKS,
           "numNextChunks": NUMBER_OF_NEXT_CHUNKS
       }
    }
    
    • RESULT_MODE:确定搜索结果是作为完整文档还是分块返回。如需获取分块,数据存储区必须开启文档分块。可接受的值为 documentschunks。如果为数据存储区启用了文档分块,默认值为 chunks
    • NUMBER_OF_PREVIOUS_CHUNKS:要返回的紧随相关分块之前的分块数量。允许的最大值为 5。
    • NUMBER_OF_NEXT_CHUNKS:要返回的紧随相关分块之后的分块数量。允许的最大值为 5。

示例

以下搜索查询请求示例将 SearchResultMode 设置为 chunks,请求一个前一分块和一个后一分块,并使用 pageSize 将结果数量限制为一个相关分块。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1alpha/projects/exampleproject/locations/global/collections/default_collection/dataStores/datastore123/servingConfigs/default_search:search" \
-d '{
  "query": "animal",
  "pageSize": 1,
  "contentSearchSpec": {
    "searchResultMode": "CHUNKS",
    "chunkSpec": {
           "numPreviousChunks": 1,
           "numNextChunks": 1
       }
  }
}'

以下示例展示了针对示例查询返回的响应。 响应包含相关的块、前一个和下一个块、原始文档的元数据,以及每个块派生自的文档页面范围。

响应

{
  "results": [
    {
      "chunk": {
        "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c17",
        "id": "c17",
        "content": "\n# ESS10: Stakeholder Engagement and Information Disclosure\nReaders should also refer to ESS10 and its guidance notes, plus the template available for a stakeholder engagement plan. More detail on stakeholder engagement in projects with risks related to animal health is contained in section 4 below. The type of stakeholders (men and women) that can be engaged by the Borrower as part of the project's environmental and social assessment and project design and implementation are diverse and vary based on the type of intervention. The stakeholders can include: Pastoralists, farmers, herders, women's groups, women farmers, community members, fishermen, youths, etc. Cooperatives members, farmer groups, women's livestock associations, water user associations, community councils, slaughterhouse workers, traders, etc. Veterinarians, para-veterinary professionals, animal health workers, community animal health workers, faculties and students in veterinary colleges, etc. 8 \n# 4. Good Practice in Animal Health Risk Assessment and Management\n\n# Approach\nRisk assessment provides the transparent, adequate and objective evaluation needed by interested parties to make decisions on health-related risks associated with project activities involving live animals. As the ESF requires, it is conducted throughout the project cycle, to provide or indicate likelihood and impact of a given hazard, identify factors that shape the risk, and find proportionate and appropriate management options. The level of risk may be reduced by mitigation measures, such as infrastructure (e.g., diagnostic laboratories, border control posts, quarantine stations), codes of practice (e.g., good animal husbandry practices, on-farm biosecurity, quarantine, vaccination), policies and regulations (e.g., rules for importing live animals, ban on growth hormones and promotors, feed standards, distance required between farms, vaccination), institutional capacity (e.g., veterinary services, surveillance and monitoring), changes in individual behavior (e.g., hygiene, hand washing, care for animals). Annex 2 provides examples of mitigation practices. This list is not an exhaustive one but a compendium of most practiced interventions and activities. The cited measures should take into account social, economic, as well as cultural, gender and occupational aspects, and other factors that may affect the acceptability of mitigation practices by project beneficiaries and other stakeholders. Risk assessment is reviewed and updated through the project cycle (for example to take into account increased trade and travel connectivity between rural and urban settings and how this may affect risks of disease occurrence and/or outbreak). Projects monitor changes in risks (likelihood and impact) b               by using data, triggers or indicators. ",
        "documentMetadata": {
          "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
          "title": "AnimalHealthGoodPracticeNote"
        },
        "pageSpan": {
          "pageStart": 14,
          "pageEnd": 15
        },
        "chunkMetadata": {
          "previousChunks": [
            {
              "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c16",
              "id": "c16",
              "content": "\n# ESS6: Biodiversity Conservation and Sustainable Management of Living Natural Resources\nThe risks associated with livestock interventions under ESS6 include animal welfare (in relation to housing, transport, and slaughter); diffusion of pathogens from domestic animals to wildlife, with risks for endemic species and biodiversity (e.g., sheep and goat plague in Mongolia affecting the saiga, an endemic species of wild antelope); the introduction of new breeds with potential risk of introducing exotic or new diseases; and the release of new species that are not endemic with competitive advantage, potentially putting endemic species at risk of extinction. Animal welfare relates to how an animal is coping with the conditions in which it lives. An animal is in a good state of welfare if it is healthy, comfortable, well nourished, safe, able to express innate behavior, 7 Good Practice Note - Animal Health and related risks and is not suffering from unpleasant states such as pain, fear or distress. Good animal welfare requires appropriate animal care, disease prevention and veterinary treatment; appropriate shelter, management and nutrition; humane handling, slaughter or culling. The OIE provides standards for animal welfare on farms, during transport and at the time of slaughter, for their welfare and for purposes of disease control, in its Terrestrial and Aquatic Codes. The 2014 IFC Good Practice Note: Improving Animal Welfare in Livestock Operations is another example of practical guidance provided to development practitioners for implementation in investments and operations. Pastoralists rely heavily on livestock as a source of food, income and social status. Emergency projects to restock the herds of pastoralists affected by drought, disease or other natural disaster should pay particular attention to animal welfare (in terms of transport, access to water, feed, and animal health) to avoid potential disease transmission and ensure humane treatment of animals. Restocking also entails assessing the assets of pastoralists and their ability to maintain livestock in good conditions (access to pasture and water, social relationship, technical knowledge, etc.). Pastoralist communities also need to be engaged by the project to determine the type of animals and breed and the minimum herd size to be considered for restocking. \n# Box 5. Safeguarding the welfare of animals and related risks in project activities\nIn Haiti, the RESEPAG project (Relaunching Agriculture: Strengthening Agriculture Public Services) financed housing for goats and provided technical recommendations for improving their welfare, which is critical to avoid the respiratory infections, including pneumonia, that are serious diseases for goats. To prevent these diseases, requires optimal sanitation and air quality in herd housing. This involves ensuring that buildings have adequate ventilation and dust levels are reduced to minimize the opportunity for infection. Good nutrition, water and minerals are also needed to support the goats' immune function. The project paid particular attention to: (i) housing design to ensure good ventilation; (ii) locating housing close to water sources and away from human habitation and noisy areas; (iii) providing mineral blocks for micronutrients; (iv) ensuring availability of drinking water and clean food troughs. ",
              "documentMetadata": {
                "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
                "title": "AnimalHealthGoodPracticeNote"
              },
              "pageSpan": {
                "pageStart": 13,
                "pageEnd": 14
              }
            }
          ],
          "nextChunks": [
            {
              "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c18",
              "id": "c18",
              "content": "\n# Scoping of risks\nEarly scoping of risks related to animal health informs decisions to initiate more comprehensive risk assessment according to the type of livestock interventions and activities. It can be based on the following considerations: • • • • Type of livestock interventions supported by the project (such as expansion of feed resources, improvement of animal genetics, construction/upgrading and management of post-farm-gate facilities, etc. – see also Annex 2); Geographic scope and scale of the livestock interventions; Human and animal populations that are likely to be affected (farmers, women, children, domestic animals, wildlife, etc.); and Changes in the project or project context (such as emerging disease outbreak, extreme weather or climatic conditions) that would require a re-assessment of risk levels, mitigation measures and their likely effect on risk reduction. Scenario planning can also help to identify project-specific vulnerabilities, country-wide or locally, and help shape pragmatic analyses that address single or multiple hazards. In this process, some populations may be identified as having disproportionate exposure or vulnerability to certain risks because of occupation, gender, age, cultural or religious affiliation, socio-economic or health status. For example, women and children may be the main caretakers of livestock in the case of 9 Good Practice Note - Animal Health and related risks household farming, which puts them into close contact with animals and animal products. In farms and slaughterhouses, workers and veterinarians are particularly exposed, as they may be in direct contact with sick animals (see Box 2 for an illustration). Fragility, conflict, and violence (FCV) can exacerbate risk, in terms of likelihood and impact. Migrants new to a geographic area may be immunologically naïve to endemic zoonotic diseases or they may inadvertently introduce exotic diseases; and refugees or internally displaced populations may have high population density with limited infrastructure, leaving them vulnerable to disease exposure. Factors such as lack of access to sanitation, hygiene, housing, and health and veterinary services may also affect disease prevalence, contributing to perpetuation of poverty in some populations. Risk assessment should identify populations at risk and prioritize vulnerable populations and circumstances where risks may be increased. It should be noted that activities that seem minor can still have major consequences. See Box 6 for an example illustrating how such small interventions in a project may have large-scale consequences. It highlights the need for risk assessment, even for simple livestock interventions and activities, and how this can help during the project cycle (from concept to implementation). ",
              "documentMetadata": {
                "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
                "title": "AnimalHealthGoodPracticeNote"
              },
              "pageSpan": {
                "pageStart": 15,
                "pageEnd": 16
              }
            }
          ]
        }
      }
    }
  ],
  "totalSize": 61,
  "attributionToken": "jwHwjgoMCICPjbAGEISp2J0BEiQ2NjAzMmZhYS0wMDAwLTJjYzEtYWQxYS1hYzNlYjE0Mzc2MTQiB0dFTkVSSUMqUMLwnhXb7Ygtq8SKLa3Eii3d7Ygtj_enIqOAlyLm7Ygtt7eMLduPmiKN96cijr6dFcXL8xfdj5oi9-yILdSynRWCspoi-eyILYCymiLk7Ygt",
  "nextPageToken": "ANxYzNzQTMiV2MjFWLhFDZh1SMjNmMtADMwATL5EmZyMDM2YDJaMQv3yagQYAsciPgIwgExEgC",
  "guidedSearchResult": {},
  "summary": {}
}

后续步骤