解析文档并分块

Gemini Enterprise 的默认解析器是布局解析器。它通常是最佳选择，因为它能够检测并理解文档层次结构，这有助于更好地进行分块，并最终实现更好的回答生成和检索。不过，在某些情况下，您可能需要更改默认解析器，例如当您有大量扫描数据要上传到数据存储区时。本页面介绍了如何更改解析器。

在将非结构化内容上传到 Gemini Enterprise 时，您可以指定如何解析它。Gemini Enterprise 提供布局解析器（默认，推荐）、数字解析器和用于 PDF 的 OCR 解析器。您还可以自备已解析的文档。

请参阅通过解析改进内容检测。

解析文档

您可以通过以下方式控制内容解析：

指定解析器类型。您可以根据文件类型指定要应用的解析类型：
- 布局解析器。布局解析器是默认解析器。如需了解此解析器，请参阅为 RAG 进行文档分块。
- 用于 PDF 的 OCR 解析。如果您打算上传扫描的 PDF 或包含图片内文本的 PDF，则可以开启 OCR 解析器来改进 PDF 索引。请参阅本文档的用于 PDF 的 OCR 解析器部分。
- 数字解析器。数字解析器是一种基本解析器，用于从文档中提取机器可读文本。如需了解详情，请参阅数字解析器。
自备已解析的文档。（预览版功能，需要加入许可名单才能使用）如果您已解析非结构化文档，可以将该预解析内容导入到 Gemini Enterprise。请参阅自备已解析的文档。

解析器可用性比较

下表列出了每种解析器可处理的文档文件类型，并显示了每种解析器可以检测和解析的元素。

文件类型	数字解析器	OCR 解析器	布局解析器
HTML	检测段落元素	不适用	检测段落、表、图片、列表、标题和标头元素
PDF	检测段落（数字文本）元素	检测段落元素	检测段落、表、标题、图片和标头元素
DOCX	检测段落元素	不适用	检测段落、表、图片、列表、标题、标头元素
PPTX	检测段落元素	不适用	检测段落、表、图片、列表、标题、标头元素
TXT	检测段落元素	不适用	不适用
XLSX	检测段落元素	不适用	检测段落、表、标题、标头元素
XLSM	检测段落元素	不适用	检测段落、表、标题、标头元素

适用于 PDF 的 OCR 解析器

如果您有不可搜索的 PDF（扫描的 PDF 或包含图片内文本的 PDF，例如信息图），Google 建议您在创建数据存储区期间开启光学字符识别 (OCR) 处理。这样，Gemini Enterprise 便可提取段落元素。

如果您有可搜索的 PDF 或其他数字格式，且主要由机器可读文本组成，您通常不需要使用 OCR 解析器。不过，如果您的 PDF 既有不可搜索的文本（例如扫描文本或信息图），又有机器可读文本，则在指定 OCR 解析器时，您可以将 useNativeText 字段设置为 true。在这种情况下，机器可读文本会与 OCR 解析输出合并，以提高文本提取质量。

OCR 处理功能适用于具有非结构化数据存储区的自定义搜索应用。

OCR 处理器可以解析 PDF 文件的前 500 页。超出 500 页限制的页面将不被处理。

布局解析器

这是默认解析器。借助此解析器，Gemini Enterprise 可以检测 PDF、HTML、DOCX、PPTX、XLSX 和 XLSM 的布局。Gemini Enterprise 随后可以识别文本块、表、列表等内容元素以及标题和标头等结构元素，并使用它们来定义文档的组织和层次结构。

布局解析器可以检测诸如段落、表、列表等内容元素以及标题、标头、页眉、脚注等结构元素。

Gemini Enterprise 会在提取时将文档拆分为多个块，并能够以块的形式返回文档。检测文档布局有助于实现内容感知分块，并改进与文档元素相关的搜索和回答生成。如需详细了解如何为 RAG 进行文档分块，请参阅为 RAG 进行文档分块。

图片注解（预览版）

通过图片注解，当在源文档中检测到图片时，系统会将该图片的说明（注解）和图片本身分配给一个块。注解用于确定是否应在搜索结果中返回相应块。如果生成了回答，则注解可以作为回答的来源。

布局解析器可以检测以下图片类型：BMP、GIF、JPEG、PNG 和 TIFF。

表注解（预览版）

通过表注解，当在源文档中检测到表时，系统会将表的说明（注解）和表本身分配给一个块。注解用于确定是否应在搜索结果中返回相应块。如果生成了回答，则注解可以作为回答的来源。

排除 HTML 内容

当对 HTML 文档使用布局解析器时，您可以排除 HTML 内容的特定部分，使其不被处理。为了提高搜索应用和 RAG 应用的数据质量，您可以排除样板内容或诸如导航菜单、页眉、页脚或边栏等部分。

layoutParsingConfig 为此目的提供了以下字段：

excludeHtmlElements：要排除的 HTML 标记列表。这些标记中的内容会被排除。
excludeHtmlClasses：要排除的 HTML 类属性的列表。包含这些类属性的 HTML 元素及其内容会被排除。
excludeHtmlIds：要排除的 HTML 元素 ID 属性的列表。具有这些 ID 属性的 HTML 元素及其内容会被排除。

数字解析器

数字解析器用于从文档中提取机器可读文本。它可以检测文本块，但无法检测表、列表和标头等文档元素。

如果指定的解析器不支持正在上传的文件类型，则会使用数字解析器。

指定默认解析器

在创建数据存储区时，通过添加 documentProcessingConfig 对象，可以为该数据存储区指定默认解析器。如果不添加 documentProcessingConfig.defaultParsingConfig，则会使用数字解析器。如果指定解析器不适用于相应文件类型，系统也会使用数字解析器。

REST

如要指定默认解析器：

在使用 API 创建数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig。您可以指定 OCR 解析器、布局解析器或数字解析器：
- 如要指定用于 PDF 的 OCR 解析器：
```
"documentProcessingConfig": {
  "defaultParsingConfig": {
    "ocrParsingConfig": {
      "useNativeText": "NATIVE_TEXT_BOOLEAN"
    }
  }
}
```
  - NATIVE_TEXT_BOOLEAN 为可选项。仅在提取 PDF 时设置。如果设置为 true，则会为 OCR 解析器启用机器可读文本处理功能。默认值为 false。
- 如要指定布局解析器：
```
"documentProcessingConfig": {
  "defaultParsingConfig": {
    "layoutParsingConfig": {}
  }
}
```
- 如要指定数字解析器：
  
  注意：通常不需要将数字解析器指定为 defaultParsingConfig。如果未明确指定其他解析器，系统会默认使用数字解析器。
```
 "documentProcessingConfig": {
    "defaultParsingConfig": { "digitalParsingConfig": {} }
 }
```

控制台

通过控制台创建搜索数据存储区时，您可以指定默认解析器。

示例

以下示例在创建数据存储区期间将 OCR 解析器指定为默认解析器。由于 OCR 解析器仅适用于 PDF 文件，因此所有提取的 PDF 文件都将由 OCR 解析器处理，而任何其他文件类型都将由数字解析器处理。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "defaultParsingConfig": {
      "ocrParsingConfig": {
        "useNativeText": "false"
      }
    }
  }
}'

为文件类型指定解析器替换项

您可以指定特定文件类型应由不同于默认解析器的解析器进行解析。为此，请在数据存储区创建请求中添加 documentProcessingConfig 字段，并指定替换解析器。如果您未指定默认解析器，则数字解析器将是默认解析器。

REST

如要指定特定于文件类型的解析器替换项：

在使用 API 创建数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig。

您可以为给定的文件类型指定解析器：
```
"documentProcessingConfig": {
  "parsingConfigOverrides": {
    "FILE_TYPE": { PARSING_CONFIG },
  }
 }
```
替换以下内容：
- FILE_TYPE：接受的值包括 pdf、html、docx、pptx、xlsm 和 xlsx。
- PARSING_CONFIG：指定您想要应用于相应文件类型的解析器的配置。您可以指定 OCR 解析器、布局解析器或数字解析器：
  - 如要指定用于 PDF 的 OCR 解析器：
```
"ocrParsingConfig": {
  "useNativeText": "NATIVE_TEXT_BOOLEAN"
}
```
    - NATIVE_TEXT_BOOLEAN：可选。仅在提取 PDF 时设置。如果设置为 true，则会为 OCR 解析器启用机器可读文本处理功能。默认值为 false。
  - 如要指定布局解析器：
```
"layoutParsingConfig": {}
```
  - 如要指定数字解析器：
```
"documentProcessingConfig": {
  "defaultParsingConfig": { "digitalParsingConfig": {} }
}
```

控制台

通过控制台创建搜索数据存储区时，您可以为特定文件类型指定解析器替换项。

示例

以下示例在创建数据存储区期间指定，PDF 文件应由 OCR 解析器处理，HTML 文件应由布局解析器处理。在这种情况下，除 PDF 和 HTML 文件之外的所有文件都将由数字解析器处理。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
  "documentProcessingConfig": {
    "parsingConfigOverrides": {
      "pdf": {
        "ocrParsingConfig": {
            "useNativeText": "false"
          },
      },
      "html": {
         "layoutParsingConfig": {}
      }
    }
  }
}'

修改现有数据存储区的文档解析设置

如果您已有数据存储区，则可以更改默认解析器并添加文件格式例外项。不过，更新后的解析器设置仅适用于新导入到数据存储区的文档。数据存储区中已有的文档不会使用新设置重新解析。

如需更改数据存储区的文档解析设置，请执行以下操作：

在 Google Cloud 控制台中，前往 Gemini Enterprise 页面。

Gemini Enterprise
在导航菜单中，点击数据存储区。
在名称列中，点击要修改的数据存储区。
在处理配置标签页上，修改文档解析设置。

文档分块设置无法更改。如果数据存储区未启用文档分块，您将无法选择布局解析器。
点击提交。

配置布局解析器以排除 HTML 内容

您可以在 documentProcessingConfig.defaultParsingConfig.layoutParsingConfig 中指定 excludeHtmlElements、excludeHtmlClasses 或 excludeHtmlIds，以将布局解析器配置为排除 HTML 内容。

REST

如需排除某些 HTML 内容，使其不被布局解析器处理，请按以下步骤操作：

在使用 API 创建搜索数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.defaultParsingConfig.layoutParsingConfig。

如需排除特定 HTML 标记类型，请使用：

"documentProcessingConfig": {
  "defaultParsingConfig": {
   "layoutParsingConfig": {
    "excludeHtmlElements": ["HTML_TAG_1","HTML_TAG_2","HTML_TAG_N"]
   }
  }
 }

将 HTML_TAG 变量替换为标记名称，例如 nav 和 footer。

如需排除特定 HTML 元素类属性，请使用：

"documentProcessingConfig": {
  "defaultParsingConfig": {
   "layoutParsingConfig": {
    "excludeHtmlClasses": ["HTML_CLASS_1","HTML_CLASS_2","HTML_CLASS_N"]
   }
  }
 }

将 HTML_CLASS 变量替换为类属性，例如 overlay 和 screenreader。

如需排除特定 HTML 元素 ID 属性，请使用：

"documentProcessingConfig": {
  "defaultParsingConfig": {
   "layoutParsingConfig": {
    "excludeHtmlIds": ["HTML_ID_1","HTML_ID_2","HTML_ID_N"]
   }
  }
 }

将 HTML_ID 变量替换为 ID 属性，例如 cookie-banner。

示例

此示例指定，当 HTML 文件由布局解析器处理时，解析器将跳过以下内容：

HTML 元素标记 header、footer、nav 和 aside
类型为 overlays 和 screenreader 的 HTML 元素类属性
属性 ID 为 cookie-banner 的任何元素

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores?dataStoreId=datastore123&createAdvancedSiteSearch=true" \
-d '{
  "displayName": "exampledatastore",
  "industryVertical": "GENERIC",
  "contentConfig": "PUBLIC_WEBSITE",
  "documentProcessingConfig": {
    "chunkingConfig": {
      "layoutBasedChunkingConfig": {}
    },
    "defaultParsingConfig": {
      "layoutParsingConfig": {
       "excludeHtmlElements": ["header", "footer", "nav", "aside"],
       "excludeHtmlClasses": ["overlays", "screenreader"],
       "excludeHtmlIds": ["cookie-banner"]
      }
    }
  }
}'

以 JSON 格式获取经过解析的文档

您可以通过调用 getProcessedDocument 方法并指定 PARSED_DOCUMENT 作为处理后的文档类型，以 JSON 格式获取已解析的文档。如果您需要将已解析的文档上传到其他位置，或者打算使用自备已解析的文档功能将已解析的文档重新导入到 Gemini Enterprise，那么以 JSON 格式获取已解析的文档会很有帮助。

REST

如需以 JSON 格式获取已解析的文档，请按以下步骤操作：

调用 getProcessedDocument 方法：

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=PARSED_DOCUMENT"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：您的数据存储区的 ID。
DOCUMENT_ID：要获取的文档的 ID。

自备已解析的文档

您可以将预解析的非结构化文档导入到 Gemini Enterprise 数据存储区。例如，您可以自行解析 PDF，然后导入解析结果，而不是导入原始 PDF 文档。这样一来，您就可以以结构化的方式导入文档，确保搜索和回答生成功能能够获取有关文档布局和元素的信息。

已解析的非结构化文档由 JSON 表示，该 JSON 使用一系列文本、表和列表块来描述非结构化文档。您导入包含您的已解析非结构化文档数据的 JSON 文件的方式，与导入其他类型非结构化文档（例如 PDF）的方式相同。启用此功能后，任何以 application/json MIME 类型或 .JSON 扩展名上传并识别的 JSON 文件，都将被视为已解析文档。

如需开启此功能以及获取有关如何使用的信息，请与您的 Google 客户支持团队联系。

为 RAG 进行文档分块

默认情况下，Gemini Enterprise 会针对文档检索进行优化，您的搜索应用会在每个搜索结果中返回一个文档（例如 PDF 或网页）。

文档分块功能适用于具有非结构化数据存储区的自定义搜索应用。

不过，Gemini Enterprise 可以针对 RAG 进行优化，在这种情况下，您的搜索应用主要用于使用自定义数据增强 LLM 输出。启用文档分块功能后，Gemini Enterprise 会将文档拆分为多个块。在搜索结果中，搜索应用可以返回相关的数据块，而不是完整的文档。将分块数据用于 RAG 可以提高 LLM 回答的相关性，并减少 LLM 的计算负荷。

如要将 Gemini Enterprise 用于 RAG：

在创建数据存储区时，开启文档分块。或者，如果您已将自己的文档分块，也可以上传自己的块（预览版，仅限已加入许可名单的用户）。
您可以通过以下方式检索和查看块：
在搜索请求中返回块。

限制

分块功能存在以下限制：

创建数据存储区后，便无法再开启或关闭文档分块功能。
您可以对已启用文档分块的数据存储区发出针对文档而非块的搜索请求。不过，已启用文档分块的数据存储区并未针对返回文档进行优化。文档是通过将块聚合为文档来返回的。
启用文档分块后，搜索摘要和跟进搜索在公开预览版中受支持，但在正式版中不受支持。

文档分块选项

本部分介绍了您可指定用来启用文档分块的选项。

在创建数据存储区期间，请开启以下选项，以便 Gemini Enterprise 可以将您的文档作为块进行索引编制。

布局感知型文档分块。如需启用此选项，请在数据存储区创建请求中添加 documentProcessingConfig 字段并指定 ChunkingConfig.LayoutBasedChunkingConfig。

启用“布局感知型文档分块”功能后，Gemini Enterprise 会检测文档的布局，并在分块期间将布局纳入考虑范围。这在用于检索和 LLM 生成时，可以提高内容的语义连贯性并减少噪声。一个块中的所有文本都来自同一布局实体，例如标题、子标题和列表。
布局解析。如需启用此选项，请在创建数据存储区时指定 ParsingConfig.LayoutParsingConfig。

布局解析器可检测 PDF、HTML、DOCX、PPTX、XLSX 和 XLSM 文件的布局。它会识别文本块、表、列表、标题和标头等元素，并使用它们来定义文档的组织和层次结构。

如需详细了解布局解析，请参阅布局解析。

启用文档分块

您可以通过在数据存储区创建请求中添加 documentProcessingConfig 对象，并开启布局感知型文档分块和布局解析，来启用文档分块。

REST

如要启用文档分块：

在使用 API 创建搜索数据存储区时，请在数据存储区创建请求中添加 documentProcessingConfig.chunkingConfig 对象。
```
 "documentProcessingConfig": {
   "chunkingConfig": {
       "layoutBasedChunkingConfig": {
           "chunkSize": CHUNK_SIZE_LIMIT,
           "includeAncestorHeadings": HEADINGS_BOOLEAN,
       }
   },
   "defaultParsingConfig": {
     "layoutParsingConfig": {}
   }
 }
```
替换以下内容：
- CHUNK_SIZE_LIMIT：可选。每个块的 token 大小限制。默认值为 500。支持的值为 100-500（含）。
- HEADINGS_BOOLEAN：可选。确定每个块中是否包含标头。默认值为 false。将文档中间部分的标题和各级标头附加到块上，有助于防止在块检索和排名时发生上下文丢失。

控制台

在通过控制台创建搜索数据存储区时，您可以启用文档分块。

自备块（预览版功能，需要加入许可名单才能使用）

如果您已将自己的文档分块，则可以将其上传到 Gemini Enterprise，而无需启用文档分块选项。

自备块是一项预览版功能，需要加入许可名单才能使用。如需使用此功能，请与您的 Google 客户支持团队联系。

列出文档的块

如需列出特定文档的所有块，请调用 Chunks.list 方法。

REST

如需列出文档的块，请按以下步骤操作：

调用 Chunks.list 方法

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：您的数据存储区的 ID。
DOCUMENT_ID：要列出其中块的文档的 ID。

从处理后的文档中获取 JSON 格式的块

您可以通过调用 getProcessedDocument 方法，以 JSON 格式获取特定文档中的所有块。如果您需要将块上传到其他位置，或者打算使用自备块功能将块重新导入到 Gemini Enterprise，那么以 JSON 格式获取块会很有帮助。

REST

如需获取文档的 JSON 块，请按以下步骤操作：

调用 getProcessedDocument 方法：

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID:getProcessedDocument?processed_document_type=CHUNKED_DOCUMENT"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：您的数据存储区的 ID。
DOCUMENT_ID：要从中获取块的文档的 ID。

获取特定块

如需获取特定块，请调用 Chunks.get 方法。

REST

如需获取特定块，请按以下步骤操作：

调用 Chunks.get 方法

curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID/chunks/CHUNK_ID"

替换以下内容：

PROJECT_ID：您的项目的 ID。
DATA_STORE_ID：您的数据存储区的 ID。
DOCUMENT_ID：块所来自的文档的 ID。
CHUNK_ID：要返回的块的 ID。

在搜索请求中返回块

确认数据已正确分块后，Gemini Enterprise 可以在搜索结果中返回分块数据。

响应会返回与搜索查询相关的块。此外，您可以选择返回源文档中出现在相关块之前和之后的相邻块。相邻块可以提供上下文信息并提高准确性。

REST

如要获取分块数据：

在发出搜索请求时，请将 ContentSearchSpec.SearchResultMode 指定为 chunks。
```
contentSearchSpec": {
  "searchResultMode": "RESULT_MODE",
  "chunkSpec": {
       "numPreviousChunks": NUMBER_OF_PREVIOUS_CHUNKS,
       "numNextChunks": NUMBER_OF_NEXT_CHUNKS
   }
}
```
- RESULT_MODE：确定搜索结果是以完整文档还是以块的形式返回。如要获取块，数据存储区必须启用文档分块。接受的值包括 documents 和 chunks。如果为数据存储区启用了文档分块，则默认值为 chunks。
- NUMBER_OF_PREVIOUS_CHUNKS：紧接在相关块之前返回的块的数量。最大允许值为 5。
- NUMBER_OF_NEXT_CHUNKS：紧接在相关块之后返回的块的数量。最大允许值为 5。

示例

以下搜索查询请求示例将 SearchResultMode 设置为 chunks，请求一个上一个块和一个下一个块，并使用 pageSize 将结果数量限制为单个相关块。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: exampleproject" \
"https://discoveryengine.googleapis.com/v1/projects/exampleproject/locations/global/collections/default_collection/dataStores/datastore123/servingConfigs/default_search:search" \
-d '{
  "query": "animal",
  "pageSize": 1,
  "contentSearchSpec": {
    "searchResultMode": "CHUNKS",
    "chunkSpec": {
           "numPreviousChunks": 1,
           "numNextChunks": 1
       }
  }
}'

以下示例展示了针对示例查询返回的响应。响应包含相关块、上一个块和下一个块、原始文档的元数据，以及每个块派生自的文档页面范围。

响应

{
  "results": [
    {
      "chunk": {
        "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c17",
        "id": "c17",
        "content": "\n# ESS10: Stakeholder Engagement and Information Disclosure\nReaders should also refer to ESS10 and its guidance notes, plus the template available for a stakeholder engagement plan. More detail on stakeholder engagement in projects with risks related to animal health is contained in section 4 below. The type of stakeholders (men and women) that can be engaged by the Borrower as part of the project's environmental and social assessment and project design and implementation are diverse and vary based on the type of intervention. The stakeholders can include: Pastoralists, farmers, herders, women's groups, women farmers, community members, fishermen, youths, etc. Cooperatives members, farmer groups, women's livestock associations, water user associations, community councils, slaughterhouse workers, traders, etc. Veterinarians, para-veterinary professionals, animal health workers, community animal health workers, faculties and students in veterinary colleges, etc. 8 \n# 4. Good Practice in Animal Health Risk Assessment and Management\n\n# Approach\nRisk assessment provides the transparent, adequate and objective evaluation needed by interested parties to make decisions on health-related risks associated with project activities involving live animals. As the ESF requires, it is conducted throughout the project cycle, to provide or indicate likelihood and impact of a given hazard, identify factors that shape the risk, and find proportionate and appropriate management options. The level of risk may be reduced by mitigation measures, such as infrastructure (e.g., diagnostic laboratories, border control posts, quarantine stations), codes of practice (e.g., good animal husbandry practices, on-farm biosecurity, quarantine, vaccination), policies and regulations (e.g., rules for importing live animals, ban on growth hormones and promotors, feed standards, distance required between farms, vaccination), institutional capacity (e.g., veterinary services, surveillance and monitoring), changes in individual behavior (e.g., hygiene, hand washing, care for animals). Annex 2 provides examples of mitigation practices. This list is not an exhaustive one but a compendium of most practiced interventions and activities. The cited measures should take into account social, economic, as well as cultural, gender and occupational aspects, and other factors that may affect the acceptability of mitigation practices by project beneficiaries and other stakeholders. Risk assessment is reviewed and updated through the project cycle (for example to take into account increased trade and travel connectivity between rural and urban settings and how this may affect risks of disease occurrence and/or outbreak). Projects monitor changes in risks (likelihood and impact) b               by using data, triggers or indicators. ",
        "documentMetadata": {
          "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
          "title": "AnimalHealthGoodPracticeNote"
        },
        "pageSpan": {
          "pageStart": 14,
          "pageEnd": 15
        },
        "chunkMetadata": {
          "previousChunks": [
            {
              "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c16",
              "id": "c16",
              "content": "\n# ESS6: Biodiversity Conservation and Sustainable Management of Living Natural Resources\nThe risks associated with livestock interventions under ESS6 include animal welfare (in relation to housing, transport, and slaughter); diffusion of pathogens from domestic animals to wildlife, with risks for endemic species and biodiversity (e.g., sheep and goat plague in Mongolia affecting the saiga, an endemic species of wild antelope); the introduction of new breeds with potential risk of introducing exotic or new diseases; and the release of new species that are not endemic with competitive advantage, potentially putting endemic species at risk of extinction. Animal welfare relates to how an animal is coping with the conditions in which it lives. An animal is in a good state of welfare if it is healthy, comfortable, well nourished, safe, able to express innate behavior, 7 Good Practice Note - Animal Health and related risks and is not suffering from unpleasant states such as pain, fear or distress. Good animal welfare requires appropriate animal care, disease prevention and veterinary treatment; appropriate shelter, management and nutrition; humane handling, slaughter or culling. The OIE provides standards for animal welfare on farms, during transport and at the time of slaughter, for their welfare and for purposes of disease control, in its Terrestrial and Aquatic Codes. The 2014 IFC Good Practice Note: Improving Animal Welfare in Livestock Operations is another example of practical guidance provided to development practitioners for implementation in investments and operations. Pastoralists rely heavily on livestock as a source of food, income and social status. Emergency projects to restock the herds of pastoralists affected by drought, disease or other natural disaster should pay particular attention to animal welfare (in terms of transport, access to water, feed, and animal health) to avoid potential disease transmission and ensure humane treatment of animals. Restocking also entails assessing the assets of pastoralists and their ability to maintain livestock in good conditions (access to pasture and water, social relationship, technical knowledge, etc.). Pastoralist communities also need to be engaged by the project to determine the type of animals and breed and the minimum herd size to be considered for restocking. \n# Box 5. Safeguarding the welfare of animals and related risks in project activities\nIn Haiti, the RESEPAG project (Relaunching Agriculture: Strengthening Agriculture Public Services) financed housing for goats and provided technical recommendations for improving their welfare, which is critical to avoid the respiratory infections, including pneumonia, that are serious diseases for goats. To prevent these diseases, requires optimal sanitation and air quality in herd housing. This involves ensuring that buildings have adequate ventilation and dust levels are reduced to minimize the opportunity for infection. Good nutrition, water and minerals are also needed to support the goats' immune function. The project paid particular attention to: (i) housing design to ensure good ventilation; (ii) locating housing close to water sources and away from human habitation and noisy areas; (iii) providing mineral blocks for micronutrients; (iv) ensuring availability of drinking water and clean food troughs. ",
              "documentMetadata": {
                "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
                "title": "AnimalHealthGoodPracticeNote"
              },
              "pageSpan": {
                "pageStart": 13,
                "pageEnd": 14
              }
            }
          ],
          "nextChunks": [
            {
              "name": "projects/961309680810/locations/global/collections/default_collection/dataStores/allie-pdf-adjacent-chunks_1711394998841/branches/0/documents/0d8619f429d7f20b3575b14cd0ad0813/chunks/c18",
              "id": "c18",
              "content": "\n# Scoping of risks\nEarly scoping of risks related to animal health informs decisions to initiate more comprehensive risk assessment according to the type of livestock interventions and activities. It can be based on the following considerations: • • • • Type of livestock interventions supported by the project (such as expansion of feed resources, improvement of animal genetics, construction/upgrading and management of post-farm-gate facilities, etc. – see also Annex 2); Geographic scope and scale of the livestock interventions; Human and animal populations that are likely to be affected (farmers, women, children, domestic animals, wildlife, etc.); and Changes in the project or project context (such as emerging disease outbreak, extreme weather or climatic conditions) that would require a re-assessment of risk levels, mitigation measures and their likely effect on risk reduction. Scenario planning can also help to identify project-specific vulnerabilities, country-wide or locally, and help shape pragmatic analyses that address single or multiple hazards. In this process, some populations may be identified as having disproportionate exposure or vulnerability to certain risks because of occupation, gender, age, cultural or religious affiliation, socio-economic or health status. For example, women and children may be the main caretakers of livestock in the case of 9 Good Practice Note - Animal Health and related risks household farming, which puts them into close contact with animals and animal products. In farms and slaughterhouses, workers and veterinarians are particularly exposed, as they may be in direct contact with sick animals (see Box 2 for an illustration). Fragility, conflict, and violence (FCV) can exacerbate risk, in terms of likelihood and impact. Migrants new to a geographic area may be immunologically naïve to endemic zoonotic diseases or they may inadvertently introduce exotic diseases; and refugees or internally displaced populations may have high population density with limited infrastructure, leaving them vulnerable to disease exposure. Factors such as lack of access to sanitation, hygiene, housing, and health and veterinary services may also affect disease prevalence, contributing to perpetuation of poverty in some populations. Risk assessment should identify populations at risk and prioritize vulnerable populations and circumstances where risks may be increased. It should be noted that activities that seem minor can still have major consequences. See Box 6 for an example illustrating how such small interventions in a project may have large-scale consequences. It highlights the need for risk assessment, even for simple livestock interventions and activities, and how this can help during the project cycle (from concept to implementation). ",
              "documentMetadata": {
                "uri": "gs://table_eval_set/pdf/worldbank/AnimalHealthGoodPracticeNote.pdf",
                "title": "AnimalHealthGoodPracticeNote"
              },
              "pageSpan": {
                "pageStart": 15,
                "pageEnd": 16
              }
            }
          ]
        }
      }
    }
  ],
  "totalSize": 61,
  "attributionToken": "jwHwjgoMCICPjbAGEISp2J0BEiQ2NjAzMmZhYS0wMDAwLTJjYzEtYWQxYS1hYzNlYjE0Mzc2MTQiB0dFTkVSSUMqUMLwnhXb7Ygtq8SKLa3Eii3d7Ygtj_enIqOAlyLm7Ygtt7eMLduPmiKN96cijr6dFcXL8xfdj5oi9-yILdSynRWCspoi-eyILYCymiLk7Ygt",
  "nextPageToken": "ANxYzNzQTMiV2MjFWLhFDZh1SMjNmMtADMwATL5EmZyMDM2YDJaMQv3yagQYAsciPgIwgExEgC",
  "guidedSearchResult": {},
  "summary": {}
}

后续步骤

创建第一方数据存储区