Cloud Data Loss Prevention (Cloud DLP) 现已成为敏感数据保护功能的一部分。API 名称保持不变：Cloud Data Loss Prevention API (DLP API)。如需了解构成敏感数据保护的服务，请参阅敏感数据保护概览。

此页面由 Cloud Translation API 翻译。

创建大型自定义字典检测器

本主题介绍了如何创建和重建大型自定义字典。本文档还介绍了多种错误场景。

何时选择自定义大字典而非常规自定义字典

如果您想要在内容中扫描数万个敏感字词或短语，那么常规自定义字典检测器就足够了。如果您有更多字词，或者您的字词列表经常更改，请考虑创建一个大型自定义字典，该字典最多可支持数千万个字词。

大型自定义字典与其他自定义 infoType 的区别

大型自定义字典不同于其他自定义 infoType，因为每个大型自定义字典都由两部分组成：

您创建和定义的短语列表。该列表以文本文件的形式存储在 Cloud Storage 中或以列的形式存储在 BigQuery 表中。
敏感数据保护功能生成并存储在 Cloud Storage 中的字典文件。字典文件由您的字词列表的副本和有助于进行搜索和匹配的布隆过滤器组成。

创建自定义大字典

本部分介绍如何创建、修改和重建大型自定义字典。

创建术语列表

创建一个列表，其中包含您希望新的 infoType 检测器搜索的所有字词和短语。执行下列其中一项操作：

将每个字词或短语各自位于一行的文本文件放在 Cloud Storage 存储分区中。
将 BigQuery 表的一列指定为字词和短语的容器。在列中为每个条目提供自己的行。只要所有字典字词和短语都位于一列中，您就可以使用现有 BigQuery 表。

您可能会组建出过大的 Sensitive Data Protection 无法处理的术语列表。如果您看到错误消息，请参阅本主题后面的排查错误。

创建存储的 infoType

创建术语列表后，使用 Sensitive Data Protection 创建字典：

控制台

在 Cloud Storage 存储桶中，创建一个新文件夹，用于存储敏感数据保护生成的字典。

Sensitive Data Protection 会在您指定的位置创建包含字典文件的文件夹。
在 Google Cloud 控制台中，前往创建 infoType 页面。

前往“创建 infoType”
在类型部分，选择大型自定义字典。
对于 InfoType ID，请输入存储的 infoType 的标识符。

在配置检查和去标识化作业时，您将使用此标识符。您可以在名称中使用字母、数字、连字符和下划线。
在 InfoType 显示名字段中，输入存储的 infoType 的名称。

您可以在名称中使用空格和标点符号。
在说明部分，输入有关存储的 infoType 所检测到内容的说明。
对于存储类型，请选择词汇列表的位置：
- BigQuery：输入项目 ID、数据集 ID 和表 ID。在字段名称字段中，输入列标识符。您最多只能指定表中的一列。
- Google Cloud Storage：输入文件的路径。
对于输出存储桶或文件夹，请输入您在第 1 步中创建的文件夹的 Cloud Storage 位置。

注意： 请勿将文件放入敏感数据保护在构建字典时创建的文件夹中。在创建或重建字典文件时，Sensitive Data Protection 将删除存储在其所创建文件夹中的任何其他文件。与敏感数据保护创建的文件夹位于同一级层的所有文件夹都将保留。
点击创建。

系统会显示存储的 infoType 的摘要。当字典生成完毕且新的存储 infoType 可供使用时，该 infoType 的状态会显示为准备就绪。

C#

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class CreateStoredInfoTypes
{
    public static StoredInfoType Create(
        string projectId,
        string outputPath,
        string storedInfoTypeId)
    {
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the stored infotype config by specifying the public table and 
        // cloud storage output path.
        var storedInfoTypeConfig = new StoredInfoTypeConfig
        {
            DisplayName = "Github Usernames",
            Description = "Dictionary of Github usernames used in commits.",
            LargeCustomDictionary = new LargeCustomDictionaryConfig
            {
                BigQueryField = new BigQueryField
                {
                    Table = new BigQueryTable
                    {
                        DatasetId = "samples",
                        ProjectId = "bigquery-public-data",
                        TableId = "github_nested"
                    },
                    Field = new FieldId
                    {
                        Name = "actor"
                    }
                },
                OutputPath = new CloudStoragePath
                {
                    Path = outputPath
                }
            },
        };

        // Construct the request.
        var request = new CreateStoredInfoTypeRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            Config = storedInfoTypeConfig,
            StoredInfoTypeId = storedInfoTypeId
        };

        // Call the API.
        StoredInfoType response = dlp.CreateStoredInfoType(request);

        // Inspect the response.
        Console.WriteLine($"Created the stored infotype at path: {response.Name}");

        return response;
    }
}

Go

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// createStoredInfoType creates a custom stored info type based on your input data.
func createStoredInfoType(w io.Writer, projectID, outputPath string) error {
	// projectId := "my-project-id"
	// outputPath := "gs://" + "your-bucket-name" + "path/to/directory"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the name you want to give the dictionary.
	displayName := "Github Usernames"

	// Specify a description of the dictionary.
	description := "Dictionary of GitHub usernames used in commits"

	// Specify the path to the location in a Cloud Storage
	// bucket to store the created dictionary.
	cloudStoragePath := &dlppb.CloudStoragePath{
		Path: outputPath,
	}

	// Specify your term list is stored in BigQuery.
	bigQueryField := &dlppb.BigQueryField{
		Table: &dlppb.BigQueryTable{
			ProjectId: "bigquery-public-data",
			DatasetId: "samples",
			TableId:   "github_nested",
		},
		Field: &dlppb.FieldId{
			Name: "actor",
		},
	}

	// Specify the configuration of the large custom dictionary.
	largeCustomDictionaryConfig := &dlppb.LargeCustomDictionaryConfig{
		OutputPath: cloudStoragePath,
		Source: &dlppb.LargeCustomDictionaryConfig_BigQueryField{
			BigQueryField: bigQueryField,
		},
	}

	// Specify the configuration for stored infoType.
	storedInfoTypeConfig := &dlppb.StoredInfoTypeConfig{
		DisplayName: displayName,
		Description: description,
		Type: &dlppb.StoredInfoTypeConfig_LargeCustomDictionary{
			LargeCustomDictionary: largeCustomDictionaryConfig,
		},
	}

	// Combine configurations into a request for the service.
	req := &dlppb.CreateStoredInfoTypeRequest{
		Parent:           fmt.Sprintf("projects/%s/locations/global", projectID),
		Config:           storedInfoTypeConfig,
		StoredInfoTypeId: "github-usernames",
	}

	// Send the request and receive response from the service.
	resp, err := client.CreateStoredInfoType(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.Name)
	return nil

}

Java

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.BigQueryField;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CloudStoragePath;
import com.google.privacy.dlp.v2.CreateStoredInfoTypeRequest;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.LargeCustomDictionaryConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.StoredInfoType;
import com.google.privacy.dlp.v2.StoredInfoTypeConfig;
import java.io.IOException;

public class CreateStoredInfoType {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.

    //The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The path to the location in a GCS bucket to store the created dictionary.
    String outputPath = "gs://" + "your-bucket-name" + "path/to/directory";
    createStoredInfoType(projectId, outputPath);
  }

  // Creates a custom stored info type that contains GitHub usernames used in commits.
  public static void createStoredInfoType(String projectId, String outputPath)
      throws IOException {
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Optionally set a display name and a description.
      String displayName = "GitHub usernames";
      String description = "Dictionary of GitHub usernames used in commits";

      // The output path where the custom dictionary containing the GitHub usernames will be stored.
      CloudStoragePath cloudStoragePath =
          CloudStoragePath.newBuilder()
              .setPath(outputPath)
              .build();

      // The reference to the table containing the GitHub usernames.
      BigQueryTable table = BigQueryTable.newBuilder()
              .setProjectId("bigquery-public-data")
              .setDatasetId("samples")
              .setTableId("github_nested")
              .build();

      // The reference to the BigQuery field that contains the GitHub usernames.
      BigQueryField bigQueryField = BigQueryField.newBuilder()
              .setTable(table)
              .setField(FieldId.newBuilder().setName("actor").build())
              .build();

      LargeCustomDictionaryConfig largeCustomDictionaryConfig =
          LargeCustomDictionaryConfig.newBuilder()
              .setOutputPath(cloudStoragePath)
              .setBigQueryField(bigQueryField)
              .build();

      StoredInfoTypeConfig storedInfoTypeConfig = StoredInfoTypeConfig.newBuilder()
              .setDisplayName(displayName)
              .setDescription(description)
              .setLargeCustomDictionary(largeCustomDictionaryConfig)
              .build();

      // Combine configurations into a request for the service.
      CreateStoredInfoTypeRequest createStoredInfoType = CreateStoredInfoTypeRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setConfig(storedInfoTypeConfig)
              .setStoredInfoTypeId("github-usernames")
              .build();

      // Send the request and receive response from the service.
      StoredInfoType response = dlp.createStoredInfoType(createStoredInfoType);

      // Print the results.
      System.out.println("Created Stored InfoType: " + response.getName());
    }
  }
}

Node.js

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

// Import the required libraries
const dlp = require('@google-cloud/dlp');

// Create a DLP client
const dlpClient = new dlp.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The identifier for the stored infoType
// const infoTypeId = 'github-usernames';

// The path to the location in a Cloud Storage bucket to store the created dictionary
// const outputPath = 'cloud-bucket-path';

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const dataProjectId = 'my-project';

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// Field ID to be used for constructing dictionary
// const fieldName = 'field_name';

async function createStoredInfoType() {
  // The name you want to give the dictionary.
  const displayName = 'GitHub usernames';
  // A description of the dictionary.
  const description = 'Dictionary of GitHub usernames used in commits';

  // Specify configuration for the large custom dictionary
  const largeCustomDictionaryConfig = {
    outputPath: {
      path: outputPath,
    },
    bigQueryField: {
      table: {
        datasetId: datasetId,
        projectId: dataProjectId,
        tableId: tableId,
      },
      field: {
        name: fieldName,
      },
    },
  };

  // Stored infoType configuration that uses large custom dictionary.
  const storedInfoTypeConfig = {
    displayName: displayName,
    description: description,
    largeCustomDictionary: largeCustomDictionaryConfig,
  };

  // Construct the job creation request to be sent by the client.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    config: storedInfoTypeConfig,
    storedInfoTypeId: infoTypeId,
  };

  // Send the job creation request and process the response.
  const [response] = await dlpClient.createStoredInfoType(request);

  // Print results
  console.log(`InfoType stored successfully: ${response.name}`);
}
await createStoredInfoType();

PHP

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


use Google\Cloud\Dlp\V2\BigQueryField;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStoragePath;
use Google\Cloud\Dlp\V2\CreateStoredInfoTypeRequest;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\LargeCustomDictionaryConfig;
use Google\Cloud\Dlp\V2\StoredInfoTypeConfig;

/**
 * Create a stored infoType.
 *
 * @param string $callingProjectId  The Google Cloud Project ID to run the API call under.
 * @param string $outputgcsPath     The path to the location in a Cloud Storage bucket to store the created dictionary.
 * @param string $storedInfoTypeId  The name of the custom stored info type.
 * @param string $displayName       The human-readable name to give the stored infoType.
 * @param string $description       A description for the stored infoType to be created.
 */
function create_stored_infotype(
    string $callingProjectId,
    string $outputgcsPath,
    string $storedInfoTypeId,
    string $displayName,
    string $description
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // The reference to the table containing the GitHub usernames.
    // The reference to the BigQuery field that contains the GitHub usernames.
    // Note: we have used public data
    $bigQueryField = (new BigQueryField())
        ->setTable((new BigQueryTable())
            ->setDatasetId('samples')
            ->setProjectId('bigquery-public-data')
            ->setTableId('github_nested'))
        ->setField((new FieldId())
            ->setName('actor'));

    $largeCustomDictionaryConfig = (new LargeCustomDictionaryConfig())
        // The output path where the custom dictionary containing the GitHub usernames will be stored.
        ->setOutputPath((new CloudStoragePath())
            ->setPath($outputgcsPath))
        ->setBigQueryField($bigQueryField);

    // Configure the StoredInfoType we want the service to perform.
    $storedInfoTypeConfig = (new StoredInfoTypeConfig())
        ->setDisplayName($displayName)
        ->setDescription($description)
        ->setLargeCustomDictionary($largeCustomDictionaryConfig);

    // Send the stored infoType creation request and process the response.
    $parent = "projects/$callingProjectId/locations/global";
    $createStoredInfoTypeRequest = (new CreateStoredInfoTypeRequest())
        ->setParent($parent)
        ->setConfig($storedInfoTypeConfig)
        ->setStoredInfoTypeId($storedInfoTypeId);
    $response = $dlp->createStoredInfoType($createStoredInfoTypeRequest);

    // Print results.
    printf('Successfully created Stored InfoType : %s', $response->getName());
}

Python

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import google.cloud.dlp


def create_stored_infotype(
    project: str,
    stored_info_type_id: str,
    output_bucket_name: str,
) -> None:
    """Uses the Data Loss Prevention API to create stored infoType.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        stored_info_type_id: The identifier for large custom dictionary.
        output_bucket_name: The name of the bucket in Google Cloud Storage
            that would store the created dictionary.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the stored infoType Configuration dictionary. This example creates
    # a stored infoType from a term list stored in a publicly available BigQuery
    # database (bigquery-public-data.samples.github_nested).
    # The database contains all GitHub usernames used in commits.
    stored_info_type_config = {
        "display_name": "GitHub usernames",
        "description": "Dictionary of GitHub usernames used in commits",
        "large_custom_dictionary": {
            "output_path": {"path": f"gs://{output_bucket_name}"},
            # We can either use bigquery field or gcs file as a term list input option.
            "big_query_field": {
                "table": {
                    "project_id": "bigquery-public-data",
                    "dataset_id": "samples",
                    "table_id": "github_nested",
                },
                "field": {"name": "actor"},
            },
        },
    }

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API.
    response = dlp.create_stored_info_type(
        request={
            "parent": parent,
            "config": stored_info_type_config,
            "stored_info_type_id": stored_info_type_id,
        }
    )

    # Print the result
    print(f"Created Stored InfoType: {response.name}")

REST

在 Cloud Storage 存储分区中为字典创建一个新文件夹。 Sensitive Data Protection 会在您指定的位置创建包含字典文件的文件夹。

使用 storedInfoTypes.create 方法创建字典。create 方法采用以下参数：
- StoredInfoTypeConfig 对象，该对象包含存储的 infoType 的配置。具体内容如下：
  - description：字典的说明。
  - displayName：您要为字典指定的名称。
  - LargeCustomDictionaryConfig：包含大型自定义字典的配置。具体内容如下：
    - BigQueryField：如果您的术语列表存储在 BigQuery 中，则指定此参数。包括对存储列表的表的引用以及含有每个字典短语的字段。
    - CloudStorageFileSet：如果您的术语列表存储在 Cloud Storage 中，则指定此参数。包含 Cloud Storage 中来源位置的网址，格式如下："gs://[PATH_TO_GS]"。支持使用通配符。
    - outputPath：Cloud Storage 存储分区中用于存储所创建字典的位置的路径。
- storedInfoTypeId：存储的 infoType 的标识符。在重建、删除存储的 infoType 或在检查或去标识化作业中使用该 infoType 时，您将通过此标识符引用该 infoType。如果将此字段留空，系统会为您生成一个标识符。

以下是示例 JSON，将该 JSON 发送到 storedInfoTypes.create 方法时，会新建一个存储的 infoType，具体而言，是一个大型自定义字典检测器。此示例根据存储在公开可用的 BigQuery 数据库 (bigquery-public-data.samples.github_nested) 中的术语列表创建存储的 infoType。该数据库包含提交中使用的所有 GitHub 用户名。生成字典的输出路径设置为名为 dlptesting 的 Cloud Storage 存储桶，且存储的 infoType 命名为 github-usernames。

JSON 输入

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes

{
  "config":{
    "displayName":"GitHub usernames",
    "description":"Dictionary of GitHub usernames used in commits",
    "largeCustomDictionary":{
      "outputPath":{
        "path":"gs://[PATH_TO_GS]"
      },
      "bigQueryField":{
        "table":{
          "datasetId":"samples",
          "projectId":"bigquery-public-data",
          "tableId":"github_nested"
        }
      }
    }
  },
  "storedInfoTypeId":"github-usernames"
}

重建字典

如果您想更新字典，请先更新源术语列表，然后指示 Sensitive Data Protection 重新构建存储的 infoType。

更新 Cloud Storage 或 BigQuery 中的现有源术语列表。

根据需要添加、移除或更改术语或短语。
使用 Google Cloud 控制台或 storedInfoTypes.patch 方法“重建”存储的 infoType，以创建其新版本。

重新构建会创建字典的新版本，该版本将取代旧版字典。

将存储的 infoType 重新构建为新版本后，系统会删除旧版本。在 Sensitive Data Protection 正在更新存储的 infoType 时，该 infoType 处于“待处理”状态。在此期间，存储的 infoType 的旧版本仍然存在。在存储的 infoType 处于待处理状态时运行的任何扫描都将使用存储的 infoType 的旧版本运行。

如需重建存储的 infoType，请执行以下操作：

控制台

更新并保存 Cloud Storage 或 BigQuery 中的术语列表。
在 Google Cloud 控制台中，前往存储的 infoType 列表。

前往存储的 infoType
点击您要更新的存储的 infoType 的 ID。
在 InfoType 详情页面上，点击重新构建数据。

Sensitive Data Protection 会使用您对源术语列表所做的更改重新构建存储的 infoType。存储的 infoType 的状态为“准备就绪”后，您就可以使用它了。任何使用存储的 infoType 的模板或作业触发器都将自动使用重新构建的版本。

C#

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


using System;
using Google.Cloud.Dlp.V2;
using Google.Protobuf.WellKnownTypes;

public class UpdateStoredInfoTypes
{
    public static StoredInfoType Update(
        string gcsFileUri,
        string storedInfoTypePath,
        string outputPath)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        // Construct the stored infotype config. Here, we will change the source from bigquery table to GCS file.
        var storedConfig = new StoredInfoTypeConfig
        {
            LargeCustomDictionary = new LargeCustomDictionaryConfig
            {
                CloudStorageFileSet = new CloudStorageFileSet
                {
                    Url = gcsFileUri
                },
                OutputPath = new CloudStoragePath
                {
                    Path = outputPath
                }
            }
        };

        // Construct the request using the stored config by specifying the update mask object
        // which represent the path of field to be updated.
        var request = new UpdateStoredInfoTypeRequest
        {
            Config = storedConfig,
            Name = storedInfoTypePath,
            UpdateMask = new FieldMask
            {
                Paths =
                {
                    "large_custom_dictionary.cloud_storage_file_set.url"
                }
            }
        };

        // Call the API.
        StoredInfoType response = dlp.UpdateStoredInfoType(request);

        // Inspect the result.
        Console.WriteLine(response);
        return response;
    }
}

Go

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
	"google.golang.org/protobuf/types/known/fieldmaskpb"
)

// updateStoredInfoType uses the Data Loss Prevention API to update stored infoType
// detector by changing the source term list from one stored in Bigquery
// to one stored in Cloud Storage.
func updateStoredInfoType(w io.Writer, projectID, gcsUri, fileSetUrl, infoTypeId string) error {
	// projectId := "your-project-id"
	// gcsUri := "gs://" + "your-bucket-name" + "/path/to/your/file.txt"
	// fileSetUrl := "your-cloud-storage-file-set"
	// infoTypeId := "your-stored-info-type-id"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Set path in Cloud Storage.
	cloudStoragePath := &dlppb.CloudStoragePath{
		Path: gcsUri,
	}
	cloudStorageFileSet := &dlppb.CloudStorageFileSet{
		Url: fileSetUrl,
	}

	// Configuration for a custom dictionary created from a data source of any size
	largeCustomDictionaryConfig := &dlppb.LargeCustomDictionaryConfig{
		OutputPath: cloudStoragePath,
		Source: &dlppb.LargeCustomDictionaryConfig_CloudStorageFileSet{
			CloudStorageFileSet: cloudStorageFileSet,
		},
	}

	// Set configuration for stored infoTypes.
	storedInfoTypeConfig := &dlppb.StoredInfoTypeConfig{
		Type: &dlppb.StoredInfoTypeConfig_LargeCustomDictionary{
			LargeCustomDictionary: largeCustomDictionaryConfig,
		},
	}

	// Set mask to control which fields get updated.
	fieldMask := &fieldmaskpb.FieldMask{
		Paths: []string{"large_custom_dictionary.cloud_storage_file_set.url"},
	}
	// Construct the job creation request to be sent by the client.
	req := &dlppb.UpdateStoredInfoTypeRequest{
		Name:       fmt.Sprint("projects/" + projectID + "/storedInfoTypes/" + infoTypeId),
		Config:     storedInfoTypeConfig,
		UpdateMask: fieldMask,
	}

	// Use the client to send the API request.
	resp, err := client.UpdateStoredInfoType(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.Name)
	return nil
}

Java

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CloudStorageFileSet;
import com.google.privacy.dlp.v2.CloudStoragePath;
import com.google.privacy.dlp.v2.LargeCustomDictionaryConfig;
import com.google.privacy.dlp.v2.StoredInfoType;
import com.google.privacy.dlp.v2.StoredInfoTypeConfig;
import com.google.privacy.dlp.v2.StoredInfoTypeName;
import com.google.privacy.dlp.v2.UpdateStoredInfoTypeRequest;
import com.google.protobuf.FieldMask;
import java.io.IOException;

public class UpdateStoredInfoType {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The path to file in GCS bucket that holds a collection of words and phrases to be searched by
    // the new infoType detector.
    String filePath = "gs://" + "your-bucket-name" + "/path/to/your/file.txt";
    // The path to the location in a GCS bucket to store the created dictionary.
    String outputPath = "your-cloud-storage-file-set";
    // The name of the stored InfoType which is to be updated.
    String infoTypeId = "your-stored-info-type-id";
    updateStoredInfoType(projectId, filePath, outputPath, infoTypeId);
  }

  // Update the stored info type rebuilding the Custom dictionary.
  public static void updateStoredInfoType(
      String projectId, String filePath, String outputPath, String infoTypeId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Set path in Cloud Storage.
      CloudStoragePath cloudStoragePath = CloudStoragePath.newBuilder().setPath(outputPath).build();
      CloudStorageFileSet cloudStorageFileSet =
          CloudStorageFileSet.newBuilder().setUrl(filePath).build();

      // Configuration for a custom dictionary created from a data source of any size
      LargeCustomDictionaryConfig largeCustomDictionaryConfig =
          LargeCustomDictionaryConfig.newBuilder()
              .setOutputPath(cloudStoragePath)
              .setCloudStorageFileSet(cloudStorageFileSet)
              .build();

      // Set configuration for stored infoTypes.
      StoredInfoTypeConfig storedInfoTypeConfig =
          StoredInfoTypeConfig.newBuilder()
              .setLargeCustomDictionary(largeCustomDictionaryConfig)
              .build();

      // Set mask to control which fields get updated.
      // Refer https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask for constructing the field mask paths.
      FieldMask fieldMask =
          FieldMask.newBuilder()
              .addPaths("large_custom_dictionary.cloud_storage_file_set.url")
              .build();

      // Construct the job creation request to be sent by the client.
      UpdateStoredInfoTypeRequest updateStoredInfoTypeRequest =
          UpdateStoredInfoTypeRequest.newBuilder()
              .setName(
                  StoredInfoTypeName.ofProjectStoredInfoTypeName(projectId, infoTypeId).toString())
              .setConfig(storedInfoTypeConfig)
              .setUpdateMask(fieldMask)
              .build();

      // Send the job creation request and process the response.
      StoredInfoType response = dlp.updateStoredInfoType(updateStoredInfoTypeRequest);

      // Print the results.
      System.out.println("Updated stored InfoType successfully: " + response.getName());
    }
  }
}

Node.js

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

// Import the required libraries
const dlp = require('@google-cloud/dlp');

// Create a DLP client
const dlpClient = new dlp.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The identifier for the stored infoType
// const infoTypeId = 'github-usernames';

// The path to the location in a Cloud Storage bucket to store the created dictionary
// const outputPath = 'cloud-bucket-path';

// Path of file containing term list
// const cloudStorageFileSet = 'gs://[PATH_TO_GS]';

async function updateStoredInfoType() {
  // Specify configuration of the large custom dictionary including cloudStorageFileSet and outputPath
  const largeCustomDictionaryConfig = {
    outputPath: {
      path: outputPath,
    },
    cloudStorageFileSet: {
      url: fileSetUrl,
    },
  };

  // Construct the job creation request to be sent by the client.
  const updateStoredInfoTypeRequest = {
    name: `projects/${projectId}/storedInfoTypes/${infoTypeId}`,
    config: {
      largeCustomDictionary: largeCustomDictionaryConfig,
    },
    updateMask: {
      paths: ['large_custom_dictionary.cloud_storage_file_set.url'],
    },
  };

  // Send the job creation request and process the response.
  const [response] = await dlpClient.updateStoredInfoType(
    updateStoredInfoTypeRequest
  );

  // Print the results.
  console.log(`InfoType updated successfully: ${JSON.stringify(response)}`);
}
await updateStoredInfoType();

PHP

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageFileSet;
use Google\Cloud\Dlp\V2\CloudStoragePath;
use Google\Cloud\Dlp\V2\LargeCustomDictionaryConfig;
use Google\Cloud\Dlp\V2\StoredInfoTypeConfig;
use Google\Cloud\Dlp\V2\UpdateStoredInfoTypeRequest;
use Google\Protobuf\FieldMask;

/**
 * Rebuild/Update the stored infoType.
 *
 * @param string $callingProjectId  The Google Cloud Project ID to run the API call under.
 * @param string $gcsPath           The path to file in GCS bucket that holds a collection of words and phrases to be searched by the new infoType detector.
 * @param string $outputgcsPath     The path to the location in a Cloud Storage bucket to store the created dictionary.
 * @param string $storedInfoTypeId  The name of the stored InfoType which is to be updated.
 *
 */
function update_stored_infotype(
    string $callingProjectId,
    string $gcsPath,
    string $outputgcsPath,
    string $storedInfoTypeId
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Set path in Cloud Storage.
    $cloudStorageFileSet = (new CloudStorageFileSet())
        ->setUrl($gcsPath);

    // Configuration for a custom dictionary created from a data source of any size
    $largeCustomDictionaryConfig = (new LargeCustomDictionaryConfig())
        ->setOutputPath((new CloudStoragePath())
            ->setPath($outputgcsPath))
        ->setCloudStorageFileSet($cloudStorageFileSet);

    // Set configuration for stored infoTypes.
    $storedInfoTypeConfig = (new StoredInfoTypeConfig())
        ->setLargeCustomDictionary($largeCustomDictionaryConfig);

    // Send the stored infoType creation request and process the response.

    $name = "projects/$callingProjectId/locations/global/storedInfoTypes/" . $storedInfoTypeId;
    // Set mask to control which fields get updated.
    // Refer https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask for constructing the field mask paths.
    $fieldMask = (new FieldMask())
        ->setPaths([
            'large_custom_dictionary.cloud_storage_file_set.url'
        ]);

    // Run request
    $updateStoredInfoTypeRequest = (new UpdateStoredInfoTypeRequest())
        ->setName($name)
        ->setConfig($storedInfoTypeConfig)
        ->setUpdateMask($fieldMask);
    $response = $dlp->updateStoredInfoType($updateStoredInfoTypeRequest);

    // Print results
    printf('Successfully update Stored InforType : %s' . PHP_EOL, $response->getName());
}

Python

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import google.cloud.dlp


def update_stored_infotype(
    project: str,
    stored_info_type_id: str,
    gcs_input_file_path: str,
    output_bucket_name: str,
) -> None:
    """Uses the Data Loss Prevention API to update stored infoType
    detector by changing the source term list from one stored in Bigquery
    to one stored in Cloud Storage.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        stored_info_type_id: The identifier of stored infoType which is to
            be updated.
        gcs_input_file_path: The url in the format <bucket>/<path_to_file>
            for the location of the source term list.
        output_bucket_name: The name of the bucket in Google Cloud Storage
            where large dictionary is stored.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the stored infoType configuration dictionary.
    stored_info_type_config = {
        "large_custom_dictionary": {
            "output_path": {"path": f"gs://{output_bucket_name}"},
            "cloud_storage_file_set": {"url": f"gs://{gcs_input_file_path}"},
        }
    }

    # Set mask to control which fields get updated. For more details, refer
    # https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask
    # for constructing the field mask paths.
    field_mask = {"paths": ["large_custom_dictionary.cloud_storage_file_set.url"]}

    # Convert the stored infoType id into a full resource id.
    stored_info_type_name = (
        f"projects/{project}/locations/global/storedInfoTypes/{stored_info_type_id}"
    )

    # Call the API.
    response = dlp.update_stored_info_type(
        request={
            "name": stored_info_type_name,
            "config": stored_info_type_config,
            "update_mask": field_mask,
        }
    )

    # Print the result
    print(f"Updated stored infoType successfully: {response.name}")

REST

更新术语列表

如果您仅更新大型自定义字典中的字词列表，则 storedInfoTypes.patch 请求仅需要 name 字段。提供要重建的已存储 infoType 的完整资源名称。

以下模式表示 name 字段的有效条目：

organizations/ORGANIZATION_ID/storedInfoTypes/STORED_INFOTYPE_ID
projects/PROJECT_ID/storedInfoTypes/STORED_INFOTYPE_ID

将 STORED_INFOTYPE_ID 替换为要重建的已存储 infoType 的标识符。

如果您不知道存储的 infoType 的标识符，请调用 storedInfoTypes.list 方法，查看所有当前存储的 infoType 的列表。

示例

PATCH https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes/STORED_INFOTYPE_ID

在这种情况下，不需要请求正文。

切换来源字词列表

您可以将存储的 infoType 的源术语列表从存储在 BigQuery 中的列表更改为存储在 Cloud Storage 中的列表。使用 storedInfoTypes.patch 方法，但要在 LargeCustomDictionaryConfig 中（您之前在其中使用过 BigQueryField 对象）包含一个 CloudStorageFileSet 对象。然后，采用 FieldMask 格式将 updateMask 参数设置为已重新创建的存储的 infoType 参数。例如，以下 JSON 会在 updateMask 参数中声明 Cloud Storage 路径的网址已更新 (large_custom_dictionary.cloud_storage_file_set.url)：

示例

PATCH https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes/github-usernames

{
  "config":{
    "largeCustomDictionary":{
      "cloudStorageFileSet":{
        "url":"gs://[BUCKET_NAME]/[PATH_TO_FILE]"
      }
    }
  },
  "updateMask":"large_custom_dictionary.cloud_storage_file_set.url"
}

同样，您也可以将术语列表从存储在 BigQuery 表中更改为存储在 Cloud Storage 存储桶中。

使用大型自定义字典检测器扫描内容

使用大型自定义字典检测器扫描内容类似于使用任何其他自定义 infoType 检测器扫描内容。

此过程假设您已有存储的 infoType。如需了解详情，请参阅本页面上的创建存储的 infoType。

控制台

您可以在执行以下操作时应用大型自定义字典检测器：

创建新的作业
创建或修改作业触发器
创建或修改模板
配置数据分析

在该页面的配置检测部分中的 InfoType 子部分中，您可以指定大型自定义字典 infoType。

点击管理 infoType。
在 InfoType 窗格中，点击自定义标签页。
点击添加自定义 infoType。
在添加自定义 infoType 窗格中，执行以下操作：
1. 对于类型，请选择存储的 infoType。
2. 对于 InfoType，请输入自定义 infoType 的名称。您可以使用字母、数字和下划线。
3. 对于可能性，请选择要分配给与此自定义 infoType 匹配的所有发现结果的默认可能性级别。您可以使用热词规则进一步微调各个发现结果的可能性级别。
  
  如果您未指定默认值，则默认似然水平会设置为 VERY_LIKELY。如需了解详情，请参阅匹配可能性。
4. 对于敏感度，请选择您要分配给与此自定义 infoType 匹配的所有发现结果的敏感度级别。如果您未指定值，则这些发现的敏感度级别会设置为 HIGH。
  
  敏感度得分用于数据剖析。在分析数据时，Sensitive Data Protection 会使用 infoType 的敏感度得分来计算敏感度级别。
5. 对于存储的 infoType 名称，请选择要作为新自定义 infoType 基础的存储的 infoType。
6. 点击完成以关闭添加自定义 infoType 窗格。
可选：在内置标签页上，修改您选择的内置 infoType。
点击完成以关闭 infoType 窗格。

自定义 infoType 会添加到 Sensitive Data Protection 扫描的 infoType 列表中。不过，在您保存作业、作业触发器、模板或扫描配置之前，此选择并非最终选择。
创建或修改完配置后，点击保存。

C#

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectDataWithStoredInfotypes
{
    public static InspectContentResponse Inspect(
        string projectId,
        string storedInfotypePath,
        string text,
        InfoType infoType = null)
    {
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the infotype if null.
        var infotype = infoType ?? new InfoType { Name = "GITHUB_LOGINS" };

        // Construct the inspect config using stored infotype.
        var inspectConfig = new InspectConfig
        {
            CustomInfoTypes =
            {
                new CustomInfoType
                {
                    InfoType = infotype,
                    StoredType = new StoredType { Name = storedInfotypePath }
                }
            },
            IncludeQuote = true
        };

        // Construct the request using inspect config.
        var request = new InspectContentRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            InspectConfig = inspectConfig,
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        InspectContentResponse response = dlp.InspectContent(request);

        // Inspect the results.
        var findings = response.Result.Findings;
        Console.WriteLine($"Findings: {findings.Count}");
        foreach (var f in findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Go

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// inspectWithStoredInfotype inspects the given text using the specified stored infoType detector.
func inspectWithStoredInfotype(w io.Writer, projectID, infoTypeId, textToDeidentify string) error {
	// projectId := "your-project-id"
	// infoTypeId := "your-info-type-id"
	// textToDeidentify := "This commit was made by kewin2010"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be inspected.
	contentItem := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: textToDeidentify,
		},
	}

	// Specify the info type the inspection will look for.
	infoType := &dlppb.InfoType{
		Name: "GITHUB_LOGINS",
	}

	// Specify the stored info type the inspection will look for.
	storedType := &dlppb.StoredType{
		Name: infoTypeId,
	}

	customInfoType := &dlppb.CustomInfoType{
		InfoType: infoType,
		Type: &dlppb.CustomInfoType_StoredType{
			StoredType: storedType,
		},
	}

	// Specify how the content should be inspected.
	inspectConfig := &dlppb.InspectConfig{
		CustomInfoTypes: []*dlppb.CustomInfoType{
			customInfoType,
		},
		IncludeQuote: true,
	}

	// Construct the Inspect request to be sent by the client.
	req := &dlppb.InspectContentRequest{
		Parent:        fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: inspectConfig,
		Item:          contentItem,
	}

	// Use the client to send the API request.
	resp, err := client.InspectContent(ctx, req)
	if err != nil {
		return err
	}

	// Process the results.
	fmt.Fprintf(w, "Findings: %d\n", len(resp.Result.Findings))
	for _, f := range resp.Result.Findings {
		fmt.Fprintf(w, "\tQuote: %s\n", f.Quote)
		fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
		fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
	}
	return nil
}

Java

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.ProjectStoredInfoTypeName;
import com.google.privacy.dlp.v2.StoredType;
import java.io.IOException;

public class InspectWithStoredInfotype {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The sample assumes that you have an existing stored infoType.
    // To create a stored InfoType refer:
    // https://cloud.google.com/dlp/docs/creating-stored-infotypes#create-storedinfotye 
    String storedInfoTypeId = "your-info-type-id";
    // The string to de-identify.
    String textToInspect =
        "My phone number is (223) 456-7890 and my email address is gary@example.com.";
    inspectWithStoredInfotype(projectId, storedInfoTypeId, textToInspect);
  }

  //  Inspects the given text using the specified stored infoType detector.
  public static void inspectWithStoredInfotype(
      String projectId, String storedInfoTypeId, String textToInspect) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify the content to be inspected.
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToInspect).build();

      InfoType infoType = InfoType.newBuilder().setName("STORED_TYPE").build();

      // Reference to the existing StoredInfoType to inspect the data.
      StoredType storedType = StoredType.newBuilder()
              .setName(ProjectStoredInfoTypeName.of(projectId, storedInfoTypeId).toString())
              .build();

      CustomInfoType customInfoType =
          CustomInfoType.newBuilder().setInfoType(infoType).setStoredType(storedType).build();

      // Construct the configuration for the Inspect request.
      InspectConfig inspectConfig =
          InspectConfig.newBuilder()
              .addCustomInfoTypes(customInfoType)
              .setIncludeQuote(true)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest inspectContentRequest =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setInspectConfig(inspectConfig)
              .setItem(contentItem)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(inspectContentRequest);

      // Parse the response and process results.
      System.out.println("Findings: " + "" + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfoType: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood() + "\n");
      }
    }
  }
}

Node.js

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = 'your-project-id';

// The custom info-type id created and stored in the bucket.
// const infoTypeId = 'your-info-type-id';

// The string to inspect.
// const string = 'My phone number is (223) 456-7890 and my email address is gary@example.com.';

async function inspectWithStoredInfotype() {
  // Reference to the existing StoredInfoType to inspect the data.
  const customInfoType = {
    infoType: {
      name: 'GITHUB_LOGINS',
    },
    storedType: {
      name: infoTypeId,
    },
  };

  // Construct the configuration for the Inspect request.
  const inspectConfig = {
    customInfoTypes: [customInfoType],
    includeQuote: true,
  };

  // Construct the Inspect request to be sent by the client.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    inspectConfig: inspectConfig,
    item: {
      value: string,
    },
  };
  // Run request
  const [response] = await dlp.inspectContent(request);

  // Print Findings
  const findings = response.result.findings;
  if (findings.length > 0) {
    console.log(`Findings: ${findings.length}\n`);
    findings.forEach(finding => {
      console.log(`InfoType: ${finding.infoType.name}`);
      console.log(`\tQuote: ${finding.quote}`);
      console.log(`\tLikelihood: ${finding.likelihood} \n`);
    });
  } else {
    console.log('No findings.');
  }
}
await inspectWithStoredInfotype();

PHP

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CustomInfoType;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectContentRequest;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\StoredType;

/**
 * Inspect with stored infoType.
 * Scan content using a large custom dictionary detector.
 *
 * @param string $projectId             The Google Cloud Project ID to run the API call under.
 * @param string $storedInfoTypeName    The name of the stored infotype whose This value must be in the format
 * projects/projectName/(locations/locationId)/storedInfoTypes/storedInfoTypeName.
 * @param string $textToInspect         The string to inspect.
 */
function inspect_with_stored_infotype(
    string $projectId,
    string $storedInfoTypeName,
    string $textToInspect
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$projectId/locations/global";

    // Specify the content to be inspected.
    $item = (new ContentItem())
        ->setValue($textToInspect);

    // Reference to the existing StoredInfoType to inspect the data.
    $customInfoType = (new CustomInfoType())
        ->setInfoType((new InfoType())
            ->setName('STORED_TYPE'))
        ->setStoredType((new StoredType())
            ->setName($storedInfoTypeName));

    // Construct the configuration for the Inspect request.
    $inspectConfig = (new InspectConfig())
        ->setCustomInfoTypes([$customInfoType])
        ->setIncludeQuote(true);

    // Run request.
    $inspectContentRequest = (new InspectContentRequest())
        ->setParent($parent)
        ->setInspectConfig($inspectConfig)
        ->setItem($item);
    $response = $dlp->inspectContent($inspectContentRequest);

    // Print the results.
    $findings = $response->getResult()->getFindings();
    if (count($findings) == 0) {
        printf('No findings.' . PHP_EOL);
    } else {
        printf('Findings:' . PHP_EOL);
        foreach ($findings as $finding) {
            printf('  Quote: %s' . PHP_EOL, $finding->getQuote());
            printf('  Info type: %s' . PHP_EOL, $finding->getInfoType()->getName());
            printf('  Likelihood: %s' . PHP_EOL, Likelihood::name($finding->getLikelihood()));
        }
    }
}

Python

如需了解如何安装和使用 Sensitive Data Protection 客户端库，请参阅 Sensitive Data Protection 客户端库。

如需向 Sensitive Data Protection 进行身份验证，请设置应用默认凭据。如需了解详情，请参阅为本地开发环境设置身份验证。

import google.cloud.dlp


def inspect_with_stored_infotype(
    project: str,
    stored_info_type_id: str,
    content_string: str,
) -> None:
    """Uses the Data Loss Prevention API to inspect/scan content using stored
    infoType.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to inspect.
        stored_info_type_id: The identifier of stored infoType used to inspect.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert stored infoType id into full resource id
    stored_type_name = f"projects/{project}/storedInfoTypes/{stored_info_type_id}"

    # Construct a custom info type dictionary using stored infoType.
    custom_info_types = [
        {
            "info_type": {"name": "STORED_TYPE"},
            "stored_type": {
                "name": stored_type_name,
            },
        }
    ]

    # Construct the inspection configuration dictionary.
    inspect_config = {
        "custom_info_types": custom_info_types,
        "include_quote": True,
    }

    # Construct the `item` to be inspected using stored infoType.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API.
    response = dlp.inspect_content(
        request={
            "parent": parent,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            print(f"Quote: {finding.quote}")
            print(f"Info type: {finding.info_type.name}")
            print(f"Likelihood: {finding.likelihood}")
    else:
        print("No findings.")

REST

将以下示例发送到 content.inspect 方法时，该示例会使用指定的存储的 infoType 检测器扫描指定的文本。infoType 为必需参数，因为所有自定义 infoType 的名称都不得与内置的 infoType 或其他自定义 infoType 冲突。storedType 参数包含存储的 infoType 的完整资源路径。

JSON 输入

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:inspect

{
  "inspectConfig":{
    "customInfoTypes":[
      {
        "infoType":{
          "name":"GITHUB_LOGINS"
        },
        "storedType":{
          "name":"projects/PROJECT_ID/storedInfoTypes/github-logins"
        }
      }
    ]
  },
  "item":{
    "value":"The commit was made by githubuser."
  }
}

排查错误

如果您在尝试从 Cloud Storage 中存储的术语列表创建存储的 infoType 时遇到错误，可能是因为以下原因：

您已经达到存储的 infoType 的上限。根据具体的问题，您可以选择多种解决方法：
- 如果已达到 Cloud Storage 中单个输入文件的大小上限 (200 MB)，请尝试将文件拆分为多个文件。您可以使用多个文件来组建单个自定义字典，只要所有文件的总大小不超过 1 GB 即可。
- BigQuery 与 Cloud Storage 的限制不同。请考虑将术语移动到 BigQuery 表中。BigQuery 中自定义字典列的大小上限为 1 GB，最大行数为 5,000,000。
- 如果术语列表文件超出了源术语列表的所有适用限制，则必须将术语列表文件拆分为多个文件，并为每个文件创建一个字典。然后，为每个字典创建单独的扫描作业。
您有一条或多条术语未包含至少一个字母或数字。 Sensitive Data Protection 无法扫描仅包含空格或符号的术语。术语必须至少含有一个字母或数字。请查看您的术语列表是否包含任何此类术语，如果有，请进行更正或将其删除。
您的术语列表包含一个含有太多“成分”的短语。此处所述的“成分”是指仅包含字母、仅包含数字或仅包含非字母和非数字字符（如空格或符号）的连续序列。请查看您的术语列表是否包含任何此类术语，如果有，请进行更正或将其删除。
Sensitive Data Protection 服务代理无权访问字典源数据或用于存储字典文件的 Cloud Storage 存储桶。如需解决此问题，请为 Sensitive Data Protection 服务代理授予 Storage Admin (roles/storage.admin) 角色或 BigQuery Data Owner (roles/bigquery.dataOwner) 和 BigQuery Job User (roles/bigquery.jobUser) 角色。

API 概览

如果您要创建大型自定义字典检测器，则必须创建存储的 infoType。

存储的 infoType 在 Sensitive Data Protection 中由 StoredInfoType 对象表示。它包含以下相关对象：

StoredInfoTypeVersion 包含创建日期和时间，以及创建当前版本时出现的最近五条错误消息。
- StoredInfoTypeConfig 包含存储的 infoType 的配置，包括其名称和说明。对于大型自定义字典，type 必须是 LargeCustomDictionaryConfig。
  - LargeCustomDictionaryConfig 指定以下两项：
    - Cloud Storage 或 BigQuery 中存储短语列表的位置。
    - Cloud Storage 中用于存储生成的字典文件的位置。
- StoredInfoTypeState 包含存储的 infoType 的最新版本以及任何待处理版本的状态。状态信息包括存储的 infoType 是正在重新构建、可供使用还是无效。

字典匹配详情

以下是关于 Sensitive Data Protection 如何与字典字词和短语进行匹配的指南。这些要点同时适用于常规自定义字典和大型自定义字典：

字典字词不区分大小写。如果您的字典包含 Abby，将匹配 abby、ABBY、Abby 等。
扫描匹配项时，字典或要扫描的内容中，除了 Unicode 基本多语言平面中包含的字母、数字和其他字母字符外，其他所有字符一律被视为空格。如果您的字典扫描 Abby Abernathy，将匹配 abby abernathy、Abby, Abernathy、Abby (ABERNATHY) 等。
任何匹配项两侧的字符均必须与字词中相邻字符的类型（字母或数字）不同。如果您的字典扫描 Abi，将匹配 Abi904 的前三个字符，但不匹配 Abigail 的前三个字符。
如果字典字词包含 Unicode 标准的辅助多语言平面中的字符，则可能会产生意外结果。此类字符的示例包括表情符号、科学符号和历史脚本。

字母、数字和其他字母字符的定义如下：

字母：Unicode 规范中具有常规类别 Lu、Ll、Lt、Lm 或 Lo 的字符
数字：Unicode 规范中具有一般类别 Nd 的字符
其他字母字符：Unicode 规范中具有一般类别 Nl 或具有贡献属性 Other_Alphabetic（如 Unicode 标准所定义）的字符

如需创建、修改或删除存储的 infoType，请使用以下方法：

storedInfoTypes.create：通过您指定的 StoredInfoTypeConfig，新建存储的 infoType。
storedInfoTypes.patch：使用您指定的新 StoredInfoTypeConfig 重新构建存储的 infoType。如果未指定，此方法会使用现有的 StoredInfoTypeConfig 创建存储的 infoType 的新版本。
storedInfoTypes.get：检索指定存储的 infoType 的 StoredInfoTypeConfig 和任何待处理版本。
storedInfoTypes.list：列出所有当前存储的 infoType。
storedInfoTypes.delete：删除指定的存储的 infoType。

创建大型自定义字典检测器 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

何时选择自定义大字典而非常规自定义字典

大型自定义字典与其他自定义 infoType 的区别

创建自定义大字典

创建术语列表

创建存储的 infoType

控制台

C#

Go

Java

Node.js

PHP

Python

REST

重建字典

控制台

C#

Go

Java

Node.js

PHP

Python

REST

更新术语列表

示例

切换来源字词列表

示例

使用大型自定义字典检测器扫描内容

控制台

C#

Go

Java

Node.js

PHP

Python

REST

排查错误

API 概览

字典匹配详情

创建大型自定义字典检测器