O Cloud Data Loss Prevention (Cloud DLP) agora faz parte da Proteção de dados sensíveis. O nome da API continua o mesmo: API Cloud Data Loss Prevention (DLP). Para saber mais sobre os serviços que compõem a Proteção de dados sensíveis, consulte Visão geral da Proteção de dados sensíveis.

Esta página foi traduzida pela API Cloud Translation.

Criar um detector de dicionário personalizado grande

Este tópico descreve como criar e recriar dicionários personalizados grandes. Ele também abrange vários cenários de erro.

Quando escolher um dicionário personalizado grande em vez de um regular

Os detectores de dicionários personalizados regulares são suficientes quando você tem dezenas de milhares de palavras ou frases sensíveis que quer verificar no conteúdo. Se você tiver mais ou se sua lista de termos mudar com frequência, crie um dicionário personalizado grande, que pode aceitar dezenas de milhões de termos.

Como os dicionários personalizados grandes são diferentes de outros infoTypes personalizados

Os dicionários personalizados grandes são diferentes de outros infoTypes personalizados porque cada um deles tem dois componentes:

Uma lista de frases que você cria e define. Essa lista é armazenada como um arquivo de texto no Cloud Storage ou como uma coluna em uma tabela do BigQuery.
Os arquivos de dicionário, que a Proteção de Dados Sensíveis gera e armazena no Cloud Storage. Os arquivos de dicionário são compostos por uma cópia da sua lista de termos e filtros de Bloom, que ajudam na pesquisa e na correspondência.

Criar um dicionário personalizado grande

Esta seção descreve como criar, editar e recriar um dicionário personalizado grande.

Criar uma lista de termos

Crie uma lista com todas as palavras e frases que você quer que o novo detector de infoType procure. Escolha uma destas opções:

Insira um arquivo de texto com cada palavra ou frase em sua própria linha em um bucket do Cloud Storage.
Atribua uma coluna de uma tabela do BigQuery como o contêiner das palavras e frases. Mantenha uma entrada por linha na coluna. É possível usar uma tabela atual do BigQuery, desde que todas as palavras e frases do dicionário estejam em uma única coluna.

É possível montar uma lista de termos muito grande para a Proteção de Dados Sensíveis processar. Se for exibida uma mensagem de erro, consulte Como solucionar erros mais adiante neste tópico.

Criar um infoType armazenado

Depois de criar sua lista de termos, use a Proteção de dados sensíveis para criar um dicionário:

Console

Em um bucket do Cloud Storage, crie uma pasta em que a proteção de dados sensíveis vai armazenar o dicionário gerado.

A Proteção de dados sensíveis cria pastas com os arquivos de dicionário no local especificado.
No console Google Cloud , acesse a página Criar infoType.

Acessar "Criar infoType"
Em Tipo, selecione Dicionário personalizado grande.
Em ID do infoType, insira um identificador para o infoType armazenado.

Você vai usar esse identificador ao configurar os jobs de inspeção e desidentificação. É possível usar letras, números, hifens e sublinhados no nome.
Em Nome de exibição do InfoType, insira um nome para o infoType armazenado.

É possível usar espaços e pontuação no nome.
Em Descrição, insira uma descrição do que o infoType armazenado detecta.
Em Tipo de armazenamento, selecione o local da sua lista de termos:
- BigQuery: insira o ID do projeto, do conjunto de dados e da tabela. No campo Nome, insira o identificador da coluna. É possível designar, no máximo, uma coluna da tabela.
- Google Cloud Storage: insira o caminho para o arquivo.
Em Bucket ou pasta de saída, insira o local do Cloud Storage da pasta criada na etapa 1.

Observação : ao desenvolver o dicionário, não coloque arquivos nas pastas geradas pela Proteção de dados sensíveis. Quando a Proteção de Dados Sensíveis cria ou recria arquivos de dicionário, ela exclui qualquer outro arquivo armazenado nas pastas criadas. Todas as pastas que estão no mesmo nível que as pastas criadas pela Proteção de Dados Sensíveis são mantidas.
Clique em Criar.

Um resumo do infoType armazenado aparece. Quando o dicionário é gerado e o novo infoType armazenado está pronto para uso, o status do infoType mostra Pronto.

C#

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

Para autenticar na Proteção de dados sensíveis, configure o Application Default Credentials. Para mais informações, consulte Configurar a autenticação para um ambiente de desenvolvimento local.


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class CreateStoredInfoTypes
{
    public static StoredInfoType Create(
        string projectId,
        string outputPath,
        string storedInfoTypeId)
    {
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the stored infotype config by specifying the public table and 
        // cloud storage output path.
        var storedInfoTypeConfig = new StoredInfoTypeConfig
        {
            DisplayName = "Github Usernames",
            Description = "Dictionary of Github usernames used in commits.",
            LargeCustomDictionary = new LargeCustomDictionaryConfig
            {
                BigQueryField = new BigQueryField
                {
                    Table = new BigQueryTable
                    {
                        DatasetId = "samples",
                        ProjectId = "bigquery-public-data",
                        TableId = "github_nested"
                    },
                    Field = new FieldId
                    {
                        Name = "actor"
                    }
                },
                OutputPath = new CloudStoragePath
                {
                    Path = outputPath
                }
            },
        };

        // Construct the request.
        var request = new CreateStoredInfoTypeRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            Config = storedInfoTypeConfig,
            StoredInfoTypeId = storedInfoTypeId
        };

        // Call the API.
        StoredInfoType response = dlp.CreateStoredInfoType(request);

        // Inspect the response.
        Console.WriteLine($"Created the stored infotype at path: {response.Name}");

        return response;
    }
}

Go

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// createStoredInfoType creates a custom stored info type based on your input data.
func createStoredInfoType(w io.Writer, projectID, outputPath string) error {
	// projectId := "my-project-id"
	// outputPath := "gs://" + "your-bucket-name" + "path/to/directory"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the name you want to give the dictionary.
	displayName := "Github Usernames"

	// Specify a description of the dictionary.
	description := "Dictionary of GitHub usernames used in commits"

	// Specify the path to the location in a Cloud Storage
	// bucket to store the created dictionary.
	cloudStoragePath := &dlppb.CloudStoragePath{
		Path: outputPath,
	}

	// Specify your term list is stored in BigQuery.
	bigQueryField := &dlppb.BigQueryField{
		Table: &dlppb.BigQueryTable{
			ProjectId: "bigquery-public-data",
			DatasetId: "samples",
			TableId:   "github_nested",
		},
		Field: &dlppb.FieldId{
			Name: "actor",
		},
	}

	// Specify the configuration of the large custom dictionary.
	largeCustomDictionaryConfig := &dlppb.LargeCustomDictionaryConfig{
		OutputPath: cloudStoragePath,
		Source: &dlppb.LargeCustomDictionaryConfig_BigQueryField{
			BigQueryField: bigQueryField,
		},
	}

	// Specify the configuration for stored infoType.
	storedInfoTypeConfig := &dlppb.StoredInfoTypeConfig{
		DisplayName: displayName,
		Description: description,
		Type: &dlppb.StoredInfoTypeConfig_LargeCustomDictionary{
			LargeCustomDictionary: largeCustomDictionaryConfig,
		},
	}

	// Combine configurations into a request for the service.
	req := &dlppb.CreateStoredInfoTypeRequest{
		Parent:           fmt.Sprintf("projects/%s/locations/global", projectID),
		Config:           storedInfoTypeConfig,
		StoredInfoTypeId: "github-usernames",
	}

	// Send the request and receive response from the service.
	resp, err := client.CreateStoredInfoType(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.Name)
	return nil

}

Java

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.BigQueryField;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CloudStoragePath;
import com.google.privacy.dlp.v2.CreateStoredInfoTypeRequest;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.LargeCustomDictionaryConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.StoredInfoType;
import com.google.privacy.dlp.v2.StoredInfoTypeConfig;
import java.io.IOException;

public class CreateStoredInfoType {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.

    //The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The path to the location in a GCS bucket to store the created dictionary.
    String outputPath = "gs://" + "your-bucket-name" + "path/to/directory";
    createStoredInfoType(projectId, outputPath);
  }

  // Creates a custom stored info type that contains GitHub usernames used in commits.
  public static void createStoredInfoType(String projectId, String outputPath)
      throws IOException {
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Optionally set a display name and a description.
      String displayName = "GitHub usernames";
      String description = "Dictionary of GitHub usernames used in commits";

      // The output path where the custom dictionary containing the GitHub usernames will be stored.
      CloudStoragePath cloudStoragePath =
          CloudStoragePath.newBuilder()
              .setPath(outputPath)
              .build();

      // The reference to the table containing the GitHub usernames.
      BigQueryTable table = BigQueryTable.newBuilder()
              .setProjectId("bigquery-public-data")
              .setDatasetId("samples")
              .setTableId("github_nested")
              .build();

      // The reference to the BigQuery field that contains the GitHub usernames.
      BigQueryField bigQueryField = BigQueryField.newBuilder()
              .setTable(table)
              .setField(FieldId.newBuilder().setName("actor").build())
              .build();

      LargeCustomDictionaryConfig largeCustomDictionaryConfig =
          LargeCustomDictionaryConfig.newBuilder()
              .setOutputPath(cloudStoragePath)
              .setBigQueryField(bigQueryField)
              .build();

      StoredInfoTypeConfig storedInfoTypeConfig = StoredInfoTypeConfig.newBuilder()
              .setDisplayName(displayName)
              .setDescription(description)
              .setLargeCustomDictionary(largeCustomDictionaryConfig)
              .build();

      // Combine configurations into a request for the service.
      CreateStoredInfoTypeRequest createStoredInfoType = CreateStoredInfoTypeRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setConfig(storedInfoTypeConfig)
              .setStoredInfoTypeId("github-usernames")
              .build();

      // Send the request and receive response from the service.
      StoredInfoType response = dlp.createStoredInfoType(createStoredInfoType);

      // Print the results.
      System.out.println("Created Stored InfoType: " + response.getName());
    }
  }
}

Node.js

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

// Import the required libraries
const dlp = require('@google-cloud/dlp');

// Create a DLP client
const dlpClient = new dlp.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The identifier for the stored infoType
// const infoTypeId = 'github-usernames';

// The path to the location in a Cloud Storage bucket to store the created dictionary
// const outputPath = 'cloud-bucket-path';

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const dataProjectId = 'my-project';

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// Field ID to be used for constructing dictionary
// const fieldName = 'field_name';

async function createStoredInfoType() {
  // The name you want to give the dictionary.
  const displayName = 'GitHub usernames';
  // A description of the dictionary.
  const description = 'Dictionary of GitHub usernames used in commits';

  // Specify configuration for the large custom dictionary
  const largeCustomDictionaryConfig = {
    outputPath: {
      path: outputPath,
    },
    bigQueryField: {
      table: {
        datasetId: datasetId,
        projectId: dataProjectId,
        tableId: tableId,
      },
      field: {
        name: fieldName,
      },
    },
  };

  // Stored infoType configuration that uses large custom dictionary.
  const storedInfoTypeConfig = {
    displayName: displayName,
    description: description,
    largeCustomDictionary: largeCustomDictionaryConfig,
  };

  // Construct the job creation request to be sent by the client.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    config: storedInfoTypeConfig,
    storedInfoTypeId: infoTypeId,
  };

  // Send the job creation request and process the response.
  const [response] = await dlpClient.createStoredInfoType(request);

  // Print results
  console.log(`InfoType stored successfully: ${response.name}`);
}
await createStoredInfoType();

PHP

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.


use Google\Cloud\Dlp\V2\BigQueryField;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStoragePath;
use Google\Cloud\Dlp\V2\CreateStoredInfoTypeRequest;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\LargeCustomDictionaryConfig;
use Google\Cloud\Dlp\V2\StoredInfoTypeConfig;

/**
 * Create a stored infoType.
 *
 * @param string $callingProjectId  The Google Cloud Project ID to run the API call under.
 * @param string $outputgcsPath     The path to the location in a Cloud Storage bucket to store the created dictionary.
 * @param string $storedInfoTypeId  The name of the custom stored info type.
 * @param string $displayName       The human-readable name to give the stored infoType.
 * @param string $description       A description for the stored infoType to be created.
 */
function create_stored_infotype(
    string $callingProjectId,
    string $outputgcsPath,
    string $storedInfoTypeId,
    string $displayName,
    string $description
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // The reference to the table containing the GitHub usernames.
    // The reference to the BigQuery field that contains the GitHub usernames.
    // Note: we have used public data
    $bigQueryField = (new BigQueryField())
        ->setTable((new BigQueryTable())
            ->setDatasetId('samples')
            ->setProjectId('bigquery-public-data')
            ->setTableId('github_nested'))
        ->setField((new FieldId())
            ->setName('actor'));

    $largeCustomDictionaryConfig = (new LargeCustomDictionaryConfig())
        // The output path where the custom dictionary containing the GitHub usernames will be stored.
        ->setOutputPath((new CloudStoragePath())
            ->setPath($outputgcsPath))
        ->setBigQueryField($bigQueryField);

    // Configure the StoredInfoType we want the service to perform.
    $storedInfoTypeConfig = (new StoredInfoTypeConfig())
        ->setDisplayName($displayName)
        ->setDescription($description)
        ->setLargeCustomDictionary($largeCustomDictionaryConfig);

    // Send the stored infoType creation request and process the response.
    $parent = "projects/$callingProjectId/locations/global";
    $createStoredInfoTypeRequest = (new CreateStoredInfoTypeRequest())
        ->setParent($parent)
        ->setConfig($storedInfoTypeConfig)
        ->setStoredInfoTypeId($storedInfoTypeId);
    $response = $dlp->createStoredInfoType($createStoredInfoTypeRequest);

    // Print results.
    printf('Successfully created Stored InfoType : %s', $response->getName());
}

Python

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

import google.cloud.dlp


def create_stored_infotype(
    project: str,
    stored_info_type_id: str,
    output_bucket_name: str,
) -> None:
    """Uses the Data Loss Prevention API to create stored infoType.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        stored_info_type_id: The identifier for large custom dictionary.
        output_bucket_name: The name of the bucket in Google Cloud Storage
            that would store the created dictionary.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the stored infoType Configuration dictionary. This example creates
    # a stored infoType from a term list stored in a publicly available BigQuery
    # database (bigquery-public-data.samples.github_nested).
    # The database contains all GitHub usernames used in commits.
    stored_info_type_config = {
        "display_name": "GitHub usernames",
        "description": "Dictionary of GitHub usernames used in commits",
        "large_custom_dictionary": {
            "output_path": {"path": f"gs://{output_bucket_name}"},
            # We can either use bigquery field or gcs file as a term list input option.
            "big_query_field": {
                "table": {
                    "project_id": "bigquery-public-data",
                    "dataset_id": "samples",
                    "table_id": "github_nested",
                },
                "field": {"name": "actor"},
            },
        },
    }

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API.
    response = dlp.create_stored_info_type(
        request={
            "parent": parent,
            "config": stored_info_type_config,
            "stored_info_type_id": stored_info_type_id,
        }
    )

    # Print the result
    print(f"Created Stored InfoType: {response.name}")

REST

Crie uma nova pasta para o dicionário em um bucket do Cloud Storage. A Proteção de dados sensíveis cria pastas com os arquivos de dicionário no local especificado.

Crie o dicionário usando o método storedInfoTypes.create. O método create utiliza os seguintes parâmetros:
- Um objeto StoredInfoTypeConfig, que contém a configuração do infoType armazenado. Ele inclui:
  - description: uma descrição do dicionário.
  - displayName: o nome que você quer dar ao dicionário.
  - LargeCustomDictionaryConfig: contém a configuração do dicionário personalizado grande. Ele inclui:
    - BigQueryField: especificado se a lista de termos estiver armazenada no BigQuery. Inclui uma referência à tabela em que sua lista está armazenada, além do campo que contém cada frase do dicionário.
    - CloudStorageFileSet: especificado se a lista de termos estiver armazenada no Cloud Storage. Inclui o URL ao local de origem no Cloud Storage, no seguinte formato: "gs://[PATH_TO_GS]". Não há suporte para caracteres curinga.
    - outputPath: o caminho para o local em um bucket do Cloud Storage para armazenar o dicionário criado.
- storedInfoTypeId: o identificador do infoType armazenado. Você usa esse identificador para se referir ao infoType armazenado ao reconstruí-lo, excluí-lo ou usá-lo em um job de inspeção ou desidentificação. Se deixar esse campo em branco, o sistema gerará um identificador para você.

Veja a seguir o JSON de exemplo que, quando é enviado para o método storedInfoTypes.create, cria um novo infoType armazenado, especificamente um detector de dicionário personalizado grande. Este exemplo cria um infoType armazenado com base em uma lista de termos armazenada em um banco de dados público do BigQuery (bigquery-public-data.samples.github_nested). O banco de dados contém todos os nomes de usuário do GitHub usados em commits. O caminho de saída do dicionário gerado é definido como um bucket do Cloud Storage chamado dlptesting, e o infoType armazenado é chamado de github-usernames.

Entrada JSON

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes

{
  "config":{
    "displayName":"GitHub usernames",
    "description":"Dictionary of GitHub usernames used in commits",
    "largeCustomDictionary":{
      "outputPath":{
        "path":"gs://[PATH_TO_GS]"
      },
      "bigQueryField":{
        "table":{
          "datasetId":"samples",
          "projectId":"bigquery-public-data",
          "tableId":"github_nested"
        }
      }
    }
  },
  "storedInfoTypeId":"github-usernames"
}

Recriar o dicionário

Se quiser atualizar o dicionário, primeiro atualize a lista de termos de origem e instrua a Proteção de dados sensíveis a recriar o infoType armazenado.

Atualize a lista atual de termos de origem no Cloud Storage ou no BigQuery.

Adicione, remova ou altere os termos ou frases conforme necessário.
"Recrie" uma nova versão do infoType armazenado usando o console Google Cloud ou o método storedInfoTypes.patch.

A recriação cria uma nova versão do dicionário, que substitui a antiga.

Quando você recria um infoType armazenado para uma nova versão, a versão antiga é excluída. Enquanto a Proteção de dados sensíveis atualiza o infoType armazenado, o status dele é "pendente". Durante esse período, a versão antiga do infoType armazenado ainda existe. Todas as verificações que forem executadas enquanto o infoType armazenado estiver pendente serão feitas usando a versão antiga dele.

Para recriar o infoType armazenado:

Console

Atualize e salve sua lista de termos no Cloud Storage ou no BigQuery.
No console Google Cloud , acesse sua lista de infoTypes armazenados.

Acessar infoTypes armazenados
Clique no ID do infoType armazenado que você quer atualizar.
Na página Detalhes do infoType, clique em Reconstruir dados.

A proteção de dados sensíveis recria o infoType armazenado com as mudanças feitas na lista de termos de origem. Depois que o status do infoType armazenado estiver "Pronto", você poderá usá-lo. Todos os modelos ou gatilhos de jobs que usam o infoType armazenado vão usar automaticamente a versão reconstruída.

C#

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.


using System;
using Google.Cloud.Dlp.V2;
using Google.Protobuf.WellKnownTypes;

public class UpdateStoredInfoTypes
{
    public static StoredInfoType Update(
        string gcsFileUri,
        string storedInfoTypePath,
        string outputPath)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        // Construct the stored infotype config. Here, we will change the source from bigquery table to GCS file.
        var storedConfig = new StoredInfoTypeConfig
        {
            LargeCustomDictionary = new LargeCustomDictionaryConfig
            {
                CloudStorageFileSet = new CloudStorageFileSet
                {
                    Url = gcsFileUri
                },
                OutputPath = new CloudStoragePath
                {
                    Path = outputPath
                }
            }
        };

        // Construct the request using the stored config by specifying the update mask object
        // which represent the path of field to be updated.
        var request = new UpdateStoredInfoTypeRequest
        {
            Config = storedConfig,
            Name = storedInfoTypePath,
            UpdateMask = new FieldMask
            {
                Paths =
                {
                    "large_custom_dictionary.cloud_storage_file_set.url"
                }
            }
        };

        // Call the API.
        StoredInfoType response = dlp.UpdateStoredInfoType(request);

        // Inspect the result.
        Console.WriteLine(response);
        return response;
    }
}

Go

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
	"google.golang.org/protobuf/types/known/fieldmaskpb"
)

// updateStoredInfoType uses the Data Loss Prevention API to update stored infoType
// detector by changing the source term list from one stored in Bigquery
// to one stored in Cloud Storage.
func updateStoredInfoType(w io.Writer, projectID, gcsUri, fileSetUrl, infoTypeId string) error {
	// projectId := "your-project-id"
	// gcsUri := "gs://" + "your-bucket-name" + "/path/to/your/file.txt"
	// fileSetUrl := "your-cloud-storage-file-set"
	// infoTypeId := "your-stored-info-type-id"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Set path in Cloud Storage.
	cloudStoragePath := &dlppb.CloudStoragePath{
		Path: gcsUri,
	}
	cloudStorageFileSet := &dlppb.CloudStorageFileSet{
		Url: fileSetUrl,
	}

	// Configuration for a custom dictionary created from a data source of any size
	largeCustomDictionaryConfig := &dlppb.LargeCustomDictionaryConfig{
		OutputPath: cloudStoragePath,
		Source: &dlppb.LargeCustomDictionaryConfig_CloudStorageFileSet{
			CloudStorageFileSet: cloudStorageFileSet,
		},
	}

	// Set configuration for stored infoTypes.
	storedInfoTypeConfig := &dlppb.StoredInfoTypeConfig{
		Type: &dlppb.StoredInfoTypeConfig_LargeCustomDictionary{
			LargeCustomDictionary: largeCustomDictionaryConfig,
		},
	}

	// Set mask to control which fields get updated.
	fieldMask := &fieldmaskpb.FieldMask{
		Paths: []string{"large_custom_dictionary.cloud_storage_file_set.url"},
	}
	// Construct the job creation request to be sent by the client.
	req := &dlppb.UpdateStoredInfoTypeRequest{
		Name:       fmt.Sprint("projects/" + projectID + "/storedInfoTypes/" + infoTypeId),
		Config:     storedInfoTypeConfig,
		UpdateMask: fieldMask,
	}

	// Use the client to send the API request.
	resp, err := client.UpdateStoredInfoType(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.Name)
	return nil
}

Java

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CloudStorageFileSet;
import com.google.privacy.dlp.v2.CloudStoragePath;
import com.google.privacy.dlp.v2.LargeCustomDictionaryConfig;
import com.google.privacy.dlp.v2.StoredInfoType;
import com.google.privacy.dlp.v2.StoredInfoTypeConfig;
import com.google.privacy.dlp.v2.StoredInfoTypeName;
import com.google.privacy.dlp.v2.UpdateStoredInfoTypeRequest;
import com.google.protobuf.FieldMask;
import java.io.IOException;

public class UpdateStoredInfoType {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The path to file in GCS bucket that holds a collection of words and phrases to be searched by
    // the new infoType detector.
    String filePath = "gs://" + "your-bucket-name" + "/path/to/your/file.txt";
    // The path to the location in a GCS bucket to store the created dictionary.
    String outputPath = "your-cloud-storage-file-set";
    // The name of the stored InfoType which is to be updated.
    String infoTypeId = "your-stored-info-type-id";
    updateStoredInfoType(projectId, filePath, outputPath, infoTypeId);
  }

  // Update the stored info type rebuilding the Custom dictionary.
  public static void updateStoredInfoType(
      String projectId, String filePath, String outputPath, String infoTypeId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Set path in Cloud Storage.
      CloudStoragePath cloudStoragePath = CloudStoragePath.newBuilder().setPath(outputPath).build();
      CloudStorageFileSet cloudStorageFileSet =
          CloudStorageFileSet.newBuilder().setUrl(filePath).build();

      // Configuration for a custom dictionary created from a data source of any size
      LargeCustomDictionaryConfig largeCustomDictionaryConfig =
          LargeCustomDictionaryConfig.newBuilder()
              .setOutputPath(cloudStoragePath)
              .setCloudStorageFileSet(cloudStorageFileSet)
              .build();

      // Set configuration for stored infoTypes.
      StoredInfoTypeConfig storedInfoTypeConfig =
          StoredInfoTypeConfig.newBuilder()
              .setLargeCustomDictionary(largeCustomDictionaryConfig)
              .build();

      // Set mask to control which fields get updated.
      // Refer https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask for constructing the field mask paths.
      FieldMask fieldMask =
          FieldMask.newBuilder()
              .addPaths("large_custom_dictionary.cloud_storage_file_set.url")
              .build();

      // Construct the job creation request to be sent by the client.
      UpdateStoredInfoTypeRequest updateStoredInfoTypeRequest =
          UpdateStoredInfoTypeRequest.newBuilder()
              .setName(
                  StoredInfoTypeName.ofProjectStoredInfoTypeName(projectId, infoTypeId).toString())
              .setConfig(storedInfoTypeConfig)
              .setUpdateMask(fieldMask)
              .build();

      // Send the job creation request and process the response.
      StoredInfoType response = dlp.updateStoredInfoType(updateStoredInfoTypeRequest);

      // Print the results.
      System.out.println("Updated stored InfoType successfully: " + response.getName());
    }
  }
}

Node.js

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

// Import the required libraries
const dlp = require('@google-cloud/dlp');

// Create a DLP client
const dlpClient = new dlp.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The identifier for the stored infoType
// const infoTypeId = 'github-usernames';

// The path to the location in a Cloud Storage bucket to store the created dictionary
// const outputPath = 'cloud-bucket-path';

// Path of file containing term list
// const cloudStorageFileSet = 'gs://[PATH_TO_GS]';

async function updateStoredInfoType() {
  // Specify configuration of the large custom dictionary including cloudStorageFileSet and outputPath
  const largeCustomDictionaryConfig = {
    outputPath: {
      path: outputPath,
    },
    cloudStorageFileSet: {
      url: fileSetUrl,
    },
  };

  // Construct the job creation request to be sent by the client.
  const updateStoredInfoTypeRequest = {
    name: `projects/${projectId}/storedInfoTypes/${infoTypeId}`,
    config: {
      largeCustomDictionary: largeCustomDictionaryConfig,
    },
    updateMask: {
      paths: ['large_custom_dictionary.cloud_storage_file_set.url'],
    },
  };

  // Send the job creation request and process the response.
  const [response] = await dlpClient.updateStoredInfoType(
    updateStoredInfoTypeRequest
  );

  // Print the results.
  console.log(`InfoType updated successfully: ${JSON.stringify(response)}`);
}
await updateStoredInfoType();

PHP

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageFileSet;
use Google\Cloud\Dlp\V2\CloudStoragePath;
use Google\Cloud\Dlp\V2\LargeCustomDictionaryConfig;
use Google\Cloud\Dlp\V2\StoredInfoTypeConfig;
use Google\Cloud\Dlp\V2\UpdateStoredInfoTypeRequest;
use Google\Protobuf\FieldMask;

/**
 * Rebuild/Update the stored infoType.
 *
 * @param string $callingProjectId  The Google Cloud Project ID to run the API call under.
 * @param string $gcsPath           The path to file in GCS bucket that holds a collection of words and phrases to be searched by the new infoType detector.
 * @param string $outputgcsPath     The path to the location in a Cloud Storage bucket to store the created dictionary.
 * @param string $storedInfoTypeId  The name of the stored InfoType which is to be updated.
 *
 */
function update_stored_infotype(
    string $callingProjectId,
    string $gcsPath,
    string $outputgcsPath,
    string $storedInfoTypeId
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Set path in Cloud Storage.
    $cloudStorageFileSet = (new CloudStorageFileSet())
        ->setUrl($gcsPath);

    // Configuration for a custom dictionary created from a data source of any size
    $largeCustomDictionaryConfig = (new LargeCustomDictionaryConfig())
        ->setOutputPath((new CloudStoragePath())
            ->setPath($outputgcsPath))
        ->setCloudStorageFileSet($cloudStorageFileSet);

    // Set configuration for stored infoTypes.
    $storedInfoTypeConfig = (new StoredInfoTypeConfig())
        ->setLargeCustomDictionary($largeCustomDictionaryConfig);

    // Send the stored infoType creation request and process the response.

    $name = "projects/$callingProjectId/locations/global/storedInfoTypes/" . $storedInfoTypeId;
    // Set mask to control which fields get updated.
    // Refer https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask for constructing the field mask paths.
    $fieldMask = (new FieldMask())
        ->setPaths([
            'large_custom_dictionary.cloud_storage_file_set.url'
        ]);

    // Run request
    $updateStoredInfoTypeRequest = (new UpdateStoredInfoTypeRequest())
        ->setName($name)
        ->setConfig($storedInfoTypeConfig)
        ->setUpdateMask($fieldMask);
    $response = $dlp->updateStoredInfoType($updateStoredInfoTypeRequest);

    // Print results
    printf('Successfully update Stored InforType : %s' . PHP_EOL, $response->getName());
}

Python

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

import google.cloud.dlp


def update_stored_infotype(
    project: str,
    stored_info_type_id: str,
    gcs_input_file_path: str,
    output_bucket_name: str,
) -> None:
    """Uses the Data Loss Prevention API to update stored infoType
    detector by changing the source term list from one stored in Bigquery
    to one stored in Cloud Storage.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        stored_info_type_id: The identifier of stored infoType which is to
            be updated.
        gcs_input_file_path: The url in the format <bucket>/<path_to_file>
            for the location of the source term list.
        output_bucket_name: The name of the bucket in Google Cloud Storage
            where large dictionary is stored.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the stored infoType configuration dictionary.
    stored_info_type_config = {
        "large_custom_dictionary": {
            "output_path": {"path": f"gs://{output_bucket_name}"},
            "cloud_storage_file_set": {"url": f"gs://{gcs_input_file_path}"},
        }
    }

    # Set mask to control which fields get updated. For more details, refer
    # https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask
    # for constructing the field mask paths.
    field_mask = {"paths": ["large_custom_dictionary.cloud_storage_file_set.url"]}

    # Convert the stored infoType id into a full resource id.
    stored_info_type_name = (
        f"projects/{project}/locations/global/storedInfoTypes/{stored_info_type_id}"
    )

    # Call the API.
    response = dlp.update_stored_info_type(
        request={
            "name": stored_info_type_name,
            "config": stored_info_type_config,
            "update_mask": field_mask,
        }
    )

    # Print the result
    print(f"Updated stored infoType successfully: {response.name}")

REST

Atualizar a lista de termos

Se você estiver atualizando apenas a lista de termos no grande dicionário personalizado, a solicitação storedInfoTypes.patch exigirá apenas o campo name. Forneça o nome completo do recurso do infoType armazenado que você quer recriar.

Os seguintes padrões representam entradas válidas para o campo name:

organizations/ORGANIZATION_ID/storedInfoTypes/STORED_INFOTYPE_ID
projects/PROJECT_ID/storedInfoTypes/STORED_INFOTYPE_ID

Substitua STORED_INFOTYPE_ID pelo identificador do infoType armazenado que você quer recriar.

Se você não souber o identificador do infoType armazenado, chame o método storedInfoTypes.list para ver uma lista de todos os infoTypes armazenados no momento.

Exemplo

PATCH https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes/STORED_INFOTYPE_ID

Nesse caso, um corpo de solicitação não é necessário.

Trocar a lista de termos de origem

É possível mudar a lista de termos de origem de um infoType armazenado de um armazenado no BigQuery para um armazenado no Cloud Storage. Use o método storedInfoTypes.patch, mas inclua um objeto CloudStorageFileSet em LargeCustomDictionaryConfig onde você usou um objeto BigQueryField antes. Em seguida, defina o parâmetro updateMask como o infoType armazenado que você reconstruiu, no formato FieldMask. Por exemplo, o JSON a seguir declara no parâmetro updateMask que o URL do caminho do Cloud Storage foi atualizado (large_custom_dictionary.cloud_storage_file_set.url):

Exemplo

PATCH https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes/github-usernames

{
  "config":{
    "largeCustomDictionary":{
      "cloudStorageFileSet":{
        "url":"gs://[BUCKET_NAME]/[PATH_TO_FILE]"
      }
    }
  },
  "updateMask":"large_custom_dictionary.cloud_storage_file_set.url"
}

Da mesma forma, é possível mudar sua lista de termos de uma tabela do BigQuery para um bucket do Cloud Storage.

Verificar conteúdo usando um detector de dicionário personalizado grande

A verificação de conteúdo com um detector de dicionário personalizado grande é semelhante à verificação usando qualquer outro detector de infoType personalizado.

Este procedimento pressupõe que você já tenha um infoType armazenado. Para mais informações, consulte Criar um infoType armazenado nesta página.

Console

É possível aplicar um detector de dicionário personalizado grande ao fazer o seguinte:

Como criar um novo job
Criar ou editar um gatilho de job
Criar ou editar um modelo
Configurar a criação de perfil de dados

Na seção Configurar detecção da página, na subseção InfoTypes, é possível especificar o infoType do dicionário personalizado grande.

Clique em Gerenciar infoTypes.
No painel InfoTypes, clique na guia Personalizar.
Clique em Adicionar infoType personalizado.
No painel Adicionar infoType personalizado, faça o seguinte:
1. Em Tipo, selecione InfoType armazenado.
2. Em InfoType, insira um nome para o infoType personalizado. É possível usar letras, números e sublinhados.
3. Em Probabilidade, selecione o nível padrão que você quer atribuir a todas as descobertas que correspondem a esse infoType personalizado. É possível ajustar ainda mais o nível de probabilidade de descobertas individuais usando regras de hotword.
  
  Se você não especificar um valor padrão, o nível de probabilidade padrão será definido como VERY_LIKELY. Para mais informações, consulte Probabilidade de correspondência.
4. Em Sensibilidade, selecione o nível que você quer atribuir a todas as descobertas que correspondem a esse infoType personalizado. Se você não especificar um valor, os níveis de sensibilidade dessas descobertas serão definidos como HIGH.
  
  As pontuações de sensibilidade são usadas em perfis de dados. Ao criar perfis dos seus dados, a Proteção de dados sensíveis usa as pontuações de sensibilidade dos infoTypes para calcular o nível de sensibilidade.
5. Em Nome do infoType armazenado, selecione o infoType armazenado que você quer usar como base para o novo infoType personalizado.
6. Clique em Concluído para fechar o painel Adicionar infoType personalizado.
Opcional: na guia Integrado, edite sua seleção de infoTypes integrados.
Clique em Concluído para fechar o painel InfoTypes.

O infoType personalizado é adicionado à lista de infoTypes que a Proteção de dados sensíveis verifica. No entanto, essa seleção não é final até que você salve o job, o acionador de jobs, o modelo ou a configuração de verificação.
Quando terminar de criar ou editar a configuração, clique em Salvar.

C#

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectDataWithStoredInfotypes
{
    public static InspectContentResponse Inspect(
        string projectId,
        string storedInfotypePath,
        string text,
        InfoType infoType = null)
    {
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the infotype if null.
        var infotype = infoType ?? new InfoType { Name = "GITHUB_LOGINS" };

        // Construct the inspect config using stored infotype.
        var inspectConfig = new InspectConfig
        {
            CustomInfoTypes =
            {
                new CustomInfoType
                {
                    InfoType = infotype,
                    StoredType = new StoredType { Name = storedInfotypePath }
                }
            },
            IncludeQuote = true
        };

        // Construct the request using inspect config.
        var request = new InspectContentRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            InspectConfig = inspectConfig,
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        InspectContentResponse response = dlp.InspectContent(request);

        // Inspect the results.
        var findings = response.Result.Findings;
        Console.WriteLine($"Findings: {findings.Count}");
        foreach (var f in findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Go

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// inspectWithStoredInfotype inspects the given text using the specified stored infoType detector.
func inspectWithStoredInfotype(w io.Writer, projectID, infoTypeId, textToDeidentify string) error {
	// projectId := "your-project-id"
	// infoTypeId := "your-info-type-id"
	// textToDeidentify := "This commit was made by kewin2010"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be inspected.
	contentItem := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: textToDeidentify,
		},
	}

	// Specify the info type the inspection will look for.
	infoType := &dlppb.InfoType{
		Name: "GITHUB_LOGINS",
	}

	// Specify the stored info type the inspection will look for.
	storedType := &dlppb.StoredType{
		Name: infoTypeId,
	}

	customInfoType := &dlppb.CustomInfoType{
		InfoType: infoType,
		Type: &dlppb.CustomInfoType_StoredType{
			StoredType: storedType,
		},
	}

	// Specify how the content should be inspected.
	inspectConfig := &dlppb.InspectConfig{
		CustomInfoTypes: []*dlppb.CustomInfoType{
			customInfoType,
		},
		IncludeQuote: true,
	}

	// Construct the Inspect request to be sent by the client.
	req := &dlppb.InspectContentRequest{
		Parent:        fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: inspectConfig,
		Item:          contentItem,
	}

	// Use the client to send the API request.
	resp, err := client.InspectContent(ctx, req)
	if err != nil {
		return err
	}

	// Process the results.
	fmt.Fprintf(w, "Findings: %d\n", len(resp.Result.Findings))
	for _, f := range resp.Result.Findings {
		fmt.Fprintf(w, "\tQuote: %s\n", f.Quote)
		fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
		fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
	}
	return nil
}

Java

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.ProjectStoredInfoTypeName;
import com.google.privacy.dlp.v2.StoredType;
import java.io.IOException;

public class InspectWithStoredInfotype {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The sample assumes that you have an existing stored infoType.
    // To create a stored InfoType refer:
    // https://cloud.google.com/dlp/docs/creating-stored-infotypes#create-storedinfotye 
    String storedInfoTypeId = "your-info-type-id";
    // The string to de-identify.
    String textToInspect =
        "My phone number is (223) 456-7890 and my email address is gary@example.com.";
    inspectWithStoredInfotype(projectId, storedInfoTypeId, textToInspect);
  }

  //  Inspects the given text using the specified stored infoType detector.
  public static void inspectWithStoredInfotype(
      String projectId, String storedInfoTypeId, String textToInspect) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify the content to be inspected.
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToInspect).build();

      InfoType infoType = InfoType.newBuilder().setName("STORED_TYPE").build();

      // Reference to the existing StoredInfoType to inspect the data.
      StoredType storedType = StoredType.newBuilder()
              .setName(ProjectStoredInfoTypeName.of(projectId, storedInfoTypeId).toString())
              .build();

      CustomInfoType customInfoType =
          CustomInfoType.newBuilder().setInfoType(infoType).setStoredType(storedType).build();

      // Construct the configuration for the Inspect request.
      InspectConfig inspectConfig =
          InspectConfig.newBuilder()
              .addCustomInfoTypes(customInfoType)
              .setIncludeQuote(true)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest inspectContentRequest =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setInspectConfig(inspectConfig)
              .setItem(contentItem)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(inspectContentRequest);

      // Parse the response and process results.
      System.out.println("Findings: " + "" + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfoType: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood() + "\n");
      }
    }
  }
}

Node.js

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = 'your-project-id';

// The custom info-type id created and stored in the bucket.
// const infoTypeId = 'your-info-type-id';

// The string to inspect.
// const string = 'My phone number is (223) 456-7890 and my email address is gary@example.com.';

async function inspectWithStoredInfotype() {
  // Reference to the existing StoredInfoType to inspect the data.
  const customInfoType = {
    infoType: {
      name: 'GITHUB_LOGINS',
    },
    storedType: {
      name: infoTypeId,
    },
  };

  // Construct the configuration for the Inspect request.
  const inspectConfig = {
    customInfoTypes: [customInfoType],
    includeQuote: true,
  };

  // Construct the Inspect request to be sent by the client.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    inspectConfig: inspectConfig,
    item: {
      value: string,
    },
  };
  // Run request
  const [response] = await dlp.inspectContent(request);

  // Print Findings
  const findings = response.result.findings;
  if (findings.length > 0) {
    console.log(`Findings: ${findings.length}\n`);
    findings.forEach(finding => {
      console.log(`InfoType: ${finding.infoType.name}`);
      console.log(`\tQuote: ${finding.quote}`);
      console.log(`\tLikelihood: ${finding.likelihood} \n`);
    });
  } else {
    console.log('No findings.');
  }
}
await inspectWithStoredInfotype();

PHP

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CustomInfoType;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectContentRequest;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\StoredType;

/**
 * Inspect with stored infoType.
 * Scan content using a large custom dictionary detector.
 *
 * @param string $projectId             The Google Cloud Project ID to run the API call under.
 * @param string $storedInfoTypeName    The name of the stored infotype whose This value must be in the format
 * projects/projectName/(locations/locationId)/storedInfoTypes/storedInfoTypeName.
 * @param string $textToInspect         The string to inspect.
 */
function inspect_with_stored_infotype(
    string $projectId,
    string $storedInfoTypeName,
    string $textToInspect
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$projectId/locations/global";

    // Specify the content to be inspected.
    $item = (new ContentItem())
        ->setValue($textToInspect);

    // Reference to the existing StoredInfoType to inspect the data.
    $customInfoType = (new CustomInfoType())
        ->setInfoType((new InfoType())
            ->setName('STORED_TYPE'))
        ->setStoredType((new StoredType())
            ->setName($storedInfoTypeName));

    // Construct the configuration for the Inspect request.
    $inspectConfig = (new InspectConfig())
        ->setCustomInfoTypes([$customInfoType])
        ->setIncludeQuote(true);

    // Run request.
    $inspectContentRequest = (new InspectContentRequest())
        ->setParent($parent)
        ->setInspectConfig($inspectConfig)
        ->setItem($item);
    $response = $dlp->inspectContent($inspectContentRequest);

    // Print the results.
    $findings = $response->getResult()->getFindings();
    if (count($findings) == 0) {
        printf('No findings.' . PHP_EOL);
    } else {
        printf('Findings:' . PHP_EOL);
        foreach ($findings as $finding) {
            printf('  Quote: %s' . PHP_EOL, $finding->getQuote());
            printf('  Info type: %s' . PHP_EOL, $finding->getInfoType()->getName());
            printf('  Likelihood: %s' . PHP_EOL, Likelihood::name($finding->getLikelihood()));
        }
    }
}

Python

Para saber como instalar e usar a biblioteca de cliente da Proteção de dados sensíveis, consulte Bibliotecas de cliente da Proteção de dados sensíveis.

import google.cloud.dlp


def inspect_with_stored_infotype(
    project: str,
    stored_info_type_id: str,
    content_string: str,
) -> None:
    """Uses the Data Loss Prevention API to inspect/scan content using stored
    infoType.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to inspect.
        stored_info_type_id: The identifier of stored infoType used to inspect.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert stored infoType id into full resource id
    stored_type_name = f"projects/{project}/storedInfoTypes/{stored_info_type_id}"

    # Construct a custom info type dictionary using stored infoType.
    custom_info_types = [
        {
            "info_type": {"name": "STORED_TYPE"},
            "stored_type": {
                "name": stored_type_name,
            },
        }
    ]

    # Construct the inspection configuration dictionary.
    inspect_config = {
        "custom_info_types": custom_info_types,
        "include_quote": True,
    }

    # Construct the `item` to be inspected using stored infoType.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API.
    response = dlp.inspect_content(
        request={
            "parent": parent,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            print(f"Quote: {finding.quote}")
            print(f"Info type: {finding.info_type.name}")
            print(f"Likelihood: {finding.likelihood}")
    else:
        print("No findings.")

REST

Quando enviado ao método content.inspect, o exemplo a seguir verifica o texto fornecido usando o detector de infoType armazenado especificado. O parâmetro infoType é obrigatório porque todos os infoTypes personalizados precisam ter um nome que não seja conflitante com infoTypes incorporados ou outros infoTypes personalizados. O parâmetro storedType contém o caminho completo do recurso do infoType armazenado.

Entrada JSON

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:inspect

{
  "inspectConfig":{
    "customInfoTypes":[
      {
        "infoType":{
          "name":"GITHUB_LOGINS"
        },
        "storedType":{
          "name":"projects/PROJECT_ID/storedInfoTypes/github-logins"
        }
      }
    ]
  },
  "item":{
    "value":"The commit was made by githubuser."
  }
}

Solucionar erros

Se você receber um erro ao tentar criar um infoType armazenado com base em uma lista de termos armazenada no Cloud Storage, estas são algumas causas possíveis:

Você atingiu o limite máximo para infoTypes armazenados. Dependendo do problema, há várias soluções alternativas:
- Se você atingir o limite máximo de um único arquivo de entrada no Cloud Storage (200 MB), tente dividir o arquivo em vários. É possível usar vários arquivos para montar um único dicionário personalizado, desde que o tamanho combinado de todos os arquivos não exceda 1 GB.
- O BigQuery não tem os mesmos limites que o Cloud Storage. Pense na possibilidade de mover os termos para uma tabela do BigQuery. O tamanho máximo de uma coluna de dicionário personalizado no BigQuery é de 1 GB, e o número máximo de linhas é de 5 milhões.
- Se o arquivo de lista de termos exceder todos os limites aplicáveis para listas de termos de origem, divida o arquivo em vários e crie um dicionário para cada um. Em seguida, crie um job de verificação separado para cada dicionário.
Falta pelo menos uma letra ou um número em um ou mais dos termos. A Proteção de Dados Sensíveis não pode procurar termos que sejam compostos exclusivamente por espaços ou símbolos. Eles precisam ter pelo menos uma letra ou um número. Analise sua lista e veja se há alguns desses termos incluídos e, em seguida, corrija-os ou exclua-os.
A lista de termos contém uma frase com muitos "componentes". Um componente, neste contexto, é uma sequência contínua composta apenas de letras, de números ou de caracteres que não são letras nem números, como espaços ou símbolos. Analise sua lista e veja se há alguns desses termos incluídos e, em seguida, corrija-os ou exclua-os.
O agente de serviço da Proteção de Dados Sensíveis não tem acesso aos dados de origem do dicionário ou ao bucket do Cloud Storage para armazenar arquivos de dicionário. Para corrigir esse problema, conceda ao agente de serviço da Proteção de dados sensíveis o papel de administrador do Storage (roles/storage.admin) ou os papéis de proprietário de dados do BigQuery (roles/bigquery.dataOwner) e usuário de jobs do BigQuery (roles/bigquery.jobUser).

Visão geral da API

É necessário criar um infoType armazenado se você estiver criando um detector de dicionário personalizado grande.

Um infoType armazenado é representado na proteção de dados sensíveis pelo objeto StoredInfoType. Ela consiste nos seguintes objetos relacionados:

StoredInfoTypeVersion inclui a data e a hora de criação e as últimas cinco mensagens de erro que ocorreram quando a versão atual foi criada.
- StoredInfoTypeConfig contém a configuração do infoType armazenado, incluindo nome e descrição. Para um dicionário personalizado grande, o type precisa ser um LargeCustomDictionaryConfig.
  - LargeCustomDictionaryConfig especifica estas opções:
    - o local no Cloud Storage ou no BigQuery em que sua lista de frases está armazenada;
    - o local no Cloud Storage para armazenar os arquivos de dicionários gerados.
- StoredInfoTypeState contém o estado da versão mais atual e quaisquer versões pendentes do infoType armazenado. As informações de estado avisam se o infoType armazenado está sendo recriado, está pronto para uso ou é inválido.

Especificações de correspondência de dicionário

Veja a seguir orientações sobre como a Proteção de dados sensíveis combina palavras e frases do dicionário. As observações abaixo se aplicam a dicionários personalizados tanto grandes como regulares:

As palavras do dicionário são indiferentes a maiúsculas. Se o dicionário incluir Abby, ele corresponderá a abby, ABBY, Abby e assim por diante.
Todos os caracteres, tanto nos dicionários quanto no conteúdo que será verificado, diferentes de letras, dígitos e outros caracteres alfabéticos contidos no Plano Multilíngue Básico Unicode são considerados espaços em branco durante a verificação de correspondências. Se o dicionário procurar por Abby Abernathy, ele corresponderá a abby abernathy, Abby, Abernathy, Abby (ABERNATHY) e assim por diante.
Os caracteres ao redor das correspondências precisam ser de um tipo diferente (letras ou dígitos) dos caracteres adjacentes da palavra. Se o dicionário procurar Abi, ele corresponderá aos três primeiros caracteres de Abi904, mas não de Abigail.
As palavras do dicionário que contêm caracteres no Plano Multilíngue Suplementar do padrão Unicode podem gerar descobertas inesperadas. Exemplos desses caracteres são emojis, símbolos científicos e scripts históricos.

Letras, dígitos e outros caracteres alfabéticos são definidos da seguinte maneira:

Letras: caracteres com categorias gerais Lu, Ll, Lt, Lm ou Lo na especificação Unicode
Dígitos: caracteres com categoria geral Nd na especificação Unicode
Outros caracteres alfabéticos: caracteres com categoria geral Nl na especificação Unicode ou com propriedade contributiva Other_Alphabetic, conforme definido pelo padrão Unicode

Use os métodos a seguir para editar ou excluir um infoType armazenado:

storedInfoTypes.create: cria um novo infoType armazenado, conforme o StoredInfoTypeConfig especificado.
storedInfoTypes.patch: recria o infoType armazenado com um novo StoredInfoTypeConfig especificado. Se nenhum for especificado, esse método vai criar uma nova versão do infoType armazenado com o StoredInfoTypeConfig atual.
storedInfoTypes.get: recupera o StoredInfoTypeConfig e todas as versões pendentes do infoType armazenado especificado.
storedInfoTypes.list: lista todos os infoTypes armazenados no momento.
storedInfoTypes.delete: exclui o infoType armazenado especificado.

Criar um detector de dicionário personalizado grande Mantenha tudo organizado com as coleções Salve e categorize o conteúdo com base nas suas preferências.

Quando escolher um dicionário personalizado grande em vez de um regular

Como os dicionários personalizados grandes são diferentes de outros infoTypes personalizados

Criar um dicionário personalizado grande

Criar uma lista de termos

Criar um infoType armazenado

Console

C#

Go

Java

Node.js

PHP

Python

REST

Recriar o dicionário

Console

C#

Go

Java

Node.js

PHP

Python

REST

Atualizar a lista de termos

Exemplo

Trocar a lista de termos de origem

Exemplo

Verificar conteúdo usando um detector de dicionário personalizado grande

Console

C#

Go

Java

Node.js

PHP

Python

REST

Solucionar erros

Visão geral da API

Especificações de correspondência de dicionário

Criar um detector de dicionário personalizado grande