Cloud Data Loss Prevention (Cloud DLP) ora fa parte di Sensitive Data Protection. Il nome dell'API rimane invariato: API Cloud Data Loss Prevention (API DLP). Per informazioni sui servizi che compongono Sensitive Data Protection, consulta la panoramica di Sensitive Data Protection.

Questa pagina è stata tradotta dall'API Cloud Translation.

Creare un rilevatore di dizionari personalizzati di grandi dimensioni

Questo argomento descrive come creare e ricompilare dizionari personalizzati di grandi dimensioni. Copre anche diversi scenari di errore.

Quando scegliere un dizionario personalizzato grande anziché uno normale

I rilevatori di dizionari personalizzati regolari sono sufficienti quando hai decine di migliaia di parole o frasi sensibili che vuoi utilizzare per la scansione dei tuoi contenuti. Se hai più termini o se il tuo elenco di termini cambia di frequente, valuta la possibilità di creare un dizionario personalizzato di grandi dimensioni, che può supportare decine di milioni di termini.

Differenze tra i dizionari personalizzati di grandi dimensioni e gli altri infoType personalizzati

I dizionari personalizzati di grandi dimensioni sono diversi dagli altri infoType personalizzati in quanto ognuno è composto da due componenti:

Un elenco di frasi che crei e definisci. L'elenco viene archiviato come file di testo in Cloud Storage o come colonna in una tabella BigQuery.
I file di dizionario, che Sensitive Data Protection genera e archivia in Cloud Storage. I file del dizionario sono composti da una copia dell'elenco dei termini più i filtri Bloom, che aiutano nella ricerca e nella corrispondenza.

Creare un dizionario personalizzato grande

Questa sezione descrive come creare, modificare e ricompilare un dizionario personalizzato di grandi dimensioni.

Creare un elenco di termini

Crea un elenco contenente tutte le parole e le frasi che vuoi che il nuovo rilevatore di infoType cerchi. Esegui una di queste operazioni:

Inserisci un file di testo con ogni parola o frase su una riga separata in un bucket Cloud Storage.
Designa una colonna di una tabella BigQuery come contenitore per le parole e le frasi. Assegna a ogni voce una riga nella colonna. Puoi utilizzare una tabella BigQuery esistente, a condizione che tutte le parole e le frasi del dizionario si trovino in una singola colonna.

È possibile assemblare un elenco di termini troppo grande per essere elaborato da Sensitive Data Protection. Se viene visualizzato un messaggio di errore, vedi Risoluzione degli errori più avanti in questo argomento.

Creare un infoType archiviato

Dopo aver creato l'elenco dei termini, utilizza Sensitive Data Protection per creare un dizionario:

Console

In un bucket Cloud Storage, crea una nuova cartella in cui Sensitive Data Protection archivierà il dizionario generato.

Sensitive Data Protection crea cartelle contenenti i file del dizionario nella posizione specificata.
Nella console Google Cloud , vai alla pagina Crea tipo di informazioni.

Vai a Crea infoType
Per Tipo, seleziona Dizionario personalizzato grande.
Per ID infoType, inserisci un identificatore per l'infoType archiviato.

Utilizzerai questo identificatore durante la configurazione dei job di ispezione e anonimizzazione. Puoi utilizzare lettere, numeri, trattini e trattini bassi nel nome.
In Nome visualizzato InfoType, inserisci un nome per l'infoType archiviato.

Puoi utilizzare spazi e punteggiatura nel nome.
In Descrizione, inserisci una descrizione di ciò che rileva l'infoType archiviato.
Per Tipo di archiviazione, seleziona la posizione dell'elenco dei termini:
- BigQuery: inserisci l'ID progetto, l'ID set di dati e l'ID tabella. Nel campo Nome campo, inserisci l'identificatore della colonna. Puoi designare al massimo una colonna della tabella.
- Google Cloud Storage: inserisci il percorso del file.
Per Bucket o cartella di output, inserisci la posizione Cloud Storage della cartella che hai creato nel passaggio 1.

Nota: non inserire file all'interno delle cartelle create da Sensitive Data Protection durante la creazione del dizionario. Quando la protezione dei dati sensibili crea o ricompila i file dizionario, elimina tutti gli altri file archiviati nelle cartelle che crea. Vengono mantenute tutte le cartelle allo stesso livello delle cartelle create da Sensitive Data Protection.
Fai clic su Crea.

Viene visualizzato un riepilogo dell'infoType archiviato. Quando il dizionario viene generato e il nuovo infoType archiviato è pronto all'uso, lo stato dell'infoType mostra Pronto.

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

Per eseguire l'autenticazione in Sensitive Data Protection, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class CreateStoredInfoTypes
{
    public static StoredInfoType Create(
        string projectId,
        string outputPath,
        string storedInfoTypeId)
    {
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the stored infotype config by specifying the public table and 
        // cloud storage output path.
        var storedInfoTypeConfig = new StoredInfoTypeConfig
        {
            DisplayName = "Github Usernames",
            Description = "Dictionary of Github usernames used in commits.",
            LargeCustomDictionary = new LargeCustomDictionaryConfig
            {
                BigQueryField = new BigQueryField
                {
                    Table = new BigQueryTable
                    {
                        DatasetId = "samples",
                        ProjectId = "bigquery-public-data",
                        TableId = "github_nested"
                    },
                    Field = new FieldId
                    {
                        Name = "actor"
                    }
                },
                OutputPath = new CloudStoragePath
                {
                    Path = outputPath
                }
            },
        };

        // Construct the request.
        var request = new CreateStoredInfoTypeRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            Config = storedInfoTypeConfig,
            StoredInfoTypeId = storedInfoTypeId
        };

        // Call the API.
        StoredInfoType response = dlp.CreateStoredInfoType(request);

        // Inspect the response.
        Console.WriteLine($"Created the stored infotype at path: {response.Name}");

        return response;
    }
}

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// createStoredInfoType creates a custom stored info type based on your input data.
func createStoredInfoType(w io.Writer, projectID, outputPath string) error {
	// projectId := "my-project-id"
	// outputPath := "gs://" + "your-bucket-name" + "path/to/directory"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the name you want to give the dictionary.
	displayName := "Github Usernames"

	// Specify a description of the dictionary.
	description := "Dictionary of GitHub usernames used in commits"

	// Specify the path to the location in a Cloud Storage
	// bucket to store the created dictionary.
	cloudStoragePath := &dlppb.CloudStoragePath{
		Path: outputPath,
	}

	// Specify your term list is stored in BigQuery.
	bigQueryField := &dlppb.BigQueryField{
		Table: &dlppb.BigQueryTable{
			ProjectId: "bigquery-public-data",
			DatasetId: "samples",
			TableId:   "github_nested",
		},
		Field: &dlppb.FieldId{
			Name: "actor",
		},
	}

	// Specify the configuration of the large custom dictionary.
	largeCustomDictionaryConfig := &dlppb.LargeCustomDictionaryConfig{
		OutputPath: cloudStoragePath,
		Source: &dlppb.LargeCustomDictionaryConfig_BigQueryField{
			BigQueryField: bigQueryField,
		},
	}

	// Specify the configuration for stored infoType.
	storedInfoTypeConfig := &dlppb.StoredInfoTypeConfig{
		DisplayName: displayName,
		Description: description,
		Type: &dlppb.StoredInfoTypeConfig_LargeCustomDictionary{
			LargeCustomDictionary: largeCustomDictionaryConfig,
		},
	}

	// Combine configurations into a request for the service.
	req := &dlppb.CreateStoredInfoTypeRequest{
		Parent:           fmt.Sprintf("projects/%s/locations/global", projectID),
		Config:           storedInfoTypeConfig,
		StoredInfoTypeId: "github-usernames",
	}

	// Send the request and receive response from the service.
	resp, err := client.CreateStoredInfoType(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.Name)
	return nil

}

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.BigQueryField;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CloudStoragePath;
import com.google.privacy.dlp.v2.CreateStoredInfoTypeRequest;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.LargeCustomDictionaryConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.StoredInfoType;
import com.google.privacy.dlp.v2.StoredInfoTypeConfig;
import java.io.IOException;

public class CreateStoredInfoType {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.

    //The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The path to the location in a GCS bucket to store the created dictionary.
    String outputPath = "gs://" + "your-bucket-name" + "path/to/directory";
    createStoredInfoType(projectId, outputPath);
  }

  // Creates a custom stored info type that contains GitHub usernames used in commits.
  public static void createStoredInfoType(String projectId, String outputPath)
      throws IOException {
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Optionally set a display name and a description.
      String displayName = "GitHub usernames";
      String description = "Dictionary of GitHub usernames used in commits";

      // The output path where the custom dictionary containing the GitHub usernames will be stored.
      CloudStoragePath cloudStoragePath =
          CloudStoragePath.newBuilder()
              .setPath(outputPath)
              .build();

      // The reference to the table containing the GitHub usernames.
      BigQueryTable table = BigQueryTable.newBuilder()
              .setProjectId("bigquery-public-data")
              .setDatasetId("samples")
              .setTableId("github_nested")
              .build();

      // The reference to the BigQuery field that contains the GitHub usernames.
      BigQueryField bigQueryField = BigQueryField.newBuilder()
              .setTable(table)
              .setField(FieldId.newBuilder().setName("actor").build())
              .build();

      LargeCustomDictionaryConfig largeCustomDictionaryConfig =
          LargeCustomDictionaryConfig.newBuilder()
              .setOutputPath(cloudStoragePath)
              .setBigQueryField(bigQueryField)
              .build();

      StoredInfoTypeConfig storedInfoTypeConfig = StoredInfoTypeConfig.newBuilder()
              .setDisplayName(displayName)
              .setDescription(description)
              .setLargeCustomDictionary(largeCustomDictionaryConfig)
              .build();

      // Combine configurations into a request for the service.
      CreateStoredInfoTypeRequest createStoredInfoType = CreateStoredInfoTypeRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setConfig(storedInfoTypeConfig)
              .setStoredInfoTypeId("github-usernames")
              .build();

      // Send the request and receive response from the service.
      StoredInfoType response = dlp.createStoredInfoType(createStoredInfoType);

      // Print the results.
      System.out.println("Created Stored InfoType: " + response.getName());
    }
  }
}

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

// Import the required libraries
const dlp = require('@google-cloud/dlp');

// Create a DLP client
const dlpClient = new dlp.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The identifier for the stored infoType
// const infoTypeId = 'github-usernames';

// The path to the location in a Cloud Storage bucket to store the created dictionary
// const outputPath = 'cloud-bucket-path';

// The project ID the table is stored under
// This may or (for public datasets) may not equal the calling project ID
// const dataProjectId = 'my-project';

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// Field ID to be used for constructing dictionary
// const fieldName = 'field_name';

async function createStoredInfoType() {
  // The name you want to give the dictionary.
  const displayName = 'GitHub usernames';
  // A description of the dictionary.
  const description = 'Dictionary of GitHub usernames used in commits';

  // Specify configuration for the large custom dictionary
  const largeCustomDictionaryConfig = {
    outputPath: {
      path: outputPath,
    },
    bigQueryField: {
      table: {
        datasetId: datasetId,
        projectId: dataProjectId,
        tableId: tableId,
      },
      field: {
        name: fieldName,
      },
    },
  };

  // Stored infoType configuration that uses large custom dictionary.
  const storedInfoTypeConfig = {
    displayName: displayName,
    description: description,
    largeCustomDictionary: largeCustomDictionaryConfig,
  };

  // Construct the job creation request to be sent by the client.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    config: storedInfoTypeConfig,
    storedInfoTypeId: infoTypeId,
  };

  // Send the job creation request and process the response.
  const [response] = await dlpClient.createStoredInfoType(request);

  // Print results
  console.log(`InfoType stored successfully: ${response.name}`);
}
await createStoredInfoType();

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.


use Google\Cloud\Dlp\V2\BigQueryField;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStoragePath;
use Google\Cloud\Dlp\V2\CreateStoredInfoTypeRequest;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\LargeCustomDictionaryConfig;
use Google\Cloud\Dlp\V2\StoredInfoTypeConfig;

/**
 * Create a stored infoType.
 *
 * @param string $callingProjectId  The Google Cloud Project ID to run the API call under.
 * @param string $outputgcsPath     The path to the location in a Cloud Storage bucket to store the created dictionary.
 * @param string $storedInfoTypeId  The name of the custom stored info type.
 * @param string $displayName       The human-readable name to give the stored infoType.
 * @param string $description       A description for the stored infoType to be created.
 */
function create_stored_infotype(
    string $callingProjectId,
    string $outputgcsPath,
    string $storedInfoTypeId,
    string $displayName,
    string $description
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // The reference to the table containing the GitHub usernames.
    // The reference to the BigQuery field that contains the GitHub usernames.
    // Note: we have used public data
    $bigQueryField = (new BigQueryField())
        ->setTable((new BigQueryTable())
            ->setDatasetId('samples')
            ->setProjectId('bigquery-public-data')
            ->setTableId('github_nested'))
        ->setField((new FieldId())
            ->setName('actor'));

    $largeCustomDictionaryConfig = (new LargeCustomDictionaryConfig())
        // The output path where the custom dictionary containing the GitHub usernames will be stored.
        ->setOutputPath((new CloudStoragePath())
            ->setPath($outputgcsPath))
        ->setBigQueryField($bigQueryField);

    // Configure the StoredInfoType we want the service to perform.
    $storedInfoTypeConfig = (new StoredInfoTypeConfig())
        ->setDisplayName($displayName)
        ->setDescription($description)
        ->setLargeCustomDictionary($largeCustomDictionaryConfig);

    // Send the stored infoType creation request and process the response.
    $parent = "projects/$callingProjectId/locations/global";
    $createStoredInfoTypeRequest = (new CreateStoredInfoTypeRequest())
        ->setParent($parent)
        ->setConfig($storedInfoTypeConfig)
        ->setStoredInfoTypeId($storedInfoTypeId);
    $response = $dlp->createStoredInfoType($createStoredInfoTypeRequest);

    // Print results.
    printf('Successfully created Stored InfoType : %s', $response->getName());
}

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

import google.cloud.dlp


def create_stored_infotype(
    project: str,
    stored_info_type_id: str,
    output_bucket_name: str,
) -> None:
    """Uses the Data Loss Prevention API to create stored infoType.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        stored_info_type_id: The identifier for large custom dictionary.
        output_bucket_name: The name of the bucket in Google Cloud Storage
            that would store the created dictionary.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the stored infoType Configuration dictionary. This example creates
    # a stored infoType from a term list stored in a publicly available BigQuery
    # database (bigquery-public-data.samples.github_nested).
    # The database contains all GitHub usernames used in commits.
    stored_info_type_config = {
        "display_name": "GitHub usernames",
        "description": "Dictionary of GitHub usernames used in commits",
        "large_custom_dictionary": {
            "output_path": {"path": f"gs://{output_bucket_name}"},
            # We can either use bigquery field or gcs file as a term list input option.
            "big_query_field": {
                "table": {
                    "project_id": "bigquery-public-data",
                    "dataset_id": "samples",
                    "table_id": "github_nested",
                },
                "field": {"name": "actor"},
            },
        },
    }

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API.
    response = dlp.create_stored_info_type(
        request={
            "parent": parent,
            "config": stored_info_type_config,
            "stored_info_type_id": stored_info_type_id,
        }
    )

    # Print the result
    print(f"Created Stored InfoType: {response.name}")

REST

Crea una nuova cartella per il dizionario in un bucket Cloud Storage. Sensitive Data Protection crea cartelle contenenti i file del dizionario nella posizione specificata.

Crea il dizionario utilizzando il metodo storedInfoTypes.create. Il metodo create accetta i seguenti parametri:
- Un oggetto StoredInfoTypeConfig che contiene la configurazione dell'infoType archiviato. Include:
  - description: una descrizione del dizionario.
  - displayName: il nome che vuoi dare al dizionario.
  - LargeCustomDictionaryConfig: Contiene la configurazione del dizionario personalizzato di grandi dimensioni. Include:
    - BigQueryField: Specifica se l'elenco dei termini è archiviato in BigQuery. Include un riferimento alla tabella in cui è memorizzato l'elenco, oltre al campo che contiene ogni frase del dizionario.
    - CloudStorageFileSet: Specificato se l'elenco dei termini è archiviato in Cloud Storage. Include l'URL della posizione di origine in Cloud Storage, nel seguente formato: "gs://[PATH_TO_GS]". I caratteri jolly sono supportati.
    - outputPath: il percorso della posizione in un bucket Cloud Storage in cui archiviare il dizionario creato.
- storedInfoTypeId: L'identificatore dell'infoType memorizzato. Utilizzi questo identificatore per fare riferimento all'infoType archiviato quando lo ricompili, lo elimini o lo utilizzi in un'ispezione o in un job di deidentificazione. Se lasci vuoto questo campo, il sistema genera un identificatore per te.

Di seguito è riportato un esempio di JSON che, se inviato al metodo storedInfoTypes.create, crea un nuovo infoType archiviato, in particolare un rilevatore di dizionari personalizzati di grandi dimensioni. Questo esempio crea un tipo di infoType memorizzato da un elenco di termini archiviato in un database BigQuery disponibile pubblicamente (bigquery-public-data.samples.github_nested). Il database contiene tutti i nomi utente GitHub utilizzati nei commit. Il percorso di output per il dizionario generato è impostato su un bucket Cloud Storage denominato dlptesting e l'infoType archiviato è denominato github-usernames.

Input JSON

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes

{
  "config":{
    "displayName":"GitHub usernames",
    "description":"Dictionary of GitHub usernames used in commits",
    "largeCustomDictionary":{
      "outputPath":{
        "path":"gs://[PATH_TO_GS]"
      },
      "bigQueryField":{
        "table":{
          "datasetId":"samples",
          "projectId":"bigquery-public-data",
          "tableId":"github_nested"
        }
      }
    }
  },
  "storedInfoTypeId":"github-usernames"
}

Ricostruire il dizionario

Se vuoi aggiornare il dizionario, devi prima aggiornare l'elenco dei termini di origine e poi devi indicare a Sensitive Data Protection di ricompilare l'infoType archiviato.

Aggiorna l'elenco dei termini di origine esistente in Cloud Storage o BigQuery.

Aggiungi, rimuovi o modifica i termini o le frasi in base alle esigenze.
Crea una nuova versione dell'infoType archiviato "ricostruendolo" utilizzando la console Google Cloud o il metodo storedInfoTypes.patch.

La ricompilazione crea una nuova versione del dizionario, che sostituisce il vecchio dizionario.

Quando ricompili un InfoType archiviato in una nuova versione, quella precedente viene eliminata. Mentre Sensitive Data Protection aggiorna l'InfoType archiviato, il suo stato è "In attesa". Durante questo periodo, la vecchia versione dell'infoType archiviato esiste ancora. Tutte le scansioni eseguite mentre l'infoType archiviato è in stato in attesa verranno eseguite utilizzando la versione precedente dell'infoType archiviato.

Per ricreare l'infoType archiviato:

Console

Aggiorna e salva l'elenco dei termini in Cloud Storage o BigQuery.
Nella console Google Cloud , vai all'elenco degli infoType archiviati.

Vai agli infoType archiviati
Fai clic sull'ID dell'InfoType archiviato che vuoi aggiornare.
Nella pagina Dettagli infoType, fai clic su Ricostruisci dati.

Sensitive Data Protection ricompila l'infoType memorizzato con le modifiche apportate all'elenco dei termini di origine. Una volta che lo stato dell'infoType archiviato è "Pronto", puoi utilizzarlo. Tutti i modelli o gli attivatori di job che utilizzano l'infoType archiviato utilizzeranno automaticamente la versione ricompilata.

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.


using System;
using Google.Cloud.Dlp.V2;
using Google.Protobuf.WellKnownTypes;

public class UpdateStoredInfoTypes
{
    public static StoredInfoType Update(
        string gcsFileUri,
        string storedInfoTypePath,
        string outputPath)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        // Construct the stored infotype config. Here, we will change the source from bigquery table to GCS file.
        var storedConfig = new StoredInfoTypeConfig
        {
            LargeCustomDictionary = new LargeCustomDictionaryConfig
            {
                CloudStorageFileSet = new CloudStorageFileSet
                {
                    Url = gcsFileUri
                },
                OutputPath = new CloudStoragePath
                {
                    Path = outputPath
                }
            }
        };

        // Construct the request using the stored config by specifying the update mask object
        // which represent the path of field to be updated.
        var request = new UpdateStoredInfoTypeRequest
        {
            Config = storedConfig,
            Name = storedInfoTypePath,
            UpdateMask = new FieldMask
            {
                Paths =
                {
                    "large_custom_dictionary.cloud_storage_file_set.url"
                }
            }
        };

        // Call the API.
        StoredInfoType response = dlp.UpdateStoredInfoType(request);

        // Inspect the result.
        Console.WriteLine(response);
        return response;
    }
}

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
	"google.golang.org/protobuf/types/known/fieldmaskpb"
)

// updateStoredInfoType uses the Data Loss Prevention API to update stored infoType
// detector by changing the source term list from one stored in Bigquery
// to one stored in Cloud Storage.
func updateStoredInfoType(w io.Writer, projectID, gcsUri, fileSetUrl, infoTypeId string) error {
	// projectId := "your-project-id"
	// gcsUri := "gs://" + "your-bucket-name" + "/path/to/your/file.txt"
	// fileSetUrl := "your-cloud-storage-file-set"
	// infoTypeId := "your-stored-info-type-id"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Set path in Cloud Storage.
	cloudStoragePath := &dlppb.CloudStoragePath{
		Path: gcsUri,
	}
	cloudStorageFileSet := &dlppb.CloudStorageFileSet{
		Url: fileSetUrl,
	}

	// Configuration for a custom dictionary created from a data source of any size
	largeCustomDictionaryConfig := &dlppb.LargeCustomDictionaryConfig{
		OutputPath: cloudStoragePath,
		Source: &dlppb.LargeCustomDictionaryConfig_CloudStorageFileSet{
			CloudStorageFileSet: cloudStorageFileSet,
		},
	}

	// Set configuration for stored infoTypes.
	storedInfoTypeConfig := &dlppb.StoredInfoTypeConfig{
		Type: &dlppb.StoredInfoTypeConfig_LargeCustomDictionary{
			LargeCustomDictionary: largeCustomDictionaryConfig,
		},
	}

	// Set mask to control which fields get updated.
	fieldMask := &fieldmaskpb.FieldMask{
		Paths: []string{"large_custom_dictionary.cloud_storage_file_set.url"},
	}
	// Construct the job creation request to be sent by the client.
	req := &dlppb.UpdateStoredInfoTypeRequest{
		Name:       fmt.Sprint("projects/" + projectID + "/storedInfoTypes/" + infoTypeId),
		Config:     storedInfoTypeConfig,
		UpdateMask: fieldMask,
	}

	// Use the client to send the API request.
	resp, err := client.UpdateStoredInfoType(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.Name)
	return nil
}

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CloudStorageFileSet;
import com.google.privacy.dlp.v2.CloudStoragePath;
import com.google.privacy.dlp.v2.LargeCustomDictionaryConfig;
import com.google.privacy.dlp.v2.StoredInfoType;
import com.google.privacy.dlp.v2.StoredInfoTypeConfig;
import com.google.privacy.dlp.v2.StoredInfoTypeName;
import com.google.privacy.dlp.v2.UpdateStoredInfoTypeRequest;
import com.google.protobuf.FieldMask;
import java.io.IOException;

public class UpdateStoredInfoType {
  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The path to file in GCS bucket that holds a collection of words and phrases to be searched by
    // the new infoType detector.
    String filePath = "gs://" + "your-bucket-name" + "/path/to/your/file.txt";
    // The path to the location in a GCS bucket to store the created dictionary.
    String outputPath = "your-cloud-storage-file-set";
    // The name of the stored InfoType which is to be updated.
    String infoTypeId = "your-stored-info-type-id";
    updateStoredInfoType(projectId, filePath, outputPath, infoTypeId);
  }

  // Update the stored info type rebuilding the Custom dictionary.
  public static void updateStoredInfoType(
      String projectId, String filePath, String outputPath, String infoTypeId) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Set path in Cloud Storage.
      CloudStoragePath cloudStoragePath = CloudStoragePath.newBuilder().setPath(outputPath).build();
      CloudStorageFileSet cloudStorageFileSet =
          CloudStorageFileSet.newBuilder().setUrl(filePath).build();

      // Configuration for a custom dictionary created from a data source of any size
      LargeCustomDictionaryConfig largeCustomDictionaryConfig =
          LargeCustomDictionaryConfig.newBuilder()
              .setOutputPath(cloudStoragePath)
              .setCloudStorageFileSet(cloudStorageFileSet)
              .build();

      // Set configuration for stored infoTypes.
      StoredInfoTypeConfig storedInfoTypeConfig =
          StoredInfoTypeConfig.newBuilder()
              .setLargeCustomDictionary(largeCustomDictionaryConfig)
              .build();

      // Set mask to control which fields get updated.
      // Refer https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask for constructing the field mask paths.
      FieldMask fieldMask =
          FieldMask.newBuilder()
              .addPaths("large_custom_dictionary.cloud_storage_file_set.url")
              .build();

      // Construct the job creation request to be sent by the client.
      UpdateStoredInfoTypeRequest updateStoredInfoTypeRequest =
          UpdateStoredInfoTypeRequest.newBuilder()
              .setName(
                  StoredInfoTypeName.ofProjectStoredInfoTypeName(projectId, infoTypeId).toString())
              .setConfig(storedInfoTypeConfig)
              .setUpdateMask(fieldMask)
              .build();

      // Send the job creation request and process the response.
      StoredInfoType response = dlp.updateStoredInfoType(updateStoredInfoTypeRequest);

      // Print the results.
      System.out.println("Updated stored InfoType successfully: " + response.getName());
    }
  }
}

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

// Import the required libraries
const dlp = require('@google-cloud/dlp');

// Create a DLP client
const dlpClient = new dlp.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = "your-project-id";

// The identifier for the stored infoType
// const infoTypeId = 'github-usernames';

// The path to the location in a Cloud Storage bucket to store the created dictionary
// const outputPath = 'cloud-bucket-path';

// Path of file containing term list
// const cloudStorageFileSet = 'gs://[PATH_TO_GS]';

async function updateStoredInfoType() {
  // Specify configuration of the large custom dictionary including cloudStorageFileSet and outputPath
  const largeCustomDictionaryConfig = {
    outputPath: {
      path: outputPath,
    },
    cloudStorageFileSet: {
      url: fileSetUrl,
    },
  };

  // Construct the job creation request to be sent by the client.
  const updateStoredInfoTypeRequest = {
    name: `projects/${projectId}/storedInfoTypes/${infoTypeId}`,
    config: {
      largeCustomDictionary: largeCustomDictionaryConfig,
    },
    updateMask: {
      paths: ['large_custom_dictionary.cloud_storage_file_set.url'],
    },
  };

  // Send the job creation request and process the response.
  const [response] = await dlpClient.updateStoredInfoType(
    updateStoredInfoTypeRequest
  );

  // Print the results.
  console.log(`InfoType updated successfully: ${JSON.stringify(response)}`);
}
await updateStoredInfoType();

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageFileSet;
use Google\Cloud\Dlp\V2\CloudStoragePath;
use Google\Cloud\Dlp\V2\LargeCustomDictionaryConfig;
use Google\Cloud\Dlp\V2\StoredInfoTypeConfig;
use Google\Cloud\Dlp\V2\UpdateStoredInfoTypeRequest;
use Google\Protobuf\FieldMask;

/**
 * Rebuild/Update the stored infoType.
 *
 * @param string $callingProjectId  The Google Cloud Project ID to run the API call under.
 * @param string $gcsPath           The path to file in GCS bucket that holds a collection of words and phrases to be searched by the new infoType detector.
 * @param string $outputgcsPath     The path to the location in a Cloud Storage bucket to store the created dictionary.
 * @param string $storedInfoTypeId  The name of the stored InfoType which is to be updated.
 *
 */
function update_stored_infotype(
    string $callingProjectId,
    string $gcsPath,
    string $outputgcsPath,
    string $storedInfoTypeId
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Set path in Cloud Storage.
    $cloudStorageFileSet = (new CloudStorageFileSet())
        ->setUrl($gcsPath);

    // Configuration for a custom dictionary created from a data source of any size
    $largeCustomDictionaryConfig = (new LargeCustomDictionaryConfig())
        ->setOutputPath((new CloudStoragePath())
            ->setPath($outputgcsPath))
        ->setCloudStorageFileSet($cloudStorageFileSet);

    // Set configuration for stored infoTypes.
    $storedInfoTypeConfig = (new StoredInfoTypeConfig())
        ->setLargeCustomDictionary($largeCustomDictionaryConfig);

    // Send the stored infoType creation request and process the response.

    $name = "projects/$callingProjectId/locations/global/storedInfoTypes/" . $storedInfoTypeId;
    // Set mask to control which fields get updated.
    // Refer https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask for constructing the field mask paths.
    $fieldMask = (new FieldMask())
        ->setPaths([
            'large_custom_dictionary.cloud_storage_file_set.url'
        ]);

    // Run request
    $updateStoredInfoTypeRequest = (new UpdateStoredInfoTypeRequest())
        ->setName($name)
        ->setConfig($storedInfoTypeConfig)
        ->setUpdateMask($fieldMask);
    $response = $dlp->updateStoredInfoType($updateStoredInfoTypeRequest);

    // Print results
    printf('Successfully update Stored InforType : %s' . PHP_EOL, $response->getName());
}

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

import google.cloud.dlp


def update_stored_infotype(
    project: str,
    stored_info_type_id: str,
    gcs_input_file_path: str,
    output_bucket_name: str,
) -> None:
    """Uses the Data Loss Prevention API to update stored infoType
    detector by changing the source term list from one stored in Bigquery
    to one stored in Cloud Storage.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        stored_info_type_id: The identifier of stored infoType which is to
            be updated.
        gcs_input_file_path: The url in the format <bucket>/<path_to_file>
            for the location of the source term list.
        output_bucket_name: The name of the bucket in Google Cloud Storage
            where large dictionary is stored.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the stored infoType configuration dictionary.
    stored_info_type_config = {
        "large_custom_dictionary": {
            "output_path": {"path": f"gs://{output_bucket_name}"},
            "cloud_storage_file_set": {"url": f"gs://{gcs_input_file_path}"},
        }
    }

    # Set mask to control which fields get updated. For more details, refer
    # https://protobuf.dev/reference/protobuf/google.protobuf/#field-mask
    # for constructing the field mask paths.
    field_mask = {"paths": ["large_custom_dictionary.cloud_storage_file_set.url"]}

    # Convert the stored infoType id into a full resource id.
    stored_info_type_name = (
        f"projects/{project}/locations/global/storedInfoTypes/{stored_info_type_id}"
    )

    # Call the API.
    response = dlp.update_stored_info_type(
        request={
            "name": stored_info_type_name,
            "config": stored_info_type_config,
            "update_mask": field_mask,
        }
    )

    # Print the result
    print(f"Updated stored infoType successfully: {response.name}")

REST

Aggiornare l'elenco dei termini

Se aggiorni solo l'elenco dei termini nel dizionario personalizzato di grandi dimensioni, la tua richiesta storedInfoTypes.patch richiede solo il campo name. Fornisci il nome completo della risorsa dell'infoType archiviato che vuoi ricreare.

I seguenti pattern rappresentano voci valide per il campo name:

organizations/ORGANIZATION_ID/storedInfoTypes/STORED_INFOTYPE_ID
projects/PROJECT_ID/storedInfoTypes/STORED_INFOTYPE_ID

Sostituisci STORED_INFOTYPE_ID con l'identificatore dell'infoType archiviato che vuoi ricreare.

Se non conosci l'identificatore dell'infoType archiviato, chiama il metodo storedInfoTypes.list per visualizzare un elenco di tutti gli infoType archiviati correnti.

Esempio

PATCH https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes/STORED_INFOTYPE_ID

In questo caso, non è necessario un corpo della richiesta.

Cambia l'elenco dei termini di origine

Puoi modificare l'elenco dei termini di origine per un tipo di infoType memorizzato da uno memorizzato in BigQuery a uno memorizzato in Cloud Storage. Utilizza il metodo storedInfoTypes.patch, ma includi un oggetto CloudStorageFileSet in LargeCustomDictionaryConfig dove prima utilizzavi un oggetto BigQueryField. Quindi, imposta il parametro updateMask sul parametro infoType memorizzato che hai ricreato nel formato FieldMask. Ad esempio, il seguente JSON indica nel parametro updateMask che l'URL del percorso Cloud Storage è stato aggiornato (large_custom_dictionary.cloud_storage_file_set.url):

Esempio

PATCH https://dlp.googleapis.com/v2/projects/PROJECT_ID/storedInfoTypes/github-usernames

{
  "config":{
    "largeCustomDictionary":{
      "cloudStorageFileSet":{
        "url":"gs://[BUCKET_NAME]/[PATH_TO_FILE]"
      }
    }
  },
  "updateMask":"large_custom_dictionary.cloud_storage_file_set.url"
}

Allo stesso modo, puoi passare da un elenco di termini archiviato in una tabella BigQuery a uno archiviato in un bucket Cloud Storage.

Esegui la scansione dei contenuti utilizzando un rilevatore di dizionari personalizzato di grandi dimensioni

La scansione dei contenuti utilizzando un rilevatore di dizionario personalizzato di grandi dimensioni è simile alla scansione dei contenuti utilizzando qualsiasi altro rilevatore di infoType personalizzato.

Questa procedura presuppone che tu disponga di un infoType archiviato esistente. Per ulteriori informazioni, vedi Creare un infoType archiviato in questa pagina.

Console

Puoi applicare un rilevatore di dizionari personalizzato di grandi dimensioni quando:

Creazione di un nuovo job
Creazione o modifica di un trigger di job
Creazione o modifica di un modello
Configurazione della profilazione dei dati

Nella sezione Configura il rilevamento della pagina, nella sottosezione InfoType, puoi specificare l'infoType del dizionario personalizzato di grandi dimensioni.

Fai clic su Gestisci infoType.
Nel riquadro InfoType, fai clic sulla scheda Personalizzato.
Fai clic su Aggiungi infoType personalizzato.
Nel riquadro Aggiungi infoType personalizzato, segui questi passaggi:
1. In Type (Tipo), seleziona Stored infoType (infoType archiviato).
2. In InfoType, inserisci un nome per l'infoType personalizzato. Puoi utilizzare lettere, numeri e trattini bassi.
3. Per Probabilità, seleziona il livello di probabilità predefinito che vuoi assegnare a tutti i risultati che corrispondono a questo infoType personalizzato. Puoi ottimizzare ulteriormente il livello di probabilità dei singoli risultati utilizzando le regole hotword.
  
  Se non specifichi un valore predefinito, il livello di probabilità predefinito è impostato su VERY_LIKELY. Per saperne di più, consulta Probabilità di corrispondenza.
4. Per Sensibilità, seleziona il livello di sensibilità che vuoi assegnare a tutti i risultati che corrispondono a questo infoType personalizzato. Se non specifichi un valore, i livelli di sensibilità di questi risultati vengono impostati su HIGH.
  
  I punteggi di sensibilità vengono utilizzati nei profili dei dati. Durante la profilazione dei dati, Sensitive Data Protection utilizza i punteggi di sensibilità degli infoType per calcolare il livello di sensibilità.
5. Per Nome infoType archiviato, seleziona l'infoType archiviato su cui vuoi basare il nuovo infoType personalizzato.
6. Fai clic su Fine per chiudere il riquadro Aggiungi InfoType personalizzato.
(Facoltativo) Nella scheda Integrati, modifica la selezione degli infoType integrati.
Fai clic su Fine per chiudere il riquadro InfoType.

L'infoType personalizzato viene aggiunto all'elenco degli infoType che vengono scansionati da Sensitive Data Protection. Tuttavia, questa selezione non è definitiva finché non salvi il job, il trigger del job, il modello o la configurazione della scansione.
Al termine della creazione o della modifica della configurazione, fai clic su Salva.

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectDataWithStoredInfotypes
{
    public static InspectContentResponse Inspect(
        string projectId,
        string storedInfotypePath,
        string text,
        InfoType infoType = null)
    {
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Construct the infotype if null.
        var infotype = infoType ?? new InfoType { Name = "GITHUB_LOGINS" };

        // Construct the inspect config using stored infotype.
        var inspectConfig = new InspectConfig
        {
            CustomInfoTypes =
            {
                new CustomInfoType
                {
                    InfoType = infotype,
                    StoredType = new StoredType { Name = storedInfotypePath }
                }
            },
            IncludeQuote = true
        };

        // Construct the request using inspect config.
        var request = new InspectContentRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            InspectConfig = inspectConfig,
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        InspectContentResponse response = dlp.InspectContent(request);

        // Inspect the results.
        var findings = response.Result.Findings;
        Console.WriteLine($"Findings: {findings.Count}");
        foreach (var f in findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// inspectWithStoredInfotype inspects the given text using the specified stored infoType detector.
func inspectWithStoredInfotype(w io.Writer, projectID, infoTypeId, textToDeidentify string) error {
	// projectId := "your-project-id"
	// infoTypeId := "your-info-type-id"
	// textToDeidentify := "This commit was made by kewin2010"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be inspected.
	contentItem := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: textToDeidentify,
		},
	}

	// Specify the info type the inspection will look for.
	infoType := &dlppb.InfoType{
		Name: "GITHUB_LOGINS",
	}

	// Specify the stored info type the inspection will look for.
	storedType := &dlppb.StoredType{
		Name: infoTypeId,
	}

	customInfoType := &dlppb.CustomInfoType{
		InfoType: infoType,
		Type: &dlppb.CustomInfoType_StoredType{
			StoredType: storedType,
		},
	}

	// Specify how the content should be inspected.
	inspectConfig := &dlppb.InspectConfig{
		CustomInfoTypes: []*dlppb.CustomInfoType{
			customInfoType,
		},
		IncludeQuote: true,
	}

	// Construct the Inspect request to be sent by the client.
	req := &dlppb.InspectContentRequest{
		Parent:        fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: inspectConfig,
		Item:          contentItem,
	}

	// Use the client to send the API request.
	resp, err := client.InspectContent(ctx, req)
	if err != nil {
		return err
	}

	// Process the results.
	fmt.Fprintf(w, "Findings: %d\n", len(resp.Result.Findings))
	for _, f := range resp.Result.Findings {
		fmt.Fprintf(w, "\tQuote: %s\n", f.Quote)
		fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
		fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
	}
	return nil
}

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.ProjectStoredInfoTypeName;
import com.google.privacy.dlp.v2.StoredType;
import java.io.IOException;

public class InspectWithStoredInfotype {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The sample assumes that you have an existing stored infoType.
    // To create a stored InfoType refer:
    // https://cloud.google.com/dlp/docs/creating-stored-infotypes#create-storedinfotye 
    String storedInfoTypeId = "your-info-type-id";
    // The string to de-identify.
    String textToInspect =
        "My phone number is (223) 456-7890 and my email address is gary@example.com.";
    inspectWithStoredInfotype(projectId, storedInfoTypeId, textToInspect);
  }

  //  Inspects the given text using the specified stored infoType detector.
  public static void inspectWithStoredInfotype(
      String projectId, String storedInfoTypeId, String textToInspect) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify the content to be inspected.
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToInspect).build();

      InfoType infoType = InfoType.newBuilder().setName("STORED_TYPE").build();

      // Reference to the existing StoredInfoType to inspect the data.
      StoredType storedType = StoredType.newBuilder()
              .setName(ProjectStoredInfoTypeName.of(projectId, storedInfoTypeId).toString())
              .build();

      CustomInfoType customInfoType =
          CustomInfoType.newBuilder().setInfoType(infoType).setStoredType(storedType).build();

      // Construct the configuration for the Inspect request.
      InspectConfig inspectConfig =
          InspectConfig.newBuilder()
              .addCustomInfoTypes(customInfoType)
              .setIncludeQuote(true)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest inspectContentRequest =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setInspectConfig(inspectConfig)
              .setItem(contentItem)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(inspectContentRequest);

      // Parse the response and process results.
      System.out.println("Findings: " + "" + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfoType: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood() + "\n");
      }
    }
  }
}

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under.
// const projectId = 'your-project-id';

// The custom info-type id created and stored in the bucket.
// const infoTypeId = 'your-info-type-id';

// The string to inspect.
// const string = 'My phone number is (223) 456-7890 and my email address is gary@example.com.';

async function inspectWithStoredInfotype() {
  // Reference to the existing StoredInfoType to inspect the data.
  const customInfoType = {
    infoType: {
      name: 'GITHUB_LOGINS',
    },
    storedType: {
      name: infoTypeId,
    },
  };

  // Construct the configuration for the Inspect request.
  const inspectConfig = {
    customInfoTypes: [customInfoType],
    includeQuote: true,
  };

  // Construct the Inspect request to be sent by the client.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    inspectConfig: inspectConfig,
    item: {
      value: string,
    },
  };
  // Run request
  const [response] = await dlp.inspectContent(request);

  // Print Findings
  const findings = response.result.findings;
  if (findings.length > 0) {
    console.log(`Findings: ${findings.length}\n`);
    findings.forEach(finding => {
      console.log(`InfoType: ${finding.infoType.name}`);
      console.log(`\tQuote: ${finding.quote}`);
      console.log(`\tLikelihood: ${finding.likelihood} \n`);
    });
  } else {
    console.log('No findings.');
  }
}
await inspectWithStoredInfotype();

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CustomInfoType;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectContentRequest;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\StoredType;

/**
 * Inspect with stored infoType.
 * Scan content using a large custom dictionary detector.
 *
 * @param string $projectId             The Google Cloud Project ID to run the API call under.
 * @param string $storedInfoTypeName    The name of the stored infotype whose This value must be in the format
 * projects/projectName/(locations/locationId)/storedInfoTypes/storedInfoTypeName.
 * @param string $textToInspect         The string to inspect.
 */
function inspect_with_stored_infotype(
    string $projectId,
    string $storedInfoTypeName,
    string $textToInspect
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$projectId/locations/global";

    // Specify the content to be inspected.
    $item = (new ContentItem())
        ->setValue($textToInspect);

    // Reference to the existing StoredInfoType to inspect the data.
    $customInfoType = (new CustomInfoType())
        ->setInfoType((new InfoType())
            ->setName('STORED_TYPE'))
        ->setStoredType((new StoredType())
            ->setName($storedInfoTypeName));

    // Construct the configuration for the Inspect request.
    $inspectConfig = (new InspectConfig())
        ->setCustomInfoTypes([$customInfoType])
        ->setIncludeQuote(true);

    // Run request.
    $inspectContentRequest = (new InspectContentRequest())
        ->setParent($parent)
        ->setInspectConfig($inspectConfig)
        ->setItem($item);
    $response = $dlp->inspectContent($inspectContentRequest);

    // Print the results.
    $findings = $response->getResult()->getFindings();
    if (count($findings) == 0) {
        printf('No findings.' . PHP_EOL);
    } else {
        printf('Findings:' . PHP_EOL);
        foreach ($findings as $finding) {
            printf('  Quote: %s' . PHP_EOL, $finding->getQuote());
            printf('  Info type: %s' . PHP_EOL, $finding->getInfoType()->getName());
            printf('  Likelihood: %s' . PHP_EOL, Likelihood::name($finding->getLikelihood()));
        }
    }
}

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, consulta la sezione Librerie client Sensitive Data Protection.

import google.cloud.dlp


def inspect_with_stored_infotype(
    project: str,
    stored_info_type_id: str,
    content_string: str,
) -> None:
    """Uses the Data Loss Prevention API to inspect/scan content using stored
    infoType.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to inspect.
        stored_info_type_id: The identifier of stored infoType used to inspect.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert stored infoType id into full resource id
    stored_type_name = f"projects/{project}/storedInfoTypes/{stored_info_type_id}"

    # Construct a custom info type dictionary using stored infoType.
    custom_info_types = [
        {
            "info_type": {"name": "STORED_TYPE"},
            "stored_type": {
                "name": stored_type_name,
            },
        }
    ]

    # Construct the inspection configuration dictionary.
    inspect_config = {
        "custom_info_types": custom_info_types,
        "include_quote": True,
    }

    # Construct the `item` to be inspected using stored infoType.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API.
    response = dlp.inspect_content(
        request={
            "parent": parent,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            print(f"Quote: {finding.quote}")
            print(f"Info type: {finding.info_type.name}")
            print(f"Likelihood: {finding.likelihood}")
    else:
        print("No findings.")

REST

Se inviato al metodo content.inspect, l'esempio seguente analizza il testo specificato utilizzando il rilevatore infoType archiviato specificato. Il parametro infoType è obbligatorio perché tutti gli infoType personalizzati devono avere un nome che non sia in conflitto con gli infoType integrati o altri infoType personalizzati. Il parametro storedType contiene il percorso completo della risorsa dell'infoType archiviato.

Input JSON

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:inspect

{
  "inspectConfig":{
    "customInfoTypes":[
      {
        "infoType":{
          "name":"GITHUB_LOGINS"
        },
        "storedType":{
          "name":"projects/PROJECT_ID/storedInfoTypes/github-logins"
        }
      }
    ]
  },
  "item":{
    "value":"The commit was made by githubuser."
  }
}

Risolvere gli errori

Se si verifica un errore durante il tentativo di creare un infoType archiviato da un elenco di termini archiviato in Cloud Storage, le seguenti sono le possibili cause:

Hai raggiunto un limite massimo per i tipi di informazioni memorizzati. A seconda del problema, esistono diverse soluzioni alternative:
- Se raggiungi il limite superiore per un singolo file di input in Cloud Storage (200 MB), prova a dividere il file in più file. Puoi utilizzare più file per assemblare un unico dizionario personalizzato, purché le dimensioni combinate di tutti i file non superino 1 GB.
- BigQuery non ha gli stessi limiti di Cloud Storage. Valuta la possibilità di spostare i termini in una tabella BigQuery. La dimensione massima di una colonna del dizionario personalizzato in BigQuery è 1 GB e il numero massimo di righe è 5.000.000.
- Se il file dell'elenco dei termini supera tutti i limiti applicabili per gli elenchi dei termini di origine, devi suddividerlo in più file e creare un dizionario per ogni file. Poi, crea un job di scansione separato per ogni dizionario.
Uno o più dei tuoi termini non contengono almeno una lettera o un numero. Sensitive Data Protection non può eseguire la scansione di termini composti esclusivamente da spazi o simboli. Deve contenere almeno una lettera o un numero. Controlla l'elenco dei termini per verificare se ne sono inclusi di questo tipo, quindi correggili o eliminali.
L'elenco dei termini contiene una frase con troppi "componenti". In questo contesto, un componente è una sequenza continua contenente solo lettere, solo numeri o solo caratteri non alfabetici e non numerici, come spazi o simboli. Controlla l'elenco dei termini per verificare se sono inclusi termini di questo tipo, quindi correggili o eliminali.
L'agente di servizio Sensitive Data Protection non ha accesso ai dati di origine del dizionario o al bucket Cloud Storage per l'archiviazione dei file del dizionario. Per risolvere il problema, concedi all'agente di servizio Sensitive Data Protection il ruolo Amministratore Storage (roles/storage.admin) o i ruoli Proprietario dati BigQuery (roles/bigquery.dataOwner) e Utente job BigQuery (roles/bigquery.jobUser).

Panoramica dell'API

La creazione di un infoType archiviato è obbligatoria se crei un rilevatore di dizionari personalizzato di grandi dimensioni.

Un infoType archiviato è rappresentato in Sensitive Data Protection dall'oggetto StoredInfoType. È composto dai seguenti oggetti correlati:

StoredInfoTypeVersion include la data e l'ora di creazione e gli ultimi cinque messaggi di errore che si sono verificati durante la creazione della versione corrente.
- StoredInfoTypeConfig contiene la configurazione dell'infoType archiviato, inclusi nome e descrizione. Per un dizionario personalizzato grande, il type deve essere un LargeCustomDictionaryConfig.
  - LargeCustomDictionaryConfig specifica entrambi i seguenti elementi:
    - La posizione all'interno di Cloud Storage o BigQuery in cui è archiviato l'elenco di frasi.
    - La posizione in Cloud Storage in cui archiviare i file del dizionario generati.
- StoredInfoTypeState contiene lo stato della versione più recente e di eventuali versioni in attesa dell'infoType archiviato. Le informazioni sullo stato indicano se il infoType memorizzato è in fase di ricostruzione, è pronto per l'uso o non è valido.

Specifiche della corrispondenza del dizionario

Di seguito sono riportate indicazioni su come Sensitive Data Protection abbina parole e frasi del dizionario. Questi punti si applicano sia ai dizionari personalizzati normali che a quelli di grandi dimensioni:

Le parole del dizionario non fanno distinzione tra maiuscole e minuscole. Se il tuo dizionario include Abby, la corrispondenza verrà trovata per abby, ABBY, Abby e così via.
Tutti i caratteri, nei dizionari o nei contenuti da scansionare, diversi da lettere, cifre e altri caratteri alfabetici contenuti nel Basic Multilingual Plane di Unicode, sono considerati spazi durante la scansione per trovare corrispondenze. Se il dizionario cerca Abby Abernathy, troverà corrispondenze con abby abernathy, Abby, Abernathy, Abby (ABERNATHY) e così via.
I caratteri che circondano qualsiasi corrispondenza devono essere di un tipo diverso (lettere o cifre) rispetto ai caratteri adiacenti all'interno della parola. Se il dizionario cerca Abi, troverà una corrispondenza con i primi tre caratteri di Abi904, ma non di Abigail.
Le parole del dizionario contenenti caratteri nel Supplementary Multilingual Plane dello standard Unicode possono produrre risultati imprevisti. Esempi di questi caratteri sono emoji, simboli scientifici e caratteri storici.

Lettere, cifre e altri caratteri alfabetici sono definiti come segue:

Lettere: caratteri con categorie generali Lu, Ll, Lt, Lm o Lo nella specifica Unicode
Cifre: caratteri con categoria generale Nd nella specifica Unicode
Altri caratteri alfabetici: caratteri con categoria generale Nl nella specifica Unicode o con proprietà contributiva Other_Alphabetic come definito dallo standard Unicode

Per creare, modificare o eliminare un infoType archiviato, utilizza i seguenti metodi:

storedInfoTypes.create: crea un nuovo infoType archiviato in base al StoredInfoTypeConfig che specifichi.
storedInfoTypes.patch: Ricostruisce l'infoType archiviato con un nuovo StoredInfoTypeConfig che specifichi. Se non ne viene specificato nessuno, questo metodo crea una nuova versione di infoType archiviato con StoredInfoTypeConfig esistente.
storedInfoTypes.get: Recupera StoredInfoTypeConfig e tutte le versioni in attesa dell'infoType archiviato specificato.
storedInfoTypes.list: Elenca tutti gli infoType archiviati correnti.
storedInfoTypes.delete: Elimina l'infoType archiviato specificato.