Cloud Data Loss Prevention (Cloud DLP) ora fa parte di Sensitive Data Protection. Il nome dell'API rimane invariato: API Cloud Data Loss Prevention (API DLP). Per informazioni sui servizi che compongono Sensitive Data Protection, consulta la panoramica di Sensitive Data Protection.

Questa pagina è stata tradotta dall'API Cloud Translation.

Creare copie anonimizzate dei dati archiviati in Cloud Storage utilizzando l'API

Questa pagina descrive come ispezionare una risorsa Cloud Storage e creare copie dei dati sottoposte a deidentificazione utilizzando l'API Cloud Data Loss Prevention.

Questa operazione contribuisce a garantire che i file utilizzati nei processi aziendali non contengano dati sensibili, come informazioni che consentono l'identificazione personale (PII). Sensitive Data Protection può ispezionare i file in un bucket Cloud Storage alla ricerca di dati sensibili e creare copie anonimizzate di questi file in un bucket separato. Puoi quindi utilizzare le copie anonimizzate nei tuoi processi aziendali.

Per saperne di più su questa funzionalità, consulta Anonimizzazione dei dati sensibili in Cloud Storage.

Prima di iniziare

Questa pagina presuppone quanto segue:

Hai abilitato la fatturazione.

Scopri come attivare la fatturazione
Hai attivato Sensitive Data Protection.

Abilita Sensitive Data Protection
Hai un bucket Cloud Storage con dati che vuoi anonimizzare.
Sai come inviare una richiesta HTTP all'API DLP. Per ulteriori informazioni, vedi Ispeziona il testo sensibile utilizzando l'API DLP.

Scopri le limitazioni e i punti da considerare per questa operazione.

L'ispezione dello spazio di archiviazione richiede il seguente ambito OAuth: https://www.googleapis.com/auth/cloud-platform. Per ulteriori informazioni, consulta la pagina Autenticazione nell'API DLP.

Ruoli IAM richiesti

Se tutte le risorse per questa operazione si trovano nello stesso progetto, è sufficiente il ruolo Agente di servizio dell'API DLP (roles/dlp.serviceAgent) sull'agente di servizio. Con questo ruolo, puoi:

Crea il job di ispezione
Leggere i file nella directory di input
Scrivere i file deidentificati nella directory di output
Scrivi i dettagli della trasformazione in una tabella BigQuery

Le risorse pertinenti includono il job di ispezione, i modelli di anonimizzazione, il bucket di input, il bucket di output e la tabella dei dettagli di trasformazione.

Se devi avere le risorse in progetti separati, assicurati che l'agente di servizio del tuo progetto disponga anche dei seguenti ruoli:

Il ruolo Storage Object Viewer (roles/storage.objectViewer) nel bucket di input o nel progetto che lo contiene.
Il ruolo Storage Object Creator (roles/storage.objectCreator) nel bucket di output o nel progetto che lo contiene.
Il ruolo Editor dati BigQuery (roles/bigquery.dataEditor) nella tabella dei dettagli di trasformazione o nel progetto che la contiene.

Per concedere un ruolo all'agente di servizio, vedi Concedere un singolo ruolo. Puoi anche controllare l'accesso ai seguenti livelli:

Panoramica dell'API

Per creare copie anonimizzate dei contenuti archiviati in Cloud Storage, configura un job di ispezione che cerca dati sensibili in base ai criteri specificati. Poi, all'interno del lavoro di ispezione, fornisci le istruzioni per la deidentificazione sotto forma di azione Deidentify.

Se vuoi eseguire la scansione solo di un sottoinsieme dei file nel bucket, puoi limitare i file analizzati dal job. Le opzioni supportate per i job con de-identificazione sono il filtro dei file per tipo (FileType) e l'espressione regolare (FileSet).

Quando abiliti l'azione Deidentify, per impostazione predefinita Sensitive Data Protection crea copie anonimizzate (trasformate) di tutti i tipi di file supportati inclusi nella scansione. Tuttavia, puoi configurare il job in modo che trasformi solo un sottoinsieme dei tipi di file supportati.

(Facoltativo) Crea modelli di anonimizzazione

Se vuoi controllare la trasformazione dei risultati, crea i seguenti modelli. Questi modelli forniscono istruzioni sulla trasformazione dei risultati in file strutturati, non strutturati e immagini.

Modello di anonimizzazione:un DeidentifyTemplate predefinito da utilizzare per i file non strutturati, come i file di testo in formato libero. Questo tipo di DeidentifyTemplate non può contenere un oggetto RecordTransformations, che è supportato solo per i contenuti strutturati. Se questo modello non è presente, Sensitive Data Protection utilizza il metodo ReplaceWithInfoTypeConfig per trasformare i file non strutturati.
Modello di deidentificazione strutturata:un DeidentifyTemplate da utilizzare per file strutturati, come i file CSV. Questo DeidentifyTemplate può contenere RecordTransformations. Se questo modello non è presente, Sensitive Data Protection utilizza il modello di anonimizzazione predefinito che hai creato. Se non è presente, la protezione dei dati sensibili utilizza il metodo ReplaceWithInfoTypeConfig per trasformare i file strutturati.
Modello di oscuramento delle immagini:un DeidentifyTemplate da utilizzare per le immagini. Questo modello deve contenere un oggetto ImageTransformations. Se questo modello non è presente, la protezione dei dati sensibili oscura tutti i risultati nelle immagini con una casella nera.

Scopri di più sulla creazione di un modello di anonimizzazione.

Crea un job di ispezione con un'azione di anonimizzazione

L'oggetto DlpJob fornisce istruzioni su cosa ispezionare, quali tipi di dati contrassegnare come sensibili e cosa fare con i risultati. Per anonimizzare i dati sensibili in una directory Cloud Storage, il tuo DlpJob deve definire almeno quanto segue:

Un oggetto StorageConfig, che specifica la directory Cloud Storage da ispezionare.
Un oggetto InspectConfig, che contiene i tipi di dati da cercare e istruzioni di ispezione aggiuntive su come trovare i dati sensibili.
Un'azione Deidentify che contiene quanto segue:
- Un oggetto TransformationConfig, che specifica tutti i modelli che hai creato per l'anonimizzazione dei dati in file strutturati e non strutturati. Puoi anche includere la configurazione per l'oscuramento dei dati sensibili nelle immagini.
  
  Se non includi un oggetto TransformationConfig, Sensitive Data Protection sostituisce i dati sensibili nel testo con il relativo infoType. Nelle immagini, copre i dati sensibili con un riquadro nero.
- Un oggetto TransformationDetailsStorageConfig, che specifica una tabella BigQuery in cui Sensitive Data Protection deve memorizzare i dettagli di ogni trasformazione. Per ogni trasformazione, i dettagli includono una descrizione, un codice di successo o di errore, eventuali dettagli dell'errore, il numero di byte trasformati, la posizione dei contenuti trasformati e il nome del job di ispezione in cui Sensitive Data Protection ha eseguito la trasformazione. Questa tabella non memorizza i contenuti deidentificati effettivi.
Quando i dati vengono scritti in una tabella BigQuery, la fatturazione e l'utilizzo delle quote vengono applicati al progetto che contiene la tabella di destinazione.

Una volta anonimizzati i contenuti copiati, il job di anonimizzazione termina. Il job contiene un riepilogo del numero di volte in cui sono state applicate le trasformazioni specificate, che puoi recuperare utilizzando il metodo projects.dlpJobs.get su DlpJob. L'oggetto DlpJob restituito include sia un oggetto DeidentifyDataSourceDetails sia un oggetto InspectDataSourceDetails. Questi oggetti contengono rispettivamente i risultati di un'azione Deidentify e del lavoro di ispezione.

Se hai incluso un oggetto TransformationDetailsStorageConfig nel tuo DlpJob, viene creata una tabella BigQuery contenente i metadati relativi ai dettagli della trasformazione. Per ogni trasformazione che si verifica, Sensitive Data Protection scrive una riga di metadati nella tabella. Per ulteriori informazioni sui contenuti della tabella, consulta Riferimento ai dettagli della trasformazione.

Esempi di codice

Gli esempi seguenti mostrano come utilizzare l'API DLP per creare copie anonimizzate dei file Cloud Storage.

Metodo HTTP e URL

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs

C#

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

Per eseguire l'autenticazione in Sensitive Data Protection, configura le Credenziali predefinite dell'applicazione. Per ulteriori informazioni, consulta Configura l'autenticazione per un ambiente di sviluppo locale.


using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using System.Linq;

public class DeidentifyDataStoredInCloudStorage
{
    public static DlpJob Deidentify(
        string projectId,
        string gcsInputPath,
        string unstructuredDeidentifyTemplatePath,
        string structuredDeidentifyTemplatePath,
        string imageRedactionTemplatePath,
        string gcsOutputPath,
        string datasetId,
        string tableId)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        //Construct the storage config by specifying the input directory.
        var storageConfig = new StorageConfig
        {
            CloudStorageOptions = new CloudStorageOptions
            {
                FileSet = new CloudStorageOptions.Types.FileSet
                {
                    Url = gcsInputPath
                }
            }
        };

        // Construct the inspect config by specifying the type of info to be inspected.
        var inspectConfig = new InspectConfig
        {
            InfoTypes =
            {
                new InfoType[]
                {
                    new InfoType { Name = "PERSON_NAME" },
                    new InfoType { Name = "EMAIL_ADDRESS" }
                }
            },
            IncludeQuote = true
        };

        // Construct the actions to take after the inspection portion of the job is completed.
        // Specify how Cloud DLP must de-identify sensitive data in structured files, unstructured files and images
        // using Transformation config.
        // The de-identified files will be written to the the GCS bucket path specified in gcsOutputPath and the details of 
        // transformations performed will be written to BigQuery table specified in datasetId and tableId.
        var actions = new Action[]
        {
            new Action
            {
                Deidentify = new Action.Types.Deidentify
                {
                    CloudStorageOutput = gcsOutputPath,
                    TransformationConfig = new TransformationConfig
                    {
                        DeidentifyTemplate = unstructuredDeidentifyTemplatePath,
                        ImageRedactTemplate = imageRedactionTemplatePath,
                        StructuredDeidentifyTemplate = structuredDeidentifyTemplatePath,
                    },
                    TransformationDetailsStorageConfig = new TransformationDetailsStorageConfig
                    {
                        Table = new BigQueryTable
                        {
                            ProjectId = projectId,
                            DatasetId = datasetId,
                            TableId = tableId
                        }
                    }
                }
            }
        };

        // Construct the inspect job config using created storage config, inspect config and actions.
        var inspectJob = new InspectJobConfig
        {
            StorageConfig = storageConfig,
            InspectConfig = inspectConfig,
            Actions = { actions }
        };

        // Create the dlp job and call the API.
        DlpJob response = dlp.CreateDlpJob(new CreateDlpJobRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            InspectJob = inspectJob
        });

        return response;
    }
}

Go

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

func deidentifyCloudStorage(w io.Writer, projectID, gcsUri, tableId, datasetId, outputDirectory, deidentifyTemplateId, structuredDeidentifyTemplateId, imageRedactTemplateId string) error {
	// projectId := "my-project-id"
	// gcsUri := "gs://" + "your-bucket-name" + "/path/to/your/file.txt"
	// tableId := "your-bigquery-table-id"
	// datasetId := "your-bigquery-dataset-id"
	// outputDirectory := "your-output-directory"
	// deidentifyTemplateId := "your-deidentify-template-id"
	// structuredDeidentifyTemplateId := "your-structured-deidentify-template-id"
	// imageRedactTemplateId := "your-image-redact-template-id"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Set path in Cloud Storage.
	cloudStorageOptions := &dlppb.CloudStorageOptions{
		FileSet: &dlppb.CloudStorageOptions_FileSet{
			Url: gcsUri,
		},
	}

	// Define the storage config options for cloud storage options.
	storageConfig := &dlppb.StorageConfig{
		Type: &dlppb.StorageConfig_CloudStorageOptions{
			CloudStorageOptions: cloudStorageOptions,
		},
	}

	// Specify the type of info the inspection will look for.
	// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
	infoTypes := []*dlppb.InfoType{
		{Name: "PERSON_NAME"},
		{Name: "EMAIL_ADDRESS"},
	}

	// inspectConfig holds the configuration settings for data inspection and analysis
	// within the context of the Google Cloud Data Loss Prevention (DLP) API.
	inspectConfig := &dlppb.InspectConfig{
		InfoTypes:    infoTypes,
		IncludeQuote: true,
	}

	// Types of files to include for de-identification.
	fileTypesToTransform := []dlppb.FileType{
		dlppb.FileType_CSV,
		dlppb.FileType_IMAGE,
		dlppb.FileType_TEXT_FILE,
	}

	// Specify the BigQuery table to be inspected.
	table := &dlppb.BigQueryTable{
		ProjectId: projectID,
		DatasetId: datasetId,
		TableId:   tableId,
	}

	// transformationDetailsStorageConfig holds configuration settings for storing transformation
	// details in the context of the Google Cloud Data Loss Prevention (DLP) API.
	transformationDetailsStorageConfig := &dlppb.TransformationDetailsStorageConfig{
		Type: &dlppb.TransformationDetailsStorageConfig_Table{
			Table: table,
		},
	}

	transformationConfig := &dlppb.TransformationConfig{
		DeidentifyTemplate:           deidentifyTemplateId,
		ImageRedactTemplate:          imageRedactTemplateId,
		StructuredDeidentifyTemplate: structuredDeidentifyTemplateId,
	}

	// Action to execute on the completion of a job.
	deidentify := &dlppb.Action_Deidentify{
		TransformationConfig:               transformationConfig,
		TransformationDetailsStorageConfig: transformationDetailsStorageConfig,
		Output: &dlppb.Action_Deidentify_CloudStorageOutput{
			CloudStorageOutput: outputDirectory,
		},
		FileTypesToTransform: fileTypesToTransform,
	}

	action := &dlppb.Action{
		Action: &dlppb.Action_Deidentify_{
			Deidentify: deidentify,
		},
	}

	// Configure the inspection job we want the service to perform.
	inspectJobConfig := &dlppb.InspectJobConfig{
		StorageConfig: storageConfig,
		InspectConfig: inspectConfig,
		Actions: []*dlppb.Action{
			action,
		},
	}

	// Construct the job creation request to be sent by the client.
	req := &dlppb.CreateDlpJobRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		Job: &dlppb.CreateDlpJobRequest_InspectJob{
			InspectJob: inspectJobConfig,
		},
	}

	// Send the request.
	resp, err := client.CreateDlpJob(ctx, req)
	if err != nil {
		fmt.Fprintf(w, "error after resp: %v", err)
		return err
	}

	// Print the results.
	fmt.Fprint(w, "Job created successfully: ", resp.Name)
	return nil

}

Java

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.Action;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CloudStorageOptions;
import com.google.privacy.dlp.v2.CreateDlpJobRequest;
import com.google.privacy.dlp.v2.DlpJob;
import com.google.privacy.dlp.v2.FileType;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeStats;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectDataSourceDetails;
import com.google.privacy.dlp.v2.InspectJobConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.ProjectDeidentifyTemplateName;
import com.google.privacy.dlp.v2.StorageConfig;
import com.google.privacy.dlp.v2.TransformationConfig;
import com.google.privacy.dlp.v2.TransformationDetailsStorageConfig;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.TimeUnit;

public class DeidentifyCloudStorage {

  // Set the timeout duration in minutes.
  private static final int TIMEOUT_MINUTES = 15;

  public static void main(String[] args) throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // Specify the cloud storage directory that you want to inspect.
    String gcsPath = "gs://" + "your-bucket-name" + "/path/to/your/file.txt";
    // Specify the big query dataset id to store the transformation details.
    String datasetId = "your-bigquery-dataset-id";
    // Specify the big query table id to store the transformation details.
    String tableId = "your-bigquery-table-id";
    // Specify the cloud storage directory to store the de-identified files.
    String outputDirectory = "your-output-directory";
    // Specify the de-identify template ID for unstructured files.
    String deidentifyTemplateId = "your-deidentify-template-id";
    // Specify the de-identify template ID for structured files.
    String structuredDeidentifyTemplateId = "your-structured-deidentify-template-id";
    // Specify the de-identify template ID for images.
    String imageRedactTemplateId = "your-image-redact-template-id";
    deidentifyCloudStorage(
        projectId,
        gcsPath,
        tableId,
        datasetId,
        outputDirectory,
        deidentifyTemplateId,
        structuredDeidentifyTemplateId,
        imageRedactTemplateId);
  }

  public static void deidentifyCloudStorage(
      String projectId,
      String gcsPath,
      String tableId,
      String datasetId,
      String outputDirectory,
      String deidentifyTemplateId,
      String structuredDeidentifyTemplateId,
      String imageRedactTemplateId)
      throws IOException, InterruptedException {

    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Set path in Cloud Storage.
      CloudStorageOptions cloudStorageOptions =
          CloudStorageOptions.newBuilder()
              .setFileSet(CloudStorageOptions.FileSet.newBuilder().setUrl(gcsPath))
              .build();

      // Set storage config indicating the type of cloud storage.
      StorageConfig storageConfig =
          StorageConfig.newBuilder().setCloudStorageOptions(cloudStorageOptions).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      List<InfoType> infoTypes = new ArrayList<>();
      for (String typeName : new String[] {"PERSON_NAME", "EMAIL_ADDRESS"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();

      // Types of files to include for de-identification.
      List<FileType> fileTypesToTransform =
          Arrays.asList(
              FileType.valueOf("IMAGE"), FileType.valueOf("CSV"), FileType.valueOf("TEXT_FILE"));

      // Specify the big query table to store the transformation details.
      BigQueryTable table =
          BigQueryTable.newBuilder()
              .setProjectId(projectId)
              .setTableId(tableId)
              .setDatasetId(datasetId)
              .build();

      TransformationDetailsStorageConfig transformationDetailsStorageConfig =
          TransformationDetailsStorageConfig.newBuilder().setTable(table).build();

      // Specify the de-identify template used for the transformation.
      TransformationConfig transformationConfig =
          TransformationConfig.newBuilder()
              .setDeidentifyTemplate(
                  ProjectDeidentifyTemplateName.of(projectId, deidentifyTemplateId).toString())
              .setImageRedactTemplate(
                  ProjectDeidentifyTemplateName.of(projectId, imageRedactTemplateId).toString())
              .setStructuredDeidentifyTemplate(
                  ProjectDeidentifyTemplateName.of(projectId, structuredDeidentifyTemplateId)
                      .toString())
              .build();

      Action.Deidentify deidentify =
          Action.Deidentify.newBuilder()
              .setCloudStorageOutput(outputDirectory)
              .setTransformationConfig(transformationConfig)
              .setTransformationDetailsStorageConfig(transformationDetailsStorageConfig)
              .addAllFileTypesToTransform(fileTypesToTransform)
              .build();

      Action action = Action.newBuilder().setDeidentify(deidentify).build();

      // Configure the long-running job we want the service to perform.
      InspectJobConfig inspectJobConfig =
          InspectJobConfig.newBuilder()
              .setInspectConfig(inspectConfig)
              .setStorageConfig(storageConfig)
              .addActions(action)
              .build();

      // Construct the job creation request to be sent by the client.
      CreateDlpJobRequest createDlpJobRequest =
          CreateDlpJobRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setInspectJob(inspectJobConfig)
              .build();

      // Send the job creation request.
      DlpJob response = dlp.createDlpJob(createDlpJobRequest);

      // Get the current time.
      long startTime = System.currentTimeMillis();

      // Check if the job state is DONE.
      while (response.getState() != DlpJob.JobState.DONE) {
        // Sleep for 30 second.
        Thread.sleep(30000);

        // Get the updated job status.
        response = dlp.getDlpJob(response.getName());

        // Check if the timeout duration has exceeded.
        long elapsedTime = System.currentTimeMillis() - startTime;
        if (TimeUnit.MILLISECONDS.toMinutes(elapsedTime) >= TIMEOUT_MINUTES) {
          System.out.printf("Job did not complete within %d minutes.%n", TIMEOUT_MINUTES);
          break;
        }
      }
      // Print the results.
      System.out.println("Job status: " + response.getState());
      System.out.println("Job name: " + response.getName());
      InspectDataSourceDetails.Result result = response.getInspectDetails().getResult();
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    }
  }
}

Node.js

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

// Imports the Google Cloud client library
const DLP = require('@google-cloud/dlp');
// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const projectId = 'my-project';

// The Cloud Storage directory that needs to be inspected
// const inputDirectory = 'your-google-cloud-storage-path';

// The ID of the dataset to inspect, e.g. 'my_dataset'
// const datasetId = 'my_dataset';

// The ID of the table to inspect, e.g. 'my_table'
// const tableId = 'my_table';

// The Cloud Storage directory that will be used to store the de-identified files
// const outputDirectory = 'your-output-directory';

// The full resource name of the default de-identify template
// const deidentifyTemplateId = 'your-deidentify-template-id';

// The full resource name of the de-identify template for structured files
// const structuredDeidentifyTemplateId = 'your-structured-deidentify-template-id';

// The full resource name of the image redaction template for images
// const imageRedactTemplateId = 'your-image-redact-template-id';

async function deidentifyCloudStorage() {
  // Specify storage configuration that uses file set.
  const storageConfig = {
    cloudStorageOptions: {
      fileSet: {
        url: inputDirectory,
      },
    },
  };

  // Specify the type of info the inspection will look for.
  const infoTypes = [{name: 'PERSON_NAME'}, {name: 'EMAIL_ADDRESS'}];

  // Construct inspect configuration
  const inspectConfig = {
    infoTypes: infoTypes,
    includeQuote: true,
  };

  // Types of files to include for de-identification.
  const fileTypesToTransform = [
    {fileType: 'IMAGE'},
    {fileType: 'CSV'},
    {fileType: 'TEXT_FILE'},
  ];

  // Specify the big query table to store the transformation details.
  const transformationDetailsStorageConfig = {
    table: {
      projectId: projectId,
      tableId: tableId,
      datasetId: datasetId,
    },
  };

  // Specify the de-identify template used for the transformation.
  const transformationConfig = {
    deidentifyTemplate: deidentifyTemplateId,
    structuredDeidentifyTemplate: structuredDeidentifyTemplateId,
    imageRedactTemplate: imageRedactTemplateId,
  };

  // Construct action to de-identify sensitive data.
  const action = {
    deidentify: {
      cloudStorageOutput: outputDirectory,
      transformationConfig: transformationConfig,
      transformationDetailsStorageConfig: transformationDetailsStorageConfig,
      fileTypes: fileTypesToTransform,
    },
  };

  // Construct the inspect job configuration.
  const inspectJobConfig = {
    inspectConfig: inspectConfig,
    storageConfig: storageConfig,
    actions: [action],
  };

  // Construct the job creation request to be sent by the client.
  const createDlpJobRequest = {
    parent: `projects/${projectId}/locations/global`,
    inspectJob: inspectJobConfig,
  };
  // Send the job creation request and process the response.
  const [response] = await dlp.createDlpJob(createDlpJobRequest);
  const jobName = response.name;

  // Waiting for a maximum of 15 minutes for the job to get complete.
  let job;
  let numOfAttempts = 30;
  while (numOfAttempts > 0) {
    // Fetch DLP Job status
    [job] = await dlp.getDlpJob({name: jobName});

    // Check if the job has completed.
    if (job.state === 'DONE') {
      break;
    }
    if (job.state === 'FAILED') {
      console.log('Job Failed, Please check the configuration.');
      return;
    }
    // Sleep for a short duration before checking the job status again.
    await new Promise(resolve => {
      setTimeout(() => resolve(), 30000);
    });
    numOfAttempts -= 1;
  }

  // Print out the results.
  const infoTypeStats = job.inspectDetails.result.infoTypeStats;
  if (infoTypeStats.length > 0) {
    infoTypeStats.forEach(infoTypeStat => {
      console.log(
        `  Found ${infoTypeStat.count} instance(s) of infoType ${infoTypeStat.infoType.name}.`
      );
    });
  } else {
    console.log('No findings.');
  }
}
await deidentifyCloudStorage();

PHP

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

use Google\Cloud\Dlp\V2\Action;
use Google\Cloud\Dlp\V2\Action\Deidentify;
use Google\Cloud\Dlp\V2\BigQueryTable;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\CloudStorageOptions;
use Google\Cloud\Dlp\V2\CloudStorageOptions\FileSet;
use Google\Cloud\Dlp\V2\CreateDlpJobRequest;
use Google\Cloud\Dlp\V2\DlpJob\JobState;
use Google\Cloud\Dlp\V2\FileType;
use Google\Cloud\Dlp\V2\GetDlpJobRequest;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectJobConfig;
use Google\Cloud\Dlp\V2\StorageConfig;
use Google\Cloud\Dlp\V2\TransformationConfig;
use Google\Cloud\Dlp\V2\TransformationDetailsStorageConfig;

/**
 * De-identify sensitive data stored in Cloud Storage using the API.
 * Create an inspection job that has a de-identification action.
 *
 * @param string $callingProjectId                  The project ID to run the API call under.
 * @param string $inputgcsPath                       The Cloud Storage directory that you want to de-identify.
 * @param string $outgcsPath                        The Cloud Storage directory where you want to store the
 *                                                  de-identified files.
 * @param string $deidentifyTemplateName            The full resource name of the default de-identify template — for
 *                                                  unstructured and structured files — if you created one. This value
 *                                                  must be in the format
 *                                                  `projects/projectName/(locations/locationId)/deidentifyTemplates/templateName`.
 * @param string $structuredDeidentifyTemplateName  The full resource name of the de-identify template for structured
 *                                                  files if you created one. This value must be in the format
 *                                                  `projects/projectName/(locations/locationId)/deidentifyTemplates/templateName`.
 * @param string $imageRedactTemplateName           The full resource name of the image redaction template for images if
 *                                                  you created one. This value must be in the format
 *                                                  `projects/projectName/(locations/locationId)/deidentifyTemplates/templateName`.
 * @param string $datasetId                         The ID of the BigQuery dataset where you want to store
 *                                                  the transformation details. If you don't provide a table ID, the
 *                                                  system automatically creates one.
 * @param string $tableId                           The ID of the BigQuery table where you want to store the
 *                                                  transformation details.
 */
function deidentify_cloud_storage(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $inputgcsPath = 'gs://YOUR_GOOGLE_STORAGE_BUCKET',
    string $outgcsPath = 'gs://YOUR_GOOGLE_STORAGE_BUCKET',
    string $deidentifyTemplateName = 'YOUR_DEIDENTIFY_TEMPLATE_NAME',
    string $structuredDeidentifyTemplateName = 'YOUR_STRUCTURED_DEIDENTIFY_TEMPLATE_NAME',
    string $imageRedactTemplateName = 'YOUR_IMAGE_REDACT_DEIDENTIFY_TEMPLATE_NAME',
    string $datasetId = 'YOUR_DATASET_ID',
    string $tableId = 'YOUR_TABLE_ID'
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$callingProjectId/locations/global";

    // Specify the GCS Path to be de-identify.
    $cloudStorageOptions = (new CloudStorageOptions())
        ->setFileSet((new FileSet())
            ->setUrl($inputgcsPath));
    $storageConfig = (new StorageConfig())
        ->setCloudStorageOptions(($cloudStorageOptions));

    // Specify the type of info the inspection will look for.
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes([
            (new InfoType())->setName('PERSON_NAME'),
            (new InfoType())->setName('EMAIL_ADDRESS')
        ]);

    // Specify the big query table to store the transformation details.
    $transformationDetailsStorageConfig = (new TransformationDetailsStorageConfig())
        ->setTable((new BigQueryTable())
            ->setProjectId($callingProjectId)
            ->setDatasetId($datasetId)
            ->setTableId($tableId));

    // Specify the de-identify template used for the transformation.
    $transformationConfig = (new TransformationConfig())
        ->setDeidentifyTemplate(
            DlpServiceClient::projectDeidentifyTemplateName($callingProjectId, $deidentifyTemplateName)
        )
        ->setStructuredDeidentifyTemplate(
            DlpServiceClient::projectDeidentifyTemplateName($callingProjectId, $structuredDeidentifyTemplateName)
        )
        ->setImageRedactTemplate(
            DlpServiceClient::projectDeidentifyTemplateName($callingProjectId, $imageRedactTemplateName)
        );

    $deidentify = (new Deidentify())
        ->setCloudStorageOutput($outgcsPath)
        ->setTransformationConfig($transformationConfig)
        ->setTransformationDetailsStorageConfig($transformationDetailsStorageConfig)
        ->setFileTypesToTransform([FileType::TEXT_FILE, FileType::IMAGE, FileType::CSV]);

    $action = (new Action())
        ->setDeidentify($deidentify);

    // Configure the inspection job we want the service to perform.
    $inspectJobConfig = (new InspectJobConfig())
        ->setInspectConfig($inspectConfig)
        ->setStorageConfig($storageConfig)
        ->setActions([$action]);

    // Send the job creation request and process the response.
    $createDlpJobRequest = (new CreateDlpJobRequest())
        ->setParent($parent)
        ->setInspectJob($inspectJobConfig);
    $job = $dlp->createDlpJob($createDlpJobRequest);

    $numOfAttempts = 10;
    do {
        printf('Waiting for job to complete' . PHP_EOL);
        sleep(30);
        $getDlpJobRequest = (new GetDlpJobRequest())
            ->setName($job->getName());
        $job = $dlp->getDlpJob($getDlpJobRequest);
        if ($job->getState() == JobState::DONE) {
            break;
        }
        $numOfAttempts--;
    } while ($numOfAttempts > 0);

    // Print finding counts.
    printf('Job %s status: %s' . PHP_EOL, $job->getName(), JobState::name($job->getState()));
    switch ($job->getState()) {
        case JobState::DONE:
            $infoTypeStats = $job->getInspectDetails()->getResult()->getInfoTypeStats();
            if (count($infoTypeStats) === 0) {
                printf('No findings.' . PHP_EOL);
            } else {
                foreach ($infoTypeStats as $infoTypeStat) {
                    printf(
                        '  Found %s instance(s) of infoType %s' . PHP_EOL,
                        $infoTypeStat->getCount(),
                        $infoTypeStat->getInfoType()->getName()
                    );
                }
            }
            break;
        case JobState::FAILED:
            printf('Job %s had errors:' . PHP_EOL, $job->getName());
            $errors = $job->getErrors();
            foreach ($errors as $error) {
                var_dump($error->getDetails());
            }
            break;
        case JobState::PENDING:
            printf('Job has not completed. Consider a longer timeout or an asynchronous execution model' . PHP_EOL);
            break;
        default:
            printf('Unexpected job state. Most likely, the job is either running or has not yet started.');
    }
}

Python

Per scoprire come installare e utilizzare la libreria client per Sensitive Data Protection, vedi Librerie client di Sensitive Data Protection.

import time
from typing import List

import google.cloud.dlp


def deidentify_cloud_storage(
    project: str,
    input_gcs_bucket: str,
    output_gcs_bucket: str,
    info_types: List[str],
    deid_template_id: str,
    structured_deid_template_id: str,
    image_redact_template_id: str,
    dataset_id: str,
    table_id: str,
    timeout: int = 300,
) -> None:
    """
    Uses the Data Loss Prevention API to de-identify files in a Google Cloud
    Storage directory.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_gcs_bucket: The name of google cloud storage bucket to inspect.
        output_gcs_bucket: The name of google cloud storage bucket where
            de-identified files would be stored.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        deid_template_id: The name of the de-identify template for
            unstructured and structured files.
        structured_deid_template_id: The name of the de-identify template
            for structured files.
        image_redact_template_id: The name of the image redaction template
            for images.
        dataset_id: The identifier of the BigQuery dataset where transformation
            details would be stored.
        table_id: The identifier of the BigQuery table where transformation
            details would be stored.
        timeout: The number of seconds to wait for a response from the API.
    """

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct the configuration dictionary.
    # Specify the type of info the inspection will look for.
    # See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct cloud_storage_options dictionary with the bucket's URL.
    storage_config = {
        "cloud_storage_options": {"file_set": {"url": f"gs://{input_gcs_bucket}"}}
    }

    # Specify the big query table to store the transformation details.
    big_query_table = {
        "project_id": project,
        "dataset_id": dataset_id,
        "table_id": table_id,
    }

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Construct Transformation Configuration with de-identify Templates used
    # for transformation.
    transformation_config = {
        "deidentify_template": f"{parent}/deidentifyTemplates/{deid_template_id}",
        "structured_deidentify_template": f"{parent}/deidentifyTemplates/{structured_deid_template_id}",
        "image_redact_template": f"{parent}/deidentifyTemplates/{image_redact_template_id}",
    }

    # Tell the API where to send notification when the job is completed.
    actions = [
        {
            "deidentify": {
                "cloud_storage_output": f"gs://{output_gcs_bucket}",
                "transformation_config": transformation_config,
                "transformation_details_storage_config": {"table": big_query_table},
                "file_types_to_transform": ["IMAGE", "CSV", "TEXT_FILE"],
            }
        }
    ]

    # Construct the job definition.
    inspect_job = {
        "inspect_config": inspect_config,
        "storage_config": storage_config,
        "actions": actions,
    }

    # Call the API.
    response = dlp.create_dlp_job(
        request={
            "parent": parent,
            "inspect_job": inspect_job,
        }
    )

    job_name = response.name
    print(f"Inspection Job started : {job_name}")

    # Waiting for the job to get completed.
    job = dlp.get_dlp_job(request={"name": job_name})
    # Since the sleep time is kept as 30s, number of calls would be timeout/30.
    no_of_attempts = timeout // 30
    while no_of_attempts != 0:
        # Check if the job has completed.
        if job.state == google.cloud.dlp_v2.DlpJob.JobState.DONE:
            break
        if job.state == google.cloud.dlp_v2.DlpJob.JobState.FAILED:
            print("Job Failed, Please check the configuration.")
            break

        # Sleep for a short duration before checking the job status again.
        time.sleep(30)
        no_of_attempts -= 1

        # Get DLP job status.
        job = dlp.get_dlp_job(request={"name": job_name})

    if job.state != google.cloud.dlp_v2.DlpJob.JobState.DONE:
        print(f"Job did not complete within {timeout} minutes.")
        return

    # Print out the results.
    print(f"Job name: {job.name}")
    result = job.inspect_details.result
    print(f"Processed Bytes: {result.processed_bytes}")
    if result.info_type_stats:
        for stats in result.info_type_stats:
            print(f"Info type: {stats.info_type.name}")
            print(f"Count: {stats.count}")
    else:
        print("No findings.")

REST

Input JSON

{
   "inspect_job": {
     "storage_config": {
       "cloud_storage_options": {
         "file_set": {
           "url": "INPUT_DIRECTORY"
         }
       }
     },
     "inspect_config": {
       "info_types": [
         {
           "name": "PERSON_NAME"
         }
       ]
     },
     "actions": {
       "deidentify": {
         "cloud_storage_output": "OUTPUT_DIRECTORY",
         "transformation_config": {
           "deidentify_template": "DEIDENTIFY_TEMPLATE_NAME",
           "structured_deidentify_template": "STRUCTURED_DEIDENTIFY_TEMPLATE_NAME",
           "image_redact_template": "IMAGE_REDACTION_TEMPLATE_NAME"
         },
         "transformation_details_storage_config": {
           "table": {
             "project_id": "TRANSFORMATION_DETAILS_PROJECT_ID",
             "dataset_id": "TRANSFORMATION_DETAILS_DATASET_ID",
             "table_id": "TRANSFORMATION_DETAILS_TABLE_ID"
           }
         },
         "fileTypesToTransform": ["IMAGE","CSV", "TEXT_FILE"]
       }
     }
   }
 }

Sostituisci quanto segue:

PROJECT_ID: l'ID del progetto in cui vuoi archiviare il job di ispezione.
INPUT_DIRECTORY: la directory Cloud Storage che vuoi esaminare, ad esempio gs://input-bucket/folder1/folder1a. Se l'URL termina con una barra finale, le sottodirectory all'interno di INPUT_DIRECTORY non vengono analizzate.
OUTPUT_DIRECTORY: la directory Cloud Storage in cui vuoi archiviare i file anonimizzati. Questa directory non deve trovarsi nello stesso bucket Cloud Storage di INPUT_DIRECTORY.
DEIDENTIFY_TEMPLATE_NAME: il nome completo della risorsa del modello di deidentificazione predefinito per i file non strutturati e strutturati, se ne hai creato uno. Questo valore deve essere nel formato projects/projectName/(locations/locationId)/deidentifyTemplates/templateName.
STRUCTURED_DEIDENTIFY_TEMPLATE_NAME: il nome completo della risorsa del modello di deidentificazione per i file strutturati, se ne hai creato uno. Questo valore deve essere nel formato projects/projectName/(locations/locationId)/deidentifyTemplates/templateName.
IMAGE_REDACTION_TEMPLATE_NAME: il nome completo della risorsa del modello di oscuramento delle immagini per le immagini, se ne hai creato uno. Questo valore deve essere nel formato projects/projectName/(locations/locationId)/deidentifyTemplates/templateName.
TRANSFORMATION_DETAILS_PROJECT_ID: l'ID del progetto in cui vuoi archiviare i dettagli della trasformazione.
TRANSFORMATION_DETAILS_DATASET_ID: l'ID del set di dati BigQuery in cui vuoi archiviare i dettagli della trasformazione. Se non fornisci un ID tabella, il sistema ne crea uno automaticamente.
TRANSFORMATION_DETAILS_TABLE_ID: l'ID della tabella BigQuery in cui vuoi archiviare i dettagli della trasformazione.

Tieni presente i seguenti oggetti:

inspectJob: l'oggetto di configurazione del job (DlpJob). Questo oggetto contiene la configurazione per le fasi di ispezione e deidentificazione.
storageConfig: la posizione dei contenuti da esaminare (StorageConfig). Questo esempio specifica un bucket Cloud Storage CloudStorageOptions.
inspectConfig: Informazioni sui dati sensibili che vuoi ispezionare per (InspectConfig). Questo esempio ispeziona i contenuti corrispondenti all'infoType integrato PERSON_NAME.
actions: le azioni da intraprendere dopo il completamento della parte di ispezione del lavoro (Action).
deidentify: se specifichi questa azione, Sensitive Data Protection deidentifica i dati sensibili corrispondenti in base alla configurazione specificata all'interno di (Deidentify).
cloud_storage_output: specifica l'URL della directory Cloud Storage che vuoi esaminare.
transformation_config: specifica in che modo Sensitive Data Protection deve anonimizzare i dati sensibili in file strutturati, non strutturati e immagini (TransformationConfig).

Se non includi un oggetto TransformationConfig, Sensitive Data Protection sostituisce i dati sensibili nel testo con il relativo infoType. Nelle immagini, copre i dati sensibili con un riquadro nero.
transformation_details_storage_config: specifica che Sensitive Data Protection deve archiviare i metadati relativi a ogni trasformazione eseguita per questo job. Inoltre, specifica la posizione e il nome della tabella in cui Sensitive Data Protection deve archiviare i metadati (TransformationDetailsStorageConfig).
fileTypesToTransform: limita l'operazione di anonimizzazione solo ai tipi di file che elenchi. Se non imposti questo campo, tutti i tipi di file supportati inclusi nell'operazione di ispezione vengono inclusi anche nell'operazione di deidentificazione. In questo esempio, Sensitive Data Protection de-identifica solo i file di immagine, CSV e di testo, anche se hai configurato DlpJob per esaminare tutti i tipi di file supportati.

Crea un job di ispezione tramite l'API REST

Per creare il job di ispezione (DlpJob), invia una richiesta projects.dlpJobs.create. Per inviare la richiesta utilizzando cURL, salva l'esempio REST precedente come file JSON ed esegui il comando seguente:

curl -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "X-Goog-User-Project: PROJECT_ID" \
https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs \
-d @PATH_TO_JSON_FILE

Sostituisci quanto segue:

PROJECT_ID: l'ID del progetto in cui hai archiviato DlpJob.
PATH_TO_JSON_FILE: il percorso del file JSON contenente il corpo della richiesta.

Sensitive Data Protection restituisce l'identificatore del nuovo DlpJob, il suo stato e uno snapshot della configurazione di ispezione che hai impostato.

{
  "name": "projects/PROJECT_ID/dlpJobs/JOB_ID",
  "type": "INSPECT_JOB",
  "state": "PENDING",
  ...
}

Recupera i risultati del job di ispezione

Per recuperare i risultati di DlpJob, invia una richiesta projects.dlpJobs.get:

curl -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "X-Goog-User-Project: PROJECT_ID" \
https://dlp.googleapis.com/v2/projects/PROJECT_ID/dlpJobs/JOB_ID

Sostituisci quanto segue:

PROJECT_ID: l'ID del progetto in cui hai archiviato DlpJob.
JOB_ID: l'ID del job restituito quando hai creato DlpJob.

Se l'operazione è completata, riceverai una risposta simile alla seguente:

{
  "name": "projects/PROJECT_ID/dlpJobs/JOB_ID",
  "type": "INSPECT_JOB",
  "state": "DONE",
  "inspectDetails": {
    "requestedOptions": {
      "snapshotInspectTemplate": {},
      "jobConfig": {
        "storageConfig": {
          "cloudStorageOptions": {
            "fileSet": {
              "url": "INPUT_DIRECTORY"
            }
          }
        },
        "inspectConfig": {
          "infoTypes": [
            {
              "name": "PERSON_NAME"
            }
          ],
          "limits": {}
        },
        "actions": [
          {
            "deidentify": {
              "transformationDetailsStorageConfig": {
                "table": {
                  "projectId": "TRANSFORMATION_DETAILS_PROJECT_ID",
                  "datasetId": "TRANSFORMATION_DETAILS_DATASET_ID",
                  "tableId": "TRANSFORMATION_DETAILS_TABLE_ID"
                }
              },
              "transformationConfig": {
                "deidentifyTemplate": "DEIDENTIFY_TEMPLATE_NAME",
                "structuredDeidentifyTemplate": "STRUCTURED_DEIDENTIFY_TEMPLATE_NAME",
                "imageRedactTemplate": "IMAGE_REDACTION_TEMPLATE_NAME"
              },
              "fileTypesToTransform": [
                "IMAGE",
                "CSV",
                "TEXT_FILE"
              ],
              "cloudStorageOutput": "OUTPUT_DIRECTORY"
            }
          }
        ]
      }
    },
    "result": {
      "processedBytes": "25242",
      "totalEstimatedBytes": "25242",
      "infoTypeStats": [
        {
          "infoType": {
            "name": "PERSON_NAME"
          },
          "count": "114"
        }
      ]
    }
  },
  "createTime": "2022-06-09T23:00:53.380Z",
  "startTime": "2022-06-09T23:01:27.986383Z",
  "endTime": "2022-06-09T23:02:00.443536Z",
  "actionDetails": [
    {
      "deidentifyDetails": {
        "requestedOptions": {
          "snapshotDeidentifyTemplate": {
            "name": "DEIDENTIFY_TEMPLATE_NAME",
            "createTime": "2022-06-09T17:46:34.208923Z",
            "updateTime": "2022-06-09T17:46:34.208923Z",
            "deidentifyConfig": {
              "infoTypeTransformations": {
                "transformations": [
                  {
                    "primitiveTransformation": {
                      "characterMaskConfig": {
                        "maskingCharacter": "*",
                        "numberToMask": 25
                      }
                    }
                  }
                ]
              }
            },
            "locationId": "global"
          },
          "snapshotStructuredDeidentifyTemplate": {
            "name": "STRUCTURED_DEIDENTIFY_TEMPLATE_NAME",
            "createTime": "2022-06-09T20:51:12.411456Z",
            "updateTime": "2022-06-09T21:07:53.633149Z",
            "deidentifyConfig": {
              "recordTransformations": {
                "fieldTransformations": [
                  {
                    "fields": [
                      {
                        "name": "Name"
                      }
                    ],
                    "primitiveTransformation": {
                      "replaceConfig": {
                        "newValue": {
                          "stringValue": "[redacted]"
                        }
                      }
                    }
                  }
                ]
              }
            },
            "locationId": "global"
          },
          "snapshotImageRedactTemplate": {
            "name": "IMAGE_REDACTION_TEMPLATE_NAME",
            "createTime": "2022-06-09T20:52:25.453564Z",
            "updateTime": "2022-06-09T20:52:25.453564Z",
            "deidentifyConfig": {},
            "locationId": "global"
          }
        },
        "deidentifyStats": {
          "transformedBytes": "3972",
          "transformationCount": "110"
        }
      }
    }
  ],
  "locationId": "global"
}

Passaggi successivi

Scopri di più sulla procedura di anonimizzazione dei dati archiviati.
Scopri come anonimizzare i dati nell'archivio utilizzando la console Google Cloud .
Segui il codelab Creazione di una copia anonimizzata dei dati in Cloud Storage.
Scopri di più sulle trasformazioni di anonimizzazione.
Scopri come ispezionare lo spazio di archiviazione per individuare dati sensibili.