Questa pagina è stata tradotta dall'API Cloud Translation.

Elaborazione di immagini satellitari Landsat con le GPU

Questo tutorial mostra come utilizzare le GPU su Dataflow per elaborare le immagini satellitari Landsat 8 e visualizzarle come file JPEG. Il tutorial si basa sull'esempio Elaborazione di immagini satellitari Landsat con le GPU.

Obiettivi

Crea un'immagine Docker per Dataflow che includa TensorFlow con supporto GPU.
Esegui un job Dataflow con le GPU.

Costi

Questo tutorial utilizza componenti fatturabili di Google Cloud, tra cui:

Cloud Storage
Dataflow
Artifact Registry

Utilizza il calcolatore prezzi per generare una stima dei costi in base all'utilizzo previsto.

Prima di iniziare

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the Dataflow, Cloud Build, and Artifact Registry APIs:

gcloud services enable dataflow cloudbuild.googleapis.com artifactregistry.googleapis.com

If you're using a local shell, then create local authentication credentials for your user account:

gcloud auth application-default login

You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace PROJECT_ID with your project ID.
Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
Replace ROLE with each individual role.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the Dataflow, Cloud Build, and Artifact Registry APIs:

gcloud services enable dataflow cloudbuild.googleapis.com artifactregistry.googleapis.com

If you're using a local shell, then create local authentication credentials for your user account:

gcloud auth application-default login

You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace PROJECT_ID with your project ID.
Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
Replace ROLE with each individual role.

Concedi ruoli al account di servizio Compute Engine predefinito. Esegui il seguente comando una volta per ciascuno dei seguenti ruoli IAM: roles/dataflow.admin, roles/dataflow.worker, roles/bigquery.dataEditor, roles/pubsub.editor, roles/storage.objectAdmin, e roles/artifactregistry.reader.
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" --role=SERVICE_ACCOUNT_ROLE
```
- Sostituisci PROJECT_ID con l'ID progetto.
- Sostituisci PROJECT_NUMBER con il numero del tuo progetto. Per trovare il numero di progetto, consulta Identificare i progetti.
- Sostituisci SERVICE_ACCOUNT_ROLE con ogni singolo ruolo.
Per archiviare i file immagine JPEG di output di questo tutorial, crea un bucket Cloud Storage:
1. In the Google Cloud console, go to the Cloud Storage Buckets page.
  Go to Buckets
2. Click Create.
3. On the Create a bucket page, enter your bucket information. To go to the next step, click Continue.
  1. For Name your bucket, enter a unique bucket name. Don't include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
  2. In the Choose where to store your data section, do the following:
    1. Select a Location type.
    2. Choose a location where your bucket's data is permanently stored from the Location type drop-down menu.
      - If you select the dual-region location type, you can also choose to enable turbo replication by using the relevant checkbox.
    3. To set up cross-bucket replication, select Add cross-bucket replication via Storage Transfer Service and follow these steps:
      Set up cross-bucket replication
      
      In the Bucket menu, select a bucket.
      
      In the Replication settings section, click Configure to configure settings for the replication job.
      
      The Configure cross-bucket replication pane appears.
      
      To filter objects to replicate by object name prefix, enter a prefix that you want to include or exclude objects from, then click Add a prefix.
      
      To set a storage class for the replicated objects, select a storage class from the Storage class menu. If you skip this step, the replicated objects will use the destination bucket's storage class by default.
      
      Click Done.
  3. In the Choose how to store your data section, do the following:
    1. In the Set a default class section, select the following: Standard.
    2. To enable hierarchical namespace, in the Optimize storage for data-intensive workloads section, select Enable hierarchical namespace on this bucket.
      Note: You cannot enable hierarchical namespace in existing buckets.
  4. In the Choose how to control access to objects section, select whether or not your bucket enforces public access prevention, and select an access control method for your bucket's objects.
    Note: You cannot change the Prevent public access setting if this setting is enforced at an organization policy.
  5. In the Choose how to protect object data section, do the following:
    - Select any of the options under Data protection that you want to set for your bucket.
      - To enable soft delete, click the Soft delete policy (For data recovery) checkbox, and specify the number of days you want to retain objects after deletion.
      - To set Object Versioning, click the Object versioning (For version control) checkbox, and specify the maximum number of versions per object and the number of days after which the noncurrent versions expire.
      - To enable the retention policy on objects and buckets, click the Retention (For compliance) checkbox, and then do the following:
        
        To enable Object Retention Lock, click the Enable object retention checkbox.
        
        To enable Bucket Lock, click the Set bucket retention policy checkbox, and choose a unit of time and a length of time for your retention period.
    - To choose how your object data will be encrypted, expand the Data encryption section (), and select a Data encryption method.
4. Click Create.

Prepara l'ambiente di lavoro

Scarica i file di base, quindi crea il repository Artifact Registry.

Scaricare i file iniziali

Scarica i file iniziali e poi cambia directory.

Clona il repository python-docs-samples.

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

Vai alla directory del codice di esempio.

cd python-docs-samples/dataflow/gpu-examples/tensorflow-landsat

Configura Artifact Registry

Crea un repository Artifact Registry per poter caricare gli artefatti. Ogni repository può contenere artefatti per un singolo formato supportato.

Tutti i contenuti del repository sono criptati utilizzando Google-owned and Google-managed encryption keys o chiavi di crittografia gestite dal cliente. Artifact Registry utilizza Google-owned and Google-managed encryption keys per impostazione predefinita e non è richiesta alcuna configurazione per questa opzione.

Devi disporre almeno dell'accesso di scrittura di Artifact Registry al repository.

Esegui questo comando per creare un nuovo repository. Il comando utilizza il flag --async e viene restituito immediatamente, senza attendere il completamento dell'operazione in corso.

gcloud artifacts repositories create REPOSITORY \
    --repository-format=docker \
    --location=LOCATION \
    --async

Sostituisci REPOSITORY con un nome per il repository. Per ogni posizione del repository in un progetto, i nomi dei repository devono essere univoci.

Prima di eseguire il push o il pull delle immagini, configura Docker per autenticare le richieste per Artifact Registry. Per configurare l'autenticazione nei repository Docker, esegui questo comando:

gcloud auth configure-docker LOCATION-docker.pkg.dev

Il comando aggiorna la configurazione Docker. Ora puoi connetterti con Artifact Registry nel tuo progetto Google Cloud per eseguire il push delle immagini.

Crea l'immagine Docker

Cloud Build ti consente di creare un'immagine Docker utilizzando un Dockerfile e salvarla in Artifact Registry, dove l'immagine è accessibile ad altri prodottiGoogle Cloud .

Crea l'immagine container utilizzando il file di configurazione build.yaml.

gcloud builds submit --config build.yaml

Esegui il job Dataflow con le GPU

Il seguente blocco di codice mostra come avviare questa pipeline Dataflow con le GPU.

Eseguiamo la pipeline Dataflow utilizzando il file di configurazione run.yaml.

export PROJECT=PROJECT_NAME
export BUCKET=BUCKET_NAME

export JOB_NAME="satellite-images-$(date +%Y%m%d-%H%M%S)"
export OUTPUT_PATH="gs://$BUCKET/samples/dataflow/landsat/output-images/"
export REGION="us-central1"
export GPU_TYPE="nvidia-tesla-t4"

gcloud builds submit \
    --config run.yaml \
    --substitutions _JOB_NAME=$JOB_NAME,_OUTPUT_PATH=$OUTPUT_PATH,_REGION=$REGION,_GPU_TYPE=$GPU_TYPE \
    --no-source

Sostituisci quanto segue:

PROJECT_NAME: il Google Cloud nome del progetto
BUCKET_NAME: il nome del bucket Cloud Storage (senza il prefisso gs://)

Dopo aver eseguito questa pipeline, attendi il completamento del comando. Se esci dalla shell, potresti perdere le variabili di ambiente che hai impostato.

Per evitare di condividere la GPU tra più processi worker, questo esempio utilizza un tipo di macchina con 1 vCPU. I requisiti di memoria della pipeline vengono soddisfatti utilizzando 13 GB di memoria estesa. Per saperne di più, leggi GPU e parallelismo dei worker.

Visualizza i tuoi risultati

La pipeline in tensorflow-landsat/main.py elabora le immagini satellitari Landsat 8 e le esegue il rendering come file JPEG. Per visualizzare questi file, segui questi passaggi.

Elenca i file JPEG di output con i dettagli utilizzando Google Cloud CLI.

gcloud storage ls "gs://$BUCKET/samples/dataflow/landsat/" --long --readable-sizes

Copia i file nella directory locale.

mkdir outputs
gcloud storage cp "gs://$BUCKET/samples/dataflow/landsat/*" outputs/

Apri questi file immagine con il visualizzatore di immagini che preferisci.

Esegui la pulizia

Per evitare che al tuo account Google Cloud vengano addebitati costi relativi alle risorse utilizzate in questo tutorial, elimina il progetto che contiene le risorse oppure mantieni il progetto ed elimina le singole risorse.

Elimina il progetto

Il modo più semplice per eliminare la fatturazione è quello di eliminare il progetto creato per il tutorial.

Per eliminare il progetto:

Attenzione: l'eliminazione di un progetto ha i seguenti effetti:

L'intero contenuto del progetto viene eliminato. Se hai utilizzato un progetto esistente per le attività descritte in questo documento, quando lo elimini, elimini anche tutto il lavoro che hai svolto nel progetto.
Gli ID progetto personalizzati non sono più disponibili. Quando hai creato questo progetto, potresti aver creato un ID progetto personalizzato che vuoi utilizzare in futuro. Per conservare gli URL che utilizzano l'ID progetto, ad esempio un URL appspot.com, elimina le risorse selezionate all'interno del progetto anziché eliminare l'intero progetto.

Se intendi esplorare più architetture, tutorial o guide rapide, puoi riutilizzare i progetti ed evitare così di superare i limiti di quota.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Passaggi successivi

Consulta un esempio di TensorFlow con GPU abilitata minimale
Guarda un esempio PyTorch minimo abilitato per la GPU
Scopri di più sul supporto delle GPU su Dataflow.
Esamina le attività per Utilizzo delle GPU.
Esplora architetture di riferimento, diagrammi e best practice su Google Cloud. Consulta il nostro Cloud Architecture Center.