GPU を使用した Landsat 衛星画像の処理

このチュートリアルでは、Dataflow で GPU を使用して Landsat 8 の衛星画像を処理し、それを JPEG ファイルとしてレンダリングする方法を説明します。このチュートリアルは、GPU を使用した Landsat 衛星画像の処理の例に基づいています。

目標

GPU サポートがある TensorFlow を含む Dataflow の Docker イメージをビルドする。
GPU を使用して Dataflow ジョブを実行する。

費用

このチュートリアルでは、以下を含む、 Google Cloudの課金対象となるコンポーネントを使用します。

Cloud Storage
Dataflow
Artifact Registry

料金計算ツールを使うと、予想使用量に基づいて費用の見積もりを出すことができます。

始める前に

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the Dataflow, Cloud Build, and Artifact Registry APIs:

gcloud services enable dataflow cloudbuild.googleapis.com artifactregistry.googleapis.com

If you're using a local shell, then create local authentication credentials for your user account:

gcloud auth application-default login

You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace PROJECT_ID with your project ID.
Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
Replace ROLE with each individual role.

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

Create or select a Google Cloud project.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Make sure that billing is enabled for your Google Cloud project.

Enable the Dataflow, Cloud Build, and Artifact Registry APIs:

gcloud services enable dataflow cloudbuild.googleapis.com artifactregistry.googleapis.com

If you're using a local shell, then create local authentication credentials for your user account:

gcloud auth application-default login

You don't need to do this if you're using Cloud Shell.

If an authentication error is returned, and you are using an external identity provider (IdP), confirm that you have signed in to the gcloud CLI with your federated identity.

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/iam.serviceAccountUser

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace PROJECT_ID with your project ID.
Replace USER_IDENTIFIER with the identifier for your user account. For example, user:myemail@example.com.
Replace ROLE with each individual role.

Compute Engine のデフォルトサービスアカウントにロールを付与します。IAM ロール roles/dataflow.admin、roles/dataflow.worker、roles/bigquery.dataEditor、roles/pubsub.editor、roles/storage.objectAdmin、roles/artifactregistry.reader ごとに次のコマンドを 1 回実行します。
```
gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" --role=SERVICE_ACCOUNT_ROLE
```
- PROJECT_ID は、実際のプロジェクト ID に置き換えます。
- PROJECT_NUMBER は、使用するプロジェクト番号に置き換えます。プロジェクト番号の確認方法については、プロジェクトを特定するをご覧ください。
- SERVICE_ACCOUNT_ROLE は、個々のロールに置き換えます。
To store the output JPEG image files from this tutorial, create a Cloud Storage bucket:
1. In the Google Cloud console, go to the Cloud Storage Buckets page.
  Go to Buckets
2. Click Create.
3. On the Create a bucket page, enter your bucket information. To go to the next step, click Continue.
  1. For Name your bucket, enter a unique bucket name. Don't include sensitive information in the bucket name, because the bucket namespace is global and publicly visible.
  2. In the Choose where to store your data section, do the following:
    1. Select a Location type.
    2. Choose a location where your bucket's data is permanently stored from the Location type drop-down menu.
      - If you select the dual-region location type, you can also choose to enable turbo replication by using the relevant checkbox.
    3. To set up cross-bucket replication, select Add cross-bucket replication via Storage Transfer Service and follow these steps:
      Set up cross-bucket replication
      
      In the Bucket menu, select a bucket.
      
      In the Replication settings section, click Configure to configure settings for the replication job.
      
      The Configure cross-bucket replication pane appears.
      
      To filter objects to replicate by object name prefix, enter a prefix that you want to include or exclude objects from, then click Add a prefix.
      
      To set a storage class for the replicated objects, select a storage class from the Storage class menu. If you skip this step, the replicated objects will use the destination bucket's storage class by default.
      
      Click Done.
  3. In the Choose how to store your data section, do the following:
    1. In the Set a default class section, select the following: Standard.
    2. To enable hierarchical namespace, in the Optimize storage for data-intensive workloads section, select Enable hierarchical namespace on this bucket.
      Note: You cannot enable hierarchical namespace in existing buckets.
  4. In the Choose how to control access to objects section, select whether or not your bucket enforces public access prevention, and select an access control method for your bucket's objects.
    Note: You cannot change the Prevent public access setting if this setting is enforced at an organization policy.
  5. In the Choose how to protect object data section, do the following:
    - Select any of the options under Data protection that you want to set for your bucket.
      - To enable soft delete, click the Soft delete policy (For data recovery) checkbox, and specify the number of days you want to retain objects after deletion.
      - To set Object Versioning, click the Object versioning (For version control) checkbox, and specify the maximum number of versions per object and the number of days after which the noncurrent versions expire.
      - To enable the retention policy on objects and buckets, click the Retention (For compliance) checkbox, and then do the following:
        
        To enable Object Retention Lock, click the Enable object retention checkbox.
        
        To enable Bucket Lock, click the Set bucket retention policy checkbox, and choose a unit of time and a length of time for your retention period.
    - To choose how your object data will be encrypted, expand the Data encryption section (), and select a Data encryption method.
4. Click Create.

作業環境を準備する

スターターファイルをダウンロードして、Artifact Registry リポジトリを作成します。

スターターファイルをダウンロードする

スターターファイルをダウンロードして、ディレクトリを変更します。

python-docs-samples リポジトリのクローンを作成します。

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

サンプルコードのディレクトリに移動します。

cd python-docs-samples/dataflow/gpu-examples/tensorflow-landsat

Artifact Registry を構成する

アーティファクトをアップロードできるように、Artifact Registry リポジトリを作成します。各リポジトリには、サポートされている単一の形式のアーティファクトを含めることができます。

すべてのリポジトリのコンテンツは、 Google-owned and Google-managed encryption keys または顧客管理の暗号鍵を使用して暗号化されます。Artifact Registry ではGoogle-owned and Google-managed encryption keys がデフォルトで使用されるため、このオプションの構成は不要です。

少なくとも、リポジトリに対する Artifact Registry 書き込みアクセス権が必要です。

次のコマンドを実行して新しいリポジトリを作成します。このコマンドは --async フラグを使用します。実行中のオペレーションの完了を待たずにすぐに戻ります。

gcloud artifacts repositories create REPOSITORY \
    --repository-format=docker \
    --location=LOCATION \
    --async

REPOSITORY は、リポジトリの名前に置き換えます。プロジェクト内のリポジトリのロケーションごとに、リポジトリ名は一意でなければなりません。

イメージを push または pull する前に、Artifact Registry に対するリクエストを認証するように Docker を構成します。Docker リポジトリの認証を設定するには、次のコマンドを実行します。

gcloud auth configure-docker LOCATION-docker.pkg.dev

このコマンドにより、Docker 構成が更新されます。これで、 Google Cloud プロジェクトの Artifact Registry に接続して、イメージを push できるようになりました。

Docker イメージをビルドする

Cloud Build を使用すると、Dockerfile を使用して Docker イメージをビルドし、そのイメージを Artifact Registry に保存できるので、これでそのイメージが他のGoogle Cloud プロダクトからアクセス可能になります。

build.yaml 構成ファイルを使用してコンテナイメージをビルドします。

gcloud builds submit --config build.yaml

GPU を使用して Dataflow ジョブを実行する

次のコードブロックは、GPU を使用してこの Dataflow パイプラインを起動する方法を示しています。

この Dataflow パイプラインの実行には run.yaml 構成ファイルを使用します。

export PROJECT=PROJECT_NAME
export BUCKET=BUCKET_NAME

export JOB_NAME="satellite-images-$(date +%Y%m%d-%H%M%S)"
export OUTPUT_PATH="gs://$BUCKET/samples/dataflow/landsat/output-images/"
export REGION="us-central1"
export GPU_TYPE="nvidia-tesla-t4"

gcloud builds submit \
    --config run.yaml \
    --substitutions _JOB_NAME=$JOB_NAME,_OUTPUT_PATH=$OUTPUT_PATH,_REGION=$REGION,_GPU_TYPE=$GPU_TYPE \
    --no-source

次のように置き換えます。

PROJECT_NAME: Google Cloud プロジェクト名
BUCKET_NAME: Cloud Storage バケット名（gs:// 接頭辞なし）。

このパイプラインを実行したら、コマンドが完了するまで待ちます。シェルを終了すると、設定した環境変数が失われる可能性があります。

複数のワーカープロセス間で GPU を共有しないようにするため、このサンプルでは 1 つの vCPU を持つマシンタイプを使用します。パイプラインのメモリ要件は、13 GB の拡張メモリを使用することで対応できます。詳細については、GPU とワーカーの並列処理をご覧ください。

結果を表示する

tensorflow-landsat/main.py のパイプラインは、Landsat 8 の衛星画像を処理し、JPEG ファイルとしてレンダリングします。これらのファイルを表示する手順は次のとおりです。

Google Cloud CLI を使用して、出力された JPEG ファイルの詳細を一覧表示します。
```
gcloud storage ls "gs://$BUCKET/samples/dataflow/landsat/" --long --readable-sizes
```

ファイルをローカルディレクトリにコピーします。

mkdir outputs
gcloud storage cp "gs://$BUCKET/samples/dataflow/landsat/*" outputs/

選択した画像ビューアでこれらの画像ファイルを開きます。

クリーンアップ

このチュートリアルで使用したリソースについて、Google Cloud アカウントに課金されないようにするには、リソースを含むプロジェクトを削除するか、プロジェクトを維持して個々のリソースを削除します。

プロジェクトの削除

課金されないようにする最も簡単な方法は、チュートリアル用に作成したプロジェクトを削除することです。

プロジェクトを削除するには:

注意: プロジェクトを削除すると、次のような影響があります。

プロジェクト内のすべてのものが削除されます。このドキュメントのタスクで既存のプロジェクトを使用した場合、それを削除すると、そのプロジェクトで行った他の作業もすべて削除されます。
カスタムプロジェクト ID が失われます。このプロジェクトを作成したときに、将来使用するカスタムプロジェクト ID を作成した可能性があります。そのプロジェクト ID を使用した URL（たとえば、appspot.com）を保持するには、プロジェクト全体ではなくプロジェクト内の選択したリソースだけを削除します。

複数のアーキテクチャ、チュートリアル、クイックスタートを実施する予定がある場合は、プロジェクトを再利用することで、プロジェクトの割り当て上限の超過を避けられます。

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

次のステップ

最小限の GPU 対応 TensorFlow サンプルを参照する。
最小限の GPU 対応の PyTorch サンプルを参照する。
Dataflow での GPU サポートについて確認する。
GPU の使用のタスクを確認する。
Google Cloud に関するリファレンスアーキテクチャ、図、ベストプラクティスを確認する。Cloud アーキテクチャセンターをご覧ください。