本頁面由 Cloud Translation API 翻譯而成。

Dataproc 個人叢集驗證

建立 Dataproc 叢集時，您可以啟用 Dataproc 個人叢集驗證功能，讓叢集上的互動式工作負載能以使用者身分安全地執行。這表示與其他資源 (例如 Cloud Storage) 互動時，系統會以您的身分而非叢集服務帳戶進行驗證。 Google Cloud

注意事項

啟用個人叢集驗證功能後，只有您能使用叢集。其他使用者無法在叢集上執行工作，也無法存取叢集上的元件閘道端點。
啟用個人叢集驗證的叢集會封鎖 SSH 存取權，以及叢集中所有 VM 的啟動指令碼等 Compute Engine 功能。
啟用個人叢集驗證的叢集會自動啟用並設定叢集上的 Kerberos，確保叢集內通訊安全無虞。不過，叢集上的所有 Kerberos 身分都會以同一位使用者身分與資源互動。 Google Cloud
啟用個人叢集驗證的叢集不支援自訂映像檔。
Dataproc 個人叢集驗證功能不支援 Dataproc工作流程。
Dataproc 個人叢集驗證功能僅適用於由個人 (人類) 使用者執行的互動式工作。長時間執行的工作和作業應設定並使用適當的服務帳戶身分。
傳播的憑證會透過憑證存取權範圍縮小權限範圍。預設存取邊界僅限於讀取及寫入 Cloud Storage 物件，這些物件位於擁有叢集的專案所擁有的 Cloud Storage 值區中。enable_an_interactive_session時，您可以定義非預設存取邊界。
Dataproc 個人叢集驗證功能會使用 Compute Engine 訪客屬性。如果停用訪客屬性功能，個人叢集驗證就會失敗。

目標

建立已啟用 Dataproc 個人叢集驗證的 Dataproc 叢集。
開始將憑證傳播至叢集。
在叢集上使用 Jupyter Notebook 執行 Spark 工作，並以您的憑證進行驗證。

事前準備

建立專案

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Dataproc API.

Enable the API

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Dataproc API.

Enable the API

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

To initialize the gcloud CLI, run the following command:

gcloud init

設定環境

從 Cloud Shell 或本機終端機設定環境：

Cloud Shell

啟動 Cloud Shell 工作階段。

本機終端機

執行 gcloud auth login 取得有效的使用者憑證。

建立叢集並啟用互動工作階段

在 gcloud 中找出有效帳戶的電子郵件地址。

gcloud auth list --filter=status=ACTIVE --format="value(account)"

建立叢集。

gcloud dataproc clusters create CLUSTER_NAME \
    --properties=dataproc:dataproc.personal-auth.user=your-email-address \
    --enable-component-gateway \
    --optional-components=JUPYTER \
    --region=REGION

為叢集啟用憑證傳播工作階段，以便在與 Google Cloud資源互動時使用個人憑證。

gcloud dataproc clusters enable-personal-auth-session \
    --region=REGION \
    CLUSTER_NAME

輸出內容範例：

Injecting initial credentials into the cluster CLUSTER_NAME...done.
Periodically refreshing credentials for cluster CLUSTER_NAME. This will continue running until the command is interrupted...

範圍縮減的存取邊界範例：下列範例會啟用個人驗證工作階段，這類工作階段的限制比預設範圍縮減的憑證存取邊界更嚴格。這項功能會限制對 Dataproc 叢集暫存值區的存取權 (詳情請參閱「使用憑證存取權界線縮減範圍」)。

 gcloud dataproc clusters enable-personal-auth-session \
    --project=PROJECT_ID \
    --region=REGION \
    --access-boundary=<(echo -n "{ \
 \"access_boundary\": { \
    \"accessBoundaryRules\": [{ \
       \"availableResource\": \"//storage.googleapis.com/projects/_/buckets/$(gcloud dataproc clusters describe --project=PROJECT_ID --region=REGION CLUSTER_NAME --format="value(config.configBucket)")\", \
       \"availablePermissions\": [ \
       \"inRole:roles/storage.objectViewer\", \
       \"inRole:roles/storage.objectCreator\", \
       \"inRole:roles/storage.objectAdmin\", \
       \"inRole:roles/storage.legacyBucketReader\" \
       ] \
    }] \
 } \
 }") \
    CLUSTER_NAME

讓指令持續執行，並切換至新的 Cloud Shell 分頁或終端機工作階段。用戶端會在指令執行期間重新整理憑證。
輸入 Ctrl-C 即可結束工作階段。

存取叢集上的 Jupyter

gcloud

取得叢集詳細資料。

gcloud dataproc clusters describe CLUSTER_NAME --region=REGION

Jupyter 網頁介面網址會列在叢集詳細資料中。

...
JupyterLab: https://UUID-dot-us-central1.dataproc.googleusercontent.com/jupyter/lab/
...

將網址複製到本機瀏覽器，啟動 Jupyter UI。
確認個人叢集驗證是否成功。
1. 啟動 Jupyter 終端機。
2. 執行 gcloud auth list
3. 確認只有您的使用者名稱是有效帳戶。
在 Jupyter 終端機中，啟用 Jupyter 以透過 Kerberos 進行驗證，並提交 Spark 工作。
```
kinit -kt /etc/security/keytab/dataproc.service.keytab dataproc/$(hostname -f)
```
1. 執行 klist，確認 Jupyter 是否已取得有效的 TGT。
在 Juypter 終端機中，使用 gcloud CLI 在專案的 Cloud Storage bucket 中建立 rose.txt 檔案。
```
echo "A rose by any other name would smell as sweet" > /tmp/rose.txt
```
```
gcloud storage cp /tmp/rose.txt gs://bucket-name/rose.txt
```
1. 將檔案標示為私人檔案，這樣只有您的使用者帳戶可以讀取或寫入檔案。Jupyter 與 Cloud Storage 互動時，會使用您的個人憑證。
```
gcloud storage objects update gs://bucket-name/rose.txt --predefined-acl=private
```
2. 確認私人存取權。
```
gcloud storage objects describe gs://$BUCKET/rose.txt
```
```
acl:
```
email: $USER entity: user-$USER role: OWNER

從 Jupyter 執行 PySpark 工作

前往資料夾，然後建立 PySpark 筆記本。

針對您在上述步驟中建立的 rose.txt 檔案，執行基本字數統計工作。

text_file = sc.textFile("gs://bucket-name/rose.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
         .map(lambda word: (word, 1)) \
         .reduceByKey(lambda a, b: a + b)
print(counts.collect())

Spark 會使用您的使用者憑證執行作業，因此能夠讀取 Cloud Storage 中的 rose.txt 檔案。

您也可以查看 Cloud Storage Bucket 稽核記錄，確認工作是否以您的身分存取 Cloud Storage (詳情請參閱「Cloud Storage 的 Cloud 稽核記錄」)。

清除所用資源

刪除 Dataproc 叢集。

gcloud dataproc clusters delete CLUSTER_NAME --region=REGION

除非另有註明，否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權，程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。

上次更新時間：2025-07-10 (世界標準時間)。