使用 Cloud Life Sciences 處理基因體資料

本頁面說明如何執行基因體管道,使用 Cloud Life Sciences API 從包含 DNA 序列的二進位檔案 (BAM 檔案) 建立索引檔案 (BAI 檔案)。

BAM 檔案通常很大,使用基因體檢視器讀取可能需要很長時間。您可以使用 BAI 檔案,找出 BAM 檔案中包含您感興趣基因體位置的部分。

事前準備

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage JSON APIs.

    Enable the APIs

  5. Install the Google Cloud CLI.

  6. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  7. To initialize the gcloud CLI, run the following command:

    gcloud init
  8. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  9. Make sure that billing is enabled for your Google Cloud project.

  10. Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage JSON APIs.

    Enable the APIs

  11. Install the Google Cloud CLI.

  12. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

  13. To initialize the gcloud CLI, run the following command:

    gcloud init
  14. 或者,您可以使用已預先安裝 gcloud CLI 的 Cloud Shell

  15. 安裝 Python 3.8

    如果您使用 Windows,且在安裝 Google Cloud CLI 時將相關的核取方塊保留勾選,系統就會自動完成這項作業。

執行管道

如要執行管道,請完成下列步驟:

  1. 建立 bucket,用來儲存 BAI 檔案。「值區」是在 Cloud Storage 中保存資料的基本容器。如要建立名為 PROJECT_ID-life-sciences 的 bucket,請執行 gcloud storage buckets create 指令:

    gcloud storage buckets create gs://PROJECT_ID-life-sciences

    PROJECT_ID 替換為專案 ID。 Google Cloud 您必須使用全域不重複的值區名稱。

    如果成功,指令會傳回下列內容:

    Creating gs://PROJECT_ID-life-sciences
  2. 如要啟動管道,請執行 gcloud beta lifesciences pipelines run 指令:

    gcloud beta lifesciences pipelines run \
        --regions us-east1 \
        --command-line 'samtools index ${BAM} ${BAI}' \
        --docker-image "gcr.io/cloud-lifesciences/samtools" \
        --inputs BAM=gs://genomics-public-data/NA12878.chr20.sample.bam \
        --outputs BAI=gs://PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai

    如果成功,指令會傳回下列內容:

    Running [projects/PROJECT_ID/operations/OPERATION_ID]

    請記下 OPERATION_ID,下一個步驟會用到。

  3. 如要追蹤管道的狀態,請執行 gcloud beta lifesciences operations wait 指令。將 OPERATION_ID 換成上一步輸出的值。管道需要幾分鐘才會完成。

    gcloud beta lifesciences operations wait OPERATION_ID

    作業完成後,會傳回以下訊息:

    Waiting for [projects/PROJECT_ID/operations/OPERATION_ID]...done.
  4. 如要確認是否已產生 BAI 檔案,請執行 gcloud storage ls 指令:

    gcloud storage ls gs://PROJECT_ID-life-sciences

    如果成功,指令會傳回下列內容:

    gs://PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai

您已使用 Cloud Life Sciences API 執行管道,從 BAM 檔案建立 BAI 檔案。使用基因體檢視器,透過 NA12878.chr20.sample.bam.bai 索引檔案檢查 NA12878.chr20.sample.bam BAM 檔案。

清除所用資源

如要避免系統向您的 Google Cloud 帳戶收取本頁所用資源的費用,請按照下列步驟操作。

刪除 BAI 檔案

如要刪除所產生的 BAI 檔案,但保留您建立的專案和值區,請執行 gcloud storage rm 指令:

gcloud storage rm PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai

刪除值區

如果您已建立本快速入門導覽課程專用的值區,且不再需要該值區,但想保留專案,那麼請使用 gcloud storage rm 指令刪除值區。刪除值區也會一併刪除產生的 BAI 檔案。

gcloud storage rm gs://PROJECT_ID-life-sciences --recursive

刪除專案

如果您已建立本快速入門導覽課程專用的專案,且不再需要該專案,那麼可刪除該專案。刪除專案時也會一併刪除 BAI 檔案和 Cloud Storage 值區。

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

還順利嗎?

後續步驟