使用 Cloud Life Sciences 處理基因體資料
本頁面說明如何執行基因體管道,使用 Cloud Life Sciences API 從包含 DNA 序列的二進位檔案 (BAM 檔案) 建立索引檔案 (BAI 檔案)。
BAM 檔案通常很大,使用基因體檢視器讀取可能需要很長時間。您可以使用 BAI 檔案,找出 BAM 檔案中包含您感興趣基因體位置的部分。
事前準備
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage JSON APIs.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Cloud Life Sciences, Compute Engine, and Cloud Storage JSON APIs.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- 安裝 Python 3.8。
如果您使用 Windows,且在安裝 Google Cloud CLI 時將相關的核取方塊保留勾選,系統就會自動完成這項作業。
或者,您可以使用已預先安裝 gcloud CLI 的 Cloud Shell。
執行管道
如要執行管道,請完成下列步驟:
建立 bucket,用來儲存 BAI 檔案。「值區」是在 Cloud Storage 中保存資料的基本容器。如要建立名為
PROJECT_ID-life-sciences
的 bucket,請執行gcloud storage buckets create
指令:gcloud storage buckets create gs://PROJECT_ID-life-sciences
將 PROJECT_ID 替換為專案 ID。 Google Cloud 您必須使用全域不重複的值區名稱。
如果成功,指令會傳回下列內容:
Creating gs://PROJECT_ID-life-sciences
如要啟動管道,請執行
gcloud beta lifesciences pipelines run
指令:gcloud beta lifesciences pipelines run \ --regions us-east1 \ --command-line 'samtools index ${BAM} ${BAI}' \ --docker-image "gcr.io/cloud-lifesciences/samtools" \ --inputs BAM=gs://genomics-public-data/NA12878.chr20.sample.bam \ --outputs BAI=gs://PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai
如果成功,指令會傳回下列內容:
Running [projects/PROJECT_ID/operations/OPERATION_ID]
請記下 OPERATION_ID,下一個步驟會用到。
如要追蹤管道的狀態,請執行
gcloud beta lifesciences operations wait
指令。將 OPERATION_ID 換成上一步輸出的值。管道需要幾分鐘才會完成。gcloud beta lifesciences operations wait OPERATION_ID
作業完成後,會傳回以下訊息:
Waiting for [projects/PROJECT_ID/operations/OPERATION_ID]...done.
如要確認是否已產生 BAI 檔案,請執行
gcloud storage ls
指令:gcloud storage ls gs://PROJECT_ID-life-sciences
如果成功,指令會傳回下列內容:
gs://PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai
您已使用 Cloud Life Sciences API 執行管道,從 BAM 檔案建立 BAI 檔案。使用基因體檢視器,透過 NA12878.chr20.sample.bam.bai
索引檔案檢查 NA12878.chr20.sample.bam
BAM 檔案。
清除所用資源
如要避免系統向您的 Google Cloud 帳戶收取本頁所用資源的費用,請按照下列步驟操作。
刪除 BAI 檔案
如要刪除所產生的 BAI 檔案,但保留您建立的專案和值區,請執行 gcloud storage rm
指令:
gcloud storage rm PROJECT_ID-life-sciences/NA12878.chr20.sample.bam.bai
刪除值區
如果您已建立本快速入門導覽課程專用的值區,且不再需要該值區,但想保留專案,那麼請使用 gcloud storage rm
指令刪除值區。刪除值區也會一併刪除產生的 BAI 檔案。
gcloud storage rm gs://PROJECT_ID-life-sciences --recursive
刪除專案
如果您已建立本快速入門導覽課程專用的專案,且不再需要該專案,那麼可刪除該專案。刪除專案時也會一併刪除 BAI 檔案和 Cloud Storage 值區。
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.