本頁面說明如何使用 BigQuery 分析變異基因段。變異體是指已識別為與參照基因體不同的基因體區域。
以下範例說明如何針對每個樣本,運算每個染色體的 單核苷酸多型體 (SNP) 轉換與顛換比。
分析 Illumina Platinum Genomes 資料集中的變異基因段
以下範例使用 Illumina Platinum Genomes 專案中的資料。資料位於 BigQuery 的 platinum_genomes_deepvariant_variants_20180823
資料表中。
如要分析資料表中的變體,請完成下列步驟:
前往 Google Cloud 控制台的「BigQuery」頁面。
按一下 [Compose query] (撰寫查詢)。
複製以下查詢,然後貼到「New Query」(新查詢) 文字區域中:
#standardSQL -- -- Compute the transition/transversion ratio per sample and reference name. -- WITH filtered_snp_calls AS ( SELECT reference_name, c.name, CONCAT(reference_bases, '->', alternate_bases[ORDINAL(1)].alt) AS mutation FROM `bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823` AS v, UNNEST(v.call) AS c WHERE # Only include biallelic SNPs. reference_bases IN ('A','C','G','T') AND alternate_bases[ORDINAL(1)].alt IN ('A','C','G','T') AND (ARRAY_LENGTH(alternate_bases) = 1 OR (ARRAY_LENGTH(alternate_bases) = 2 AND alternate_bases[ORDINAL(2)].alt = '<*>')) # Skip homozygous reference calls and no-calls. AND EXISTS (SELECT g FROM UNNEST(c.genotype) AS g WHERE g > 0) AND NOT EXISTS (SELECT g FROM UNNEST(c.genotype) AS g WHERE g < 0) # Include only high quality calls. AND NOT EXISTS (SELECT ft FROM UNNEST(c.filter) ft WHERE ft NOT IN ('PASS', '.')) ), mutation_type_counts AS ( SELECT reference_name, name, SUM(CAST(mutation IN ('A->G', 'G->A', 'C->T', 'T->C') AS INT64)) AS transitions, SUM(CAST(mutation IN ('A->C', 'C->A', 'G->T', 'T->G', 'A->T', 'T->A', 'C->G', 'G->C') AS INT64)) AS transversions FROM filtered_snp_calls GROUP BY reference_name, name ) SELECT reference_name, name, transitions, transversions, transitions/transversions AS titv FROM mutation_type_counts WHERE transversions > 0 ORDER BY titv DESC, name
按一下 [Run query] (執行查詢)。查詢會傳回以下回應:
列 reference_name 名稱 轉換 顛換 titv 1 chr22 NA12892 35299 15017 2.3506026503296265 2 chr22 NA12889 34091 14624 2.331167943107221 3 chr17 NA12892 67297 28885 2.3298251687727194 4 chr22 NA12878 33627 14439 2.3289008934136715 5 chr22 NA12877 34751 14956 2.3235490772933938 6 chr22 NA12891 33534 14434 2.323264514341139 7 chr17 NA12877 70600 30404 2.3220628864623074 8 chr17 NA12878 66010 28475 2.3181738366988585 9 chr17 NA12890 67242 29057 2.314141170802216 10 chr17 NA12889 69767 30189 2.311007320547219 … ... ... ... ... ...
titv
欄會顯示轉換與顛換比。
後續步驟
- 如需更多如何使用 BigQuery 分析變異基因段的範例,請參閱教學課程。
- 瞭解 BigQuery 變異基因段資料表結構定義。
- 使用 R、RMarkdown 或 JavaScript 在 BigQuery 中分析變異基因段。
- 瞭解如何使用 R 執行查詢及以視覺化方式呈現結果。