使用 bq 工具

在本教程中，您将学习如何使用 bq（一款适用于 BigQuery 基于 Python 的命令行界面 [CLI] 工具）来创建数据集、加载示例数据和查询表。完成本教程后，您将熟悉 bq 以及如何使用 CLI 处理 BigQuery。

如需所有 bq 命令和标志的完整参考信息，请参阅 bq 命令行工具参考文档。

如需在 Google Cloud 控制台中直接遵循有关此任务的分步指导，请点击操作演示：

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

如果您没有为本教程中使用的 Google Cloud 项目启用结算，则需要在 BigQuery 沙盒中加载和查询数据。借助 BigQuery 沙盒，您可以免费使用限定的 BigQuery 功能，了解 BigQuery。

确保已启用 BigQuery API。

启用 API

如果您创建了一个新项目，则系统会自动启用 BigQuery API。

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

下载包含源数据的文件

您要下载的文件包含约 7 MB 的热门婴儿名字数据。由美国社会保障管理局提供。

如需详细了解数据，请参阅社会保障管理局的热门名字背景信息。

在新的浏览器标签页中打开以下网址，下载美国社会保障管理局的数据：
```
https://www.ssa.gov/OACT/babynames/names.zip
```
提取文件。

如需详细了解数据集架构，请参阅您提取的 NationalReadMe.pdf 文件。
如需查看数据，请打开 yob2024.txt 文件。此文件包含姓名、出生时指定的性别和使用该姓名的儿童人数的英文逗号分隔值。该文件没有标题行。
将该文件移至您的工作目录中。
- 如果您使用的是 Cloud Shell，请点击更多上传，然后点击选择文件，选择 yob2024.txt 文件并点击上传。
- 如果您在使用本地 shell，请将 yob2024.txt 文件复制或移动到运行 bq 工具的目录中。

创建数据集

如果您是从文档中启动 Cloud Shell 的，请输入以下命令来设置项目 ID。这样一来，您就不必在每个 CLI 命令中指定项目 ID。
```
gcloud config set project PROJECT_ID
```
将 PROJECT_ID 替换为您的项目 ID。

输入以下命令以创建名为 babynames 的数据集：
```
bq mk --dataset babynames
```
输出类似于以下内容：
```
Dataset 'babynames' successfully created.
```
确认 babynames 数据集现已显示在项目中：
```
bq ls --datasets=true
```
输出类似于以下内容：
```
  datasetId
-------------
  babynames
```

将数据加载到表中

在 babynames 数据集中，将源文件 yob2024.txt 加载到名为 names2024 的新表中：

bq load babynames.names2024 yob2024.txt name:string,assigned_sex_at_birth:string,count:integer

输出类似于以下内容：

Upload complete.
Waiting on bqjob_r3c045d7cbe5ca6d2_0000018292f0815f_1 ... (1s) Current status: DONE

确认 names2024 表现已显示在 babynames 数据集内：

bq ls --format=pretty babynames

输出类似于以下内容：为简化输出，省略了某些列。

+-----------+-------+
|  tableId  | Type  |
+-----------+-------+
| names2024 | TABLE |
+-----------+-------+

确认新表 names2024 的表架构包含 name: string、assigned_sex_at_birth: string 和 count: integer：

bq show babynames.names2024

输出类似于以下内容：为简化输出，省略了某些列。

  Last modified        Schema                      Total Rows   Total Bytes
----------------- ------------------------------- ------------ ------------
14 Mar 17:16:45   |- name: string                    31904       607494
                  |- assigned_sex_at_birth: string
                  |- count: integer

查询表数据

确定数据中最热门的女孩姓名：

bq query \
    'SELECT
      name,
      count
    FROM
      babynames.names2024
    WHERE
      assigned_sex_at_birth = "F"
    ORDER BY
      count DESC
    LIMIT 5'

输出类似于以下内容：

+-----------+-------+
|   name    | count |
+-----------+-------+
| Olivia    | 14718 |
| Emma      | 13485 |
| Amelia    | 12740 |
| Charlotte | 12552 |
| Mia       | 12113 |
+-----------+-------+

确定数据中最罕见的男孩姓名：

bq query \
    'SELECT
      name,
      count
    FROM
      babynames.names2024
    WHERE
      assigned_sex_at_birth = "M"
    ORDER BY
      count ASC
    LIMIT 5'

输出类似于以下内容：

+---------+-------+
|  name   | count |
+---------+-------+
| Aaran   |     5 |
| Aadiv   |     5 |
| Aadarsh |     5 |
| Aarash  |     5 |
| Aadrik  |     5 |
+---------+-------+

最小计数为 5，因为源数据省略了出现次数少于 5 次的名字。

清理

为避免因本页面中使用的资源导致您的 Google Cloud 账号产生费用，请删除包含这些资源的 Google Cloud 项目。

删除项目

如果您使用 BigQuery 沙盒查询公共数据集，则您的项目不会启用结算功能，因此您无需删除该项目。

为了避免产生费用，最简单的方法是删除您为本教程创建的项目。

要删除项目，请执行以下操作：

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

删除资源

如果您使用的是现有项目，请删除您创建的资源：

删除 babynames 数据集：
```
bq rm --recursive=true babynames
```
--recursive 标志会删除数据集中的所有表，包括 names2024 表。

输出类似于以下内容：
```
rm: remove dataset 'myproject:babynames'? (y/N)
```
如需确认删除命令，请输入 y。

后续步骤

详细了解如何使用 bq 工具。
了解 BigQuery 沙盒。
详细了解如何将数据加载到 BigQuery 中。
详细了解如何在 BigQuery 中查询数据。