Managing table data

This document describes how to manage table data in BigQuery. You can work with BigQuery table data in the following ways:

  • Load data into a table
  • Append to or overwrite table data
  • Browse (or preview) table data
  • Query table data
  • Modify table data using data manipulation language (DML)
  • Copy table data
  • Export table data

For information on managing table schemas, see Modifying table schemas.

Before you begin

Grant roles that give the necessary permissions to users who need to perform each task in this document. Permissions required (if any) to perform a task are listed in the "Required permissions" section of the task.

Loading data into a table

You can load data when you create a table, or you can create an empty table and load the data later. When you load data, you can use schema auto-detect for supported data formats, or you can specify the schema.

For more information on loading data, see the documentation for your source data's format and location:

Appending to and overwriting table data

You can overwrite table data using a load or query operation. You can append additional data to an existing table by performing a load-append operation or by appending query results to the table.

For more information on appending to or overwriting a table when loading data, see the documentation for your source data format:

To append to or overwrite a table using query results, specify a destination table and set the write disposition to either:

  • Append to table — Appends the query results to an existing table.
  • Overwrite table — Overwrites an existing table with the same name using the query results.

You can use the following query to append records from one table to another:

  INSERT INTO <projectID>.<datasetID>.<table1> (
    <column2>,
    <column3>) (SELECT * FROM <projectID>.<datasetID>.<table2>)

For more information on using query results to append to or overwrite data, see Writing query results.

Browsing table data

You can browse or read table data by:

  • Using the Google Cloud console
  • Using the bq command-line tool's bq head command
  • Calling the tabledata.list API method
  • Using the client libraries

Required permissions

To read table and partition data, you need the bigquery.tables.getData Identity and Access Management (IAM) permission.

Each of the following predefined IAM roles includes the permissions that you need in order to browse table and partition data:

  • roles/bigquery.dataViewer
  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.admin

If you have the bigquery.datasets.create permission, you can browse data in the tables and partitions of the datasets you create.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Browsing table data

To browse table data:

Console

  1. In the Google Cloud console, open the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and select a dataset.

  3. Click a table in the list.

  4. Click Details and note the value in Number of rows. You may need this value to control the starting point for your results using the bq command-line tool or API.

  5. Click Preview. A sample set of data is displayed.

    Table preview

Command-line

Issue the bq head command with the --max_rows flag to list all columns in a particular number of table rows. If --max_rows is not specified, the default is 100.

To browse a subset of columns in the table (including nested and repeated columns), use the --selected_fields flag and enter the columns as a comma- separated list.

To specify the number of rows to skip before displaying table data, use the --start_row=integer flag (or the -s shortcut). The default value is 0. You can retrieve the number of rows in a table by using the bq show command to retrieve table information.

If the table you're browsing is in a project other than your default project, add the project ID to the command in the following format: project_id:dataset.table.

bq head \
--max_rows integer1 \
--start_row integer2 \
--selected_fields "columns" \
project_id:dataset.table

Where:

  • integer1 is the number of rows to display.
  • integer2 is the number of rows to skip before displaying data.
  • columns is a comma-separated list of columns.
  • project_id is your project ID.
  • dataset is the name of the dataset containing the table.
  • table is the name of the table to browse.

Examples:

Enter the following command to list all columns in the first 10 rows in mydataset.mytable. mydataset is in your default project.

bq head --max_rows=10 mydataset.mytable

Enter the following command to list all columns in the first 100 rows in mydataset.mytable. mydataset is in myotherproject, not your default project.

bq head myotherproject:mydataset.mytable

Enter the following command to display only field1 and field2 in mydataset.mytable. The command uses the --start_row flag to skip to row 100. mydataset.mytable is in your default project.

bq head --start_row 100 --selected_fields "field1,field2" mydataset.mytable

Because the bq head command does not create a query job, bq head commands do not appear in your query history, and you are not charged for them.

API

Browse through a table's data by calling tabledata.list. Specify the name of the table in the tableId parameter.

Configure these optional parameters to control the output:

  • maxResults — Maximum number of results to return
  • selectedFields — Comma-separated list of columns to return; If unspecified, all columns are returned
  • startIndex — Zero-based index of the starting row to read

Values are returned wrapped in a JSON object that you must parse, as described in the tabledata.list reference documentation.

C#

Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.


using Google.Api.Gax;
using Google.Apis.Bigquery.v2.Data;
using Google.Cloud.BigQuery.V2;
using System;
using System.Linq;

public class BigQueryBrowseTable
{
    public void BrowseTable(
        string projectId = "your-project-id"
    )
    {
        BigQueryClient client = BigQueryClient.Create(projectId);
        TableReference tableReference = new TableReference()
        {
            TableId = "shakespeare",
            DatasetId = "samples",
            ProjectId = "bigquery-public-data"
        };
        // Load all rows from a table
        PagedEnumerable<TableDataList, BigQueryRow> result = client.ListRows(
            tableReference: tableReference,
            schema: null
        );
        // Print the first 10 rows
        foreach (BigQueryRow row in result.Take(10))
        {
            Console.WriteLine($"{row["corpus"]}: {row["word_count"]}");
        }
    }
}

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

The Cloud Client Libraries for Go automatically paginates by default, so you do not need to implement pagination yourself, for example:

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
	"google.golang.org/api/iterator"
)

// browseTable demonstrates reading data from a BigQuery table directly without the use of a query.
// For large tables, we also recommend the BigQuery Storage API.
func browseTable(w io.Writer, projectID, datasetID, tableID string) error {
	// projectID := "my-project-id"
	// datasetID := "mydataset"
	// tableID := "mytable"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	table := client.Dataset(datasetID).Table(tableID)
	it := table.Read(ctx)
	for {
		var row []bigquery.Value
		err := it.Next(&row)
		if err == iterator.Done {
			break
		}
		if err != nil {
			return err
		}
		fmt.Fprintln(w, row)
	}
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQuery.TableDataListOption;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableResult;

// Sample to directly browse a table with optional paging
public class BrowseTable {

  public static void runBrowseTable() {
    // TODO(developer): Replace these variables before running the sample.
    String table = "MY_TABLE_NAME";
    String dataset = "MY_DATASET_NAME";
    browseTable(dataset, table);
  }

  public static void browseTable(String dataset, String table) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      // Identify the table itself
      TableId tableId = TableId.of(dataset, table);

      // Page over 100 records. If you don't need pagination, remove the pageSize parameter.
      TableResult result = bigquery.listTableData(tableId, TableDataListOption.pageSize(100));

      // Print the records
      result
          .iterateAll()
          .forEach(
              row -> {
                row.forEach(fieldValue -> System.out.print(fieldValue.toString() + ", "));
                System.out.println();
              });

      System.out.println("Query ran successfully");
    } catch (BigQueryException e) {
      System.out.println("Query failed to run \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

The Cloud Client Libraries for Node.js automatically paginates by default, so you do not need to implement pagination yourself, for example:


// Import the Google Cloud client library and create a client
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function browseRows() {
  // Displays rows from "my_table" in "my_dataset".

  /**
   * TODO(developer): Uncomment the following lines before running the sample.
   */
  // const datasetId = "my_dataset";
  // const tableId = "my_table";

  // List rows in the table
  const [rows] = await bigquery
    .dataset(datasetId)
    .table(tableId)
    .getRows();

  console.log('Rows:');
  rows.forEach(row => console.log(row));
}

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

Pagination happens automatically in the Cloud Client Libraries for PHP using the generator function rows, which fetches the next page of results during iteration.

use Google\Cloud\BigQuery\BigQueryClient;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $datasetId = 'The BigQuery dataset ID';
// $tableId   = 'The BigQuery table ID';
// $maxResults = 10;

$maxResults = 10;
$startIndex = 0;

$options = [
    'maxResults' => $maxResults,
    'startIndex' => $startIndex
];
$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);
$dataset = $bigQuery->dataset($datasetId);
$table = $dataset->table($tableId);
$numRows = 0;
foreach ($table->rows($options) as $row) {
    print('---');
    foreach ($row as $column => $value) {
        printf('%s: %s' . PHP_EOL, $column, $value);
    }
    $numRows++;
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.


from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to browse data rows.
# table_id = "your-project.your_dataset.your_table_name"

# Download all rows from a table.
rows_iter = client.list_rows(table_id)  # Make an API request.

# Iterate over rows to make the API requests to fetch row data.
rows = list(rows_iter)
print("Downloaded {} rows from table {}".format(len(rows), table_id))

# Download at most 10 rows.
rows_iter = client.list_rows(table_id, max_results=10)
rows = list(rows_iter)
print("Downloaded {} rows from table {}".format(len(rows), table_id))

# Specify selected fields to limit the results to certain columns.
table = client.get_table(table_id)  # Make an API request.
fields = table.schema[:2]  # First two columns.
rows_iter = client.list_rows(table_id, selected_fields=fields, max_results=10)
rows = list(rows_iter)
print("Selected {} columns from table {}.".format(len(rows_iter.schema), table_id))
print("Downloaded {} rows from table {}".format(len(rows), table_id))

# Print row data in tabular format.
rows = client.list_rows(table, max_results=10)
format_string = "{!s:<16} " * len(rows.schema)
field_names = [field.name for field in rows.schema]
print(format_string.format(*field_names))  # Prints column headers.
for row in rows:
    print(format_string.format(*row))  # Prints row data.

Ruby

Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

Pagination happens automatically in the Cloud Client Libraries for Ruby using Table#data and Data#next.

require "google/cloud/bigquery"

def browse_table
  bigquery = Google::Cloud::Bigquery.new project_id: "bigquery-public-data"
  dataset  = bigquery.dataset "samples"
  table    = dataset.table "shakespeare"

  # Load all rows from a table
  rows = table.data

  # Load the first 10 rows
  rows = table.data max: 10

  # Print row data
  rows.each { |row| puts row }
end

Querying table data

You can query BigQuery data by using one of the following query job types:

  • Interactive query jobs. By default, BigQuery runs interactive (on-demand) query jobs as soon as possible.

  • Batch query jobs. With these jobs, BigQuery queues each batch query on your behalf and then starts the query when idle resources are available, usually within a few minutes.

  • Continuous query jobs (Preview). With these jobs, the query runs continuously, letting you analyze incoming data in BigQuery in real time and then write the results to a BigQuery table, or export the results to Bigtable or Pub/Sub. You can use this capability to perform time sensitive tasks, such as creating and immediately acting on insights, applying real time machine learning (ML) inference, and building event-driven data pipelines.

You can run query jobs by using the following methods:

For more information on querying BigQuery tables, see Introduction to querying BigQuery data.

In addition to querying data stored in BigQuery tables, you can query data stored externally. For more information, see Introduction to external data sources.

Modifying table data

You can modify data in a table using data manipulation language (DML) statements in SQL. DML statements let you update, merge, insert, and delete rows in tables. For syntax reference and examples of each type of DML statement, see Data manipulation language statements in GoogleSQL.

The legacy SQL dialect does not support DML statements. To update or delete data using legacy SQL, you must delete the table and then recreate it with new data. Alternatively, you can write a query that modifies the data and write the query results to a new, destination table.

Copying table data

You can copy a table by:

  • Using the Google Cloud console
  • Using the bq command-line tool's bq cp command
  • Calling the jobs.insert API method and configuring a copy job
  • Using the client libraries

For more information on copying tables, see Copying a table.

Exporting table data

You can export table data to a Cloud Storage bucket in CSV, JSON, Avro, or Parquet (Preview) format. Exporting to your local machine is not supported; however, you can download and save query results using the Google Cloud console.

For more information, see Exporting table data.

Table security

To control access to tables in BigQuery, see Introduction to table access controls.

What's next