Understand slots

A BigQuery slot is a virtual compute unit used by BigQuery to execute SQL queries or other job types. During the execution of a query, BigQuery automatically determines how many slots are used by the query. The number of slots used depends on the amount of data being processed, the complexity of the query, and the number of slots available. In general, access to more slots lets you run more concurrent queries, and your complex queries can run faster.

While all queries use slots, you have two options for how you are charged for usage, the on-demand pricing model or the capacity-based pricing model.

By default, you are charged using the on-demand model. With this model, you are charged for the amount of data processed (measured in TiB) by each query. Projects using the on-demand model are subject to per-project and per-organization slot limits with transient burst capability. Most users on the on-demand model find the slot capacity limits more than sufficient. However, depending on your workload, access to more slots may improve query performance. To check how many slots your account uses, see BigQuery monitoring.

With the capacity-based model, you pay for the slot capacity allocated for your queries over time. This model gives you explicit control over total slot capacity, whereas the on-demand model does not. You explicitly choose the amount of slots to use through a reservation. You can specify the amount of slots in a reservation as a baseline amount which is always allocated, or as an autoscaled amount, which is allocated when needed.

Query execution using slots

When BigQuery executes a query job, it converts the SQL statement into an execution plan, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps. BigQuery uses a heavily distributed parallel architecture to run these queries, and the stages model the units of work that many potential workers may execute in parallel. Data is passed between stages by using a fast distributed shuffle architecture, which is discussed in more detail on the Google Cloud blog.

BigQuery query execution is dynamic, which means that the query plan can be modified while a query is in flight. Stages that are introduced while a query is running are often used to improve data distribution throughout query workers. In addition, query execution might be impacted by the changing amount of available capacity as other queries complete or begin execution, or slots are added to the reservation by the autoscaler.

BigQuery can run multiple stages concurrently, can use speculative execution to accelerate a query, and can dynamically repartition a stage to achieve optimal parallelization.

BigQuery slots execute individual units of work at each stage of the query. For example, if BigQuery determines that a stage's optimal parallelization factor is 10, it requests 10 slots to process that stage.

Query slots.

Query execution graph of stages and steps

Slot resource economy

If a query requests more slots than are available, BigQuery queues up individual units of work and waits for slots to become available. As progress on query execution is made, and as slots free up, these queued up units of work get dynamically picked up for execution.

BigQuery can request any number of slots for a particular stage of a query. The number of slots requested is not related to the amount of capacity you purchase, but rather an indication of the most optimal parallelization factor chosen by BigQuery for that stage. Units of work queue up and get executed as slots become available.

When query demands exceed slots you committed to, you are not charged for additional slots, and you are not charged for additional on-demand rates. Your individual units of work queue up.

For example,

A query stage requests 2,000 slots, but only 1,000 are available.
BigQuery consumes all 1,000 slots and queues up the other 1,000 slots.
Thereafter, if 100 slots finish their work, they dynamically pick up 100 units of work from the 1,000 queued up units of work. 900 units of queued up work remain.
Thereafter, if 500 slots finish their work, they dynamically pick up 500 units of work from the 900 queued up units of work. 400 units of queued up work remain.

Slot scheduling.

BigQuery slots queued up if demand exceeds availability

Fair scheduling in BigQuery

BigQuery allocates slot capacity within a single reservation using an algorithm called fair scheduling.

The BigQuery scheduler enforces the equal sharing of slots among projects with running queries within a reservation, and then within jobs of a given project. The scheduler provides eventual fairness. During short periods, some jobs might get a disproportionate share of slots, but the scheduler eventually corrects this. The goal of the scheduler is to find a balance between aggressively evicting running tasks (which results in wasting slot time) and being too lenient (which results in jobs with long running tasks getting a disproportionate share of the slot time).

Fair scheduling ensures that every query has access to all available slots at any time, and capacity is dynamically and automatically re-allocated among active queries as each query's capacity demands change. Queries complete and new queries get submitted for execution under the following conditions:

Whenever a new query is submitted, capacity is automatically re-allocated across executing queries. Individual units of work can be gracefully paused, resumed, and queued up as more capacity becomes available to each query.
Whenever a query completes, capacity consumed by that query automatically becomes immediately available for all other queries to use.
Whenever a query's capacity demands change due to changes in query's dynamic DAG, BigQuery automatically re-evaluates capacity availability for this and all other queries, re-allocating and pausing slots as necessary.

Multiple query scheduling.

Fair scheduling in BigQuery

Depending on complexity and size, a query might not require all the slots it has the right to, or it may require more. BigQuery dynamically ensures that, given fair scheduling, all slots can be fully used at any point in time.

If an important job consistently needs more slots than it receives from the scheduler, consider creating an additional reservation with the required number slots and assigning the job to that reservation.

As an example of fair scheduling, suppose you have the following reservation configuration:

Reservation A, which has 1,000 baseline slots with no autoscaling
Project A and project B, which are assigned to your reservation

Scenario 1: In project A, you run query A (one concurrent query) that requires high slot usage, and in project B you run 20 concurrent queries. Even though there are a total of 21 queries that are using reservation A, the slot distribution is the following:

Project A receives 500 slots, and query A runs with 500 slots.
Project B receives 500 slots that are shared among its 20 queries.

Scenario 2: In project A, you run query A (one concurrent query) that requires 100 slots to run, and in project B you run 20 concurrent queries. Since query A doesn't require 50% of the reservation, then the slot distribution is the following:

Project A receives 100 slots, and query A runs with 100 slots.
Project B receives 900 slots that are shared among its 20 queries.

Inversely, consider the following reservation configuration:

Reservation B, which has 1,000 baseline slots with no autoscaling.
10 projects, which are all assigned to reservation B.

Assume the 10 projects are running queries that have sufficient slot demand, then each project receives 1/10 of the total reservation slots (or 100 slots), regardless of how many queries are running on each project.

Slot quotas and limits

Slot quotas and limits provide a safeguard for BigQuery. Different pricing models use different slot quota types, as follows:

On-demand pricing model: You are subject to a per-project and organization slot limit with transient burst capability. Depending on your workloads, access to more slots can improve query performance.
Capacity-based pricing model: Reservations quotas and limits define the maximum number of slots you can allocate across all reservations in a location. You are only billed for your reservations and commitments, not for the quotas. For information about increasing your slot quota, see Requesting a quota increase.

To check how many slots you are using, see BigQuery monitoring.

Idle slots

At any given time, some slots might be idle. This can include:

Slot commitments that are not allocated to any reservation baseline.
Slots that are allocated to a reservation baseline but aren't in use.

Idle slots are not applicable when using the on-demand pricing model.

By default, queries running in a reservation automatically use idle slots from other reservations within the same administration project. BigQuery immediately allocates slots to an assigned reservation when they are needed. Idle slots that were in use by another reservation are quickly preempted. There might be a short time when you see total slot consumption exceed the maximum you specified across all reservations, but you aren't charged for this additional slot usage.

For example, suppose you have the following reservation setup:

project_a is assigned to reservation_a, which has 500 baseline slots with no autoscaling.
project_b is assigned to reservation_b, which has 100 baseline slots with no autoscaling.
Both reservations are in the same administrative project and there are no other projects assigned to these reservations.

You run query_b in project_b. If no query is running in project_a, then query_b has access to the 500 idle slots from reservation_a. While query_b is still running, it may use up to 600 slots: 100 baseline slots plus 500 idle slots.

While query_b is running, suppose you run query_a in project_a that can use 500 slots.

Since you have 500 baseline slots reserved for project_a, query_a immediately starts and is allocated 500 slots.
The number of slots allocated to query_b quickly decreases to 100 baseline slots.
Additional queries run in project_b share those 100 slots. If subsequent queries don't have enough slots to start, then they queue up until running queries complete and slots become available.

In this example, if project_b was assigned to a reservation with no baseline slots or autoscaling, then query_b would have no slots after query_a starts running. BigQuery would pause query_b until idle slots are available or the query times out. Additional queries in project_b would queue up until idle slots are available.

To ensure a reservation only uses its provisioned slots, set ignore_idle_slots to true. Reservations with ignore_idle_slots set to true can, however, share their idle slots with other reservations.

You cannot share idle slots between reservations of different editions. You can share only the baseline slots or committed slots. Autoscaled slots might be temporarily available but are not shareable as idle slots for other reservations because they might scale down.

As long as ignore_idle_slots is false, a reservation can have a slot count of 0 and still have access to unused slots. If you use only the default reservation, toggle off ignore_idle_slots as a best practice. You can then assign a project or folder to that reservation and it will only use idle slots.

Assignments of type ML_EXTERNAL are an exception in that slots used by BigQuery ML external model creation jobs are not preemptible. The slots in a reservation with both ML_EXTERNAL and QUERY assignment types are only available for other query jobs when the slots are not occupied by the ML_EXTERNAL jobs. Moreover, these jobs cannot use idle slots from other reservations.

Reservation-based fairness

With reservation-based fairness, BigQuery prioritizes and allocates idle slots equally across all reservations within the same admin project, regardless of the number of projects running jobs in each reservation. Each reservation receives a similar share of available capacity in the idle slot pool, and then its slots are distributed fairly within its projects. This feature is only supported with the Enterprise or Enterprise Plus editions.

The following chart shows how idle slots are distributed without reservation-based fairness enabled:

Idle slots are shared across projects.

In this chart, idle slots are shared equally across projects.

Without reservation-based fairness enabled, the available idle slots are distributed evenly across the projects within the reservations.

The following chart shows how idle slots are distributed with reservation-based fairness enabled:

Idle slots are shared across reservations.

In this chart, idle slots are shared equally across reservations, not projects.

With reservation-based fairness enabled, the available idle slots are equally distributed across the reservations.

When you enable reservation-based fairness, review your resource consumption to manage your slot availability and query performance.

Avoid relying solely on idle slots for production workloads with strict time requirements - these jobs should use baseline or autoscaled slots. We recommend using idle slots for lower priority jobs because as the slots can be preempted at any time.

Excess slot usage

When a job holds onto slots for too long, it can receive an unfair share of slots. To prevent delays, BigQuery allows other jobs to borrow additional slots, resulting in periods of total slot use above your specified slot capacity. Any excess slot usage is attributed only to the jobs that receive more than their fair share.

The excess slots are not billed directly to you. Instead, jobs continue to run and accrue slot usage at their fair share until all of their excess usage is covered by your allocated capacity. Excess slots are excluded from reported slot usage with the exception of certain detailed execution statistics.

Note that some preemptive borrowing of slots can occur to reduce future delays and to provide other benefits such as reduced slot cost variability and reduced tail latency. Slot borrowing is limited to a small fraction of your total slot capacity.