Understand slots
A BigQuery slot is a virtual compute unit used by BigQuery to execute SQL queries or other job types. During the execution of a query, BigQuery automatically determines how many slots are used by the query. The number of slots used depends on the amount of data being processed, the complexity of the query, and the number of slots available. In general, access to more slots lets you run more concurrent queries, and your complex queries can run faster.
While all queries use slots, you have two options for how you are charged for usage, the on-demand pricing model or the capacity-based pricing model.
By default, you are charged using the on-demand model. With this model, you are charged for the amount of data processed (measured in TiB) by each query. Projects using the on-demand model are subject to per-project and per-organization slot limits with transient burst capability. Most users on the on-demand model find the slot capacity limits more than sufficient. However, depending on your workload, access to more slots may improve query performance. To check how many slots your account uses, see BigQuery monitoring.
With the capacity-based model, you pay for the slot capacity allocated for your queries over time. This model gives you explicit control over total slot capacity, whereas the on-demand model does not. You explicitly choose the amount of slots to use through a reservation. You can specify the amount of slots in a reservation as a baseline amount which is always allocated, or as an autoscaled amount, which is allocated when needed.
Query execution using slots
When BigQuery executes a query job, it converts the SQL statement into an execution plan, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps. BigQuery uses a heavily distributed parallel architecture to run these queries, and the stages model the units of work that many potential workers may execute in parallel. Data is passed between stages by using a fast distributed shuffle architecture, which is discussed in more detail on the Google Cloud blog.
BigQuery query execution is dynamic, which means that the query plan can be modified while a query is in flight. Stages that are introduced while a query is running are often used to improve data distribution throughout query workers. In addition, query execution might be impacted by the changing amount of available capacity as other queries complete or begin execution, or slots are added to the reservation by the autoscaler.
BigQuery can run multiple stages concurrently, can use speculative execution to accelerate a query, and can dynamically repartition a stage to achieve optimal parallelization.
BigQuery slots execute individual units of work at each stage of the query. For example, if BigQuery determines that a stage's optimal parallelization factor is 10, it requests 10 slots to process that stage.
Slot resource economy
If a query requests more slots than are available, BigQuery queues up individual units of work and waits for slots to become available. As progress on query execution is made, and as slots free up, these queued up units of work get dynamically picked up for execution.
BigQuery can request any number of slots for a particular stage of a query. The number of slots requested is not related to the amount of capacity you purchase, but rather an indication of the most optimal parallelization factor chosen by BigQuery for that stage. Units of work queue up and get executed as slots become available.
When query demands exceed slots you committed to, you are not charged for additional slots, and you are not charged for additional on-demand rates. Your individual units of work queue up.
For example,
- A query stage requests 2,000 slots, but only 1,000 are available.
- BigQuery consumes all 1,000 slots and queues up the other 1,000 slots.
- Thereafter, if 100 slots finish their work, they dynamically pick up 100 units of work from the 1,000 queued up units of work. 900 units of queued up work remain.
- Thereafter, if 500 slots finish their work, they dynamically pick up 500 units of work from the 900 queued up units of work. 400 units of queued up work remain.
Fair scheduling in BigQuery
BigQuery allocates slot capacity within a single reservation using an algorithm called fair scheduling.
The BigQuery scheduler enforces the equal sharing of slots among projects with running queries within a reservation, and then within jobs of a given project. The scheduler provides eventual fairness. During short periods, some jobs might get a disproportionate share of slots, but the scheduler eventually corrects this. The goal of the scheduler is to find a balance between aggressively evicting running tasks (which results in wasting slot time) and being too lenient (which results in jobs with long running tasks getting a disproportionate share of the slot time).
Fair scheduling ensures that every query has access to all available slots at any time, and capacity is dynamically and automatically re-allocated among active queries as each query's capacity demands change. Queries complete and new queries get submitted for execution under the following conditions:
- Whenever a new query is submitted, capacity is automatically re-allocated across executing queries. Individual units of work can be gracefully paused, resumed, and queued up as more capacity becomes available to each query.
- Whenever a query completes, capacity consumed by that query automatically becomes immediately available for all other queries to use.
- Whenever a query's capacity demands change due to changes in query's dynamic DAG, BigQuery automatically re-evaluates capacity availability for this and all other queries, re-allocating and pausing slots as necessary.
Depending on complexity and size, a query might not require all the slots it has the right to, or it may require more. BigQuery dynamically ensures that, given fair scheduling, all slots can be fully used at any point in time.
If an important job consistently needs more slots than it receives from the scheduler, consider creating an additional reservation with the required number of slots and assigning the job to that reservation.
Slot quotas and limits
Slot quotas and limits provide a safeguard for BigQuery. Different pricing models use different slot quota types, as follows:
On-demand pricing model: You are subject to a per-project and organization slot limit with transient burst capability. Depending on your workloads, access to more slots can improve query performance.
Capacity-based pricing model: Reservations quotas and limits define the maximum number of slots you can allocate across all reservations in a location. You are only billed for your reservations and commitments, not for the quotas. For information about increasing your slot quota, see Requesting a quota increase.
To check how many slots you are using, see BigQuery monitoring.
Idle slots
At any given time, some slots might be idle. This can include:
- Slot commitments that are not allocated to any reservation baseline.
- Slots that are allocated to a reservation baseline but aren't in use.
Idle slots are not applicable when using the on-demand pricing model.
By default, queries running in a reservation automatically use idle slots from other reservations within the same administration project. BigQuery immediately allocates slots to an assigned reservation when they are needed. Idle slots that were in use by another reservation are quickly preempted. There might be a short time when you see total slot consumption exceed the maximum you specified across all reservations, but you aren't charged for this additional slot usage.
For example, suppose you have the following reservation setup:
project_a
is assigned toreservation_a
, which has 500 baseline slots with no autoscaling.project_b
is assigned toreservation_b
, which has 100 baseline slots with no autoscaling.- Both reservations are in the same administrative project and there are no other projects assigned to these reservations.
You run query_b
in project_b
. If no query is running in project_a
, then
query_b
has access to the 500 idle slots from reservation_a
. While query_b
is still running, it may use up to 600 slots: 100 baseline slots plus 500 idle
slots.
While query_b
is running, suppose you run query_a
in project_a
that can
use 500 slots.
- Since you have 500 baseline slots reserved for
project_a
,query_a
immediately starts and is allocated 500 slots. - The number of slots allocated to
query_b
quickly decreases to 100 baseline slots. - Additional queries run in
project_b
share those 100 slots. If subsequent queries don't have enough slots to start, then they queue up until running queries complete and slots become available.
In this example, if project_b
was assigned to a reservation with no baseline
slots or autoscaling, then query_b
would have no slots after query_a
starts
running. BigQuery would pause query_b
until idle slots are
available or the query times out. Additional queries in project_b
would queue
up until idle slots are available.
To ensure a reservation only uses its provisioned
slots, set ignore_idle_slots
to true
. Reservations with ignore_idle_slots
set to true
can, however, share their idle slots with other reservations.
You cannot share idle slots between reservations of different editions. You can share only the baseline slots or committed slots. Autoscaled slots might be temporarily available but are not shareable as idle slots for other reservations because they might scale down.
As long as ignore_idle_slots
is false, a reservation can have a slot count of
0
and still have access to unused slots. If you use only the default
reservation, toggle off ignore_idle_slots
as a best practice. You can
then assign a project or
folder
to that reservation and it will only use idle slots.
Assignments of type ML_EXTERNAL
are an exception in that slots used by
BigQuery ML external model creation jobs are not preemptible. The
slots in a reservation with both ML_EXTERNAL
and QUERY
assignment types
are only available for other query jobs when the slots are not occupied by the
ML_EXTERNAL
jobs. Moreover, these jobs cannot use idle slots from other
reservations.
Avoid relying solely on idle slots for production workloads with strict time requirements - these jobs should use baseline or autoscaled slots. We recommend using idle slots for lower priority jobs because as the slots can be preempted at any time.
Excess slot usage
When a job holds onto slots for too long, it can receive an unfair share of slots. To prevent delays, BigQuery allows other jobs to borrow additional slots, resulting in periods of total slot use above your specified slot capacity. Any excess slot usage is attributed only to the jobs that receive more than their fair share.
The excess slots are not billed directly to you. Instead, jobs continue to run and accrue slot usage at their fair share until all of their excess usage is covered by your allocated capacity. Excess slots are excluded from reported slot usage with the exception of certain detailed execution statistics.
Note that some preemptive borrowing of slots can occur to reduce future delays and to provide other benefits such as reduced slot cost variability and reduced tail latency. Slot borrowing is limited to a small fraction of your total slot capacity.