To help avoid incurring Google Cloud charges for an inactive cluster, use Dataproc's Cluster Scheduled Deletion feature when you create a cluster. This feature provides options to delete a cluster upon the happening of the following events:
- after a specified cluster idle period
- at a specified future time
- after a specified period that starts from the time of submission of the cluster creation request
Actions that disable scheduled deletion
While a cluster is running, the following actions disable scheduled deletion until the disabling action is reversed:
- Removing IAM Dataproc Service Agent role on the Dataproc Service Agent service account
- Disabling the Dataproc API in the cluster project
- Enabling Compute Engine VM deletion protection on a scheduled deletion cluster VM
- Enabling VPC-Service Controls if the Dataproc Service Agent service account (Control plane identity) isn't within the perimeter boundary
Calculate cluster idle time
You can use scheduled deletion to delete a cluster after a specified cluster idle time. Idle time is calculated after the cluster is created and cluster provisioning is complete. The idle time calculation starts when a cluster has no running jobs.
The dataproc:dataproc.cluster-ttl.consider-yarn-activity
cluster property
affects the calculation of cluster idle time, as follows:
- This property is enabled (set to
true
) by default. - When this property is enabled, both YARN and Dataproc Jobs API
activity must be idle to start and continue incrementing the cluster idle time
calculation.
- YARN activity includes pending and running YARN applications.
- Dataproc Jobs API activity includes pending and running jobs submitted to the Dataproc Jobs API.
- When this property is set to
false
, the cluster idle time calculation starts and continues only when Dataproc Jobs API activity is idle.
The dataproc:dataproc.cluster-ttl.consider-yarn-activity
property applies
to clusters created with image versions released on or after 1.4.64
, 1.5.39
,
2.0.13
, and later image versions. For clusters created with earlier
image versions, only Dataproc Jobs API activity is considered in
calculating cluster idle time.
Use cluster scheduled deletion
You can set scheduled deletion values when you create a cluster using the Google Cloud CLI, Dataproc API, or Google Cloud console. After you create the cluster, you can update the cluster to change or delete scheduled deletion values previously set on the cluster.
gcloud CLI
You can create or update scheduled deletion values on a cluster by passing the
flags and values listed in the following table to the
gcloud dataproc clusters create
or gcloud dataproc clusters update
commands.
gcloud CLI flag | Description | Value granularity | Min value | Max value |
---|---|---|---|---|
--delete-max-idle 1 |
Applies to cluster create and cluster update commands.
The duration from the time when the cluster becomes idle after the
cluster is created or updated and is in a ready-to-use state to the
moment when the cluster starts to delete. Provide the duration in
IntegerUnit format, where the unit can be "s, m, h, d"
(seconds, minutes, hours, days). Example: "30m": 30 minutes from the moment
when the cluster becomes idle. |
1 second | 5 minutes | 14 days |
--no-delete-max-idle |
Applies to cluster update command only.
Cancels cluster deletion by the previous delete-max-idle
flag setting. |
not applicable | not applicable | not applicable |
--delete-expiration-time 2 |
Applies to cluster create and cluster update commands. The time to start deleting the cluster in ISO 8601 datetime format. To generate the datetime in correct format, you can use the Timestamp Generator. For example, "2017-08-22T13:31:48-08:00" specifies an expiration time of 13:21:48 in the UTC -8:00 time zone. | 1 second | 10 minutes from the current time | 14 days from the current time |
--delete-max-age 2 |
Applies to cluster create and cluster update commands.
The duration from the moment of submitting the cluster create request to
the moment when the cluster starts to delete. Provide the duration in
IntegerUnit format, where the unit can be "s, m, h, d"
(seconds, minutes, hours, days). Examples: "30m": 30 minutes from now;
"1d": 1 day from now. |
1 second | 10 minutes | 14 days |
--no-delete-max-age |
Applies to cluster update command only.
Cancels cluster auto-deletion by the previous
delete-max-age or delete-expiration-time flag
setting. |
Not applicable | Not applicable | Not applicable |
- You can pass the
delete-max-idle
flag with either thedelete-expiration-time
ordelete-max-age
flag in your cluster create or update request. The first to become true takes effect to delete the cluster. - You can pass either thec
delete-expiration-time
flag or thedelete-max-age
flag to the cluster create or update command, but not both.
Cluster creation example:
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --delete-max-idle=DURATION \ --delete-expiration-time=TIME \ ... other flags ...
Cluster update example:
gcloud dataproc clusters update CLUSTER_NAME \ --region=REGION \ --delete-max-idle=DURATION \ --no-delete-max-age \ ... other flags
REST API
You can create or update scheduled deletion values on a cluster by setting the Dataproc API ClusterLifecycleConfig fields and values listed in the following table as part of a Dataproc cluster.create or cluster.patch API request.
API field | Description | Value granularity | Min value | Max value |
---|---|---|---|---|
idleDeleteTtl 1 |
Applies to cluster create and cluster update commands.
The duration from the time when the cluster becomes idle after the
cluster is created or updated and is in a ready-to-use state to the
moment when the cluster starts to delete. When
updating a cluster with a new value, the new value must be greater than the previously set
value.
Provide a duration in seconds with up to nine fractional digits, terminated by
's'. Example: "3.5s".
Submit an empty duration to cancel a
previously set idleDeleteTtl value. |
1 second | 5 minutes | 14 days |
autoDeleteTime 2 |
Applies to cluster create and cluster update commands.
The time to start deleting the cluster. When updating a cluster with a new time,
the new time must be later than the previously set time. When updating, if
an empty value is set for autoDeleteTime , it cancels the
existing auto delete.Provide a timestamp in RFC 3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z". |
1 second | 10 minutes from the current time | 14 days from the current time |
autoDeleteTtl 2 |
The duration from the moment of submitting the cluster create or update
request to the moment when the cluster starts to delete. When updating a cluster,
the new scheduled deletion time (time of the update request plus The
new duration) must be later than the previously
set cluster deletion time. Submit an empty value
to cancel a previously set autoDeleteTtl value.
Provide a duration in seconds with up to nine fractional digits, terminated by 's'.
Example: "3.5s". |
1 second | 10 minutes | 14 days |
- You can set or update both
idleDeleteTtl
and eitherautoDeleteTime
orautoDeleteTtl
in your cluster create or update request. The first to become true takes effect to delete the cluster. - You can set or update either
autoDeleteTime
orautoDeleteTtl
in your request, but not both.
Console
- Open the Dataproc Create a cluster page.
- Select the Customize cluster panel.
- In the Scheduled deletion section, select the options to apply to your cluster.
View Scheduled Deletion cluster settings
gcloud CLI
You can use the gcloud dataproc clusters list
command to
confirm that a cluster has scheduled deletion enabled.
gcloud dataproc clusters list \ --region=REGION
... NAME WORKER_COUNT ... SCHEDULED_DELETE CLUSTER_ID NUMBER ... enabled ...
You can use the gcloud dataproc clusters describe
command to
check the cluster LifecycleConfig
scheduled deletion settings.
gcloud dataproc clusters describe CLUSTER_NAME \ --region=REGION
... lifecycleConfig: autoDeleteTime: '2018-11-28T19:33:48.146Z' idleDeleteTtl: 1800s idleStartTime: '2018-11-28T18:33:48.146Z' ...
The autoDeleteTime
and idleDeleteTtl
are the
scheduled deletion configuration values set on the cluster.
Dataproc generates the idleStartTime
value, which is
the latest cluster idle start time. Dataproc deletes the
cluster if the cluster remains idle at idleStartTime
+
idleDeleteTtl
.
REST API
You can make a clusters.list request to confirm that a cluster has scheduled deletion enabled.
Console
- You can view cluster scheduled deletion settings by selecting the cluster name from the Dataproc Clusters page in the Google Cloud console.
- From the clusters details page, select the Configuration tab. Go to the cluster configuration list to view scheduled deletion settings.