Configure container health checks for worker pools

You can configure HTTP, TCP, and gRPC startup probes, along with HTTP and gRPC liveness probes for new and existing Cloud Run worker pools. The configuration varies depending on the type of probe.

Use cases

You can configure two types of health check probes:

  • Liveness probes determine whether to restart a container.

    • Restarting a container in this case can increase worker pool availability in the event of bugs.
    • Liveness probes are intended to restart individual instances that can't be recovered in any other way. They should be used primarily for unrecoverable instance failures, for example, to catch a deadlock where a worker pool is running, but unable to make progress. You can require a liveness probe for every container by using custom organization policies.
  • Startup probes determine whether the container has started.

    • When you configure a startup probe, liveness checks are disabled until the startup probe determines that the container is started, to prevent interference with the worker pool startup.
    • Startup probes are especially useful if you use liveness checks on slow starting containers, because it prevents them from being shut down prematurely before the containers are up and running.

Note that when a worker pool experiences repeated startup or liveness probe failures, Cloud Run prevents uncontrolled crash loops by limiting instance restarts to prevent.

CPU allocation

  • CPU is always allocated when probes run.
  • All probes are billed for CPU and memory usage consumption.

Probe requirements and behavior

Probe Type Requirements Behavior
TCP startup None If specified, Cloud Run makes a TCP connection to open the TCP Socket on the specified port. If Cloud Run is unable to establish a connection, it indicates a failure.

If a startup probe does not succeed within the specified time, Cloud Run shuts down the container. The time is a maximum of 240 seconds, calculated as failureThreshold * periodSeconds, which you set when configuring the startup probe for the worker pool.
HTTP startup Create an HTTP health check endpoint
Use HTTP/1
After probe configuration, Cloud Run makes an HTTP GET request to the worker pool health check endpoint (for example, /ready). Any response between 200 and 400 is a success, everything else indicates failure.

If a startup probe does not succeed within the specified time (failureThreshold * periodSeconds), which cannot exceed 240 seconds, Cloud Run shuts down the container.

If the HTTP startup probe succeeds within the specified time, and you have configured an HTTP liveness probe, Cloud Run starts the HTTP liveness probe.
HTTP liveness Create an HTTP health check endpoint
Use HTTP/1
The liveness probe starts only after the startup probe is successful. After probe configuration, and any startup probe is successful, Cloud Run makes an HTTP GET request to the health check endpoint (for example, /health). Any response between 200 and 400 is a success, everything else indicates failure.

If a liveness probe does not succeed within the specified time (failureThreshold * periodSeconds), Cloud Run shuts down the container using a SIGKILL signal. Any remaining requests that were still being served by the container are terminated with the HTTP status code 503. After Cloud Run shuts down the container, Cloud Run autoscaling starts up a new container instance.
gRPC startup Implement the gRPC Health Checking protocol in your Cloud Run worker pool If a startup probe does not succeed within the specified time (failureThreshold * periodSeconds), which cannot exceed 240 seconds, Cloud Run shuts down the container.
gRPC liveness Implement the gRPC Health Checking protocol in your Cloud Run worker pool If you configure a gRPC startup probe, the liveness probe starts only after the startup probe is successful.

After the liveness probe is configured, and any startup probe is successful, Cloud Run makes a health check request to the worker pool.

If a liveness probe does not succeed within the specified time (failureThreshold * periodSeconds), Cloud Run shuts down the container using a SIGKILL signal. After Cloud Run shuts down the container, Cloud Run autoscaling starts up a new container instance.

Configure probes

Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.

You can configure HTTP, TCP, and gRPC probes using the Cloud Run REST API:

REST API

Important: If you are configuring your Cloud Run worker pool for HTTP probes, you must also add an HTTP health check endpoint in your worker pool code to respond to the probe. If you are configuring a gRPC probe, you must also implement the gRPC Health Checking protocol in your Cloud Run worker pool.

HTTP startup

Use the REST API to configure this.

HTTP liveness

Use the REST API to configure this.

gRPC startup

Use the REST API to configure this.

gRPC liveness

Use the REST API to configure this.

Create HTTP health check endpoints

If you configure your Cloud Run worker pool for an HTTP startup probe or liveness probe, you need to add an endpoint in your worker pool code to respond to the probe. The endpoint can have whatever name you want, for example, /startup or /ready, but the name must match the value you specify for path in the probe configuration. For example, if you specify /ready for an HTTP startup probe, specify path in your probe configuration as shown:

startupProbe:
  httpGet:
    path: /ready

HTTP health check endpoints are externally accessible and follow the same principles as any other HTTP endpoints that are exposed externally.