This document describes how to enable retrying for CloudEvents functions, also known as as event-driven functions. Automatic retrying is not available for HTTP functions.
Why event-driven functions fail to complete
On rare occasions, a function might exit prematurely due to an internal error, and by default the function might or might not be automatically retried.
More typically, an event-driven function might fail to successfully complete due to errors thrown in the function code itself. The reasons this might happen include:
- The function contains a bug and the runtime throws an exception.
- The function cannot reach a service endpoint, or times out while trying to do so.
- The function intentionally throws an exception (for example, when a parameter fails validation).
- A Node.js function returns a rejected promise, or passes a non-
null
value to a callback.
In any of the above cases, the function will stop executing and return an error. Event triggers producing the messages have retry policies that you can customize to meet the needs of your function.
Semantics of retry
Cloud Run functions provides at-least-once execution of an event-driven function for each event emitted by an event source. The way you configure retries depends on how you created your function:
- Functions created in the Google Cloud console or with the Cloud Run Admin API requires you to separately create and manage the event triggers. Triggers have default retry behaviors that you can customize to suit the needs of your function.
- Functions created with the Cloud Functions v2 API will implicitly create the necessary event triggers, for example Pub/Sub topics or Eventarc triggers. By default, the retries are disabled for these triggers and can be re-enabled using the Cloud Functions v2 API.
Event driven functions created with Cloud Run
Functions created in the Google Cloud console or with the Cloud Run Admin API requires you to separately create and manage the event triggers. We strongly recommend that you review the default behavior of each trigger type:
- Eventarc retry policy has a default message retention of 24 hours with an exponential backoff delay. Refer to the Eventarc documentation on retry events.
- Pub/Sub defaults to using the immediate redelivery policy for all subscriptions. Refer to the Pub/Sub documentation on handling message failures and retry requests.
Event driven functions created with Cloud Functions v2 API
Functions created using the Cloud Functions v2 API; for example, using the Cloud Functions gcloud CLI, the REST API, or Terraform, will create and manage event triggers on your behalf. By default, if a function invocation terminates with an error, the function is not invoked again and the event is dropped. When you enable retries on an event-driven function, Cloud Run functions retries a failed function invocation until it completes successfully or the retry window expires.
When retries are not enabled for a function, which is the default, the function
always reports that it executed successfully, and 200 OK
response codes might
appear in its logs. This occurs even if the function encountered an error. To
make it clear when your function encounters an error, be sure to
report errors
appropriately.
Enable or disable retries
To enable or disable retries, you can either use the gcloud
command-line tool
or the Google Cloud console. By default, retries are disabled.
Configure retries from the gcloud
command-line tool
To enable retries using the gcloud
command-line tool, include the --retry
flag
when deploying your function:
gcloud functions deploy FUNCTION_NAME --retry FLAGS...
To disable retries, re-deploy the function without the --retry
flag:
gcloud functions deploy FUNCTION_NAME FLAGS...
Configure retries from the console
If you're creating a new function:
- From the Create Function screen, under Trigger and choose the type of event to act as a trigger for your function.
- Select the Retry on failure checkbox to enable retries.
If you're updating an existing function:
- From the Cloud Run functions Overview page, click the name of the function you're updating to open its Function details screen, then choose Edit from the menu bar to display Trigger pane.
- Select or clear the Retry on failure checkbox to enable or disable retries.
Retry window
This retry window expires after 24 hours. Cloud Run functions retries newly created event-driven functions using an exponential backoff strategy, with an increasing backoff of between 10 and 600 seconds.Best practices
This section describes best practices for using retries.
Use retry to handle transient errors
Because your function is retried continuously until successful execution, permanent errors like bugs should be eliminated from your code through testing before enabling retries. Retries are best used to handle intermittent or transient failures that have a high likelihood of resolution upon retrying, such as a flaky service endpoint or timeout.
Set an end condition to avoid infinite retry loops
It is best practice to protect your function against continuous looping when using retries. You can do this by including a well-defined end condition, before the function begins processing. Note that this technique only works if your function starts successfully and is able to evaluate the end condition.
A simple yet effective approach is to discard events with timestamps older than a certain time. This helps to avoid excessive executions when failures are either persistent or longer-lived than expected.
For example, this code snippet discards all events older than 10 seconds:
Node.js
Python
Go
Java
C#
Ruby
PHP
Distinguish between functions that can be retried and fatal errors
If your function has retries enabled, any unhandled error will trigger a retry. Make sure that your code captures any errors that shouldn't result in a retry.
Node.js
Python
Go
Java
C#
Ruby
PHP
Make retryable event-driven functions idempotent
Event-driven functions that can be retried must be idempotent. Here are some general guidelines for making such a function idempotent:
- Many external APIs (such as Stripe) let you supply an idempotency key as a parameter. If you are using such an API, you should use the event ID as the idempotency key.
- Idempotency works well with at-least-once delivery, because it makes it safe to retry. So a general best practice for writing reliable code is to combine idempotency with retries.
- Make sure that your code is internally idempotent. For example:
- Make sure that mutations can happen more than once without changing the outcome.
- Query database state in a transaction before mutating the state.
- Make sure that all side effects are themselves idempotent.
- Impose a transactional check outside the function, independent of the code. For example, persist state somewhere recording that a given event ID has already been processed.
- Deal with duplicate function calls out-of-band. For example, have a separate clean up process that cleans up after duplicate function calls.
Configure the retry policy
Depending on the needs of your Cloud Run function, you may want to configure the retry policy directly. This would allow you to set up any combination of the following:
- Shorten the retry window from 7 days to as little as 10 minutes.
- Change the minimum and maximum backoff time for the exponential backoff retry strategy.
- Change the retry strategy to retry immediately.
- Configure a dead-letter topic.
- Set a maximum and minimum number of delivery attempts.
To configure the retry policy:
- Write an HTTP function.
- Use the Pub/Sub API to create a Pub/Sub subscription, specifying the URL of the function as the target.
See Pub/Sub documentation on handling failures for a more information on configuring Pub/Sub directly.
Next steps
- Deploy Cloud Run functions
- Call Pub/Sub Trigger Functions
- Call Cloud Storage Trigger Functions
- Cloud Run functions with Pub/Sub Tutorial
- Cloud Run functions with Cloud Storage Tutorial