This guide describes how to troubleshoot configuration issues for external Application Load Balancers. Before investigating issues, familiarize yourself with the following pages:
- External Application Load Balancer overview
- Global and Classic Application Load Balancer logging and monitoring
- Regional external Application Load Balancer logging and monitoring
Troubleshoot common issues with Network Analyzer
Network Analyzer automatically monitors your VPC network configuration and detects both suboptimal configurations and misconfigurations. It identifies network failures, provides root cause information, and suggests possible resolutions. To learn about the different misconfiguration scenarios that are automatically detected by Network Analyzer, see Load balancer insights in the Network Analyzer documentation.
Network Analyzer is available in the Google Cloud console as a part of Network Intelligence Center.
Go to Network AnalyzerBackends have incompatible balancing modes
When creating a load balancer, you might see the error:
Validation failed for instance group INSTANCE_GROUP: backend services 1 and 2 point to the same instance group but the backends have incompatible balancing_mode. Values should be the same.
This happens when you try to use the same backend in two different load balancers, and the backends don't have compatible balancing modes.
For more information, see the following:
Troubleshoot general connectivity issues
Unexplained 5XX errors
For error conditions caused by a communications issue between the load balancer
proxy and its backends, the load balancer generates an HTTP error response code
(5XX) and returns that error response code to the client. Not all HTTP 5XX
errors are generated by the load balancer—for example, if a backend sends
an HTTP 5XX response to the load balancer, the load balancer relays that
response to its client. To determine if an HTTP 5XX response was relayed from a
backend or if it was generated by the load balancer proxy, refer to the
statusDetails
field of the load balancer
logs.
If statusDetails
returns a log string response_sent_by_backend
, the load
balancer is merely relaying whatever response code the backend sent to it, in
which case, you need to troubleshoot HTTP error responses on your backends.
For HTTP error responses with statusDetails
not matching the log string
response_sent_by_backend
:
The global external Application Load Balancer and the regional external Application Load Balancer generate meaningful HTTP response error codes like 503 (Service Unavailable) and 504 (Gateway Timeout).
The classic Application Load Balancer always uses the HTTP response error code 502.
Configuration changes to the global external Application Load Balancer, such as addition or
removal of a backend service, can result in a brief period of time where users
see the HTTP response error code 502. While these configuration changes
propagate to
GFEs globally,
you'll see log entries where the statusDetails
field matches the
failed_to_pick_backend
log string.
If HTTP 5XX errors persist longer than a few minutes after you complete the load balancer configuration, take the following steps to troubleshoot HTTP 5XX responses:
Verify that there is a firewall rule configured to allow health checks. In the absence of one, load balancer logs typically have a
statusDetails
matchingfailed_to_pick_backend
, which indicates that the load balancer failed to pick a healthy backend to handle the request.Verify that health check traffic reaches your backend VMs. To do this, enable health check logging and search for successful log entries.
For new load balancers, the lack of successful health check log entries does not mean that health check traffic is not reaching your backends. It might mean that the backend's initial health state has not yet changed from
UNHEALTHY
to a different state. You see successful health check log entries only after the health check prober receives an HTTP 200 OK response from the backend.Verify that the software on the backends is running. To do this, check whether the 5xx response is being served by the load balancer or if it is generated from the backends. Perform the following steps:
- Use Cloud Logging to view logs for the load balancer. You can create a query to look for 5xx response codes only.
Check the value of the
statusDetails
field:- If
statusDetails
returns a success message, such asresponse_sent_by_backend
, then it is the backend that is serving HTTP 502 responses. Check logs on the backend and troubleshoot further depending on the service running on the backend. - If
statusDetails
returns a failure message, refer to the following list of solutions for some common failures related to 5xx responses:
statusDetail failure message Potential causes and solutions failed_to_connect_to_backend
The load balancer failed to establish a connection with the backend. This could mean that the service running on the backend is not listening on the port defined in the backend service.
Recommendations:
- Set the health check's port to use the serving port. This means that the backend will be found unhealthy before it is eligible to serve real traffic.
- Use the following command to make sure that there
is a service running on the backend service's named port:
$ netstat -tnl | grep PORT
failed_to_pick_backend
The load balancer could not pick a backend. This could mean that all backends are unhealthy. Make sure that you configured the correct firewall rules for the health checks.
backend_connection_closed_before_data_sent_to_client
The backend unexpectedly closed its connection to the load balancer before the response was proxied to the client. This can happen if the load balancer is sending traffic to another entity. The other entity might be a third-party load balancer that has a TCP timeout that is shorter than the load balancer's timeout. For more details, see Timeouts and retries. backend_timeout
The backend took too long to respond. The backend service timeout might be set too low for the given service to respond. Consider increasing the backend service timeout or look into why your service is taking so long to respond. - If
Verify that the keepalive configuration parameter for the HTTP server software running on the backend instance is not less than the keepalive timeout of the load balancer, whose value is fixed at 10 minutes (600 seconds) and is not configurable.
The load balancer generates an HTTP 5XX response code when the connection to the backend has unexpectedly closed while sending the HTTP request or before the complete HTTP response has been received. This can happen because the keepalive configuration parameter for the web server software running on the backend instance is less than the fixed keepalive timeout of the load balancer. Ensure that the keepalive timeout configuration for HTTP server software on each backend is set to slightly greater than 10 minutes (the recommended value is 620 seconds).
Resolving HTTP 408
errors
With HTTP traffic, the maximum amount of time for the client to complete
sending its request is equal to the backend service timeout. If you see
HTTP 408
responses with the jsonPayload.statusDetail
client_timed_out
,
this means that there was insufficient progress while the request from the
client was proxied or the response from the backend was proxied. If the
problem is because of clients that are experiencing performance issues,
you can resolve this issue by increasing the backend service timeout.
Load balanced traffic does not have the source address of the original client
The source IP address for packets, as seen by the backends, is not the external IP address of the load balancer. Proxy-based load balancers such as the external Application Load Balancers use two TCP connections to transmit traffic from the client to the backends:
- Connection 1, from original client to the load balancer (GFE or proxy-only subnet)
- Connection 2, from the load balancer (GFE or proxy-only subnet) to the backend VM or endpoint
The source and destination IP addresses for each connection differ based on the type of external Application Load Balancer you're using. For details, see Source IP addresses for client packets .
Getting a permission error when trying to view an object in my Cloud Storage bucket
In order to serve objects through load balancing, the Cloud Storage objects must be publicly accessible. Make sure to update the permissions of the objects being served so they are publicly readable.
URL doesn't serve expected Cloud Storage object
The Cloud Storage object to serve is determined based on your URL map and the URL that you request. If the request path maps to a backend bucket in your URL map, the Cloud Storage object is determined by appending the full request path onto the Cloud Storage bucket that the URL map specifies.
For example, if you map /static/*
to gs://[EXAMPLE_BUCKET]
, the request to
https://<GCLB IP or Host>/static/path/to/content.jpg
will try to serve
gs://[EXAMPLE_BUCKET]/static/path/to/content.jpg
. If that object doesn't
exist, you will get the following error message instead of the object:
NoSuchKey
The specified key does not exist.
Compression isn't working
An external Application Load Balancer does not compress or decompress responses itself, but it can serve responses generated by your backend service that are compressed by using tools such as gzip or DEFLATE.
If responses served by the load balancer are not compressed but should be,
check to be sure that the web server software running on your instances is
configured to compress responses. By default, some web server software
automatically disables compression for requests that include a Via
header,
which indicates that the request was forwarded by a proxy. Because it is a
proxy, the external Application Load Balancer adds a Via
header to each request as
required by the HTTP
specification.
To enable compression, you may have to override your web server's default
configuration to tell it to compress responses even if the request had a
Via
header.
To configure nginx backends to serve compressed responses proxied through an external Application Load Balancer:
- Set the
gzip_proxied
directive appropriately (for example, toany
), and - Set the
gzip_vary
directive toon
.
To configure Apache backends to serve compressed responses proxied through an external Application Load Balancer:
- Use the
DEFLATE
filter, and - Add
Vary Accept-Encoding
to the response header using themod_headers
module.
Troubleshoot unhealthy backends
Troubleshoot issues with HTTP/2 to the backends
Make sure that your backend instance is healthy and supports HTTP/2 protocol. You can verify this by testing connectivity to the backend instance using HTTP/2. Ensure that the VM uses HTTP/2 spec-compliant cipher suites. For example, certain TLS 1.2 cipher suites are disallowed by HTTP/2. Refer to the TLS 1.2 Cipher Suite Black List.
After you verify that the VM uses the HTTP/2 protocol, make sure your firewall setup allows the health checker and load balancer to pass through.
If there are no problems with the firewall setup, ensure that the load balancer is configured to talk to the correct port on the VM.
Troubleshoot external backend and internet NEG issues
Before investigating issues, familiarize yourself with the following pages:
- Internet NEGs overview
- Set up a global external Application Load Balancer with an external backend (internet NEG)
- Set up a regional external Application Load Balancer with an external backend (internet NEG)
Traffic does not reach the endpoints
After you configure a service, the new endpoint becomes reachable through the external Application Load Balancer when:
- The endpoint is attached to the internet NEG.
- The associated FQDN can be DNS resolved successfully (if you are using FQDN endpoint type).
- The endpoint is accessible over the internet.
If traffic cannot reach the endpoint, which results in a 502 error code, query
the _cloud-eoips.googleusercontent.com
DNS TXT record using a tool like dig or
nslookup. Note the CIDRs (following ip4:
) and ensure these ranges are allowed
by your firewall or cloud access control list (ACL).
After configuring an external backend, requests to external backend failed with a 5xx error
- Check Logging.
- Verify that the network endpoint group is configured with the correct IP:Port or FQDN:Port for your external backend.
- If you are using FQDN, make sure that it is resolvable through Google Public DNS. You can verify that the FQDN is resolvable through Google Public DNS using these steps or the web interface directly.
- If you are accessing the load balancer on its external IP only, and your origin web-server is expecting a hostname, ensure that you are sending a valid HTTP Host header to your backend by configuring a custom request header.
- If you're communicating with a backend over HTTPS or HTTP2 (as set in the
protocol
field of the backend service) configured as anINTERNET_FQDN_PORT
external backend endpoint, ensure that your origin is presenting a valid TLS (SSL) certificate and the configured FQDN matches a SAN (Subject Alternative Name) in the certificates' list of SANs. A valid certificate is defined as one signed by a public Certificate Authority and that has not expired. - When using
INTERNET_FQDN_PORT
external backend endpoints, self-signed certificates are not accepted by the load balancer, and are rejected. - When using HTTPS or HTTP/2 with
INTERNET_IP_PORT
type endpoints, no SSL certificate validation/SAN check is performed. This means one can use self-signed certificates. When using SSL, our recommendation is to useINTERNET_FQDN_PORT
endpoints to make sure server certificates and SANs can be validated.
Responses from my external backend are not cached by Cloud CDN
Ensure that:
- You have enabled Cloud CDN on the backend service containing the NEG that points to your external backend by setting enableCDN to true.
- Responses served by your external backend meet Cloud CDN
caching requirements. For example, you are
sending
Cache-Control: public, max-age=3600
response headers from the origin.
Troubleshoot serverless NEG issues
Before investigating issues, familiarize yourself with the following pages:
Requests fail with a 404 error
Ensure that the underlying serverless resource (such as an App Engine, Cloud Run functions, or Cloud Run service) is still running. If the serverless resource is deleted but the serverless NEG still exists, the external Application Load Balancer will continue to attempt to route requests to the non-existence service. This results in a 404 response.
In general, an external Application Load Balancer cannot detect if the underlying serverless resource is working as expected. This means that if your service in one region is returning errors but the overall Cloud Run, Cloud Run functions, or App Engine infrastructure in that region is operating normally, your external Application Load Balancer will not automatically direct traffic away to other regions. Make sure you thoroughly test new versions of your services before routing user traffic to them.
Handling URL mask mismatches
If applying the configured URL mask to a user request URL doesn't result in a service name, or if it results in a service name that does not exist, the load balancer might handle these mismatches differently depending on the serverless compute platform in use.
Cloud Run: In case of a URL mask mismatch, the load balancer returns an HTTP error 404 (Not Found).
Cloud Run functions: In case of a URL mask mismatch, the load balancer returns an HTTP error 404 (Not Found).
App Engine: In case of a URL mask mismatch,
App Engine uses dispatch.yaml
and App Engine's default
routing logic to determine which service to send the request to.