Cassandra Java heap space issues

You're viewing Apigee and Apigee hybrid documentation.
There is no equivalent Apigee Edge documentation for this topic.

Symptom

Cassandra heap issues may cause slowness in the Apigee hybrid proxy execution or even Datastore errors. Sometimes logs are an early indicator, even before the onset of symptoms.

Error Message

In Cassandra pod logs (Cloud Logging), log entries might be observed that are similar to the following:

WARN  [Service Thread] 2023-01-01 01:14:51,121 GCInspector.java:283 - G1 Young Generation GC in 2510ms...
...
WARN  [Service Thread] 2023-01-01 01:14:51,121 GCInspector.java:283 - G1 Old Generation GC in 3100ms...
2023-01-01 01:14:51,123 FailureDetector.java:278 - Not marking nodes down due to local pause of 45261214670 > 5000000000
java.lang.OutOfMemoryError: Java heap space

Cause: Insufficient Java heap size

Diagnosis

In the overrides.yaml file for the Apigee hybrid installation, maxHeapSize is not set to a sufficient value. This could be due to production settings not applied, or current throughputs requiring a setting higher than usual.

Resolution

To resolve the issue, adjust maxHeapSize and memory accordingly, and apply the changes:

  resources:
    requests:
      cpu: 7
      memory: 15Gi
  maxHeapSize: 8192M
  heapNewSize: 1200M

If the issue persists with the default production configuration, try to increase the values further. Note that you need to ensure that node capacity, disk throughput and network bandwidth are sufficient as well.

  resources:
    requests:
      cpu: 7
      memory: 30Gi
  maxHeapSize: 16384M
  heapNewSize: 2400M

If the 16 GB heap setting is still not sufficient for the traffic volume, continue with horizontal scaling.

Must gather diagnostic information

If the problem persists even after following the above instructions, gather the following diagnostic information and then contact Google Cloud Customer Care:

Apart from the usual data you might be asked to provide, collect the diagnostic data from all the Cassandra pods with the command below:

for p in $(kubectl -n apigee get pods -l app=apigee-cassandra --no-headers -o custom-columns=":metadata.name") ; do \
for com in info describecluster failuredetector version status ring info gossipinfo compactionstats tpstats netstats cfstats proxyhistograms gcstats ; do kubectl \
-n apigee exec ${p} -- bash -c 'nodetool -u $APIGEE_JMX_USER -pw $APIGEE_JMX_PASSWORD '"$com"' 2>&1 '\
| tee /tmp/k_cassandra_nodetool_${com}_${p}_$(date +%Y.%m.%d_%H.%M.%S).txt | head -n 40 ; echo '...' ; done; done

Compress it, and provide it in the Support case:

tar -cvzf /tmp/cassandra_data_$(date +%Y.%m.%d_%H.%M.%S).tar.gz /tmp/k_cassandra_nodetool*