You're viewing Apigee and Apigee hybrid documentation.
      There is no equivalent
      
      Apigee Edge documentation for this topic.
This document describes how to reset Apigee hybrid components when they are stuck in a
  creating or releasing state.
Run the following command to list the Apigee hybrid installation main components:
kubectl get crd | grep apigee
apigeeorganization (apigeeorganizations.apigee.cloud.google.com) apigeeenvironment (apigeeenvironments.apigee.cloud.google.com) apigeedatastore (apigeedatastores.apigee.cloud.google.com) apigeetelemetries (apigeetelemetries.apigee.cloud.google.com) apigeeredis (apigeeredis.apigee.cloud.google.com)
Run the following command to display the current state:
kubectl get apigeedatastore -n NAMESPACE
  When fully functional, each of these components will be in a running state.
  For example:
NAME STATE AGE default running 5d6h
  If the installation is not successful, components may be stuck in a creating (or
  releasing)  state. For example:
NAME STATE AGE default creating 5d6h
Identify the problem
To identify the cause for the issue, begin by describing each component. The components are structured as follows:
Each ApigeeOrganization custom resource is represented by the following hierarchy:
ApigeeOrganization/HASHED_VALUE ├─ApigeeDeployment/apigee-connect-agent-HASHED_VALUE│ ├─HorizontalPodAutoscaler/apigee-connect-agent-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-connect-agent-HASHED_VALUE│ ├─ReplicaSet/apigee-connect-agent-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-connect-agent-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-mart-HASHED_VALUE│ ├─HorizontalPodAutoscaler/apigee-mart-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-mart-HASHED_VALUE│ ├─ReplicaSet/apigee-mart-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-mart-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-watcher-HASHED_VALUE│ ├─HorizontalPodAutoscaler/apigee-watcher-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-watcher-HASHED_VALUE│ ├─ReplicaSet/apigee-watcher-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-watcher-HASHED_VALUE-VER-xxxx
Each ApigeeEnvironment custom resource is  represented by the following hierarchy:
ApigeeEnvironment/HASHED_VALUE ├─ApigeeDeployment/apigee-runtime-HASHED_VALUE│ ├─HorizontalPodAutoscaler/apigee-runtime-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-runtime-HASHED_VALUE│ ├─ReplicaSet/apigee-runtime-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-runtime-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-synchronizer-HASHED_VALUE│ ├─HorizontalPodAutoscaler/apigee-synchronizer-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-synchronizer-HASHED_VALUE│ ├─ReplicaSet/apigee-synchronizer-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-synchronizer-HASHED_VALUE-VER-xxxx ├─ApigeeDeployment/apigee-udca-HASHED_VALUE│ ├─HorizontalPodAutoscaler/apigee-udca-HASHED_VALUE-VER-xxxx │ ├─PodDisruptionBudget/apigee-udca-HASHED_VALUE│ ├─ReplicaSet/apigee-udca-HASHED_VALUE-VER-xxxx │ │ └─Pod/apigee-udca-HASHED_VALUE-VER-xxxx
Begin problem identification by describing the root component. For example:
kubectl describe apigeeorganization -n NAMESPACE COMPONENT_NAME
Check to see if the State of the component is  running:
      Replicas:
        Available:  1
        Ready:      1
        Total:      1
        Updated:    1
      State:        running
  State:            running
Events:             <none>
  If there are no events logged at this level, repeat the process with
  apigeedeployments followed by ReplicaSet. For example:
kubectl get apigeedeployment -n NAMESPACE AD_NAME>
  If apigeedeployments and ReplicaSet do not show any errors, focus
  on the pods that are not ready:
kubectl get pods -n NAMESPACE
NAME READY STATUS apigee-cassandra-default-0 1/1 Running apigee-connect-agent-apigee-b56a362-150rc2-42gax-dbrrn 1/1 Running apigee-logger-apigee-telemetry-s48kb 1/1 Running apigee-mart-apigee-b56a362-150rc2-bcizm-7jv6w0/2 Running apigee-runtime-apigee-test-0d59273-150rc2-a5mov-dfb290/1 Running
  In this example, mart and runtime are not ready. Inspect the pod logs
  to determine errors:
kubectl logs -n NAMESPACE POD_NAME
Deleting components
If you've made a mistake with any of these components, delete the component and recreate the environment using Helm:
kubectl delete -n apigee apigeeenv HASHED_ENV_NAME
Follow this up with creating the environment (after making the necessary corrections):
helm upgrade ENV_NAME apigee-env/ \ --install \ --namespace APIGEE_NAMESPACE \ --set env=ENV_NAME \ --atomic \ -f OVERRIDES_FILE \ --dry-run=server
Make sure to include all of the settings shown, including --atomic
  so that the action rolls back on failure.
Install the chart:
helm upgrade ENV_NAME apigee-env/ \ --install \ --namespace APIGEE_NAMESPACE \ --set env=ENV_NAME \ --atomic \ -f OVERRIDES_FILE
Inspect the controller
  If there are no obvious error messages in the pod, but the component has not transitioned to the
  running state, inspect the apigee-controller for error messages.
kubectl logs -n NAMESPACE $(k get pods -n NAMESPACE | sed -n '2p' | awk '{print $1}') | grep -i error
  This allows the user to see why the controller was unable to process the request
  (of create/delete/update, etc.).
Apigee datastore
Apache Cassandra is implemented as a StatefulSet. Each Cassandra instance contains:
ApigeeDatastore/default├─Certificate/apigee-cassandra-default │ └─CertificateRequest/apigee-cassandra-default-wnd7s ├─Secret/config-cassandra-default ├─Service/apigee-cassandra-default │ ├─EndpointSlice/apigee-cassandra-default-7m9kx │ └─EndpointSlice/apigee-cassandra-default-gzqpr└─StatefulSet/apigee-cassandra-default├─ControllerRevision/apigee-cassandra-default-6976b77bd ├─ControllerRevision/apigee-cassandra-default-7fc76588cb└─Pod/apigee-cassandra-default-0
This example shows one pod; however, typical production installs contain three or more pods.
  If the state for Cassandra is creating or releasing, the state MUST be
  reset. Certain problems (like Cassandra password changes) and problems not related to networking
  may require that you delete components. It is quite possible that in such cases, you cannot delete
  the instance (i.e., kubectl delete apigeedatastore -n NAMESPACE default). Using
  --force or --grace-period=0 also does not help.
  The objective of reset is to change the state of the component
  (apigeedatastore) from creating or releasing back to
  running. Changing the state in this way typically will not solve the
  underlying problem. In most cases, the component should be deleted after a reset.
- Attempt a delete (this won't be successful): - kubectl delete -n NAMESPACE apigeedatastore default - It is common for this command to not complete. Use Ctrl+C and terminate the call. 
- Reset the state: - On Window 1: - kubectl proxy - On Window 2: - curl -X PATCH -H "Accept: application/json" -H "Content-Type: application/json-patch+json" --data '[{"op": "replace", "path": "/status/nestedState", "value": ""},{"op": "replace", "path": "/status/state", "value": "running"}]' 'http://127.0.0.1:8001/apis/apigee.cloud.google.com/v1alpha1/namespaces/apigee/apigeedatastores/default/status'- Remove the finalizer (Window 2): - kubectl edit -n NAMESPACE apigeedatastore default - Look for the following two lines and delete them: - finalizers: - apigeedatastore.apigee.cloud.google.com 
Common error scenarios
Proxy configuration not available with runtime
This error can manifest in one of two ways:
- The runtimeis not in thereadystate.
- The runtimehas not received the latest version of the API.
- Start with the - synchronizerpods.- Inspect the logs for the - synchronizer. Common errors are as follows:- Lack of network connectivity (to *.googleapi.com)
- Incorrect IAM access (service account not available or not provided by the Synchronizer Manager permission)
- The setSyncAuthorization API was not invoked
 
- Lack of network connectivity (to 
- Inspect the - runtimepods.- Inspecting the logs from the - runtimepods will show why the- runtimedid not load the configuration. The control plane attempts to prevent most configuration mistakes from even going to the data plane. In cases where a validation is either impossible or not correctly implemented, the- runtimewill fail to load it.
"No runtime pods" in the control plane
- Start with the - synchronizerpods.- Inspect the logs for the - synchronizer. Common errors are as follows:- Lack of network connectivity (to *.googleapi.com)
- Incorrect IAM access (service account not available or not provided by the Synchronizer Manager permission)
- The setSyncAuthorization API was not invoked. Perhaps the configuration never made it to the data plane.
 
- Lack of network connectivity (to 
- Inspect the - runtimepods.- Inspecting the logs from the - runtimepods will show why the- runtimedid not load the configuration.
- Inspect the - watcherpods.- It is the - watchercomponent that configures the ingress (routing) and reports proxy and ingress deployment status to the control plane. Inspect these logs to find out why the- watcheris not reporting the status. Common reasons include a mismatch between the names in the- overrides.yamlfile and the control plane for environment name and/or environment group name.
Debug session is not appearing in the control plane
- Start with the - synchronizerpods.- Inspect the logs for the - synchronizer. Common errors are as follows:- Lack of network connectivity (to *.googleapi.com)
- Incorrect IAM access (service account not available or not provided by the Synchronizer Manager permission)
- The setSyncAuthorization API was not invoked.
 
- Lack of network connectivity (to 
- Inspect the runtimepods.
 Inspecting the logs from theruntimepods will show why theruntimeis not sending debug logs to UDCA.
- Inspect the UDCA pods. 
 Inspecting the logs from the UDCA will show why UDCA is not sending debug session information to control plane.
Cassandra returning large cache responses
The following warning message indicates that Cassandra is receiving read or write requests with a larger payload and can be safely ignored as this warning threshold is set to a lower value to indicate the response payload sizes.
Batch for [cache_ahg_gap_prod_hybrid.cache_map_keys_descriptor, cache_ahg_gap_prod_hybrid.cache_map_entry] is of size 79.465KiB, exceeding specified threshold of 50.000KiB by 29.465KiB