Configuring dedicated node pools

About node pools

A node pool is a group of nodes within a cluster that all have the same configuration. Typically, you define separate node pools when you have pods with differing resource requirements. For example, the apigee-cassandra pods require persistent storage, while the other Apigee hybrid pods do not.

This topic discusses how to configure dedicated node pools for a hybrid installation.

Using the default nodeSelectors

The best practice is to set up two dedicated node pools: one for the Cassandra pods and one for all the other runtime pods. Using default nodeSelector configurations, the installer will assign the Cassandra pods to a stateful node pool named apigee-data and all the other pods to a stateless node pool named apigee-runtime. All you have to do is create node pools with these names, and Apigee hybrid handles the pod scheduling details for you:

Default node pool name	Description
`apigee-data`	A stateful node pool.
`apigee-runtime`	A stateless node pool.

Following is the default nodeSelector configuration. The apigeeData property specifies a node pool for the Cassandra pods. The apigeeRuntime specifies the node pool for all the other pods. You can override these default settings in your overrides file, as explained later in this topic:

nodeSelector:
  requiredForScheduling: true
  apigeeRuntime:
    key: "cloud.google.com/gke-nodepool"
    value: "apigee-runtime"
  apigeeData:
    key: "cloud.google.com/gke-nodepool"
    value: "apigee-data"

To ensure your pods are scheduled on the correct nodes, create two node pools with the names apigee-data and apigee-runtime.

The requiredForScheduling property

The nodeSelector config section has a property called requiredForScheduling:

nodeSelector:
  requiredForScheduling: false
  apigeeRuntime:
    key: "cloud.google.com/gke-nodepool"
    value: "apigee-runtime"
  apigeeData:
    key: "cloud.google.com/gke-nodepool"
    value: "apigee-data"

If set to false, underlying pods will be scheduled whether or not node pools are defined with the required names. This means that if you forget to create node pools or if you accidentally name a node pool other than apigee-runtime or apigee-data, the hybrid runtime installation will succeed. Kubernetes will decide where to run your pods.

If you set requiredForScheduling to true (the default), the installation will fail unless there are node pools that match the configured nodeSelector keys and values.

Note: The best practice is to set this value to requiredForScheduling:true for a production environment.

Using custom node pool names

If you don't want to use node pools with the default names, you can create node pools with custom names and specify those names in the nodeSelector stanza. For example, the following configuration assigns the Cassandra pods to the pool named my-cassandra-pool and all other pods to the pool named my-runtime-pool:

nodeSelector:
  requiredForScheduling: false
  apigeeRuntime:
    key: "cloud.google.com/gke-nodepool"
    value: "my-runtime-pool"
  apigeeData:
    key: "cloud.google.com/gke-nodepool"
    value: "my-cassandra-pool"

Overriding the node pool for specific components on GKE

You can also override node pool configurations at the individual component level. For example, the following configuration assigns the node pool with the value apigee-custom to the runtime component:

runtime:
  nodeSelector:
    key: cloud.google.com/gke-nodepool
    value: apigee-custom

You can specify a custom node pool on any of these components:

istio
mart
synchronizer
runtime
cassandra
udca
logger

GKE and Google Distributed Cloud node pool configuration

On GKE and Google Distributed Cloud (GDC) platforms, node pools must have a unique name that you provide when you create the pools, and GKE/GDC automatically labels each node with the following:

cloud.google.com/gke-nodepool=THE_NODE_POOL_NAME

As long as you create node pools named apigee-data and apigee-runtime, no further configuration is required. If you want to use custom node names, see Using custom node pool names.

Node pool configuration on other Kubernetes platforms.

See your Kubernetes platform documentation for information about labeling and managing node pools.

While the node pools automatically label the worker nodes by default, you can optionally label the worker nodes manually with the following steps:

Run the following command to get a list of the worker nodes in your cluster:
```
kubectl -n APIGEE_NAMESPACE get nodes
```
If you are using custom node pool labels, make sure each key-value pair is unique. For example:
```
nodeSelector:
  requiredForScheduling: true
  apigeeRuntime:
    key: "pool1-key"
    value: "pool1-label"
  apigeeData:
    key: "pool2-key"
    value: "pool2-label"
```
Overriding the node pool for specific components

You can also override node pool configurations at the individual component level. For example, the following configuration assigns the node pool with the value apigee-custom to the runtime component:
```
runtime:
  nodeSelector:
    key: apigee.com/apigee-nodepool
    value: apigee-custom
```
You can specify a custom node pool on any of these components:
- apigeeingressgateway
- cassandra
- logger
- mart
- metrics
- runtime
- synchronizer
- udca