Troubleshoot controller fights

This page shows you how to resolve issues with controller fights. Such fights consume a high amount of resources and can degrade your performance. Controller fights are also known as resource contention.

Config Sync watches the objects it applies on the cluster and reverts changes made to the values declared in the source of truth. If these changes are made by another controller, the resource might switch back and forth between the states wanted by the competing controllers. One symptom of this behavior is that the metadata.generation and metadata.resourceVersion fields increase rapidly. Because of this, if a managed object is updated more than five times per minute, Config Sync detects the fight, logs the drift, and reports the error in the RootSync or RepoSync object status.

Config Sync has special logic to detect fights between multiple RootSync and RepoSync objects. For RepoSync objects, if the reconciler sees that the object is already managed by another reconciler, further updates are skipped. For RootSync objects, the reconciler attempts to adopt any object it's configured to manage, unless it's managed by another RootSync object. This prevents Config Sync reconcilers from fighting between themselves and reports errors in the status of all RootSync and RepoSync objects involved.

Identify controller fights

You can review the fight errors by using the nomos status command or by checking the status field in the RootSync or RepoSync object.

If you don't have the nomos command-line tool installed, you can review the logs for the RootSync reconciler by running the following command:

kubectl logs -n config-management-system \
    --selector "app=reconciler," \
    --container reconciler

To filter for specific RepoSync reconcilers, run the following command:

kubectl logs -n config-management-system \
    --selector "app=reconciler," \
    --container reconciler

Replace NAMESPACE with the namespace that you created your namespace-scoped source of truth in.

If you see KNV2005 in the results, then there is a controller fight.

The following error message is an example of the type of error you might see in your logs:

KNV2005: detected excessive object updates, approximately 6 times per
minute. This may indicate Config Sync is fighting with another controller over
the object.

Investigate controller fights

To find more information about any controller fight, watch updates to the resource's YAML file by running the following command:

     --namespace NAMESPACE \
     --watch -o yaml

Replace the following:

  • RESOURCE: the kind of resource that's being fought over.
  • OBJECT_NAME: the name of the object that's being fought over.
  • NAMESPACE: the namespace that the resource being fought over is in.

The log results specify the resource, object name, and namespace you need to add.

This command returns a stream of the state of the resource after updates are applied to the API server. Use a file comparison tool to compare the output.

Resolve controller fights

There are multiple ways to resolve controller fights. Choose the option that works best for your Config Sync setup:

  • Update the resource manifest in the source to match the value that the other controller wants.
  • Remove the field in question from the source to let the other controller to manage it.
  • Disable or uninstall the other controller.
  • Remove the resource from the source and manage it manually or with a custom controller that tolerates specific changes or co-management.
  • If you own the controller that is causing resource contention, and the field being changed is not in the source of truth, update your controller to perform patching instead of updating. That way the change will be allowed by Config Sync and not reverted.

There are also some resources that should belong to other controllers (for example, some operators install or maintain CRDs). These other controllers automatically remove any metadata specific to Config Sync. If another component in your Kubernetes cluster removes Config Sync metadata, stop managing the resource with Config Sync. For information about how to do this, see Stop managing a managed object.

Alternatively, if you don't want Config Sync to revert changes to managed objects in the cluster, you can add the ignore annotation to the object that you want Config Sync to ignore mutations in. For information about how to do this, see Ignore object mutations.

What's next

  • If you're still experiencing issues, check to see if your problem is a known issue.