Prevent config drift

Config Sync reduces the risk of "shadow ops" through automatic self-healing, periodic re-sync, and optional drift prevention. When Config Sync detects drift between the cluster and the source of truth, it can either be allowed and quickly reverted or completely rejected.

Self-healing watches managed resources, detects drift from the source of truth, and reverts that drift. Self-healing is always enabled.

Periodic re-sync automatically syncs an hour after the last successful sync, even if no change has been made to the source of truth. Periodic re-sync is always enabled.

While self-healing and periodic re-syncs help remediate drift, drift prevention intercepts requests to change managed objects and validates whether the change should be allowed. If the change doesn't match the source of truth, the change is rejected. Drift prevention is disabled by default. When enabled, drift prevention protects RootSync objects by default, and can also be configured to protect RepoSync objects.

To use drift prevention, you must enable the RootSync and RepoSync APIs.

Enable drift prevention

  1. Set the preventDrift field in the config file to true and apply the config file:

    gcloud

    Enable drift prevention using gcloud CLI if you installed Config Sync using the Google Cloud console or gcloud CLI. Make sure to update your gcloud CLI to the latest version. Set the spec.configSync.preventDrift field of the gcloud config file to true, and then apply the gcloud config file.

    kubectl (1.19.2 or earlier)

    Enable drift prevention using kubectl if you installed Config Sync manually using kubectl. Set the spec.preventDrift field of the ConfigManagement object to true, and then apply the ConfigManagement object.

  2. Wait until the Config Sync ValidateWebhookConfiguration object is created by the ConfigManagement Operator:

    kubectl get validatingwebhookconfiguration admission-webhook.configsync.gke.io
    

    You should see output similar to the following example:

    NAME                                  WEBHOOKS   AGE
    admission-webhook.configsync.gke.io   0          2m15s
    
  3. Commit a new change to the source of truth to be synced so that the root-reconciler Deployment can add webhooks into the Config Sync ValidatingWebhookConfiguration object. An alternative is to delete the root-reconcilier Deployment to trigger a reconciliation. The new root-reconciler Deployment would update the Config Sync ValidatingWebhookConfiguration object.

  4. Wait until the webhook server is ready. The Config Sync admission webhook Deployment log should include serving webhook server. This can take several minutes.

    kubectl logs -n config-management-system -l app=admission-webhook --tail=-1 | grep "serving webhook server"
    

    You should see output similar to the following example:

    I1201 18:05:41.805531       1 deleg.go:130] controller-runtime/webhook "level"=0 "msg"="serving webhook server"  "host"="" "port"=10250
    I1201 18:07:04.626199       1 deleg.go:130] controller-runtime/webhook "level"=0 "msg"="serving webhook server"  "host"="" "port"=10250
    

Disable drift prevention

gcloud

Disable drift prevention using gcloud CLI if you installed Config Sync using the Google Cloud console or gcloud CLI. Make sure to update your gcloud CLI to the latest version. Set the spec.configSync.preventDrift field of the gcloud config file to false or remove the field, and then apply the gcloud config file.

kubectl (1.19.2 or earlier)

Disable drift prevention using kubectl if you installed Config Sync manually using kubectl. Set the spec.preventDrift field of the ConfigManagement object to false or remove the field, and then apply the ConfigManagement object.

This deletes all the Config Sync admission webhook resources. Since the Config Sync ValidatingWebhookConfiguration object no longer exists, the Config Sync reconcilers no longer generate the webhook configs for managed resources.

Enable the admission webhook in namespace-scoped sources

Namespace-scoped sources of truth are not fully protected by the webhook. The Config Sync reconciler for each namespace source does not have permission to read or update the ValidatingWebhookConfiguration objects at the cluster level.

This lack of permission results in an error for the namespace reconcilers logs similar to the following example:

Failed to update admission webhook: KNV2013: applying changes to
admission webhook: Insufficient permission. To fix, make sure the reconciler has
sufficient permissions.:
validatingwebhookconfigurations.admissionregistration.k8s.io "admission-
webhook.configsync.gke.io" is forbidden: User "system:serviceaccount:config-
management-system:ns-reconciler-NAMESPACE" cannot update resource
"validatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at
the cluster scope

You can ignore this error if you don't want to use the webhook protection for your namespace-scoped source of truth. However, if you want to use the webhook, grant permission to the reconciler for each namespace-scoped source of truth after you have configured syncing from more than one source of truth. You might not need to perform these steps if a RoleBinding for the ns-reconciler-NAMESPACE already exists with ClusterRole cluster-admin permissions.

  1. In the root source of truth, declare a new ClusterRole configuration that grants permission to the Config Sync admission webhook. This ClusterRole only needs to be defined once per cluster:

    # ROOT_SOURCE/cluster-roles/webhook-role.yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: admission-webhook-role
    rules:
    - apiGroups: ["admissionregistration.k8s.io"]
      resources: ["validatingwebhookconfigurations"]
      resourceNames: ["admission-webhook.configsync.gke.io"]
      verbs: ["get", "update"]
    
  2. For each namespace-scoped source where the admission webhook permission needs to be granted, declare a ClusterRoleBinding configuration to grant access to the admission webhook:

    # ROOT_SOURCE/NAMESPACE/sync-webhook-rolebinding.yaml
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: syncs-webhook
    subjects:
    - kind: ServiceAccount
      name: ns-reconciler-NAMESPACE
      namespace: config-management-system
    roleRef:
      kind: ClusterRole
      name: admission-webhook-role
      apiGroup: rbac.authorization.k8s.io
    

    Replace NAMESPACE with the namespace that you created your namespace-scoped source in.

  3. Commit the changes to the root source of truth, for example, if syncing from a Git repository:

    git add .
    git commit -m 'Providing namespace repository the permission to update the admission webhook.'
    git push
    
    
  4. To verify, use kubectl get to make sure the ClusterRole and ClusterRoleBinding have been created:

    kubectl get clusterrole admission-webhook-role
    kubectl get clusterrolebindings syncs-webhook
    

Disable drift prevention for abandoned resources

When you delete a RootSync or RepoSync object, by default Config Sync doesn't modify the resources previously managed by that RootSync or RepoSync object. This can leave behind several labels and annotations that Config Sync uses to track these resource objects. If drift protection is enabled, this can cause any changes to the previously managed resources to be rejected.

If you didn't use deletion propagation, the resource objects left behind might still retain labels and annotations added by Config Sync.

If you want to keep these managed resources, unmanage these resources before deleting the RootSync or RepoSync objects by setting the configmanagement.gke.io/managed annotation to disabled on every managed resource declared in the source of truth. This tells Config Sync to remove its labels and annotations from the managed resources, without deleting these resources. After the sync is complete, you can remove the RootSync or RepoSync object.

If you want to delete these managed resources, you have two options:

  • Delete the managed resources from the source of truth. Then, Config Sync will delete the managed objects from the cluster. After the sync is complete, you can remove the RootSync or RepoSync object.
  • Enable deletion propagation on the RootSync or RepoSync object before deleting it. Then, Config Sync will delete the managed objects from the cluster.

If the RootSync or RepoSync object is deleted before unmanaging or deleting its managed resources, you can recreate the RootSync or RepoSync object, and it adopts the resources on the cluster that match the source of truth. Then you can unmanage or delete the resources, wait for the changes to sync, and delete the RootSync or RepoSync object again.

What's next