Canary-based migration to Cloud Service Mesh certificate authority
Migrating to Cloud Service Mesh certificate authority (Cloud Service Mesh certificate authority) from Istio CA (also known as Citadel) requires migrating the root of trust. Prior to Cloud Service Mesh 1.10, if you wanted to migrate from Istio on Google Kubernetes Engine (GKE) to Cloud Service Mesh with Cloud Service Mesh certificate authority, you needed to schedule downtime because Cloud Service Mesh was not able to load multiple root certificates. Therefore, during the migration, the newly deployed workloads trust the new root certificate, while others trust the old root certificate. Workloads using certificates signed by different root certificates can't authenticate with each other. This means that mutual TLS (mTLS) traffic is interrupted during the migration. The entire cluster only fully recovers when the control plane and all workloads in all namespaces are redeployed with Cloud Service Mesh certificate authority's certificate. If your mesh has multiple clusters with workloads that send requests to workloads on another cluster, all workloads on those clusters need to be updated as well.
Use the steps in this guide for the following use cases:
- Migrate from Istio on GKE to the Cloud Service Mesh 1.19.10-asm.9 in-cluster control plane with Cloud Service Mesh certificate authority.
- Upgrade from Cloud Service Mesh 1.15 or a 1.16 patch release with Istio CA to the Cloud Service Mesh 1.19.10-asm.9 in-cluster control plane with Cloud Service Mesh certificate authority.
Limitations
- All GKE clusters must be in the same Google Cloud project.
Prerequisites
Follow the steps in Install dependent tools and validate cluster to:- Install required tools
- Download
asmcli
- Grant cluster admin permissions
- Validate your project and cluster
Required tools
During the migration, you run a Google-provided tool, migrate_ca
, to
validate the following for each Pod in the cluster:
- The root certificate for Istio CA and Cloud Service Mesh certificate authority.
- The workload mTLS certificate issued by Istio CA and by Cloud Service Mesh certificate authority.
- The trust domains configured by Istio CA and Cloud Service Mesh certificate authority.
This tool has the following dependencies:
awk
grep
istioctl
Running theasmcli install
downloads the version ofistioctl
that matches the version of Cloud Service Mesh that you are installing.jq
kubectl
openssl
Overview of the migration
To migrate to Cloud Service Mesh certificate authority, you follow the revision-based migration process (also referred to as a "canary upgrade"). With a revision-based migration, a new control plane revision is deployed alongside the existing control plane. You then gradually move your workloads to the new revision, which lets you monitor the effect of the migration through the process. During the migration process, authentication and authorization are fully functional between workloads using the Cloud Service Mesh certificate authority and workloads using the Istio CA.
The following is an outline of the migration to Cloud Service Mesh certificate authority:
Distribute the Cloud Service Mesh certificate authority root of trust.
Install a new control plane revision that uses the Istio CA with an option that will distribute the Cloud Service Mesh certificate authority root of trust.
Migrate workloads to the new control plane, namespace by namespace, and test your application. When all workloads are successfully migrated to the new control plane, remove the old control plane.
Migrate to Cloud Service Mesh certificate authority. Now that all sidecar proxies are configured with the old root of trust and the Cloud Service Mesh certificate authority root of trust, you can migrate to Cloud Service Mesh certificate authority without downtime. Again, you follow the revision-based migration process:
Install a control plane revision with Cloud Service Mesh certificate authority enabled.
Migrate workloads to the new control plane revision, namespace by namespace, and test your application. When all workloads are successfully migrated to the new control plane, remove the old control plane.
Remove CA secrets in the cluster that are associated with the old CA and restart the new control plane.
Distribute the Cloud Service Mesh certificate authority root of trust
Before you can migrate to Cloud Service Mesh certificate authority, all GKE clusters in the mesh must have Cloud Service Mesh 1.10 or later, and all clusters must be configured with a control plane option that triggers the root of trust for Cloud Service Mesh certificate authority to be distributed to the proxies of all workloads on the cluster. When the process is finished, each proxy is configured with both the old and the new root of trusts. With this scheme, when you migrate to Cloud Service Mesh certificate authority, workloads using Cloud Service Mesh certificate authority will be able to authenticate with workloads using the old CA.
Install a new control plane revision
Install a control plane revision with an option that distributes the Cloud Service Mesh certificate authority root of trust.
Follow the steps in Install dependent tools and validate cluster to get ready to use a Google-provided tool,
asmcli
, to install the new control plane revision.Make sure you have the version of
asmcli
that installs Cloud Service Mesh 1.11 or higher:./asmcli --version
Run
asmcli install
. In the following command, replace the placeholders with your values../asmcli install \ --fleet_id FLEET_PROJECT_ID \ --kubeconfig KUBECONFIG_FILE \ --enable_all \ --ca citadel \ --ca_cert CA_CERT_FILE_PATH \ --ca_key CA_KEY_FILE_PATH \ --root_cert ROOT_CERT_FILE_PATH \ --cert_chain CERT_CHAIN_FILE_PATH \ --option ca-migration-citadel \ --revision_name REVISION_1 \ --output_dir DIR_PATH
--fleet_id
The project ID of the fleet host project.--kubeconfig
The path to thekubeconfig
file You can specify either a relative path or a full path. The environment variable$PWD
doesn't work here.--output_dir
Include this option to specify a directory whereasmcli
downloads theanthos-service-mesh
package and extracts the installation file, which contains,istioctl
, samples, and manifests. Otherwiseasmcli
downloads the files to atmp
directory. You can specify either a relative path or a full path. The environment variable$PWD
doesn't work here.-
--enable_all
Allows the tool to:- Grant required IAM permissions.
- Enable the required Google APIs.
- Set a label on the cluster that identifies the mesh.
- Register the cluster to the fleet if it isn't already registered.
-ca citadel
To avoid downtime, specify Istio CA (the `citadel` option corresponds to Istio CA). Don't switch to Cloud Service Mesh certificate authority at this point.--ca_cert
The intermediate certificate.--ca_key
The key for the intermediate certificate--root_cert
The root certificate.--cert_chain
The certificate chain.--option ca-migration-citadel
When you redeploy your workloads, this option triggers the new root of trust to be distributed to the sidecar proxies of the workloads.REVISION_1
: Recommended. A [revision label](/service-mesh/docs/revisions-overview) is a key-value pair that is set on the control plane. The revision label key is alwaysistio.io/rev
. By default, the tool sets the value for the revision label based on the Cloud Service Mesh version, for example:asm-11910-9
. We recommend that you include this option and replaceREVISION_1
with a name that describes the installation, such asasm-11910-9-distribute-root
. The name must be a DNS-1035 label, and it must consist of lower case alphanumeric characters or-
, start with an alphabetic character, and end with an alphanumeric character (such asmy-name
orabc-123
).
Migrate workloads to the new control plane
To finish distributing the new root of trust, you need to label your
namespaces with the revision label
istio.io/rev=<var>REVISION_1</var>-distribute-root
and restart your
workloads. When testing your workloads after restarting them, you run a tool
to validate that the sidecar proxy is configured with both the old and new root
of trust for Cloud Service Mesh certificate authority.
Set the current context for
kubectl
. In the following command, change--region
to--zone
if you have a single-zone cluster.gcloud container clusters get-credentials CLUSTER_NAME \ --project=PROJECT_ID \ --region=CLUSTER_LOCATION
Download the validation tool:
curl https://raw.githubusercontent.com/GoogleCloudPlatform/anthos-service-mesh-packages/master/scripts/ca-migration/migrate_ca > migrate_ca
Set the executable bit on the tool:
chmod +x migrate_ca
The
migrate_ca
tool callsistioctl
, which is version dependent. Theasmcli
tool adds a symlink toistioctl
in the directory you specified for--output_dir
. Make sure that directory is at the beginning of your path. In the following command, replaceISTIOCTL_PATH
with the directory that containsistioctl
that the tool downloaded.export PATH=ISTIOCTL_PATH:$PATH which istioctl echo $PATH
Get the revision label that is on
istiod
and theistio-ingressgateway
.kubectl get pod -n istio-system -L istio.io/rev
The output is similar to the following:
NAME READY STATUS RESTARTS AGE REV istio-ingressgateway-5fd454f8ff-t7w9x 1/1 Running 0 36m default istio-ingressgateway-asm-195-2-distribute-root-c6ccfbdbd-z2s9p 1/1 Running 0 18m asm-195-2-distribute-root istio-ingressgateway-asm-195-2-distribute-root-c6ccfbdbd-zr2cs 1/1 Running 0 18m asm-195-2-distribute-root istiod-68bc495f77-shl2h 1/1 Running 0 36m default istiod-asm-195-2-distribute-root-6f764dbb7c-g9f8c 1/1 Running 0 18m asm-195-2-distribute-root istiod-asm-195-2-distribute-root-6f764dbb7c-z7z9s 1/1 Running 0 18m asm-195-2-distribute-root
In the output, under the
REV
column, note the value of the revision label for the new revision, which matches the revision label that you specified when you ranasmcli install
. In this example, the value isasm-11910-9-distribute-root
.You need to delete the old revision of
istiod
when you finish moving workloads to the new revision. Note the value in the revision label for the oldistiod
revision. The example output shows a migration from Istio, which is using thedefault
revision.
Add the revision label to a namespace and remove the
istio-injection
label (if it exists). In the following command, replaceNAMESPACE
with the namespace to label.kubectl label namespace NAMESPACE istio.io/rev=REVISION_1 istio-injection- --overwrite
If you see
"istio-injection not found"
in the output, you can ignore it. That means that the namespace didn't previously have theistio-injection
label. Because auto-injection behavior is undefined when a namespace has both theistio-injection
and the revision label, allkubectl label
commands in the Cloud Service Mesh documentation explicitly ensure that only one is set.Restart the Pods to trigger re-injection.
kubectl rollout restart deployment -n NAMESPACE
Test your application to verify that the workloads are working correctly.
If you have workloads in other namespaces, repeat the steps to label the namespace and restart Pods.
Validate that the sidecar proxies for all workloads on the cluster are configured with both the old and new root certificates:
./migrate_ca check-root-cert
Expected output:
Namespace: foo httpbin-66cdbdb6c5-pmzps.foo trusts [CITADEL MESHCA] sleep-64d7d56698-6tmjm.foo trusts [CITADEL MESHCA]
If you need to migrate gateways, follow the steps in Canary Upgrades (advanced) to install new gateway deployments. Keep the following points in mind:
- Use
REVISION_1
as the revision label. - Deploy the gateway resources in the same namespace as the gateway from the older installation to perform zero-downtime migration. Ensure that the service resources pointing to the older gateway should include the newer deployments now as well.
- Don't delete the older gateway deployments until you are sure your application is working properly (after the next step).
- Use
If you are satisfied that your application is working as expected, continue with the steps to transition to the new version of
istiod
. If there's an issue with your application, follow the steps to rollback.Complete the transition
If you are satisfied that your application is working as expected, remove the old control plane to complete the transition to the new version.
Change to the directory where the files from the
anthos-service-mesh
GitHub repository are located.Configure the validating webhook to use the new control plane.
kubectl apply -f asm/istio/istiod-service.yaml
Delete the old
istio-ingressgateway
Deployment. The command that you run depends on whether you are migrating from Istio or upgrading from a previous version of Cloud Service Mesh:Migrate
If you migrated from Istio, the old
istio-ingressgateway
doesn't have a revision label.kubectl delete deploy/istio-ingressgateway -n istio-system
Upgrade
If you upgraded from a previous Cloud Service Mesh version, in the following command, replace
OLD_REVISION
with the revision label for the previous version of theistio-ingressgateway
.kubectl delete deploy -l app=istio-ingressgateway,istio.io/rev=OLD_REVISION -n istio-system --ignore-not-found=true
Delete the old revision of
istiod
. The command that you use depends on whether you are migrating from Istio or upgrading from a previous version of Cloud Service Mesh.Migrate
If you migrated from Istio, the old
istio-ingressgateway
doesn't have a revision label.kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod -n istio-system --ignore-not-found=true
Upgrade
If you upgraded from a previous Cloud Service Mesh version, in the following command, make sure that
OLD_REVISION
matches the revision label for the previous version ofistiod
.kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-OLD_REVISION -n istio-system --ignore-not-found=true
Remove the earlier version of the
IstioOperator
configuration.kubectl delete IstioOperator installed-state-OLD_REVISION -n istio-system
The expected output is similar to the following:
istiooperator.install.istio.io "installed-state-OLD_REVISION" deleted
Rollback
If you encountered an issue when testing your application with the new version of
istiod
, follow these steps to rollback to the previous version:Delete the new gateway deployments installed as part of step 11.
Relabel your namespace to enable auto-injection with the previous version of
istiod
. The command that you use depends on whether you used a revision label oristio-injection=enabled
with the previous version.If you used a revision label for auto-injection:
kubectl label namespace NAMESPACE istio.io/rev=OLD_REVISION --overwrite
If you used
istio-injection=enabled
:kubectl label namespace NAMESPACE istio.io/rev- istio-injection=enabled --overwrite
Expected output:
namespace/NAMESPACE labeled
Confirm that the revision label on the namespace matches the revision label on the previous version of
istiod
:kubectl get ns NAMESPACE --show-labels
Restart the Pods to trigger re-injection so the proxies have the previous version:
kubectl rollout restart deployment -n NAMESPACE
Remove the new
istio-ingressgateway
Deployment.kubectl delete deploy -l app=istio-ingressgateway,istio.io/rev=REVISION_1 -n istio-system --ignore-not-found=true
Remove the new revision of
istiod
.kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-REVISION_1 -n istio-system --ignore-not-found=true
Remove the new
IstioOperator
configuration.kubectl delete IstioOperator installed-state-asm-11910-9-distribute-root -n istio-system
Expected output is similar to the following:
istiooperator.install.istio.io "installed-state-asm-11910-9-distribute-root" deleted
Migrate to Cloud Service Mesh certificate authority
Now that the sidecar proxies for all workloads are configured with both the old root of trust and the new root of trust for Cloud Service Mesh certificate authority, the steps to migrate to Cloud Service Mesh certificate authority are similar to those that you did to distribute the Cloud Service Mesh certificate authority root of trust:
Install a new control plane with Cloud Service Mesh certificate authority enabled
You use asmcli install
to install a new control plane revision that has Mesh
CA enabled.
If you customized the previous installation, you need to specify the same overlay files when you run
asmcli install
.Run
asmcli install
. In the following command, replace the placeholders with your values../asmcli install \ --fleet_id FLEET_PROJECT_ID \ --kubeconfig KUBECONFIG_FILE \ --output_dir DIR_PATH \ --enable_all \ --ca mesh_ca \ --option ca-migration-meshca \ --revision_name REVISION_2 \ OVERLAYS
--fleet_id
The project ID of the fleet host project.--kubeconfig
The path to thekubeconfig
file You can specify either a relative path or a full path. The environment variable$PWD
doesn't work here.--output_dir
Include this option to specify a directory whereasmcli
downloads theanthos-service-mesh
package and extracts the installation file, which containsistioctl
, samples, and manifests. Otherwiseasmcli
downloads the files to atmp
directory. You can specify either a relative path or a full path. The environment variable$PWD
doesn't work here.-
--enable_all
Allows the tool to:- Grant required IAM permissions.
- Enable the required Google APIs.
- Set a label on the cluster that identifies the mesh.
- Register the cluster to the fleet if it isn't already registered.
--ca mesh_ca
You can now switch to Cloud Service Mesh certificate authority since the Cloud Service Mesh certificate authority root of trust has been distributed.REVISION_2
Recommended. ReplaceREVISION_2
with a name that describes the installation, such asasm-11910-9-meshca-ca-migration
. The name must be a DNS-1035 label, and it must consist of lower case alphanumeric characters or-
, start with an alphabetic character, and end with an alphanumeric character (such asmy-name
orabc-123
).--option ca-migration-migration
When you [redeploy your workloads](/service-mesh/docs/unified-install/install-anthos-service-mesh#deploy_and_redeploy_workloads), this option configures the proxies to use the Cloud Service Mesh certificate authority root of trust.
Migrate workloads to the new control plane
To finish the installation, you need to label your namespaces with the new revision label and restart your workloads.
Get the revision label that is on
istiod
and theistio-ingressgateway
.kubectl get pod -n istio-system -L istio.io/rev
The output is similar to the following:
NAME READY STATUS RESTARTS AGE REV istio-ingressgateway-asm-11910-9-distribute-root-65d884685d-6hrdk 1/1 Running 0 67m asm-11910-9-distribute-root istio-ingressgateway-asm-11910-9-distribute-root65d884685d-94wgz 1/1 Running 0 67m asm-11910-9-distribute-root istio-ingressgateway-asm-11910-9-meshca-ca-migration-8b5fc8767-gk6hb 1/1 Running 0 5s asm-11910-9-meshca-ca-migration istio-ingressgateway-asm-11910-9-meshca-ca-migration-8b5fc8767-hn4w2 1/1 Running 0 20s asm-11910-9-meshca-ca-migration istiod-asm-11910-9-distribute-root-67998f4b55-lrzpz 1/1 Running 0 68m asm-11910-9-distribute-root istiod-asm-11910-9-distribute-root-67998f4b55-r76kr 1/1 Running 0 68m asm-11910-9-distribute-root istiod-asm-11910-9-meshca-ca-migration-5cd96f88f6-n7tj9 1/1 Running 0 27s asm-11910-9-meshca-ca-migration istiod-asm-11910-9-meshca-ca-migration-5cd96f88f6-wm68b 1/1 Running 0 27s asm-11910-9-meshca-ca-migration
In the output, under the
REV
column, note the value of the revision label for the new version. In this example, the value isasm-11910-9-meshca-ca-migration
.Also note the value in the revision label for the earlier
istiod
version. You need this to delete the earlier version ofistiod
when you finish moving workloads to the new version. In the example, the value of the revision label for the previous revision isasm-11910-9-distribute-root
.
Add the new revision label to a namespace In the following command, replace
NAMESPACE
with the namespace to label.kubectl label namespace NAMESPACE istio.io/rev=REVISION_2 --overwrite
Restart the Pods to trigger re-injection.
kubectl rollout restart deployment -n NAMESPACE
Test your application to verify that the workloads are working correctly. Make sure that mTLS communication works between workloads in the older namespace and workloads in the newer namespace.
If you have workloads in other namespaces, repeat the steps to label the namespace and restart Pods.
Follow the steps outlined in In-place upgrades to upgrade the older gateway deployments installed in step 11 of the previous section to the latest revision
REVISION_2
.If you are satisfied that your application is working as expected, continue with the steps to transition to the new control plane. If there's an issue with your application, follow the steps to rollback.
Complete the transition
If you are satisfied that your application is working as expected, remove the old control plane to complete the transition to the new version.
Change to the directory where the files from the
anthos-service-mesh
GitHub repository are located.Configure the validating webhook to use the new control plane.
kubectl apply -f asm/istio/istiod-service.yaml
Delete the old
istio-ingressgateway
Deployment. In the following command, replaceOLD_REVISION
with the revision label for the previous version of theistio-ingressgateway
.kubectl delete deploy -l app=istio-ingressgateway,istio.io/rev=OLD_REVISION -n istio-system --ignore-not-found=true
Delete the old
istiod
revision. In the following command, replaceOLD_REVISION
with the revision label for the previous version ofistiod
.kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-OLD_REVISION -n istio-system --ignore-not-found=true
Remove the old
IstioOperator
configuration.kubectl delete IstioOperator installed-state-OLD_REVISION -n istio-system
The expected output is similar to the following:
istiooperator.install.istio.io "installed-state-OLD_REVISION" deleted
Rollback
If you encountered an issue when testing your application with the new
istiod
revision, follow these steps to rollback to the previous revision:Follow the steps in In-place upgrades to downgrade the gateway deployments previously upgraded in step 6 of this section to the older revision
REVISION_1
.Relabel your namespace to enable auto-injection with the previous
istiod
revision.kubectl label namespace NAMESPACE istio.io/rev=OLD_REVISION --overwrite
Expected output:
namespace/NAMESPACE labeled
Confirm that the revision label on the namespace matches the revision label on the previous version of
istiod
:kubectl get ns NAMESPACE --show-labels
Restart the Pods to trigger re-injection so the proxies have the previous version:
kubectl rollout restart deployment -n NAMESPACE
Remove the new
istio-ingressgateway
Deployment.kubectl delete deploy -l app=istio-ingressgateway,istio.io/rev=REVISION_2 -n istio-system --ignore-not-found=true
Remove the new version of
istiod
. Make sure that the revision label in the following command matches your revision.kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-REVISION_2 -n istio-system --ignore-not-found=true
Remove the new version of the
IstioOperator
configuration.kubectl delete IstioOperator installed-state-REVISION_2 -n istio-system
Expected output is similar to the following:
istiooperator.install.istio.io "installed-state-REVISION_2" deleted
Remove the CA secrets and restart the new control plane
Preserve secrets just in case you need them:
kubectl get secret/cacerts -n istio-system -o yaml > save_file_1
Remove the CA secrets in the cluster associated with the old CA:
kubectl delete secret cacerts -n istio-system --ignore-not-found
Restart the newly installed control plane. This makes sure the old root of trust is cleaned up from all workloads running in the mesh.
kubectl rollout restart deployment -n istio-system