Migrate containers to Google Cloud: Migrate Kubernetes to GKE

Last reviewed 2023-05-22 UTC

This document helps you plan, design, and implement your migration from a self-managed Kubernetes environment to Google Kubernetes Engine (GKE). If done incorrectly, moving apps from one environment to another can be a challenging task, so you need to plan and execute your migration carefully.

This document is part of a multi-part series about migrating to Google Cloud. If you're interested in an overview of the series, see Migrate to Google Cloud: Choose your migration path.

This document is part of a series that discusses migrating containers to Google Cloud:

This document is useful if you're planning to migrate from a self-managed Kubernetes environment to GKE. Your environment might be running in an on-premises environment, in a private hosting environment, or in another cloud provider. This document is also useful if you're evaluating the opportunity to migrate and want to explore what it might look like.

By using GKE, you get the following benefits:

The following diagram illustrates the path of your migration journey.

Migration path with four phases.

During each migration step, you follow the phases defined in Migrate to Google Cloud: Get started:

  1. Assessing and discovering your workloads.
  2. Planning and building a foundation.
  3. Deploying your workloads.
  4. Optimizing your environment.

Assess your environment

In the assessment phase, you determine the requirements and dependencies to migrate your self-managed Kubernetes environment to GKE:

  1. Build a comprehensive inventory of your apps.
  2. Catalog your apps according to their properties and dependencies.
  3. Train and educate your teams on Google Cloud.
  4. Build an experiment and proof of concept on Google Cloud.
  5. Calculate the total cost of ownership (TCO) of the target environment.
  6. Choose the workloads that you want to migrate first.

The following sections rely on Migrate to Google Cloud: Assess and discover your workloads.

Build your inventories

To scope your migration, you must understand your current Kubernetes environment. You first gather information about your clusters, and then you focus on your workloads deployed in those clusters and the workloads' dependencies. At the end of the assessment phase, you have two inventories: one for your clusters, and one for the workloads deployed in those clusters.

To build the inventory of your clusters, consider the following for each cluster:

  • Number and type of nodes. When you know how many nodes and the characteristics of each node that you have in your current environment, you size your clusters when you move to GKE. The nodes in your new environment might run on a different hardware-architecture generation than the ones you use in your environment. The performance of each architecture generation is different, so the number of nodes you need in your new environment might be different from your environment. Evaluate any type of hardware that you're using in your nodes, such as high-performance storage devices, GPUs, and TPUs.
  • Internal or external cluster. Evaluate which actors, either internal to your environment or external, that each cluster is exposed to. To support your use cases, this evaluation includes the workloads running in the cluster, and the interfaces that interact with your clusters.
  • Multi-tenancy. If you're managing multi-tenant clusters in your environment, assess if it works in your new Google Cloud environment. Now is a good time to evaluate how to improve your multi-tenant clusters because your multi-tenancy strategy influences how you build your foundation on Google Cloud.
  • Kubernetes version. Gather information about the Kubernetes version of your clusters to assess if there is a mismatch between those versions and the ones available in GKE. If you're running an older or a recently released version, you might be using features that are unavailable in GKE. The features might be deprecated, or the Kubernetes version that ships them is not yet available in GKE.
  • Kubernetes upgrade cycle. To maintain a reliable environment, understand how you're handling Kubernetes upgrades and how your upgrade cycle relates to GKE upgrades.
  • Node pools. If you're using any form of node grouping, you might want to consider how these groupings map to the concept of node pools in GKE because your grouping criteria might not be suitable for GKE.
  • Node initialization. Assess how you initialize each node before marking it as available to run your workloads so you can port those initialization procedures over to GKE.

The following items that you assess in your inventory focus on the security of your infrastructure and Kubernetes clusters:

  • Namespaces. If you use Kubernetes Namespaces in your clusters to logically separate resources, assess which resources are in each Namespace, and understand why you created this separation. For example, you might be using Namespaces as part of your multi-tenancy strategy. You might have workloads deployed in Namespaces reserved for Kubernetes system components, and you might not have as much control in GKE.
  • Role-based access control (RBAC). If you use RBAC authorization in your clusters, list a description of all ClusterRoles and ClusterRoleBindings that you configured in your clusters.
  • Network policies. List all network policies that you configured in your clusters, and understand how network policies work in GKE.
  • Pod security contexts. Capture information about the Pod security contexts that you configured in your clusters and learn how they work in GKE.
  • Service accounts. If any process in your cluster is interacting with the Kubernetes API server, capture information about the service accounts that they're using.

After you complete the Kubernetes clusters inventory and assess the security of your environment, build the inventory of the workloads deployed in those clusters. When evaluating your workloads, gather information about the following aspects:

  • Pods and controllers. To size the clusters in your new environment, assess how many instances of each workload you have deployed, and if you're using Resource quotas and compute resource consumption limits. Gather information about the workloads that are running on the control plane nodes of each cluster and the controllers that each workload uses. For example, how many Deployments are you using? How many DaemonSets are you using?
  • Jobs and CronJobs. Your clusters and workloads might need to run Jobs or CronJobs as part of their initialization or operation procedures. Assess how many instances of Jobs and CronJobs you have deployed, and the responsibilities and completion criteria for each instance.
  • Kubernetes Autoscalers. To migrate your autoscaling policies in the new environment, learn how the Horizontal Pod Autoscaler, the Vertical Pod Autoscaler, and the Multidimensional Pod Autoscaler work on GKE.
  • Stateless and stateful workloads. Stateless workloads don't store data or state in the cluster or to persistent storage. Stateful applications save data for later use. For each workload, assess which components are stateless and which are stateful, because migrating stateful workloads is typically harder than migrating stateless ones.
  • Kubernetes features. From the cluster inventory, you know which Kubernetes version each cluster runs. Review the release notes of each Kubernetes version to know which features it ships and which features it deprecates. Then assess your workloads against the Kubernetes features that you need. The goal of this task is to know whether you're using deprecated features or features that are not yet available in GKE. If you find any unavailable features, migrate away from deprecated features and adopt the new ones when they're available in GKE.
  • Storage. For stateful workloads, assess if they use PersistenceVolumeClaims. List any storage requirements, such as size and access mode, and how these PersistenceVolumeClaims map to PersistenceVolumes. To account for future growth, assess if you need to expand any PersistenceVolumeClaim.
  • Configuration and secret injection. To avoid rebuilding your deployable artifacts every time there is a change in the configuration of your environment, inject configuration and secrets into Pods using ConfigMaps and Secrets. For each workload, assess which ConfigMaps and Secrets that workload is using, and how you're populating those objects.
  • Dependencies. Your workloads probably don't work in isolation. They might have dependencies, either internal to the cluster, or from external systems. For each workload, capture the dependencies, and if your workloads have any tolerance for when the dependencies are unavailable. For example, common dependencies include distributed file systems, databases, secret distribution platforms, identity and access management systems, service discovery mechanisms, and any other external systems.
  • Kubernetes Services. To expose your workloads to internal and external clients, use Services. For each Service, you need to know its type. For externally exposed services, assess how that service interacts with the rest of your infrastructure. For example, how is your infrastructure supporting LoadBalancer services and Ingress objects? Which Ingress controllers did you deploy in your clusters?
  • Service mesh. If you're using a service mesh in your environment, you assess how it's configured. You also need to know how many clusters it spans, which services are part of the mesh, and how you modify the topology of the mesh. For example, are you using an automatic sidecar injection to automatically add sidecars to Kubernetes pods?
  • Taints and tolerations and affinity and anti-affinity. For each Pod and Node, assess if you configured any Node taints, Pod tolerations, or affinities to customize the scheduling of Pods in your Kubernetes clusters. These properties might also give you insights about possible non-homogeneous Node or Pod configurations, and might mean that either the Pods, the Nodes, or both need to be assessed with special focus and care. For example, if you configured a particular set of Pods to be scheduled only on certain Nodes in your Kubernetes cluster, it might mean that the Pods need specialized resources that are available only on those Nodes.

After you assess your clusters and their workloads, evaluate the rest of the supporting services and aspects in your infrastructure, such as the following:

  • StorageClasses and PersistentVolumes. Assess how your infrastructure is backing PersistentVolumeClaims by listing StorageClasses for dynamic provisioning, and statically provisioned PersistentVolumes. For each PersistentVolume, consider the following: capacity, volume mode, access mode, class, reclaim policy, mount options, and node affinity.
  • VolumeSnapshots and VolumeSnapshotContents. For each PersistentVolume, assess if you configured any VolumeSnapshot, and if you need to migrate any existing VolumeSnapshotContents.
  • Container Storage Interface (CSI) drivers. If deployed in your clusters, assess if these drivers are compatible with GKE, and if you need to adapt the configuration of your volumes to work with CSI drivers that are compatible with GKE.
  • Data storage. If you depend on external systems to provision PersistentVolumes, provide a way for the workloads in your GKE environment to use those systems. Data locality has an impact on the performance of stateful workloads, because the latency between your external systems and your GKE environment is proportional to the distance between them. For each external data storage system, consider its type, such as block volumes, file storage, or object storage, and any performance and availability requirements that it needs to satisfy.
  • Logging, monitoring, and tracing. Capture information on your monitoring, logging, and tracing systems. You can integrate your systems with the Google Cloud Observability, or you can use Google Cloud Observability as your only monitoring, logging, and tracing tool. For example, you can integrate Google Cloud Observability with other services, set up logging interfaces for your preferred programming languages, and use the Cloud Logging agent on your VMs. GKE integrates with Google Cloud Observability and Cloud Audit Logs. You can also customize Cloud Logging logs for GKE with Fluentd and then process logs at scale using Dataflow.
  • Custom resources and Kubernetes add-ons. Collect information about any custom Kubernetes resources and any Kubernetes add-ons that you might have deployed in your clusters, because they might not work in GKE, or you might need to modify them. For example, if a custom resource interacts with an external system, you assess if that's applicable to your Google Cloud environment.

Complete the assessment

After building the inventories related to your Kubernetes clusters and workloads, complete the rest of the activities of the assessment phase in Migrate to Google Cloud: Assess and discover your workloads.

Plan and build your foundation

In the planning and building phase, you provision and configure the cloud infrastructure and services that support your workloads on Google Cloud:

  1. Build a resource hierarchy.
  2. Configure identity and access management.
  3. Set up billing.
  4. Set up network connectivity.
  5. Harden your security.
  6. Set up monitoring and alerting.

If you've already adopted infrastructure-as-code to manage the workloads in your Kubernetes environment, you can apply the same process to your Google Cloud environment. You analyze your Kubernetes descriptors because some Google Cloud resources that GKE automatically provisions for you are configurable by using Kubernetes labels and annotations. For example, you can provision an internal load balancer instead of an external one by adding an annotation to a LoadBalancer Service.

The following sections rely on Migrate to Google Cloud: Build your foundation.

Build a resource hierarchy

To design an efficient resource hierarchy, consider how your business and organizational structures map to Google Cloud as detailed in Migrate to Google Cloud: Build your foundation.

For example, if you need a multi-tenant environment on GKE, you can choose between the following options:

  • Creating one Google Cloud project for each tenant.
  • Sharing one project among different tenants, and provisioning multiple GKE clusters.
  • Using Kubernetes namespaces.

Your choice depends on your isolation, complexity, and scalability needs. For example, having one project per tenant isolates the tenants from one another, but the resource hierarchy becomes more complex to manage due to the high number of projects. However, although managing Kubernetes Namespaces is relatively easier than a complex resource hierarchy, this option doesn't guarantee as much isolation. For example, the control plane might be shared between tenants.

Configure identity and access management

Identity and Access Management provides the tools to centrally configure fine-grained access control to cloud resources. For more information, see Identity and Access Management.

Review how Kubernetes RBAC interacts with identity and access management in Google Cloud, and configure the RBAC according to your requirements that you gathered in the assessment phase.

Set up billing

Before provisioning any Google Cloud resources, configure Cloud Billing and understand the GKE pricing model. For more information, see billing.

Set up network connectivity

Network configuration is a fundamental aspect of your environment. Assess the GKE network model and the connectivity requirements of your workloads. Then, you can start planning your network configuration. For more information, see connectivity and networking.

Harden your security

Understanding the differences between your environment's security model and Google Cloud's model and how to harden the security of your GKE clusters are crucial to protect your critical assets. For more information, see security.

Set up monitoring and alerting

Having a clear picture of how your infrastructure and workloads are performing is key to finding areas of improvement. GKE has deep integrations with Google Cloud Observability, so you get logging and monitoring information about your GKE clusters and workloads inside those clusters. For more information, see monitoring and alerting