Create a cluster to run container workloads

Create a Kubernetes cluster to allow for container workload deployment.

Before you begin

To get the permissions needed to create a Kubernetes cluster, ask your Organization IAM Admin to grant you the User Cluster Admin role (user-cluster-admin). This role is not bound to a namespace.

Google Distributed Cloud (GDC) air-gapped has the following limits for Kubernetes clusters:

  • 16 clusters per organization
  • 42 worker nodes per cluster, and a minimum of three worker nodes
  • 4620 pods per cluster
  • 110 pods per node

Configure the pod CIDR block

The cluster follows this logic when allocating IP addresses:

  • Kubernetes assigns a /24 CIDR block consisting of 256 addresses to each of the nodes. This amount adheres to the default maximum of 110 pods per node for user clusters.
  • The size of the CIDR block assigned to a node depends on the maximum pods per node value.
  • The block always contains at least twice as many addresses as the maximum number of pods per node.

See the following example to understand how the default value of Per node mask size= /24 was calculated to accommodate 110 pods:

Maximum pods per node = 110
Total number of IP addresses required = 2 * 110 = 220

Per node mask size = /24
Number of IP addresses in a /24 = 2(32 - 24) = 256

Determine the required pod CIDR mask to be configured for the user cluster based on the required number of nodes. Plan for future node additions to the cluster while configuring the CIDR range:

  Total number of nodes supported = 2(Per node mask size - pod CIDR mask)

Given that we have a default Per node mask size= /24 , refer to the following table that maps the pod CIDR mask to the number of nodes supported.

Pod CIDR Mask Calculation: 2(Per node mask size - CIDR mask) Maximum number of nodes supported including control plane nodes
/21 2(24 - 21) 8
/20 2(24-20) 16
/19 2(24 - 19) 32
/18 2(24 - 18) 64

Create a Kubernetes cluster

Complete the following steps to create a Kubernetes cluster:

Console

  1. In the navigation menu, select Kubernetes Engine > Clusters.

  2. Click Create Cluster.

  3. In the Name field, specify a name for the cluster.

  4. Select the Kubernetes version for the cluster.

  5. Click Attach Project and select an existing project to attach to your cluster. Then click Save. You can attach or detach projects after creating the cluster from the project details page. You must have a project attached to your cluster before deploying container workloads it.

    Create a cluster using the console.

  6. Click Next.

  7. Configure the network settings for your cluster. You can't change these network settings after you create the cluster. The default and only supported Internet Protocol for Kubernetes clusters is Internet Protocol version 4 (IPv4).

    1. If you want to create dedicated load balancer nodes, enter the number of nodes to create. By default, you receive zero nodes, and load balancer traffic runs through the control nodes.

    2. Select the Service CIDR (Classless Inter-Domain Routing) to use. Your deployed services, such as load balancers, are allocated IP addresses from this range.

    3. Select the Pod CIDR to use. The cluster allocates IP addresses from this range to your pods and VMs.

    4. Click Next.

  8. Review the details of the auto-generated default node pool for the cluster. Click Edit to modify the default node pool.

  9. To create additional node pools, select Add node pool. When editing the default node pool or adding a new node pool, you customize it with the following options:

    1. Assign a name for the node pool. You cannot modify the name after you create the node pool.
    2. Specify the number of worker nodes to create in the node pool.
    3. Select your machine class that best suits your workload requirements. View the list of the following settings:

      • Machine type
      • CPU
      • Memory
    4. Click Save.

  10. Click Create to create the cluster.

API

To create a new cluster using the API directly, apply a custom resource to your GDC instance:

  1. Create a Cluster custom resource and save it as a YAML file, such as cluster.yaml:

    apiVersion: cluster.gdc.goog/v1
    kind: Cluster
    metadata:
      name: CLUSTER_NAME
      namespace: platform
    spec:
      clusterNetwork:
        podCIDRSize: POD_CIDR
        serviceCIDRSize: SERVICE_CIDR
      initialVersion:
        kubernetesVersion: KUBERNETES_VERSION
      loadBalancer:
        ingressServiceIPSize: LOAD_BALANCER_POOL_SIZE
      nodePools:
      - machineTypeName: MACHINE_TYPE
        name: NODE_POOL_NAME
        nodeCount: NUMBER_OF_WORKER_NODES
        taints: TAINTS
        labels: LABELS
        acceleratorOptions:
          gpuPartitionScheme: GPU_PARTITION_SCHEME
      releaseChannel:
        channel: UNSPECIFIED
    

    Replace the following:

    • CLUSTER_NAME: The name of the cluster. The cluster name must not end with -system. The -system suffix is reserved for clusters created by GDC.
    • POD_CIDR: The size of network ranges from which pod virtual IP addresses are allocated. If unset, a default value 21 is used.
    • SERVICE_CIDR: The size of network ranges from which service virtual IP addresses are allocated. If unset, a default value 23 is used.
    • KUBERNETES_VERSION: The Kubernetes version of the cluster, such as 1.26.5-gke.2100. To list the available Kubernetes versions to configure, see List available Kubernetes versions for a cluster.
    • LOAD_BALANCER_POOL_SIZE: The size of non-overlapping IP address pools used by load balancer services. If unset, a default value 20 is used.
    • MACHINE_TYPE: The machine type for the worker nodes of the node pool. View the available machine types for what is available to configure.
    • NODE_POOL_NAME: The name of the node pool.
    • NUMBER_OF_WORKER_NODES: The number of worker nodes to provision in the node pool.
    • TAINTS: The taints to apply to the nodes of this node pool. This is an optional field.
    • LABELS: The labels to apply to the nodes of this node pool. It contains a list of key-value pairs. This is an optional field.
    • GPU_PARTITION_SCHEME: The GPU partitioning scheme, if you're running GPU workloads. For example, mixed-2. The GPU is not partitioned if this field is not set. For available Multi-Instance GPU (MIG) profiles, see Supported MIG profiles.
  2. Apply the custom resource to your GDC instance:

    kubectl apply -f cluster.yaml --kubeconfig ORG_ADMIN_CLUSTER_KUBECONFIG
    

    Replace ORG_ADMIN_CLUSTER_KUBECONFIG with the org admin cluster's kubeconfig path.

Terraform

  1. In a Terraform configuration file, insert the following code snippet:

    provider "kubernetes" {
      config_path = "ORG_ADMIN_CLUSTER_KUBECONFIG"
    }
    
    resource "kubernetes_manifest" "cluster-create" {
      manifest = {
        "apiVersion" = "cluster.gdc.goog/v1"
        "kind" = "Cluster"
        "metadata" = {
          "name" = "CLUSTER_NAME"
          "namespace" = "platform"
        }
        "spec" = {
          "clusterNetwork" = {
            "podCIDRSize" = "POD_CIDR"
            "serviceCIDRSize" = "SERVICE_CIDR"
          }
          "initialVersion" = {
            "kubernetesVersion" = "KUBERNETES_VERSION"
          }
          "loadBalancer" = {
            "ingressServiceIPSize" = "LOAD_BALANCER_POOL_SIZE"
          }
          "nodePools" = [{
            "machineTypeName" = "MACHINE_TYPE"
            "name" = "NODE_POOL_NAME"
            "nodeCount" = "NUMBER_OF_WORKER_NODES"
            "taints" = "TAINTS"
            "labels" = "LABELS"
            "acceleratorOptions" = {
              "gpuPartitionScheme" = "GPU_PARTITION_SCHEME"
            }
          }]
          "releaseChannel" = {
            "channel" = "UNSPECIFIED"
          }
        }
      }
    }
    

    Replace the following:

    • ORG_ADMIN_CLUSTER_KUBECONFIG: The org admin cluster's kubeconfig path.
    • CLUSTER_NAME: The name of the cluster. The cluster name must not end with -system. The -system suffix is reserved for clusters created by GDC.
    • POD_CIDR: The size of network ranges from which pod virtual IP addresses are allocated. If unset, a default value 21 is used.
    • SERVICE_CIDR: The size of network ranges from which service virtual IP addresses are allocated. If unset, a default value 23 is used.
    • KUBERNETES_VERSION: The Kubernetes version of the cluster, such as 1.26.5-gke.2100. To list the available Kubernetes versions to configure, see List available Kubernetes versions for a cluster.
    • LOAD_BALANCER_POOL_SIZE: The size of non-overlapping IP address pools used by load balancer services. If unset, a default value 20 is used.
    • MACHINE_TYPE: The machine type for the worker nodes of the node pool. View the available machine types for what is available to configure.
    • NODE_POOL_NAME: The name of the node pool.
    • NUMBER_OF_WORKER_NODES: The number of worker nodes to provision in the node pool.
    • TAINTS: The taints to apply to the nodes of this node pool. This is an optional field.
    • LABELS: The labels to apply to the nodes of this node pool. It contains a list of key-value pairs. This is an optional field.
    • GPU_PARTITION_SCHEME: The GPU partitioning scheme, if you're running GPU workloads. For example, mixed-2. The GPU is not partitioned if this field is not set. For available Multi-Instance GPU (MIG) profiles, see Supported MIG profiles.
  2. Apply the new Kubernetes cluster using Terraform:

    terraform apply
    

List available Kubernetes versions for a cluster

You can list the available Kubernetes versions in your GDC instance using the kubectl CLI:

kubectl get userclustermetadata.upgrade.private.gdc.goog \
    -o=custom-columns=K8S-VERSION:.spec.kubernetesVersion \
    --kubeconfig ORG_ADMIN_CLUSTER_KUBECONFIG

Replace ORG_ADMIN_CLUSTER_KUBECONFIG with the org admin cluster's kubeconfig path.

The output looks similar to the following:

K8S-VERSION
1.25.10-gke.2100
1.26.5-gke.2100
1.27.4-gke.500

Support GPU workloads in a cluster

Distributed Cloud provides NVIDIA GPU support for Kubernetes clusters, and they run your GPU devices as user workloads. For example, you might prefer running artificial intelligence (AI) and machine learning (ML) notebooks in a GPU environment. Ensure that your cluster supports GPU devices before leveraging AI and ML notebooks. GPU support is enabled by default for clusters who have GPU machines provisioned for them.

Clusters can be created using the GDC console or API directly. Ensure that you provision GPU machines for your cluster to support GPU workloads on its associated containers. For more information, see Create a Kubernetes cluster.

GPUs are statically allocated. The first four GPUs are always dedicated to workloads like pretrained Artificial Intelligence (AI) and Machine Learning (ML) APIs. These GPUs don't run on a Kubernetes cluster. The remaining GPUs are available to Kubernetes clusters. AI and ML notebooks run on Kubernetes clusters.

Be sure to allocate GPU machines for the correct cluster types to ensure components such as AI and ML APIs and notebooks can be used.