Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Anda dapat mengaktifkan dan mengelola resource unit pemrosesan grafis (GPU) di
container Anda. Misalnya, Anda mungkin lebih suka menjalankan notebook kecerdasan buatan (AI) dan machine learning (ML) di lingkungan GPU. Untuk menjalankan workload
kontainer GPU, Anda harus memiliki cluster Kubernetes yang mendukung perangkat GPU. Dukungan GPU diaktifkan secara default untuk cluster Kubernetes yang telah menyediakan mesin GPU.
Sebelum memulai
Untuk men-deploy GPU ke container, Anda harus memiliki hal berikut:
Cluster Kubernetes dengan class mesin GPU. Periksa bagian
kartu GPU yang didukung
untuk mengetahui opsi yang dapat Anda konfigurasi untuk mesin cluster.
Peran User Cluster Node Viewer (user-cluster-node-viewer) untuk memeriksa GPU,
dan peran Namespace Admin (namespace-admin) untuk men-deploy workload GPU di
namespace project Anda.
Jalur kubeconfig untuk server API pengelolaan zonal yang menghosting cluster Kubernetes Anda.
Login dan buat file
kubeconfig jika Anda belum memilikinya.
Jalur kubeconfig untuk cluster infrastruktur org di zona yang ditujukan untuk
menghosting GPU Anda.
Login dan buat file
kubeconfig jika Anda belum memilikinya.
Nama cluster Kubernetes. Minta informasi ini kepada Administrator Platform Anda jika Anda tidak memilikinya.
Jalur kubeconfig cluster Kubernetes.
Login dan buat file
kubeconfig jika Anda belum memilikinya.
Mengonfigurasi container untuk menggunakan resource GPU
Untuk menggunakan GPU ini dalam container, selesaikan langkah-langkah berikut:
Pastikan cluster Kubernetes Anda memiliki node pool yang mendukung GPU:
Output yang relevan mirip dengan cuplikan berikut:
Spec:
Machine Class Name: a2-ultragpu-1g-gdc
Node Count: 2
Untuk mengetahui daftar lengkap jenis mesin GPU yang didukung dan profil GPU Multi-Instance (MIG), lihat
Jenis mesin node cluster.
Tambahkan kolom .containers.resources.requests dan .containers.resources.limits
ke spesifikasi container Anda. Setiap nama resource berbeda, bergantung pada
class mesin Anda.
Periksa alokasi resource GPU Anda untuk menemukan
nama resource GPU Anda.
Misalnya, spesifikasi container berikut meminta tiga partisi GPU
dari node a2-ultragpu-1g-gdc:
Kontainer juga memerlukan izin tambahan untuk mengakses GPU. Untuk setiap
container yang meminta GPU, tambahkan izin berikut ke spesifikasi
container Anda:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eGPU support is enabled by default for Kubernetes clusters that have GPU machines provisioned, making it suitable for running workloads like AI and ML notebooks.\u003c/p\u003e\n"],["\u003cp\u003eDeploying GPUs to containers requires a Kubernetes cluster with a GPU machine class, along with specific roles such as User Cluster Node Viewer and Namespace Admin.\u003c/p\u003e\n"],["\u003cp\u003eTo configure a container for GPU use, users must verify that their Kubernetes cluster supports GPUs and add requests and limits fields to the container specification.\u003c/p\u003e\n"],["\u003cp\u003eEach container requiring GPU access must also include specific security permissions in their specification, ensuring they can properly interact with the GPU resources.\u003c/p\u003e\n"],["\u003cp\u003eUsers can check their GPU resource allocation by running a command, which will output information on GPU capacity and the resource names needed for configuration.\u003c/p\u003e\n"]]],[],null,["# Manage GPU container workloads\n\nYou can enable and manage graphics processing unit (GPU) resources on your\ncontainers. For example, you might prefer running artificial intelligence (AI)\nand machine learning (ML) notebooks in a GPU environment. To run GPU container\nworkloads, you must have a Kubernetes cluster that supports GPU devices. GPU support\nis enabled by default for Kubernetes clusters that have GPU machines provisioned for\nthem.\n\nBefore you begin\n----------------\n\nTo deploy GPUs to your containers, you must have the following:\n\n- A Kubernetes cluster with a GPU machine class. Check the\n [supported GPU cards](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/create-user-cluster#supported-gpu-cards)\n section for options on what you can configure for your cluster machines.\n\n- The User Cluster Node Viewer role (`user-cluster-node-viewer`) to check GPUs,\n and the Namespace Admin role (`namespace-admin`) to deploy GPU workloads in\n your project namespace.\n\n- The kubeconfig path for the zonal management API server that hosts your\n Kubernetes cluster.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\n- The kubeconfig path for the org infrastructure cluster in the zone intended to\n host your GPUs.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\n- The Kubernetes cluster name. Ask your Platform Administrator for this information if\n you don't have it.\n\n- The Kubernetes cluster kubeconfig path.\n [Sign in and generate](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/iam/sign-in) the\n kubeconfig file if you don't have one.\n\nConfigure a container to use GPU resources\n------------------------------------------\n\nTo use these GPUs in a container, complete the following steps:\n\n1. Verify your Kubernetes cluster has node pools that support GPUs:\n\n kubectl describe nodepoolclaims -n \u003cvar translate=\"no\"\u003eKUBERNETES_CLUSTER_NAME\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eORG_INFRASTRUCTURE_CLUSTER\u003c/var\u003e\n\n The relevant output is similar to the following snippet: \n\n Spec:\n Machine Class Name: a2-ultragpu-1g-gdc\n Node Count: 2\n\n For a full list of supported GPU machine types and Multi-Instance GPU (MIG)\n profiles, see\n [Cluster node machine types](/distributed-cloud/hosted/docs/latest/gdch/platform/pa-user/cluster-node-machines).\n2. Add the `.containers.resources.requests` and `.containers.resources.limits`\n fields to your container spec. Each resource name is different depending on\n your machine class.\n [Check your GPU resource allocation](#check-gpu-resource-allocation) to find\n your GPU resource names.\n\n For example, the following container spec requests three partitions of a GPU\n from an `a2-ultragpu-1g-gdc` node: \n\n ...\n containers:\n - name: my-container\n image: \"my-image\"\n resources:\n requests:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 3\n limits:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 3\n ...\n\n | **Note:** You can request a maximum of seven GPU partitions per pod.\n3. Containers also require additional permissions to access GPUs. For each\n container that requests GPUs, add the following permissions to your\n container spec:\n\n ...\n securityContext:\n seLinuxOptions:\n type: unconfined_t\n ...\n\n4. Apply your container manifest file:\n\n kubectl apply -f \u003cvar translate=\"no\"\u003eCONTAINER_MANIFEST_FILE\u003c/var\u003e \\\n -n \u003cvar translate=\"no\"\u003eNAMESPACE\u003c/var\u003e \\\n --kubeconfig \u003cvar translate=\"no\"\u003eKUBERNETES_CLUSTER_KUBECONFIG\u003c/var\u003e\n\nCheck GPU resource allocation\n-----------------------------\n\n- To check your GPU resource allocation, use the following command:\n\n kubectl describe nodes \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e\n\n Replace \u003cvar translate=\"no\"\u003eNODE_NAME\u003c/var\u003e with the node managing the GPUs\n you want to inspect.\n\n The relevant output is similar to the following snippet: \n\n Capacity:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 7\n Allocatable:\n nvidia.com/mig-1g.10gb-NVIDIA_A100_80GB_PCIE: 7\n\nNote the resource names for your GPUs; you must specify them when configuring\na container to use GPU resources."]]