Meningkatkan performa pada GPU bersama menggunakan NVIDIA MPS
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Jika menjalankan beberapa proses SDK di GPU Dataflow bersama, Anda
dapat meningkatkan efisiensi dan pemanfaatan GPU dengan mengaktifkan Layanan Multi-Proses (MPS) NVIDIA. MPS mendukung pemrosesan serentak di GPU dengan memungkinkan proses
untuk berbagi konteks CUDA dan sumber daya penjadwalan. MPS dapat mengurangi
biaya pengalihan konteks, meningkatkan paralelisme, dan mengurangi persyaratan penyimpanan.
Alur kerja target adalah pipeline Python yang berjalan di pekerja dengan lebih dari satu vCPU.
MPS adalah teknologi NVIDIA yang mengimplementasikan CUDA API, platform NVIDIA
yang mendukung komputasi GPU tujuan umum. Untuk informasi selengkapnya, lihat
panduan pengguna Layanan Multiproses NVIDIA.
Manfaat
Meningkatkan pemrosesan paralel dan throughput secara keseluruhan untuk pipeline GPU,
terutama untuk workload dengan penggunaan resource GPU yang rendah.
Meningkatkan penggunaan GPU, yang dapat mengurangi biaya Anda.
Dukungan dan batasan
MPS hanya didukung pada pekerja Dataflow yang menggunakan satu
GPU.
Pipeline tidak dapat menggunakan opsi pipeline yang membatasi paralelisme.
Hindari melebihi memori GPU yang tersedia, terutama untuk kasus penggunaan yang
melibatkan pemuatan model machine learning yang besar. Seimbangkan jumlah vCPU dan proses SDK dengan memori GPU yang tersedia yang diperlukan proses ini.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-18 UTC."],[[["\u003cp\u003eNVIDIA Multi-Process Service (MPS) improves GPU efficiency and utilization when running multiple SDK processes on a shared Dataflow GPU by enabling concurrent processing and resource sharing.\u003c/p\u003e\n"],["\u003cp\u003eEnabling MPS enhances parallel processing and throughput for GPU pipelines, particularly for workloads with low GPU resource usage, potentially reducing overall costs.\u003c/p\u003e\n"],["\u003cp\u003eMPS is supported on Dataflow workers with a single GPU and requires specific pipeline configurations, including appending \u003ccode\u003euse_nvidia_mps\u003c/code\u003e to the \u003ccode\u003eworker_accelerator\u003c/code\u003e parameter with a count of 1 and avoiding the \u003ccode\u003e--experiments=no_use_multiple_sdk_containers\u003c/code\u003e option.\u003c/p\u003e\n"],["\u003cp\u003eWhen using TensorFlow with MPS, you must enable dynamic memory allocation on the GPU and use logical devices with memory limits to optimize performance.\u003c/p\u003e\n"],["\u003cp\u003eMPS is not compatible with Dataflow Prime.\u003c/p\u003e\n"]]],[],null,["# Improve performance on a shared GPU by using NVIDIA MPS\n\nIf you run multiple SDK processes on a shared Dataflow GPU, you\ncan improve GPU efficiency and utilization by enabling the NVIDIA Multi-Process\nService (MPS). MPS supports concurrent processing on a GPU by enabling processes\nto share CUDA contexts and scheduling resources. MPS can reduce\ncontext-switching costs, increase parallelism, and reduce storage requirements.\n\nTarget workflows are Python pipelines that run on workers with more than one\nvCPU.\n\nMPS is an NVIDIA technology that implements the CUDA API, an NVIDIA platform\nthat supports general-purpose GPU computing. For more information, see the\n[NVIDIA Multi-Process Service user guide](https://docs.nvidia.com/deploy/mps/index.html).\n\nBenefits\n--------\n\n- Improves parallel processing and overall throughput for GPU pipelines, especially for workloads with low GPU resource usage.\n- Improves GPU utilization, which might reduce your costs.\n\nSupport and limitations\n-----------------------\n\n- MPS is supported only on Dataflow workers that use a single GPU.\n- The pipeline can't use pipeline options that restrict parallelism.\n- Avoid exceeding the available GPU memory, especially for use cases that involve loading large machine learning models. Balance the number of vCPUs and SDK processes with the available GPU memory that these processes need.\n- MPS doesn't affect the concurrency of non-GPU operations.\n- Dataflow Prime doesn't support MPS.\n\nEnable MPS\n----------\n\nWhen you [run a pipeline with GPUs](/dataflow/docs/gpu/use-gpus), enable MPS by\ndoing the following:\n\n- In the pipeline option `--dataflow_service_options`, append `use_nvidia_mps` to the `worker_accelerator` parameter.\n- Set the `count` to 1.\n- Don't use the pipeline option `--experiments=no_use_multiple_sdk_containers`.\n\nThe pipeline option `--dataflow_service_options` looks like the following: \n\n --dataflow_service_options=\"worker_accelerator=type:\u003cvar translate=\"no\"\u003eGPU_TYPE\u003c/var\u003e;count:1;install-nvidia-driver;use_nvidia_mps\"\n\nIf you use TensorFlow and enable MPS, do the following:\n\n1. [Enable dynamic memory allocation](https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth) on the GPU. Use either of the following TensorFlow options:\n - Turn on memory growth by calling `tf.config.experimental.set_memory_growth(gpu, True)`.\n - Set the environmental variable `TF_FORCE_GPU_ALLOW_GROWTH` to true.\n2. Use logical devices with appropriate memory limits.\n3. For optimal performance, enforce the use of the GPU when possible by using [soft device placement](https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement) or [manual placement](https://www.tensorflow.org/guide/gpu#manual_device_placement).\n\nWhat's next\n-----------\n\n- To review more best practices, see [GPUs and worker parallelism](/dataflow/docs/gpu/develop-with-gpus#parallelism)."]]