Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Dokumen ini menjelaskan perilaku, penagihan, dan batasan unit pemantauan performa (PMU) di Compute Engine. Untuk mempelajari cara mengaktifkan PMU di
instance virtual machine (VM) C4A atau C4, lihat
Mengaktifkan PMU di VM.
PMU adalah komponen hardware dalam core CPU yang memantau cara
prosesor menjalankan kode. Dengan mengaktifkan PMU di VM C4A atau C4, Anda dapat mengakses penghitung performa di PMU menggunakan software pemantauan performa. Pendekatan
ini memungkinkan Anda mengoptimalkan workload yang sensitif terhadap performa, seperti workload
komputasi berperforma tinggi (HPC) atau machine learning (ML), dengan membantu Anda
mengidentifikasi dan mengatasi bottleneck performa di aplikasi.
Cara kerja PMU
PMU terdiri dari serangkaian penghitung hardware yang disebut penghitung pemantauan
performa (PMC). Penghitung ini adalah
register khusus model
yang menghitung setiap kali peristiwa prosesor tingkat rendah, seperti kesalahan prediksi cabang
atau cache yang tidak ditemukan, terjadi dalam CPU. Anda dapat membaca dan mengonfigurasi PMC di PMU
menggunakan software pemantauan performa seperti
Intel VTune Profiler.
Secara default, PMU dinonaktifkan dalam VM. Untuk mengaktifkannya, tentukan jenis
peristiwa CPU tingkat rendah yang akan dilacak dengan mengaktifkan salah satu jenis PMU berikut:
Arsitektur (ARCHITECTURAL): Anda dapat mengukur peristiwa performa arsitektur berikut:
Petunjuk cabang dihentikan: Jumlah petunjuk cabang
yang dihentikan. Gunakan peristiwa ini untuk mengukur eksekusi kode dan mengidentifikasi
potensi bottleneck performa.
Branch misses retired: Jumlah petunjuk cabang yang
salah diprediksi, menyebabkan prosesor terhenti dan menghapus petunjuk
yang diambil. Jika Anda melihat angka yang tinggi untuk peristiwa ini, Anda mungkin dapat
mengoptimalkan performa CPU.
Instruksi yang dihentikan: Jumlah instruksi yang berhasil diproses
CPU. Gunakan peristiwa ini untuk mengukur throughput
perintah CPU.
Slot top down: Jumlah slot yang tersedia dalam pipeline
prosesor yang digunakan untuk menjalankan petunjuk secara bersamaan. Gunakan peristiwa
ini untuk memahami seberapa efisien kode Anda menggunakan resource
prosesor.
Siklus core yang tidak dihentikan: Jumlah siklus core saat thread
tidak dihentikan—misalnya, karena pengelolaan daya atau interupsi. Gunakan
peristiwa ini untuk mengevaluasi penggunaan pemroses secara keseluruhan.
Siklus referensi yang tidak dihentikan: Jumlah siklus referensi saat
inti tidak dihentikan—misalnya, saat mengambil data atau
petunjuk. Core dihentikan saat menjalankan
petunjuk HLT atau MWAIT.
Siklus referensi beroperasi pada frekuensi tetap, yang memberikan referensi waktu
yang stabil meskipun kecepatan prosesor berubah untuk menghemat
energi. Gunakan peristiwa ini untuk mengukur waktu yang dihabiskan untuk tugas dan mengidentifikasi
bottleneck performa dalam kode Anda.
Standar (STANDARD): Anda dapat mengukur semua peristiwa dari jenis Architectural
PMU dan peristiwa lokal apa pun di dalam core CPU, termasuk peristiwa cache
level 2 (L2).
Tingkat lanjut (ENHANCED): Anda dapat mengukur semua peristiwa dari jenis PMU Standar, peristiwa lokal apa pun di luar core CPU, termasuk peristiwa cache level 3 (L3).
Setelah Anda mengaktifkan PMU di VM, PMU akan berjalan di latar belakang, terus
memantau peristiwa performa menggunakan PMC. Secara opsional, Anda dapat mengonfigurasi
nilai minimum untuk PMC tertentu menggunakan software monitoring performa
pilihan Anda. Jika PMC melebihi nilai minimum yang ditetapkan, PMU akan memberi tahu
software.
Batasan
PMU memiliki batasan berikut:
Anda hanya dapat mengaktifkan PMU di platform CPU berikut:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-19 UTC."],[[["\u003cp\u003eThe Performance Monitoring Unit (PMU) is a hardware component in the CPU core that monitors processor code execution and can be enabled in C4A or C4 virtual machines (VMs) to access performance counters.\u003c/p\u003e\n"],["\u003cp\u003eEnabling the PMU allows users to measure low-level processor events, such as branch mispredictions and cache misses, using performance-monitoring software to help optimize workloads like HPC and ML.\u003c/p\u003e\n"],["\u003cp\u003eThe PMU supports three types of performance monitoring events: Architectural, which measures general events like branch instructions and core cycles; Standard, which also measures local events within the CPU core; and Enhanced, which additionally includes local events outside the CPU core.\u003c/p\u003e\n"],["\u003cp\u003eThe PMU can only be enabled on Google Axion C4A VMs or Intel Xeon Scalable Processor (Emerald Rapids) 5th generation C4 VMs, with the Enhanced PMU type limited to C4 machines with 96 or 192 vCPUs.\u003c/p\u003e\n"],["\u003cp\u003eThere are no additional costs associated with enabling or disabling the PMU in a VM.\u003c/p\u003e\n"]]],[],null,["# PMU overview\n\n*** ** * ** ***\n\nThis document explains the behavior, billing, and limitations of the performance\nmonitoring unit (PMU) in Compute Engine. To learn how to enable the PMU in a\nC4A or C4 virtual machine (VM) instance, see\n[Enable the PMU in VMs](/compute/docs/enable-pmu-in-vms).\n\nThe PMU is a hardware component within the CPU core that monitors how the\nprocessor runs code. By enabling the PMU in a C4A or C4 VM, you can access the\nperformance counters in the PMU using performance-monitoring software. This\napproach lets you optimize performance-sensitive workloads, such as high\nperformance computing (HPC) or machine learning (ML) workloads, by helping you\nidentify and address performance bottlenecks in your applications.\n\nHow the PMU works\n-----------------\n\nThe PMU is composed of a set of hardware counters called performance monitoring\ncounters (PMCs). These counters are\n[model-specific registers](https://en.wikipedia.org/wiki/Model-specific_register)\nthat count each time a low-level processor event, such as a branch misprediction\nor cache miss, occurs within the CPU. You can read and configure PMCs in the PMU\nby using performance-monitoring software such as\n[Intel VTune Profiler](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html).\n\nBy default, the PMU is disabled within VMs. To enable it, specify the types of\nlow-level CPU events to track by enabling one of the following PMU types:\n\n- **Architectural (`ARCHITECTURAL`)**: You can measure the following\n architectural performance events:\n\n - **Branch instructions retired**: The number of branch instructions\n retired. Use this event to measure your code's execution and identify\n potential performance bottlenecks.\n\n - **Branch misses retired**: The number of branch instructions that were\n mispredicted, causing the processor to stall and discard fetched\n instructions. If you see a high number for this event, then you can\n likely optimize the CPU performance.\n\n - **Instructions retired**: The number of instructions the CPU\n successfully processes. Use this event to measure the CPU's instruction\n throughput.\n\n - **Top down slots**: The number of available slots within a processor's\n pipeline that are used to simultaneously execute instructions. Use this\n event to understand how efficiently your code is using the processor's\n resources.\n\n - **Unhalted core cycles**: The number of core cycles when the thread is\n not halted---for example, due to power management or interrupts. Use\n this event to evaluate the overall usage of the processor.\n\n - **Unhalted reference cycles** : The number of reference cycles when the\n core is not halted---for example, when fetching data or\n instructions. The core is halted when it runs the\n [`HLT` or `MWAIT` instructions](https://en.wikipedia.org/wiki/HLT_(x86_instruction)).\n Reference cycles operate at a fixed frequency, providing a stable time\n reference even when the speed of the processor changes to preserve\n energy. Use this event to measure the time spent on a task and identify\n performance bottlenecks in your code.\n\n- **Standard (`STANDARD`)**: You can measure all events from the Architectural\n PMU type and any local events inside the CPU core, including level 2 (L2)\n cache events.\n\n- **Enhanced (`ENHANCED`)**: You can measure all events from the Standard PMU\n type, any local events outside the CPU core, including level 3 (L3) cache\n events.\n\nAfter you enable the PMU in a VM, the PMU runs in the background, continuously\nmonitoring performance events using PMCs. You can optionally configure\nthresholds for specific PMCs using your preferred performance-monitoring\nsoftware. If a PMC exceeds its designated threshold, then the PMU notifies the\nsoftware.\n\nLimitations\n-----------\n\nThe PMU has the following limitations:\n\n- You can only enable the PMU in the following CPU platforms:\n\n- You can enable the Enhanced PMU type only in VMs that use a C4 machine type\n with 96 or 192 vCPUs.\n\nPricing\n-------\n\nThere are no costs associated with enabling or disabling the PMU in a VM.\n\nWhat's next\n-----------\n\n- [Enable the PMU in VMs](/compute/docs/enable-pmu-in-vms)\n\n- [Enable the PMU in Google Kubernetes Engine clusters](/kubernetes-engine/docs/how-to/analyzing-cpu-performance-using-pmu)"]]