Peristiwa pemeliharaan host biasanya terjadi dua minggu sekali, tetapi terkadang bisa berjalan lebih sering.
Dokumen ini membahas cara meminimalkan gangguan pada workload selama peristiwa pemeliharaan.
Menerima pemberitahuan awal sebelum peristiwa pemeliharaan
Anda dapat
memantau jadwal pemeliharaan untuk instance virtual machine (VM) dan
menyiapkan workload untuk bertransisi melalui mulai ulang sistem.
Untuk menerima pemberitahuan awal tentang peristiwa host, pantau nilai metadata /computeMetadata/v1/instance/maintenance-event.
Jika permintaan ke server metadata menampilkan NONE, VM tidak
dijadwalkan untuk berhenti. Misalnya, jalankan perintah berikut dari dalam VM:
Jika server metadata menampilkan TERMINATE_ON_HOST_MAINTENANCE, VM Anda akan dijadwalkan untuk berhenti. Compute Engine memberi VM GPU pemberitahuan berhenti 1 jam, sedangkan VM normal hanya menerima pemberitahuan 60 detik. Konfigurasi aplikasi Anda untuk bertransisi melalui
peristiwa pemeliharaan. Misalnya, Anda dapat menggunakan salah satu teknik berikut:
Konfigurasikan aplikasi Anda untuk memindahkan sementara pekerjaan yang sedang berlangsung ke bucket Cloud Storage, lalu ambil data tersebut setelah VM dimulai ulang.
Menulis data ke
Persistent Disk sekunder.
Saat VM dimulai ulang secara otomatis, Persistent Disk dapat dipasang kembali dan aplikasi Anda dapat melanjutkan pekerjaan.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-18 UTC."],[[["\u003cp\u003eVMs with attached GPUs must be stopped during Compute Engine maintenance events because they cannot be live migrated.\u003c/p\u003e\n"],["\u003cp\u003eYou must configure these GPU-attached VMs to stop for host maintenance events, with the option to automatically restart afterward.\u003c/p\u003e\n"],["\u003cp\u003eData on Local SSD disks attached to GPU VMs is unrecoverable if the VM is restarted during a host maintenance event.\u003c/p\u003e\n"],["\u003cp\u003eYou can monitor the \u003ccode\u003e/computeMetadata/v1/instance/maintenance-event\u003c/code\u003e metadata value to receive advance notice of host maintenance events, with GPU VMs receiving a 1-hour notice to prepare for shutdown.\u003c/p\u003e\n"],["\u003cp\u003eTo minimize disruptions, you can temporarily move in-progress work to Cloud Storage or write data to a secondary Persistent Disk, ensuring it is retrievable after the VM restarts.\u003c/p\u003e\n"]]],[],null,["# Handle GPU host maintenance events\n\n*** ** * ** ***\n\nWhen Compute Engine performs [maintenance](/compute/docs/instances/host-maintenance-overview#maintenanceevents) on a virtual machine (VM) with\n[attached graphics processing units (GPUs)](/compute/docs/gpus/about-gpus),\nthe VM must be stopped. This is because VMs with attached GPUs\ncan't be\n[live migrated](/compute/docs/instances/live-migration-process#limitations).\n\nYou must set these VMs to\n[stop for host maintenance events](/compute/docs/instances/host-maintenance-overview#terminate_and_optionally_restart).\nYou can set your stopped VMs to\n[automatically restart](/compute/docs/instances/host-maintenance-overview#autorestart)\nafter the maintenance event completes.\n| **Warning:** For VMs with GPUs, data on any Local SSD disks attached to the VM is unrecoverable if Compute Engine restarts the VM for [host maintenance events](/compute/docs/gpus/gpu-host-maintenance).\n\nHost maintenance events typically occur once every two weeks, but might occasionally run more frequently.\n\nThis document discusses how you can minimize disruptions to your workloads during a maintenance event.\n| **Note:** VMs with attached GPUs can take up to one hour to terminate after failures or [host errors](/compute/docs/faq#hosterror).\n\nReceive advance notice before maintenance events\n------------------------------------------------\n\nYou can\nmonitor the maintenance schedule for your virtual machine (VM) instance, and\nprepare your workloads to transition through the system restart.\n\nTo receive advance notice of host events, monitor the\n`/computeMetadata/v1/instance/maintenance-event` metadata value.\nIf the request to the metadata server returns `NONE`, then the VM isn't\nscheduled to stop. For example, run the following command from within a VM: \n\n```\ncurl http://metadata.google.internal/computeMetadata/v1/instance/maintenance-event -H \"Metadata-Flavor: Google\"\n\nNONE\n```\n\nIf the metadata server returns `TERMINATE_ON_HOST_MAINTENANCE`, then your\nVM is scheduled for stopping. Compute Engine gives GPU\nVMs a 1-hour stopping notice, while normal VMs receive only\na 60-second notice. Configure your application to transition through the\nmaintenance event. For example, you might use one of the following techniques:\n\n- Configure your application to temporarily move work in progress to a\n [Cloud Storage bucket](/storage/docs/uploading-objects), then retrieve\n that data after the VM restarts.\n\n- Write data to a\n [secondary Persistent Disk](/compute/docs/disks/add-persistent-disk).\n When the VM automatically restarts, the Persistent Disk can be\n reattached and your application can resume work.\n\nWhat's next?\n------------\n\n- Learn more about [GPU platforms](/compute/docs/gpus).\n- To learn more about managing and scaling groups of VMs, see [Set the group's target size](/compute/docs/instance-groups/add-remove-vms-in-mig#set_the_groups_target_size).\n- To monitor GPU performance, see [Monitoring GPU performance](/compute/docs/gpus/monitor-gpus).\n- To improve network performance, see [Use higher network bandwidth](/compute/docs/gpus/optimize-gpus).\n- Learn how to [troubleshoot VM shutdowns and reboots](/compute/docs/troubleshooting/troubleshooting-reboots)."]]