Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Men-debug VM Cloud TPU
Dokumen ini menjelaskan cara menggunakan paket PyPI cloud-tpu-diagnostics untuk membuat rekaman aktivitas stack untuk proses yang berjalan di VM TPU. Paket
ini membuang pelacakan Python saat terjadi error, misalnya error
segmentasi, pengecualian floating point, atau pengecualian operasi ilegal.
Selain itu, alat ini juga mengumpulkan pelacakan tumpukan secara berkala untuk membantu Anda men-debug situasi saat program tidak responsif.
Untuk menggunakan paket PyPI cloud-tpu-diagnostics, Anda harus menginstalnya dengan menjalankan pip install cloud-tpu-diagnostics
di semua VM TPU. Anda dapat melakukannya dengan satu perintah
gcloud compute tpus tpu-vm ssh. Contoh:
Secara default, pelacakan tumpukan dikumpulkan setiap 10 menit. Anda dapat mengubah durasi antara dua peristiwa pengumpulan pelacakan tumpukan menjadi 5 menit, misalnya:
Konfigurasi ini mulai mengumpulkan pelacakan tumpukan di dalam direktori /tmp/debugging
di setiap VM TPU. Ada agen yang berjalan di semua VM TPU yang mengupload
rekaman aktivitas dari direktori sementara ke Cloud Logging.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-18 UTC."],[],[],null,["# Debugging Cloud TPU VMs\n=======================\n\nThis document describes how to use the [cloud-tpu-diagnostics](https://pypi.org/project/cloud-tpu-diagnostics/)\nPyPI package to generate stack traces for processes running in TPU VMs. This\npackage dumps the Python traces when a fault occurs, for example segmentation\nfaults, floating-point exceptions, or illegal operation exceptions.\nAdditionally, it also periodically collects stack traces to help you debug\nsituations when the program is unresponsive.\n\n\nTo use the [cloud-tpu-diagnostics](https://pypi.org/project/cloud-tpu-diagnostics/)\nPyPI package, you must install it by running `pip install cloud-tpu-diagnostics`\non all TPU VMs. You can do this with one `gcloud compute tpus tpu-vm ssh`\ncommand. For example: \n\n```bash\n gcloud compute tpus tpu-vm ssh you-tpu-name \\\n --zone=your-zone \\\n --project=your-project-name \\\n --worker=all \\\n --command=\"pip install cloud-tpu-diagnostics\"\n```\n\nYou must also add the following code to your scripts running on all TPU VMs. \n\n from cloud_tpu_diagnostics import diagnostic\n from cloud_tpu_diagnostics.configuration import debug_configuration\n from cloud_tpu_diagnostics.configuration import diagnostic_configuration\n from cloud_tpu_diagnostics.configuration import stack_trace_configuration\n\n stack_trace_config = stack_trace_configuration.StackTraceConfig(\n collect_stack_trace = True,\n stack_trace_to_cloud = True)\n debug_config = debug_configuration.DebugConfig(\n stack_trace_config = stack_trace_config)\n diagnostic_config = diagnostic_configuration.DiagnosticConfig(\n debug_config = debug_config)\n\nBy default, stack traces are collected every 10 minutes. You can change\nthe duration between two stack trace collection events to 5 minutes, for example: \n\n stack_trace_config = stack_trace_configuration.StackTraceConfig(\n collect_stack_trace = True,\n stack_trace_to_cloud = True,\n stack_trace_interval_seconds = 300)\n\nWrap your main method with `diagnose()` to periodically collect the stack traces: \n\n with diagnostic.diagnose(diagnostic_config):\n run_main()\n\nThis configuration starts collecting stack traces inside the `/tmp/debugging`\ndirectory on each TPU VM. There is an agent running on all TPU VMs that uploads\nthe traces from a temporary directory to Cloud Logging."]]