Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Memecahkan masalah PyTorch - TPU
Panduan ini memberikan informasi pemecahan masalah untuk
membantu Anda mengidentifikasi dan menyelesaikan masalah yang mungkin Anda alami saat melatih
model PyTorch di Cloud TPU. Untuk panduan yang lebih umum tentang cara memulai Cloud TPU, lihat panduan memulai PyTorch.
Untuk menganalisis laporan metrik secara otomatis dan memberikan ringkasan, jalankan beban kerja Anda dengan PT_XLA_DEBUG=1.
Untuk mengetahui informasi selengkapnya tentang masalah yang mungkin menyebabkan model Anda dilatih dengan lambat,
lihat Peringatan performa yang diketahui.
Pembuatan profil performa
Untuk membuat profil workload secara mendalam guna menemukan bottleneck, tinjau referensi berikut:
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-18 UTC."],[],[],null,["# Troubleshooting PyTorch - TPU\n=============================\n\nThis guide provides troubleshooting information to\nhelp you identify and resolve problems you might encounter while training\nPyTorch models on Cloud TPU. For a more general guide to\ngetting started with Cloud TPU, see the\n[PyTorch quickstart](/tpu/docs/run-calculation-pytorch).\n| **Note:** If you aren't able to resolve your issue using this guide, see [Getting Support](/tpu/docs/getting-support) for further assistance.\n\nTroubleshooting slow training performance\n-----------------------------------------\n\nIf your model trains slowly, [generate and review a metrics report.](https://pytorch.org/xla/release/r2.6/learn/troubleshoot.html#get-a-metrics-report)\n\nTo automatically analyze the metrics report and provide a summary, run\nyour workload with PT_XLA_DEBUG=1.\n\nFor more information about issues that might cause your model to train slowly,\nsee [Known performance caveats](https://pytorch.org/xla/release/r2.6/learn/troubleshoot.html#known-performance-caveats).\n\nPerformance profiling\n---------------------\n\nTo profile your workload in-depth to discover bottlenecks, review these resources:\n\n- [PyTorch/XLA performance profiling](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm)\n- [Sample MNIST training script with profiling](https://github.com/pytorch/xla/blob/master/test/test_profile_mp_mnist.py)\n\nMore debugging tools\n--------------------\n\nYou can specify [environment variables](https://pytorch.org/xla/release/r2.6/learn/troubleshoot.html#environment-variables)\nto control the behavior of the PyTorch/XLA software stack.\n\nIf you encounter an unexpected bug and need help, [file a GitHub issue](https://github.com/pytorch/xla).\n\nManaging XLA tensors\n--------------------\n\n[XLA tensor Quirks](https://pytorch.org/xla/release/r2.6/learn/troubleshoot.html#xla-tensor-quirks)\ndescribes what you should and shouldn't do when working with XLA tensors and\nshared weights."]]