Memecahkan masalah tugas streaming yang tertinggal
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Untuk pipeline streaming, straggler ditentukan sebagai item kerja dengan
karakteristik berikut:
Hal ini mencegah
watermark
berkembang selama jangka waktu yang signifikan (dalam urutan menit).
Item ini diproses dalam waktu yang lama dibandingkan dengan item kerja lainnya dalam tahap yang sama.
Tugas yang tertinggal menahan watermark dan menambahkan latensi ke tugas. Jika jeda dapat diterima untuk kasus penggunaan Anda, Anda tidak perlu melakukan tindakan apa pun. Jika Anda ingin mengurangi latensi tugas, mulailah dengan mengatasi tugas yang tertinggal.
Melihat lambatnya streaming di konsol Google Cloud
Setelah memulai tugas Dataflow, Anda dapat menggunakan konsol Google Cloud
untuk melihat tugas yang tertunda.
Di halaman Detail tugas, klik tab Detail eksekusi.
Dalam daftar Graph view, pilih Stage progress. Grafik progres
menampilkan jumlah gabungan semua tugas yang tertunda yang terdeteksi dalam setiap tahap.
Untuk melihat detail tahap, tahan kursor ke batang untuk tahap tersebut. Panel
detail menyertakan link ke log pekerja. Mengklik link ini akan membuka Cloud Logging yang dicakup ke pekerja dan rentang waktu saat keterlambatan terdeteksi.
Melihat alur kerja tahap yang tertinggal
Untuk melihat keterlambatan menurut alur kerja tahap:
Di konsol Google Cloud , buka halaman Jobs Dataflow.
Di halaman detail tugas, klik tab Detail eksekusi.
Dalam daftar Graph view, pilih Stage workflow. Alur kerja tahap
menampilkan tahap eksekusi tugas, yang direpresentasikan sebagai grafik alur kerja.
Memecahkan masalah keterlambatan streaming
Jika terdeteksi operasi yang tertinggal, berarti ada operasi di pipeline Anda yang telah berjalan terlalu lama.
Untuk memecahkan masalah ini, periksa terlebih dahulu apakah
Insight Dataflow
menunjukkan adanya masalah.
Jika Anda masih tidak dapat menentukan penyebabnya, periksa log pekerja untuk tahap yang melaporkan keterlambatan. Untuk melihat log pekerja yang relevan, lihat
detail keterlambatan dalam progres tahap.
Kemudian, klik link untuk pekerja. Link ini akan membuka Cloud Logging, yang dicakup ke
pekerja dan rentang waktu saat keterlambatan terdeteksi. Cari masalah yang mungkin memperlambat tahap, seperti:
Bug dalam kode DoFn atau
macet DoFns. Cari stack trace di log, di dekat stempel waktu saat objek yang tertinggal terdeteksi.
Panggilan ke layanan eksternal yang memerlukan waktu lama untuk diselesaikan. Untuk mengurangi masalah ini, lakukan panggilan batch ke layanan eksternal dan tetapkan waktu tunggu pada RPC.
Batas kuota di sink. Jika pipeline Anda menghasilkan output ke layanan Google Cloud, Anda mungkin dapat menaikkan kuota. Untuk mengetahui informasi selengkapnya, lihat
Menangani kuota. Selain itu, lihat dokumentasi untuk layanan tertentu terkait strategi pengoptimalan, serta dokumentasi untuk Konektor I/O.
DoFns yang melakukan operasi baca atau tulis besar pada status persisten.
Pertimbangkan untuk memfaktorkan ulang kode Anda untuk melakukan pembacaan atau penulisan yang lebih kecil pada
status persisten.
Anda juga dapat menggunakan panel
Info samping
untuk menemukan langkah-langkah paling lambat dalam tahap. Salah satu langkah ini mungkin menyebabkan keterlambatan. Klik nama langkah untuk melihat log pekerja untuk langkah tersebut.
Setelah Anda menentukan penyebabnya, perbarui pipeline dengan kode baru dan pantau hasilnya.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-21 UTC."],[[["\u003cp\u003eStreaming pipeline stragglers are work items that significantly delay watermark advancement and process for a notably longer duration than other items in the same stage, leading to increased job latency.\u003c/p\u003e\n"],["\u003cp\u003eThe Google Cloud console allows viewing of detected streaming stragglers through the stage progress view or the stage workflow view after a Dataflow job has started.\u003c/p\u003e\n"],["\u003cp\u003eTroubleshooting streaming stragglers involves checking for issues with Dataflow insights, reviewing worker logs for the relevant stage, and investigating potential causes like bugs in \u003ccode\u003eDoFn\u003c/code\u003e code, slow external service calls, quota limits, or large read/write operations on persistent state.\u003c/p\u003e\n"],["\u003cp\u003eThe Side info panel in the console can help identify the slowest steps in a stage, potentially revealing the cause of a straggler, and these steps can be used to find the relevant worker logs for the issue.\u003c/p\u003e\n"],["\u003cp\u003eAfter identifying the root cause of a straggler, you should update your pipeline code to resolve the issue, and then monitor the job's performance for improvement.\u003c/p\u003e\n"]]],[],null,["# Troubleshoot stragglers in streaming jobs\n\nFor streaming pipelines, a *straggler* is defined as a work item with the\nfollowing characteristics:\n\n- It prevents the [watermark](/dataflow/docs/concepts/beam-programming-model#advanced_concepts) from advancing for a significant length of time (on the order of minutes).\n- It processes for a long time relative to other work items in the same stage.\n\nStragglers hold back the watermark and add latency to the job. If the lag is\nacceptable for your use case, then you don't need to take any action. If you\nwant to reduce a job's latency, start by addressing any stragglers.\n| **Note:** For information about troubleshooting stragglers in batch jobs, see [Troubleshoot stragglers in batch\n| jobs](/dataflow/docs/guides/troubleshoot-batch-stragglers).\n\nView streaming stragglers in the Google Cloud console\n-----------------------------------------------------\n\nAfter you start a Dataflow job, you can use the Google Cloud console\nto view any detected stragglers.\n\nYou can view streaming stragglers in the [stage progress\nview](/dataflow/docs/concepts/execution-details#stage_progress_for_streaming_jobs)\nor the [stage workflow\nview](/dataflow/docs/concepts/execution-details#stage_workflow).\n\n### View stragglers by stage progress\n\nTo view stragglers by stage progress:\n\n1. In the Google Cloud console, go to the Dataflow **Jobs**\n page.\n\n [Go to Jobs](https://console.cloud.google.com/dataflow/jobs)\n2. Click the name of the job.\n\n3. In the **Job details** page, click the **Execution details** tab.\n\n4. In the **Graph view** list, select **Stage progress**. The progress graph\n shows aggregated counts of all stragglers detected within each stage.\n\n5. To see details for a stage, hold the pointer over the bar for the stage. The\n details pane includes a link to the worker logs. Clicking this link opens\n Cloud Logging scoped to the worker and the time range when the straggler\n was detected.\n\n### View stragglers by stage workflow\n\nTo view stragglers by stage workflow:\n\n1. In the Google Cloud console, go to the Dataflow **Jobs**\n page.\n\n Go to [Jobs](https://console.cloud.google.com/dataflow/jobs)\n2. Click the name of the job.\n\n3. In the job details page, click the **Execution details** tab.\n\n4. In the **Graph view** list, select **Stage workflow**. The stage workflow\n shows the execution stages of the job, represented as a workflow graph.\n\nTroubleshoot streaming stragglers\n---------------------------------\n\nIf a straggler is detected, it means that an operation in your pipeline has\nbeen running for an unusually long time.\n\nTo troubleshoot the issue, first check whether\n[Dataflow insights](/dataflow/docs/guides/using-dataflow-insights)\npinpoints any issues.\n\nIf you still can't determine the cause, check the worker logs for the stage that\nreported the straggler. To see the relevant worker logs, view the\n[straggler details](#view_stragglers_by_stage_progress) in the stage progress.\nThen click the link for the worker. This link opens Cloud Logging, scoped to\nthe worker and the time range when the straggler was detected. Look for problems\nthat might be slowing down the stage, such as:\n\n- Bugs in `DoFn` code or [stuck `DoFns`](/dataflow/docs/guides/common-errors#processing-stuck). Look for stack traces in the logs, near the timestamp when the straggler was detected.\n- Calls to external services that take a long time to complete. To mitigate this issue, [batch calls to external services](/dataflow/docs/tutorials/ecommerce-java#micro-batch-calls) and set timeouts on RPCs.\n- Quota limits in sinks. If your pipeline outputs to a Google Cloud service, you might be able to raise the quota. For more information, see the [Cloud Quotas documentation](/docs/quotas/overview). Also, consult the documentation for the particular service for optimization strategies, as well as the documentation for the [I/O Connector](https://beam.apache.org/documentation/io/connectors/).\n- `DoFns` that perform large read or write operations on persistent state. Consider refactoring your code to perform smaller reads or writes on persistent state.\n\nYou can also use the\n[**Side info**](/dataflow/docs/concepts/execution-details#stage-info)\npanel to find the slowest steps in the stage. One of these steps might be\ncausing the straggler. Click on the step name to view the worker logs for that\nstep.\n\nAfter you determine the cause,\n[update your pipeline](/dataflow/docs/guides/updating-a-pipeline) with new\ncode and monitor the result.\n\nWhat's next\n-----------\n\n- Learn to use the [Dataflow monitoring interface](/dataflow/docs/guides/using-monitoring-intf).\n- Understand the [**Execution details**](/dataflow/docs/concepts/execution-details) tab in the monitoring interface."]]