Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Pemrosesan data di Dataflow dapat dilakukan secara paralel dan ekstensif. Sebagian besar paralelisme ini ditangani secara otomatis oleh Dataflow. Konektor I/O
berada di batas antara pipeline dan bagian lain arsitektur Anda,
seperti penyimpanan file, database, dan sistem pesan. Dengan demikian, konektor I/O
sering kali memiliki pertimbangan khusus untuk mencapai paralelisme.
Praktik terbaik umum
Daftar berikut menjelaskan praktik terbaik umum untuk menggunakan konektor I/O di
Dataflow.
Baca dokumentasi Javadoc, Pydoc, atau Go untuk konektor di
pipeline Anda. Untuk mengetahui informasi selengkapnya, lihat
konektor I/O
dalam dokumentasi Apache Beam.
Gunakan Apache Beam SDK versi terbaru. Konektor I/O
terus ditingkatkan, menambahkan fitur, dan memperbaiki masalah umum.
Saat mengembangkan pipeline, penting untuk menyeimbangkan paralelisme
tugas. Jika paralelisme tugas terlalu sedikit, tugas tersebut dapat berjalan lambat, dan data dapat menumpuk
di sumber. Namun, terlalu banyak paralelisme dapat membebani sink dengan terlalu banyak permintaan.
Jangan mengandalkan pengurutan elemen. Secara umum, Dataflow
tidak menjamin urutan elemen dalam koleksi.
Jika konektor I/O tidak tersedia di SDK pilihan Anda, pertimbangkan untuk menggunakan
framework lintas bahasa
untuk menggunakan konektor I/O dari SDK lain. Selain itu, konektor tidak selalu
memiliki paritas fitur antar-SDK. Jika konektor dari SDK lain menyediakan
fitur yang Anda perlukan, Anda dapat menggunakannya sebagai transformasi lintas bahasa.
Secara umum, menulis konektor I/O kustom merupakan hal yang sulit. Gunakan konektor yang ada jika memungkinkan. Jika Anda perlu menerapkan konektor I/O kustom, baca Mengembangkan konektor I/O baru.
Saat melakukan operasi tulis dari Dataflow ke konektor, sebaiknya gunakan
ErrorHandler
untuk menangani operasi tulis yang gagal atau pembacaan yang salah format. Jenis penanganan error ini
didukung untuk I/O Java berikut di Apache Beam versi 2.55.0 dan yang lebih baru: BigQueryIO,
BigtableIO, PubSubIO, KafkaIO, FileIO, TextIO, dan AvroIO.
Praktik terbaik untuk setiap konektor I/O
Topik berikut mencantumkan praktik terbaik untuk setiap konektor I/O:
Tabel berikut mencantumkan konektor I/O Apache Beam yang didukung oleh Dataflow. Untuk mengetahui daftar lengkap konektor I/O Apache Beam,
termasuk yang dikembangkan oleh komunitas Apache Beam dan didukung
oleh runner lain, lihat
konektor I/O
dalam dokumentasi Apache Beam.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-18 UTC."],[[["\u003cp\u003eDataflow handles parallelism automatically, but I/O connectors require specific considerations for optimal parallel performance when interacting with external systems.\u003c/p\u003e\n"],["\u003cp\u003eUsing the latest version of the Apache Beam SDK is advised to leverage ongoing improvements in features and fixes for I/O connectors.\u003c/p\u003e\n"],["\u003cp\u003eBalancing job parallelism is crucial; too little can cause delays and data buildup, while too much can overwhelm data sinks with excess requests.\u003c/p\u003e\n"],["\u003cp\u003eDataflow does not guarantee the order of elements in a collection, so ordering should not be relied upon.\u003c/p\u003e\n"],["\u003cp\u003eAn ErrorHandler can be utilized when writing data to a connector to manage failed writes or malformed reads, and it is supported for several Java I/Os in Apache Beam versions 2.55.0 and later.\u003c/p\u003e\n"]]],[],null,["# Apache Beam I/O connector best practices\n\nData processing in Dataflow can be highly parallelized. Much of this\nparallelism is handled automatically by Dataflow. I/O connectors\nsit at the boundary between your pipeline and other parts of your architecture,\nsuch as file storage, databases, and messaging systems. As such, I/O connectors\noften have specific considerations for achieving parallelism.\n\nGeneral best practices\n----------------------\n\nThe following list describes general best practices for using I/O connectors in\nDataflow.\n\n- Read the Javadoc, Pydoc, or Go documentation for the connectors in your\n pipeline. For more information, see\n [I/O connectors](https://beam.apache.org/documentation/io/connectors/)\n in the Apache Beam documentation.\n\n- Use the latest version of the Apache Beam SDK. I/O connectors are\n continually being improved, adding features and fixing known issues.\n\n- When developing a pipeline, it's important to balance the parallelism of the\n job. If a job has too little parallelism, it can be slow, and data can build\n up in the source. However, too much parallelism can overwhelm a sink with too\n many requests.\n\n- Don't rely on the ordering of elements. In general, Dataflow\n does not guarantee the order of elements in a collection.\n\n- If an I/O connector isn't available in your SDK of choice, consider using the\n [cross-language framework](https://beam.apache.org/documentation/programming-guide/#use-x-lang-transforms)\n to use an I/O connector from another SDK. In addition, connectors don't always\n have feature parity between SDKs. If a connector from another SDK provides a\n feature that you need, you can use it as a cross-language transform.\n\n- In general, writing custom I/O connectors is challenging. Use an existing\n connector whenever possible. If you need to implement a custom I/O connector,\n read\n [Developing a new I/O connector](https://beam.apache.org/documentation/io/developing-io-overview/).\n\n- If a pipeline fails, check for errors logged by I/O connectors. See\n [Troubleshoot Dataflow errors](/dataflow/docs/guides/common-errors).\n\n- When performing writes from Dataflow to a connector, consider using\n an [ErrorHandler](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/errorhandling/ErrorHandler.html)\n to handle any failed writes or malformed reads. This type of error handling is\n supported for the following Java I/Os in Apache Beam versions 2.55.0 and later: BigQueryIO,\n BigtableIO, PubSubIO, KafkaIO, FileIO, TextIO, and AvroIO.\n\nBest practices for individual I/O connectors\n--------------------------------------------\n\nThe following topics list best practices for individual I/O connectors:\n\nGoogle-supported I/O connectors\n-------------------------------\n\nThe following table lists the Apache Beam I/O connectors supported by\nDataflow. For a full list of Apache Beam I/O connectors,\nincluding those developed by the Apache Beam community and supported\nby other runners, see\n[I/O connectors](https://beam.apache.org/documentation/io/connectors/)\nin the Apache Beam documentation.\n\nWhat's next\n-----------\n\n- Read the Apache Beam documentation for [I/O connectors](https://beam.apache.org/documentation/io/connectors/)."]]