Menjalankan tugas Spark dengan DataprocFileOutputCommitter
Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Fitur DataprocFileOutputCommitter adalah versi
yang ditingkatkan dari FileOutputCommitter open source. Hal ini memungkinkan penulisan serentak oleh tugas Apache Spark ke lokasi output.
Batasan
Fitur DataprocFileOutputCommitter mendukung tugas Spark yang dijalankan di cluster Compute Engine Dataproc yang dibuat dengan versi image berikut:
Tetapkan spark.hadoop.mapreduce.outputcommitter.factory.class=org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory dan spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=false
sebagai properti tugas saat Anda mengirimkan tugas Spark
ke cluster.
Contoh Google Cloud CLI:
gcloud dataproc jobs submit spark \
--properties=spark.hadoop.mapreduce.outputcommitter.factory.class=org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory,spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=false \
--region=REGION \
other args ...
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-07-10 UTC."],[[["The DataprocFileOutputCommitter is an enhanced version of FileOutputCommitter, designed to enable concurrent writes by Apache Spark jobs to an output location."],["This feature is available for Dataproc Compute Engine clusters running image versions 2.1.10 and higher, or 2.0.62 and higher."],["To utilize DataprocFileOutputCommitter, set `spark.hadoop.mapreduce.outputcommitter.factory.class` to `org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory` and `spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs` to `false` when submitting a Spark job."],["When using the Dataproc file output committer, it is required that `spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs` is set to false in order to prevent conflicts with the created success marker files."]]],[]]