Tetap teratur dengan koleksi
Simpan dan kategorikan konten berdasarkan preferensi Anda.
Halaman ini menjelaskan orkestrasi pipeline dengan Cloud Composer dan
pemicu. Cloud Data Fusion merekomendasikan penggunaan Cloud Composer untuk
melakukan orkestrasi pipeline. Jika Anda memerlukan cara yang lebih sederhana untuk mengelola orkestrasi, gunakan
pemicu.
Composer
Mengorkestrasi pipeline dengan Cloud Composer
Mengorkestrasi eksekusi pipeline di Cloud Data Fusion dengan
Cloud Composer memberikan manfaat berikut:
Pengelolaan alur kerja terpusat: mengelola eksekusi beberapa pipeline Cloud Data Fusion secara seragam.
Pengelolaan dependensi: untuk memastikan urutan eksekusi yang tepat, tentukan
dependensi antar-pipeline.
Pemantauan dan pemberitahuan: Cloud Composer menyediakan
kemampuan pemantauan dan pemberitahuan untuk kegagalan.
Integrasi dengan layanan lain: Cloud Composer memungkinkan Anda
melakukan orkestrasi alur kerja yang mencakup Cloud Data Fusion dan layanan
Google Cloud lainnya.
Untuk mengorkestrasi pipeline Cloud Data Fusion menggunakan
Cloud Composer, ikuti proses ini:
Siapkan lingkungan Cloud Composer.
Buat lingkungan Cloud Composer. Jika Anda tidak memilikinya, sediakan lingkungan di project Google Cloud .
Lingkungan ini adalah ruang kerja orkestrasi Anda.
Berikan izin. Pastikan akun layanan Cloud Composer memiliki izin yang diperlukan untuk mengakses Cloud Data Fusion (seperti izin untuk memulai, menghentikan, dan mencantumkan pipeline).
Menentukan Directed Acyclic Graph (DAG) untuk orkestrasi.
Buat DAG: Di Cloud Composer, buat DAG yang menentukan alur kerja orkestrasi untuk pipeline Cloud Data Fusion Anda.
Operator Cloud Data Fusion: Gunakan Operator Cloud Data Fusion Cloud Composer dalam DAG Anda. Operator ini
memungkinkan Anda berinteraksi secara terprogram dengan Cloud Data Fusion.
Operator Cloud Data Fusion
Orkestrasi pipeline Cloud Data Fusion memiliki operator berikut:
CloudDataFusionStartPipelineOperator
Memicu eksekusi pipeline Cloud Data Fusion berdasarkan ID-nya. Laporan ini
memiliki parameter berikut:
ID Pipeline
Lokasi (wilayahGoogle Cloud )
Namespace pipeline
Argumen runtime (opsional)
Menunggu penyelesaian (opsional)
Waktu tunggu (opsional)
CloudDataFusionStopPipelineOperator
Memungkinkan Anda menghentikan pipeline Cloud Data Fusion yang sedang berjalan.
CloudDataFusionDeletePipelineOperator
Menghapus pipeline Cloud Data Fusion.
Mem-build alur kerja DAG
Saat Anda mem-build alur kerja DAG, pertimbangkan hal-hal berikut:
Menentukan dependensi: Gunakan struktur DAG untuk menentukan dependensi
antara tugas. Misalnya, Anda mungkin memiliki tugas yang menunggu
pipeline di satu namespace berhasil diselesaikan sebelum memicu
pipeline lain di namespace yang berbeda.
Penjadwalan: Menjadwalkan DAG untuk berjalan pada interval tertentu, seperti harian atau per jam, atau menyetelnya untuk dipicu secara manual.
Pemicu Cloud Data Fusion memungkinkan Anda menjalankan pipeline downstream secara otomatis setelah menyelesaikan (berhasil, gagal, atau kondisi yang ditentukan) satu atau beberapa pipeline upstream.
Pemicu berguna untuk tugas berikut:
Membersihkan data Anda sekali, lalu menyediakannya ke beberapa
pipeline downstream untuk digunakan.
Berbagi informasi, seperti argumen runtime dan konfigurasi
plugin, di antara pipeline. Tugas ini disebut konfigurasi
payload.
Memiliki kumpulan pipeline dinamis yang berjalan menggunakan data dari jam,
hari, minggu, atau bulan, bukan pipeline statis yang harus diperbarui
untuk setiap operasi.
Misalnya, Anda memiliki set data yang berisi semua informasi tentang
pengiriman perusahaan Anda. Berdasarkan data ini, Anda ingin menjawab beberapa pertanyaan
bisnis. Untuk melakukannya, Anda membuat satu pipeline yang membersihkan data mentah
tentang pengiriman, yang disebut Pembersihan Data Pengiriman. Kemudian, Anda membuat pipeline kedua, Delayed Shipments USA, yang membaca data yang telah dibersihkan dan menemukan pengiriman dalam Amerika Serikat yang tertunda lebih dari nilai minimum yang ditentukan. Pipeline Pengiriman Tertunda Amerika Serikat dapat dipicu segera setelah
pipeline Pembersihan Data Pengiriman upstream berhasil diselesaikan.
Selain itu, karena pipeline downstream menggunakan output
pipeline upstream, Anda harus menentukan bahwa saat pipeline downstream berjalan
menggunakan pemicu ini, pipeline juga akan menerima direktori input yang akan dibaca (yang
adalah direktori tempat pipeline upstream menghasilkan outputnya). Proses ini
disebut meneruskan konfigurasi payload, yang Anda tentukan dengan
argumen runtime. Dengan demikian, Anda dapat memiliki serangkaian pipeline dinamis yang
berjalan menggunakan data jam, hari, minggu, atau bulan (bukan pipeline statis,
yang harus diperbarui untuk setiap operasi).
Untuk mengatur pipeline dengan pemicu, ikuti proses ini:
Membuat pipeline upstream dan downstream.
Di Cloud Data Fusion Studio, desain dan deploy
pipeline yang membentuk rantai orkestrasi Anda.
Pertimbangkan penyelesaian pipeline mana yang akan mengaktifkan pipeline berikutnya (downstream) dalam alur kerja Anda.
Opsional: meneruskan argumen runtime untuk pipeline upstream.
Di Cloud Data Fusion Studio, buka halaman List. Di tab Deployed, klik nama pipeline downstream. Tampilan
Deploy untuk pipeline tersebut akan muncul.
Di sisi kiri tengah halaman, klik Pemicu masuk.
Daftar pipeline yang tersedia akan muncul.
Klik pipeline upstream. Pilih satu atau beberapa status penyelesaian pipeline upstream (Berhasil, Gagal, atau Berhenti) sebagai kondisi untuk kapan pipeline downstream harus dijalankan.
Jika Anda ingin pipeline upstream membagikan informasi (disebut
konfigurasi payload) dengan pipeline downstream, klik
Konfigurasi pemicu, lalu ikuti langkah-langkah untuk
meneruskan konfigurasi payload sebagai argumen runtime.
Jika tidak, klik Aktifkan pemicu.
Uji pemicu.
Memulai proses pipeline upstream.
Jika pemicu dikonfigurasi dengan benar, pipeline downstream akan otomatis dijalankan setelah pipeline upstream selesai, berdasarkan kondisi yang Anda konfigurasikan.
Meneruskan konfigurasi payload sebagai argumen runtime
Konfigurasi payload memungkinkan pembagian informasi dari pipeline upstream ke pipeline downstream. Informasi ini dapat berupa, misalnya,
direktori output, format data, atau hari pipeline dijalankan. Informasi
ini kemudian digunakan oleh pipeline downstream untuk keputusan seperti
menentukan set data yang tepat untuk dibaca.
Untuk meneruskan informasi dari pipeline upstream ke pipeline downstream,
Anda menetapkan argumen runtime pipeline downstream dengan nilai
argumen runtime atau konfigurasi plugin apa pun di
pipeline upstream.
Setiap kali pipeline downstream dipicu dan dijalankan, konfigurasi payloadnya
ditetapkan menggunakan argumen runtime dari operasi tertentu
dari pipeline upstream yang memicu pipeline downstream.
Untuk meneruskan konfigurasi payload sebagai argumen runtime, ikuti langkah-langkah berikut:
Melanjutkan dari Membuat pemicu masuk,
setelah mengklik Konfigurasi pemicu, argumen runtime apa pun yang Anda
tetapkan sebelumnya untuk pipeline upstream akan muncul. Pilih argumen runtime yang akan diteruskan dari pipeline upstream ke pipeline downstream saat pemicu ini dieksekusi.
Klik tab Konfigurasi plugin untuk melihat daftar hal yang akan diteruskan
dari pipeline upstream ke pipeline downstream saat
dipicu.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-04 UTC."],[[["\u003cp\u003eCloud Composer can orchestrate multiple Cloud Data Fusion pipelines, offering centralized workflow and dependency management, monitoring, alerting, and integration with other Google Cloud services.\u003c/p\u003e\n"],["\u003cp\u003eCloud Composer uses Directed Acyclic Graphs (DAGs) and Cloud Data Fusion Operators to define and manage pipeline orchestration, including starting, stopping, and deleting pipelines.\u003c/p\u003e\n"],["\u003cp\u003eTriggers in Cloud Data Fusion allow automatic execution of downstream pipelines upon completion of upstream pipelines, based on success, failure, or other conditions.\u003c/p\u003e\n"],["\u003cp\u003eTriggers facilitate dynamic pipelines by enabling the sharing of runtime arguments and plugin configurations (payload configuration) between upstream and downstream pipelines.\u003c/p\u003e\n"],["\u003cp\u003eUsing payload configuration with triggers, the downstream pipeline can receive information, such as output directory and data format, from the upstream pipeline.\u003c/p\u003e\n"]]],[],null,["# Orchestrate pipelines\n\nThis page explains pipeline orchestration with Cloud Composer and\ntriggers. Cloud Data Fusion recommends using Cloud Composer to\norchestrate pipelines. If you require a simpler way to manage orchestration, use\ntriggers. \n\n### Composer\n\nOrchestrate pipelines with Cloud Composer\n-----------------------------------------\n\nOrchestrating pipeline execution in Cloud Data Fusion with\nCloud Composer provides following benefits:\n\n- **Centralized workflow management:** uniformly manage the execution of multiple Cloud Data Fusion pipelines.\n- **Dependency management:** to ensure proper execution order, define dependencies between pipelines.\n- **Monitoring and alerting:** Cloud Composer provides monitoring capabilities and alerts for failures.\n- **Integration with other services:** Cloud Composer lets you orchestrate workflows that span across Cloud Data Fusion and other Google Cloud services.\n\nTo orchestrate Cloud Data Fusion pipelines using\nCloud Composer, follow this process:\n\n1. **Set up the Cloud Composer environment.**\n\n - **Create a Cloud Composer environment.** If you don't have one, provision the environment in your Google Cloud project. This environment is your orchestration workspace.\n - **Give permissions.** Ensure the Cloud Composer service account has the necessary permissions to access Cloud Data Fusion (such as permission to start, stop, and list pipelines).\n2. **Define Directed Acyclic Graphs (DAG) for orchestration.**\n\n - **Create a DAG:** In Cloud Composer, create a DAG that defines the orchestration workflow for your Cloud Data Fusion pipelines.\n - **Cloud Data Fusion Operators:** Use Cloud Composer's Cloud Data Fusion Operators within your DAG. These operators let you interact programmatically with Cloud Data Fusion.\n\n### Cloud Data Fusion operators\n\nCloud Data Fusion pipeline orchestration has the following operators:\n\n`CloudDataFusionStartPipelineOperator`\n\n: Triggers the execution of a Cloud Data Fusion pipeline by its ID. It\n has the following parameters:\n\n - Pipeline ID\n - Location (Google Cloud region)\n - Pipeline namespace\n - Runtime arguments (optional)\n - Wait for completion (optional)\n - Timeout (optional)\n\n`CloudDataFusionStopPipelineOperator`\n\n: Lets you stop a running Cloud Data Fusion pipeline.\n\n`CloudDataFusionDeletePipelineOperator`\n\n: Deletes a Cloud Data Fusion pipeline.\n\n### Build the DAG workflow\n\nWhen you build the DAG workflow, consider the following:\n\n- **Defining dependencies:** Use the DAG structure to define dependencies between tasks. For example, you might have a task that waits for a pipeline in one namespace to complete successfully before triggering another pipeline in a different namespace.\n- **Scheduling:** Schedule the DAG to run at specific intervals, such as daily or hourly, or set it to be triggered manually.\n\nFor more information, see the\n[Cloud Composer overview](/composer/docs/concepts/overview).\n\n### Triggers\n\nOrchestrate pipelines with triggers\n-----------------------------------\n\nCloud Data Fusion triggers let you automatically execute a downstream\npipeline upon the completion (success, failure, or any specified condition)\nof one or more upstream pipelines.\n\nTriggers are useful for the following tasks:\n\n- Cleaning your data once, and then making it available to multiple downstream pipelines for consumption.\n- Sharing information, such as runtime arguments and plugin configurations, between pipelines. This task is called *payload\n configuration*.\n- Having a set of dynamic pipelines that run using the data from the hour, day, week, or month, instead of a static pipeline that must be updated for every run.\n\nFor example, you have a dataset that contains all information about your\ncompany's shipments. Based on this data, you want to answer several business\nquestions. To do this, you create one pipeline that cleanses the raw data\nabout shipments, called *Shipments Data Cleaning* . Then you create a second\npipeline, *Delayed Shipments USA* , which reads the cleansed data and finds\nthe shipments within the USA that were delayed by more than a specified\nthreshold. The *Delayed Shipments USA* pipeline can be triggered as soon as\nthe upstream *Shipments Data Cleaning* pipeline successfully completes.\n\nAdditionally, since the downstream pipeline consumes the output of the\nupstream pipeline, you must specify that when the downstream pipeline runs\nusing this trigger, it also receives the input directory to read from (which\nis the directory where the upstream pipeline generated its output). This\nprocess is called *passing payload configuration*, which you define with\nruntime arguments. It lets you have a set of dynamic pipelines that\nrun using the data of the hour, day, week, or month (not a static pipeline,\nwhich must be updated for every run).\n| **Note:** Don't trigger upgrades with Terraform. For more information, see the [limitations for Cloud Data Fusion upgrades](/data-fusion/docs/how-to/upgrading#limitations).\n\nTo orchestrate pipelines with triggers, follow this process:\n\n1. **Create upstream and downstream pipelines.**\n\n - In the Cloud Data Fusion Studio, design and deploy the pipelines that form your orchestration chain.\n - Consider which pipeline's completion will activate the next pipeline (downstream) in your workflow.\n2. **Optional: pass runtime arguments for upstream pipelines.**\n\n - If you need to [pass payload configuration as runtime arguments](#pass-payload-configs) between pipelines, configure runtime arguments. These arguments can be passed to the downstream pipeline during execution.\n3. **Create an inbound trigger on the downstream pipeline.**\n\n - In the Cloud Data Fusion Studio, go to the **List** page. In the **Deployed** tab, click the name of the downstream pipeline. The Deploy view for that pipeline appears.\n - On the middle left side of the page, click **Inbound triggers**. A list of available pipelines appears.\n - Click the upstream pipeline. Select one or more of the upstream pipeline completion states (**Succeeds** , **Fails** , or **Stops**) as the condition for when the downstream pipeline should run.\n - If you want the upstream pipeline to share information (called *payload configuration* ) with the downstream pipeline, click **Trigger config** and then follow the steps to [pass payload configuration as runtime arguments](#pass-payload-configs). Otherwise, click **Enable trigger**.\n4. **Test the trigger.**\n\n - Initiate a run of the upstream pipeline.\n - If the trigger is configured correctly, the downstream pipeline automatically executes upon completion of the upstream pipelines, based on your configured condition.\n\n### Pass payload configuration as runtime arguments\n\nPayload configuration allows sharing of information from the upstream\npipeline to the downstream pipeline. This information can be, for example,\nthe output directory, the data format, or the day the pipeline was run. This\ninformation is then used by the downstream pipeline for decisions such as\ndetermining the right dataset to read from.\n\nTo pass information from the upstream pipeline to the downstream pipeline,\nyou set the runtime arguments of the downstream pipeline with the values of\neither the runtime arguments or the configuration of any plugin in the\nupstream pipeline.\n\nWhenever the downstream pipeline triggers and runs, its payload\nconfiguration is set using the runtime arguments of the particular run of\nthe upstream pipeline that triggered the downstream pipeline.\n\nTo pass payload configuration as runtime arguments, follow these steps:\n\n1. Picking up where you left off in the [Creating an inbound trigger](/data-fusion/docs/how-to/using-triggers#create_inbound_trigger), after clicking **Trigger config** , any runtime arguments you [previously set](/data-fusion/docs/how-to/using-triggers#before_you_begin) for your upstream pipeline will appear. Choose the runtime arguments to pass from the upstream pipeline to the downstream pipeline when this trigger executes.\n2. Click the **Plugin config** tab to see a list of what will be passed from your upstream pipeline to your downstream pipeline when it is triggered.\n3. Click **Configure and Enable Trigger**."]]