View Cloud Data Fusion logs

This document describes how to access and view pipeline logs and service logs for Cloud Data Fusion.

Starting with Cloud Data Fusion version 6.11, pipeline logs and service logs are available in Cloud Logging.

About log types

Cloud Data Fusion generates several types of logs to help monitor and troubleshoot data integration processes:

Pipeline logs
These logs provide detailed information about individual pipeline runs, including preview runs. They cover all stages of a pipeline run, such as pre-run setup, job execution, and post-run activities.

Service logs
These logs provide detailed information about Cloud Data Fusion system services, such as AppFabric and AppFabric processor.

Audit logs
Cloud Data Fusion generates audit logs, such as Admin Activity audit logs. For more information, see Cloud Data Fusion audit logging.

Advanced pipeline logs
These logs provide detailed information about individual pipeline runs. These logs are particularly useful for troubleshooting pipeline failures, as they capture information even before the Dataproc cluster finishes provisioning, which can help identify cluster issues or delays. For more information, see View advanced pipeline logs in Cloud Logging.

Pricing

Cloud Logging and Cloud Monitoring usage incurs charges. For more information, see Google Cloud Observability pricing.

Optional: Import the Cloud Data Fusion Logging dashboard

To view pipeline logs and service logs using the Cloud Data Fusion Logging dashboard, import the dashboard:

  1. In the Google Cloud console, go to the Cloud Monitoring Dashboards page.

    Go to Dashboards

  2. Click View dashboard templates.

  3. Search for Cloud Data Fusion Logging and select the dashboard.

  4. Click Add Cloud Data Fusion Logging dashboard to your list.

View pipeline logs

You can view pipeline logs using the Cloud Data Fusion Logging dashboard or directly in the Logs Explorer.

View pipeline logs using dashboard

  1. If you haven't done already, import the Cloud Data Fusion Logging dashboard.

  2. In the My dashboards section, click Cloud Data Fusion Logging.

  3. In the Pipeline logs section, view the list of pipeline logs. You can filter the logs by severity, field names, and values.

    To refine your search using queries, use Logs Explorer.

View pipeline logs in Logs Explorer

  1. In the Google Cloud console, go to the Logs Explorer page.

    Go to Logs Explorer

  2. Enter the following query:

    resource.type="datafusion.googleapis.com/PipelineV2"
    

    This displays the list of pipeline logs. You can use filters to refine the results.

Filter pipeline logs

You can filter pipeline logs by run ID, instance ID, pipeline ID, location, namespace, or custom labels.

Every Cloud Data Fusion pipeline run is assigned a unique RunID. After you deploy and run your pipeline, you can find the RunID of your pipeline and view the corresponding pipeline logs.

To filter pipeline logs by RunID, follow these steps:

  1. Get the pipeline's RunID.

  2. In the Google Cloud console, go to the Logs Explorer page.

  3. Enter the following query:

    resource.type="datafusion.googleapis.com/PipelineV2" resource.labels.run_id=RUN_ID
    

View service logs

You can view service logs using the Cloud Data Fusion Logging dashboard or in the Logs Explorer.

View service logs using dashboard

  1. If you haven't done already, import the Cloud Data Fusion Logging dashboard.

  2. In the My dashboards section, click Cloud Data Fusion Logging.

  3. In the Service logs section, view the list of service logs. You can filter the logs by severity, field names, and values.

    To refine your search using queries, use Logs Explorer.

View service logs in Logs Explorer

  1. In the Google Cloud console, go to the Logs Explorer page.

    Go to Logs Explorer

  2. Find the service logs by entering the specific query for that service.

    Service name Log query
    Appfabric
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="appfabric"
    
    AppFabric processor
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="appfabric.processor"
    
    Dataset executor
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="dataset.executor"
    
    Log saver
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="log.saver"
    
    Metadata service
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="metadata.service"
    
    Metrics
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="metrics"
    
    Pipeline Studio
    resource.type="datafusion.googleapis.com/InstanceV2" 
    resource.labels.namespace="system"
    labels.".userserviceid"="studio"
    
    Runtime
    resource.type="datafusion.googleapis.com/InstanceV2"
    labels.".serviceId"="runtime"
    
    Wrangler service
    resource.type="datafusion.googleapis.com/InstanceV2"
    resource.labels.namespace="system"
    labels.".applicationId"="dataprep"
    labels.".userserviceid"="service"
    

Configurable logging in Cloud Data Fusion

Cloud Data Fusion 6.11.0 offers configurable logging, with Cloud Logging enabled by default. While disabling Cloud Logging is possible, it's strongly recommended to keep it enabled to ensure you have access to critical pipeline and instance logs.

To disable Cloud Logging, run the following command:

echo '{ "loggingConfig": {"instance_cloud_logging_disabled": "true"}}' | curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    --data @- \
    "https://datafusion.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances?instanceId=$INSTANCE_ID?updateMask=logging_config"

Replace the following:

  • PROJECT_ID: the Google Cloud project ID
  • LOCATION: the location of your instance
  • INSTANCE_ID: the ID of your Cloud Data Fusion instance

What's next