Create a pipeline monitoring dashboard using Cloud Monitoring

Learn how to use Cloud Monitoring to create a dashboard to monitor pipelines.


To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:

Guide me


Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Make sure that billing is enabled for your Google Cloud project.

  6. Enable the Cloud Data Fusion, BigQuery, Cloud Storage, and Dataproc APIs.

    Enable the APIs

  7. To create custom dashboards, you must be granted the Monitoring Editor (roles/monitoring.editor) IAM role on the service account.

    For more information about granting roles, see Manage access.

Create a Cloud Data Fusion instance with Cloud Logging enabled

To use Cloud Logging with your Cloud Data Fusion pipeline, create a Cloud Data Fusion instance with Cloud Logging enabled:

  1. Go to the Cloud Data Fusion Instances page and click Create instance.

    Create an instance

  2. In the Instance name field, enter a name for your new instance.

  3. From the Region drop-down, select the Google Cloud region in which you want to create the instance.

  4. From the Version drop-down, select a Cloud Data Fusion version.

  5. Select an Edition.

  6. Expand Advanced options.

  7. In the Logging and monitoring section, select Enable Stackdriver logging service.

  8. Click Create.

After you create an instance, you can't enable Cloud Logging in the Google Cloud console.

Create a log-based metric

  1. Go to the Cloud Logging Log-based metrics page:

    Go to Log-based metrics

  2. Click Create metric.

  3. On the Create a metric page, do the following:

    1. For Metric type, select Counter.
    2. In the Log-based metric name field, enter pipeline_logs.
    3. In the Units field, enter 1.
    4. In the Build filter field, enter the following:

      resource.type="cloud_dataproc_cluster"
      log_name=~"projects/.*/logs/datafusion-pipeline-logs"
      
    5. In the Labels section, click Add label and create the following labels. After entering each label, click Done, and click Add label again to create the next label.

      Label name Label type Field name
      Project STRING resource.labels.project_id
      Message STRING jsonPayload.message
      LoggerName STRING labels.loggerName
      ClusterName STRING resource.labels.cluster_name
      SparkPhase STRING labels.".workflowSparkId"
      Region STRING resource.labels.region
      Pipeline STRING labels.".applicationId"
      RunId STRING labels.".runId"
      Namespace STRING labels.".namespaceId"
      LogLevel STRING labels.levelName
    6. Click Create metric.

      The newly created metric appears in the user-defined metrics table. If the metric isn't immediately visible, refresh the page.

      The dashboard contains the following charts:

      • All pipelines
      • Completed pipelines
      • Failed pipelines
      • All pipeline runs
      • Completed pipeline runs
      • Failed pipeline runs
      • Dataproc clusters for runs

      After a metric is created, it might take up to 24 hours to start displaying the time series data.

Install the dashboard

  1. Download the JSON file to your local machine.

  2. Go to the Cloud Monitoring Dashboards page:

    Go to Monitoring dashboards

  3. Click Create dashboard.

  4. Click Dashboard settings > JSON > JSON editor.

  5. In a text editor, open the JSON file that you downloaded.

  6. Copy the content of the downloaded JSON file and paste it into the JSON editor, replacing the content that the JSON editor contains by default.

  7. Click Apply changes.

    This refreshes the dashboard. The Cloud Data Fusion pipelines run after the metric was created, appear in the dashboard. If no pipelines were run after the metric was created, the dashboard will be empty.

  8. Autosave is enabled by default. If autosave is disabled, click Save to save the dashboard.

  9. Click Close editor.

    Your new dashboard appears in the list of dashboards on the Monitoring overview page.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

Delete the Cloud Data Fusion instance

Follow these instructions to delete your Cloud Data Fusion instance.

Delete the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next