Deploy the OpenTelemetry Collector on Container-Optimized OS

This document describes how to run the Google-Built OpenTelemetry Collector on Container-Optimized OS to collect OTLP logs, metrics, and traces from instrumented applications and then export that data to Google Cloud.

Before you begin

Running the OpenTelemetry Collector requires the following resources:

  • A Google Cloud project with the Cloud Monitoring API, Cloud Trace API, and Cloud Logging API enabled.

    • If you don't have a Google Cloud project, then do the following:

      1. In the Google Cloud console, go to New Project:

        Create a New Project

      2. In the Project Name field, enter a name for your project and then click Create.

      3. Go to Billing:

        Go to Billing

      4. Select the project you just created if it isn't already selected at the top of the page.

      5. You are prompted to choose an existing payment profile or to create a new one.

      The Monitoring API, Trace API, and Logging API are enabled by default for new projects.

    • If you already have a Google Cloud project, then ensure that the Monitoring API, Trace API, and Logging API are enabled:

      Enable the APIs

  • A Container-Optimized OS virtual machine (VM). If you don't have a Container-Optimized OS VM, then follow the instructions at Creating and configuring instances.

  • An installation of gcloud. For information about installing gcloud, see Install the Google Cloud CLI.

Configure permissions for the Collector

By default, Container-Optimized OS VMs use the Compute Engine default service account, PROJECT_NUMBER-compute@developer.gserviceaccount.com. This service account usually has the Identity and Access Management (IAM) roles necessary to write the metrics and logs described in this document:

If you are configuring a custom service account for your instance, see Manage access to service accounts.

Deploy the Collector

To run Google-Built OpenTelemetry Collector, you need to provide a configuration file for your Container-Optimized OS VM. You can use the cloud-init tool to write a configuration file. The following is a recommended cloud-init file for using the Google-built Collector:

write_files:
- path: /etc/config/config.yaml
  permissions: 0644
  owner: root
  content: |
    receivers:
      # Open two OTLP servers:
      # - On port 4317, open an OTLP GRPC server
      # - On port 4318, open an OTLP HTTP server
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
      otlp:
        protocols:
          grpc:
            endpoint: localhost:4317
          http:
            cors:
              # This effectively allows any origin
              # to make requests to the HTTP server.
              allowed_origins:
              - http://*
              - https://*
            endpoint: localhost:4318

      # Using the prometheus scraper, scrape the Collector's self metrics.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver
      # https://opentelemetry.io/docs/collector/internal-telemetry/
      prometheus/self-metrics:
        config:
          scrape_configs:
          - job_name: otel-self-metrics
            scrape_interval: 1m
            static_configs:
            - targets:
              - localhost:8888

    processors:
      # The batch processor is in place to regulate both the number of requests
      # being made and the size of those requests.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
      batch:
        send_batch_max_size: 200
        send_batch_size: 200
        timeout: 5s

      # The memorylimiter will check the memory usage of the collector process.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/memorylimiterprocessor
      memory_limiter:
        check_interval: 1s
        limit_percentage: 65
        spike_limit_percentage: 20

      # The resourcedetection processor is configured to detect GCP resources.
      # Resource attributes that represent the GCP resource the collector is
      # running on will be attached to all telemetry that goes through this
      # processor.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#gcp-metadata
      resourcedetection:
        detectors: [gcp]
        timeout: 10s

      transform/collision:
        metric_statements:
        - context: datapoint
          statements:
          - set(attributes["exported_location"], attributes["location"])
          - delete_key(attributes, "location")
          - set(attributes["exported_cluster"], attributes["cluster"])
          - delete_key(attributes, "cluster")
          - set(attributes["exported_namespace"], attributes["namespace"])
          - delete_key(attributes, "namespace")
          - set(attributes["exported_job"], attributes["job"])
          - delete_key(attributes, "job")
          - set(attributes["exported_instance"], attributes["instance"])
          - delete_key(attributes, "instance")
          - set(attributes["exported_project_id"], attributes["project_id"])
          - delete_key(attributes, "project_id")

    exporters:
      # The googlecloud exporter will export telemetry to different
      # Google Cloud services:
      # Logs -> Cloud Logging
      # Metrics -> Cloud Monitoring
      # Traces -> Cloud Trace
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlecloudexporter
      googlecloud:
        log:
          default_log_name: opentelemetry-collector

      # The googlemanagedprometheus exporter will send metrics to
      # Google Managed Service for Prometheus.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlemanagedprometheusexporter
      googlemanagedprometheus:

    service:
      pipelines:
        logs:
          receivers:
          - otlp
          processors:
          - resourcedetection
          - memory_limiter
          - batch
          exporters:
          - googlecloud
        metrics/otlp:
          receivers:
          - otlp
          processors:
          - resourcedetection
          - transform/collision
          - memory_limiter
          - batch
          exporters:
          - googlemanagedprometheus
        metrics/self-metrics:
          receivers:
          - prometheus/self-metrics
          processors:
          - resourcedetection
          - transform/collision
          - memory_limiter
          - batch
          exporters:
          - googlemanagedprometheus
        traces:
          receivers:
          - otlp
          processors:
          - resourcedetection
          - memory_limiter
          - batch
          exporters:
          - googlecloud
      telemetry:
        metrics:
          address: localhost:8888

We recommend that you create a Docker bridge network to facilitate communication between the Collector container and any other container on the system that will be sending telemetry. To create the network, run the following command:

docker network create -d bridge otel otel

Run the Collector container with the following command:

docker run -d \
    --network otel \
    --name opentelemetry-collector \
    -v /etc/config:/etc/config \
    us-docker.pkg.dev/cloud-ops-agents-artifacts/google-cloud-opentelemetry-collector/otelcol-google:0.121.0 \
    --config=/etc/config/config.yaml

The previous command does the following:

  • Runs the Collector container in the background.
  • Attaches the Collector container to the previously created otel bridge network. Other containers can be connected to the bridge to send telemetry.
  • Mounts your configuration file on the container so the file can be accessed to configure the Collector.

Configure the Collector

We provide an OpenTelemetry Collector configuration for you to use with the Google-built Collector. This configuration is designed to deliver high volumes of OTLP metrics, logs, and traces. This configuration is also designed to prevent common ingestion issues. You can add to the configuration, but we strongly recommend that you don't remove elements.

This section describes the provided configuration, the key components like exporters, processors, receivers, and other available components.

Provided Collector configuration

You can find the Collector configuration in the google-built-opentelemetry-collector directory on the opentelemetry-operations-collector repository:

receivers:
  # Open two OTLP servers:
  # - On port 4317, open an OTLP GRPC server
  # - On port 4318, open an OTLP HTTP server
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        cors:
          # This effectively allows any origin
          # to make requests to the HTTP server.
          allowed_origins:
          - http://*
          - https://*
        endpoint: localhost:4318

  # Using the prometheus scraper, scrape the Collector's self metrics.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver
  # https://opentelemetry.io/docs/collector/internal-telemetry/
  prometheus/self-metrics:
    config:
      scrape_configs:
      - job_name: otel-self-metrics
        scrape_interval: 1m
        static_configs:
        - targets:
          - localhost:8888

processors:
  # The batch processor is in place to regulate both the number of requests
  # being made and the size of those requests.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  # The memorylimiter will check the memory usage of the collector process.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/memorylimiterprocessor
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

  # The resourcedetection processor is configured to detect GCP resources.
  # Resource attributes that represent the GCP resource the collector is
  # running on will be attached to all telemetry that goes through this
  # processor.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#gcp-metadata
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  # The transform/collision processor ensures that any attributes that may
  # collide with the googlemanagedprometheus exporter's monitored resource
  # construction are moved to a similar name that is not reserved.
  transform/collision:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

exporters:
  # The googlecloud exporter will export telemetry to different
  # Google Cloud services:
  # Logs -> Cloud Logging
  # Metrics -> Cloud Monitoring
  # Traces -> Cloud Trace
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlecloudexporter
  googlecloud:
    log:
      default_log_name: opentelemetry-collector

  # The googlemanagedprometheus exporter will send metrics to
  # Google Managed Service for Prometheus.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlemanagedprometheusexporter
  googlemanagedprometheus:

service:
  pipelines:
    logs:
      receivers:
      - otlp
      processors:
      - resourcedetection
      - memory_limiter
      - batch
      exporters:
      - googlecloud
    metrics/otlp:
      receivers:
      - otlp
      processors:
      - transform/collision
      - resourcedetection
      - memory_limiter
      - batch
      exporters:
      - googlemanagedprometheus
    metrics/self-metrics:
      receivers:
      - prometheus/self-metrics
      processors:
      - resourcedetection
      - memory_limiter
      - batch
      exporters:
      - googlemanagedprometheus
    traces:
      receivers:
      - otlp
      processors:
      - resourcedetection
      - memory_limiter
      - batch
      exporters:
      - googlecloud
  telemetry:
    metrics:
      address: localhost:8888

Exporters

The Collector configuration includes the following exporters:

  • googlecloud exporter, for logs and traces. This exporter is configured with a default log name.

  • googlemanagedprometheus exporter, for metrics. This exporter does not require any configuration, but there are configuration options. For information about configuration options for the googlemanagedprometheus exporter, see Get started with the OpenTelemetry Collector in the Google Cloud Managed Service for Prometheus documentation.

Processors

The Collector configuration includes the following processors:

  • batch: Configured to batch telemetry requests at the Google Cloud maximum number of entries per request, or at the Google Cloud minimum interval of every 5 seconds (whichever comes first).

  • memory_limiter: Caps the Collector's memory usage to prevent out-of-memory crashes by dropping data points when the limit is exceeded.

  • resourcedetection: Automatically detects Google Cloud resource labels such as project_id.

Receivers

The Collector configuration includes only the otlp receiver. For information about instrumenting your applications to send OTLP traces and metrics to the Collector's OTLP endpoint, see see Choose an instrumentation approach.

Available components

The Google-Built OpenTelemetry Collector contains the components that most users will need to enable a rich experience within Google Cloud Observability. For a complete list of available components, see Components in the opentelemetry-operations-collector repository.

To request any changes or additions to the available components, open a feature request. in the opentelemetry-operations-collector repository.

Generate telemetry

You can test your configuration by using the open-source telemetrygen tool. The OpenTelemetry project provides a container in the GitHub Container registry.

Before running the following commands, replace the following placeholders, if you changed the defaults used in the Docker commands in Deploy the Collector:

  • otel: The name you specified when you created the Docker bridge network.
  • opentelemetry-collector: The name you specified when running the container.

Generate logs

To generate logs by using the telemetrygen tool, run the following command:

docker run \
  --net=otel \
  ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.105.0 \
  logs --otlp-insecure --rate=3 --duration=5m \
  --otlp-endpoint=opentelemetry-collector:4317

Generate metrics

To generate metrics by using the telemetrygen tool, run the following command:

docker run \
  --net=otel \
  ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.105.0 \
  metrics --otlp-insecure --rate=0.1 --duration=5m \
  --otlp-endpoint=opentelemetry-collector:4317

Generate metrics

To generate traces by using the telemetrygen tool, run the following command:

docker run \
  --net=otel \
  ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.105.0 \
  traces --otlp-insecure --rate=3 --duration=5m \
  --otlp-endpoint=opentelemetry-collector:4317

After a few minutes, telemetry generated by the application begins flowing through the Collector to the Google Cloud console for each signal.

View telemetry

The Google-Built OpenTelemetry Collector sends metrics, logs, and traces from your instrumented applications to Google Cloud Observability. The Collector also sends self-observability metrics. The following sections describe how to view this telemetry.

View your metrics

The Google-Built OpenTelemetry Collector collects Prometheus metrics that you can view by using the Metrics Explorer. The metrics collected depend on the instrumentation of the app, although the Google-built Collector also writes some self-metrics.

To view the metrics collected by the Google-Built OpenTelemetry Collector, do the following:
  1. In the Google Cloud console, go to the  Metrics explorer page:

    Go to Metrics explorer

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. In the toolbar of the Google Cloud console, select your Google Cloud project.
  3. In the Metric element, expand the Select a metric menu, enter Prometheus Target in the filter bar, and then use the submenus to select a specific resource type and metric:
    1. In the Active resources menu, select Prometheus Target.
    2. To select a metric, use the Active metric categories and Active metrics menus. Metrics collected by the Google-Built OpenTelemetry Collector have the prefix prometheus.googleapis.com.
    3. Click Apply.
  4. Configure how the data is viewed.

    When the measurements for a metric are cumulative, Metrics Explorer automatically normalizes the measured data by the alignment period, which results in the chart displaying a rate. For more information, see Kinds, types, and conversions.

    When integer or double values are measured, such as with counter metrics, Metrics Explorer automatically sums all time series. To change this behavior, set the first menu of the Aggregation entry to None.

    For more information about configuring a chart, see Select metrics when using Metrics Explorer.

View your traces

To view your trace data, do the following:

  1. In the Google Cloud console, go to the Trace explorer page:

    Go to Trace explorer

    You can also find this page by using the search bar.

  2. In the toolbar of the Google Cloud console, select your Google Cloud project. For App Hub configurations, select the App Hub host project.
  3. In the scatter plot, select a trace.
  4. In the Gantt chart on the Trace details panel, select a span.

    A panel opens that displays information about the traced request. These details include the method, status code, number of bytes, and the user agent of the caller.

  5. To view the logs associated with this trace, select the Logs & Events tab.

    The tab shows individual logs. To view the details of the log entry, expand the log entry. You can also click View Logs and view the log by using the Logs Explorer.

For more information about using the Cloud Trace explorer, see Find and explore traces.

View your logs

From the Logs Explorer, you can inspect your logs, and you can also view associated traces, when they exist.

  1. In the Google Cloud console, go to the Logs Explorer page:

    Go to Logs Explorer

    If you use the search bar to find this page, then select the result whose subheading is Logging.

  2. Locate a log entry from your instrumented app. To view the details, expand the log entry.

  3. Click Traces on a log entry with a trace message, and then select View trace details.

    A Trace details panel opens and displays the selected trace.

For more information about using the Logs Explorer, see View logs by using the Logs Explorer.

Observe and debug the Collector

The Google-Built OpenTelemetry Collector automatically provides self-observability metrics to help you monitor its performance and ensure continued uptime of the OTLP ingestion pipeline.

To monitor the Collector, install the sample dashboard for the Collector. This dashboard offers at-a-glance insights into several metrics from the Collector, including uptime, memory usage, and API calls to Google Cloud Observability.

To install the dashboard, do the following:

  1. In the Google Cloud console, go to the  Dashboards page:

    Go to Dashboards

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Click Dashboard Templates.
  3. Search for the OpenTelemetry Collector dashboard.
  4. Optional: To preview the dashboard, select it.
  5. Click Add dashboard to your list and then complete the dialog.

    The dialog lets you select the name of the dashboard, and add labels to the dashboard.

For more information about installing dashboards, see Install a dashboard template.