Deploy the OpenTelemetry Collector on Container-Optimized OS

This document describes how to run the Google-Built OpenTelemetry Collector on Container-Optimized OS to collect OTLP logs, metrics, and traces from instrumented applications and then export that data to Google Cloud.

Before you begin

Running the OpenTelemetry Collector requires the following resources:

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Cloud Logging, Cloud Monitoring, and Cloud Trace APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Cloud Logging, Cloud Monitoring, and Cloud Trace APIs.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

Enable the APIs

A Container-Optimized OS virtual machine (VM). If you don't have a Container-Optimized OS VM, then follow the instructions at Creating and configuring instances.
An installation of gcloud. For information about installing gcloud, see Install the Google Cloud CLI.

Configure permissions for the Collector

By default, Container-Optimized OS VMs use the Compute Engine default service account, PROJECT_NUMBER-compute@developer.gserviceaccount.com. This service account usually has the Identity and Access Management (IAM) roles necessary to write the metrics and logs described in this document:

If you are configuring a custom service account for your instance, see Manage access to service accounts.

Deploy the Collector

To run Google-Built OpenTelemetry Collector, you need to provide a configuration file for your Container-Optimized OS VM. You can use the cloud-init tool to write a configuration file. The following is a recommended cloud-init file for using the Google-built Collector:

write_files:
- path: /etc/config/config.yaml
  permissions: 0644
  owner: root
  content: |
    receivers:
      # Open two OTLP servers:
      # - On port 4317, open an OTLP GRPC server
      # - On port 4318, open an OTLP HTTP server
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
      otlp:
        protocols:
          grpc:
            endpoint: localhost:4317
          http:
            cors:
              # This effectively allows any origin
              # to make requests to the HTTP server.
              allowed_origins:
              - http://*
              - https://*
            endpoint: localhost:4318

    processors:
      # The batch processor is in place to regulate both the number of requests
      # being made and the size of those requests.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
      batch:
        send_batch_max_size: 200
        send_batch_size: 200
        timeout: 5s

      # The memorylimiter will check the memory usage of the collector process.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/memorylimiterprocessor
      memory_limiter:
        check_interval: 1s
        limit_percentage: 65
        spike_limit_percentage: 20

      # The resourcedetection processor is configured to detect GCP resources.
      # Resource attributes that represent the GCP resource the collector is
      # running on will be attached to all telemetry that goes through this
      # processor.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#gcp-metadata
      resourcedetection:
        detectors: [gcp]
        timeout: 10s

      transform/collision:
        metric_statements:
        - context: datapoint
          statements:
          - set(attributes["exported_location"], attributes["location"])
          - delete_key(attributes, "location")
          - set(attributes["exported_cluster"], attributes["cluster"])
          - delete_key(attributes, "cluster")
          - set(attributes["exported_namespace"], attributes["namespace"])
          - delete_key(attributes, "namespace")
          - set(attributes["exported_job"], attributes["job"])
          - delete_key(attributes, "job")
          - set(attributes["exported_instance"], attributes["instance"])
          - delete_key(attributes, "instance")
          - set(attributes["exported_project_id"], attributes["project_id"])
          - delete_key(attributes, "project_id")

    exporters:
      # The googlecloud exporter will export telemetry to different
      # Google Cloud services:
      # Logs -> Cloud Logging
      # Metrics -> Cloud Monitoring
      # Traces -> Cloud Trace
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlecloudexporter
      googlecloud:
        log:
          default_log_name: opentelemetry-collector

      # The googlemanagedprometheus exporter will send metrics to
      # Google Managed Service for Prometheus.
      #
      # Docs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlemanagedprometheusexporter
      googlemanagedprometheus:

    service:
      pipelines:
        logs:
          receivers:
          - otlp
          processors:
          - resourcedetection
          - memory_limiter
          - batch
          exporters:
          - googlecloud
        metrics/otlp:
          receivers:
          - otlp
          processors:
          - resourcedetection
          - transform/collision
          - memory_limiter
          - batch
          exporters:
          - googlemanagedprometheus
        traces:
          receivers:
          - otlp
          processors:
          - resourcedetection
          - memory_limiter
          - batch
          exporters:
          - googlecloud
      # Internal telemetry for the collector supports both push and pull-based telemetry data transmission.
      # Leveraging the pre-configured OTLP receiver eliminates the need for an additional port.
      #
      # Docs:
      # https://opentelemetry.io/docs/collector/internal-telemetry/
      telemetry:
        metrics:
          readers:
            - periodic:
                exporter:
                  otlp:
                    protocol: grpc
                    endpoint: localhost:4317

We recommend that you create a Docker bridge network to facilitate communication between the Collector container and any other container on the system that will be sending telemetry. To create the network, run the following command:

docker network create -d bridge otel

Run the Collector container with the following command:

docker run -d \
    --network otel \
    --name opentelemetry-collector \
    -v /etc/config:/etc/config \
    us-docker.pkg.dev/cloud-ops-agents-artifacts/google-cloud-opentelemetry-collector/otelcol-google:0.137.0 \
    --config=/etc/config/config.yaml

The previous command does the following:

Runs the Collector container in the background.
Attaches the Collector container to the previously created otel bridge network. Other containers can be connected to the bridge to send telemetry.
Mounts your configuration file on the container so the file can be accessed to configure the Collector.

Configure the Collector

We provide an OpenTelemetry Collector configuration for you to use with the Google-built Collector. This configuration is designed to deliver high volumes of OTLP metrics, logs, and traces. This configuration is also designed to prevent common ingestion issues. You can add to the configuration, but we strongly recommend that you don't remove elements.

This section describes the provided configuration, the key components like exporters, processors, receivers, and other available components.

Provided Collector configuration

You can find the Collector configuration in the google-built-opentelemetry-collector directory on the opentelemetry-operations-collector repository:

receivers:
  # Open two OTLP servers:
  # - On port 4317, open an OTLP GRPC server
  # - On port 4318, open an OTLP HTTP server
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        cors:
          # This effectively allows any origin
          # to make requests to the HTTP server.
          allowed_origins:
          - http://*
          - https://*
        endpoint: localhost:4318

processors:
  # The batch processor is in place to regulate both the number of requests
  # being made and the size of those requests.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  # The memorylimiter will check the memory usage of the collector process.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/memorylimiterprocessor
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

  # The resourcedetection processor is configured to detect GCP resources.
  # Resource attributes that represent the GCP resource the collector is
  # running on will be attached to all telemetry that goes through this
  # processor.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#gcp-metadata
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  transform/collision:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

exporters:
  # The googlecloud exporter will export telemetry to different
  # Google Cloud services:
  # Logs -> Cloud Logging
  # Metrics -> Cloud Monitoring
  # Traces -> Cloud Trace
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlecloudexporter
  googlecloud:
    log:
      default_log_name: opentelemetry-collector

  # The googlemanagedprometheus exporter will send metrics to
  # Google Managed Service for Prometheus.
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/googlemanagedprometheusexporter
  googlemanagedprometheus:

extensions:
  # Opens an endpoint on 13133 that can be used to check the
  # status of the collector. Since this does not configure the
  # `path` config value, the endpoint will default to `/`.
  #
  # When running on Cloud Run, this extension is required and not optional.
  # In other environments it is recommended but may not be required for operation
  # (i.e. in Container-Optimized OS or other GCE environments).
  #
  # Docs:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/healthcheckextension
  health_check:
    endpoint: 0.0.0.0:13133

service:
  extensions:
  - health_check
  pipelines:
    logs:
      receivers:
      - otlp
      processors:
      - resourcedetection
      - memory_limiter
      - batch
      exporters:
      - googlecloud
    metrics/otlp:
      receivers:
      - otlp
      processors:
      - resourcedetection
      - transform/collision
      - memory_limiter
      - batch
      exporters:
      - googlemanagedprometheus
    traces:
      receivers:
      - otlp
      processors:
      - resourcedetection
      - memory_limiter
      - batch
      exporters:
      - googlecloud
  # Internal telemetry for the collector supports both push and pull-based telemetry data transmission.
  # Leveraging the pre-configured OTLP receiver eliminates the need for an additional port.
  #
  # Docs:
  # https://opentelemetry.io/docs/collector/internal-telemetry/
  telemetry:
    metrics:
      readers:
        - periodic:
            exporter:
              otlp:
                protocol: grpc
                endpoint: localhost:4317

Exporters

The Collector configuration includes the following exporters:

googlecloud exporter, for logs and traces. This exporter is configured with a default log name.
googlemanagedprometheus exporter, for metrics. This exporter does not require any configuration, but there are configuration options. For information about configuration options for the googlemanagedprometheus exporter, see Get started with the OpenTelemetry Collector in the Google Cloud Managed Service for Prometheus documentation.

Processors

The Collector configuration includes the following processors:

batch: Configured to batch telemetry requests at the Google Cloud maximum number of entries per request, or at the Google Cloud minimum interval of every 5 seconds (whichever comes first).
memory_limiter: Caps the Collector's memory usage to prevent out-of-memory crashes by dropping data points when the limit is exceeded.
resourcedetection: Automatically detects Google Cloud resource labels such as project_id.

Receivers

The Collector configuration includes only the otlp receiver. For information about instrumenting your applications to send OTLP traces and metrics to the Collector's OTLP endpoint, see Choose an instrumentation approach.

Available components

The Google-Built OpenTelemetry Collector contains the components that most users will need to enable a rich experience within Google Cloud Observability. For a complete list of available components, see Components in the opentelemetry-operations-collector repository.

To request any changes or additions to the available components, open a feature request. in the opentelemetry-operations-collector repository.

Generate telemetry

You can test your configuration by using the open-source telemetrygen tool. The OpenTelemetry project provides a container in the GitHub Container registry.

Before running the following commands, replace the following placeholders, if you changed the defaults used in the Docker commands in Deploy the Collector:

otel: The name you specified when you created the Docker bridge network.
opentelemetry-collector: The name you specified when running the container.

Generate logs

To generate logs by using the telemetrygen tool, run the following command:

docker run \
  --net=otel \
  ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.105.0 \
  logs --otlp-insecure --rate=3 --duration=5m \
  --otlp-endpoint=opentelemetry-collector:4317

Generate metrics

To generate metrics by using the telemetrygen tool, run the following command:

docker run \
  --net=otel \
  ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.105.0 \
  metrics --otlp-insecure --rate=0.1 --duration=5m \
  --otlp-endpoint=opentelemetry-collector:4317

Generate metrics

To generate traces by using the telemetrygen tool, run the following command:

docker run \
  --net=otel \
  ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:v0.105.0 \
  traces --otlp-insecure --rate=3 --duration=5m \
  --otlp-endpoint=opentelemetry-collector:4317

After a few minutes, telemetry generated by the application begins flowing through the Collector to the Google Cloud console for each signal.

View telemetry

The Google-Built OpenTelemetry Collector sends metrics, logs, and traces from your instrumented applications to Google Cloud Observability. The Collector also sends self-observability metrics. The following sections describe how to view this telemetry.

View your metrics

The Google-Built OpenTelemetry Collector collects Prometheus metrics that you can view by using the Metrics Explorer. The metrics collected depend on the instrumentation of the app, although the Google-built Collector also writes some self-metrics.

To view the metrics collected by the Google-Built OpenTelemetry Collector, do the following:

In the Google Cloud console, go to the Metrics explorer page:
Go to Metrics explorer

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
In the toolbar of the Google Cloud console, select your Google Cloud project. For App Hub configurations, select the App Hub host project or the app-enabled folder's management project.
In the Metric element, expand the Select a metric menu, enter Prometheus Target in the filter bar, and then use the submenus to select a specific resource type and metric:
1. In the Active resources menu, select Prometheus Target.
2. To select a metric, use the Active metric categories and Active metrics menus. Metrics collected by the Google-Built OpenTelemetry Collector have the prefix prometheus.googleapis.com.
3. Click Apply.
To add filters, which remove time series from the query results, use the Filter element.
Configure how the data is viewed.
When the measurements for a metric are cumulative, Metrics Explorer automatically normalizes the measured data by the alignment period, which results in the chart displaying a rate. For more information, see Kinds, types, and conversions.

When integer or double values are measured, such as with counter metrics, Metrics Explorer automatically sums all time series. To change this behavior, set the first menu of the Aggregation entry to None.

For more information about configuring a chart, see Select metrics when using Metrics Explorer.

View your traces

To view your trace data, do the following:

In the Google Cloud console, go to the Trace explorer page:
Go to Trace explorer

You can also find this page by using the search bar.
In the toolbar of the Google Cloud console, select your Google Cloud project. For App Hub configurations, select the App Hub host project or management project.
In the table section of the page, select a row.
In the Gantt chart on the Trace details panel, select a span.

A panel opens that displays information about the traced request. These details include the method, status code, number of bytes, and the user agent of the caller.
To view the logs associated with this trace, select the Logs & Events tab.

The tab shows individual logs. To view the details of the log entry, expand the log entry. You can also click View Logs and view the log by using the Logs Explorer.

For more information about using the Cloud Trace explorer, see Find and explore traces.

View your logs

From the Logs Explorer, you can inspect your logs, and you can also view associated traces, when they exist.

In the Google Cloud console, go to the Logs Explorer page:
Go to Logs Explorer

If you use the search bar to find this page, then select the result whose subheading is Logging.
Locate a log entry from your instrumented app. To view the details, expand the log entry.
Click Traces on a log entry with a trace message, and then select View trace details.

A Trace details panel opens and displays the selected trace.

For more information about using the Logs Explorer, see View logs by using the Logs Explorer.

Observe and debug the Collector

The Google-Built OpenTelemetry Collector automatically provides self-observability metrics to help you monitor its performance and ensure continued uptime of the OTLP ingestion pipeline.

To monitor the Collector, install the sample dashboard for the Collector. This dashboard offers at-a-glance insights into several metrics from the Collector, including uptime, memory usage, and API calls to Google Cloud Observability.

To install the dashboard, do the following:

In the Google Cloud console, go to the Dashboards page:
Go to Dashboards

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
Click Dashboard Templates.
Search for the OpenTelemetry Collector dashboard.
Optional: To preview the dashboard, select it.
Click Add dashboard to your list and then complete the dialog.

The dialog lets you select the name of the dashboard, and add labels to the dashboard.

For more information about installing dashboards, see Install a dashboard template.