Deploy Google-Built OpenTelemetry Collector on Google Kubernetes Engine

This document describes how to run the Google-Built OpenTelemetry Collector on Google Kubernetes Engine to collect OTLP logs, metrics, and traces from instrumented applications and then export that data to Google Cloud.

Before you begin

Running the Google-Built OpenTelemetry Collector requires the following resources:

  • A Google Cloud project with the Cloud Monitoring API, Cloud Trace API, and Cloud Logging API enabled.

    • If you don't have a Google Cloud project, then do the following:

      1. In the Google Cloud console, go to New Project:

        Create a New Project

      2. In the Project Name field, enter a name for your project and then click Create.

      3. Go to Billing:

        Go to Billing

      4. Select the project you just created if it isn't already selected at the top of the page.

      5. You are prompted to choose an existing payment profile or to create a new one.

      The Monitoring API, Trace API, and Logging API are enabled by default for new projects.

    • If you already have a Google Cloud project, then ensure that the Monitoring API, Trace API, and Logging API are enabled:

      Enable the APIs

  • A Kubernetes cluster. If you don't have a Kubernetes cluster, then follow the instructions in the Quickstart for GKE.

  • The following command-line tools:

    • gcloud
    • kubectl

    The gcloud and kubectl tools are part of the Google Cloud CLI. For information about installing them, see Managing Google Cloud CLI components. To see the gcloud CLI components you have installed, run the following command:

    gcloud components list
    

Configure permissions for the Collector

If you have disabled GKE workload identity, then you can skip this section.

To ensure that the OpenTelemetry Collector's kubernetes service account has the necessary permissions to export telemetry, ask your administrator to grant the OpenTelemetry Collector's kubernetes service account the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations.

Your administrator might also be able to give the OpenTelemetry Collector's kubernetes service account the required permissions through custom roles or other predefined roles.

To configure the permissions, use the following add-iam-policy-binding commands:

gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/logging.logWriter \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/monitoring.metricWriter \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
    --role=roles/cloudtrace.agent \
    --member=principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/opentelemetry/sa/opentelemetry-collector

Before running the commands, replace the following variables:

  • PROJECT_ID: The identifier of the project.
  • PROJECT_NUMBER: The Google Cloud project number.

Deploy the Collector

The Collector pipeline can be deployed directly from the vetted examples provided by the Self-Managed OTLP Kubernetes Ingestion repo. You can deploy directly from GitHub with the following commands after replacing PROJECT_ID with the ID of your Google Cloud project:

export GCLOUD_PROJECT=PROJECT_ID
kubectl kustomize https://github.com/GoogleCloudPlatform/otlp-k8s-ingest.git/k8s/base | envsubst | kubectl apply -f -

Before running the commands, replace the following variable:

  • PROJECT_ID: The identifier of the project.

Configure the Collector

We provide an OpenTelemetry Collector configuration for you to use with the Google-built Collector. This configuration is designed to deliver high volumes of OTLP metrics, logs, and traces with consistent GKE and Kubernetes metadata attached. This configuration is also designed to prevent common ingestion issues. You can add to the configuration, but we strongly recommend that you don't remove elements.

This section describes the provided configuration, the key components like exporters, processors, receivers, and other available components.

Provided Collector configuration

You can find the Collector configuration for Kubernetes environments in the otlp-k8s-ingest repository:

# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

exporters:
  googlecloud:
    log:
      default_log_name: opentelemetry-collector
    user_agent: Google-Cloud-OTLP manifests:0.2.0 OpenTelemetry Collector Built By Google/0.121.0 (linux/amd64)
  googlemanagedprometheus:
    user_agent: Google-Cloud-OTLP manifests:0.2.0 OpenTelemetry Collector Built By Google/0.121.0 (linux/amd64)

extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  filter/self-metrics:
    metrics:
      include:
        match_type: strict
        metric_names:
        - otelcol_process_uptime
        - otelcol_process_memory_rss
        - otelcol_grpc_io_client_completed_rpcs
        - otelcol_googlecloudmonitoring_point_count
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.replicaset.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.pod.start_time
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  memory_limiter:
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

  metricstransform/self-metrics:
    transforms:
    - action: update
      include: otelcol_process_uptime
      operations:
      - action: add_label
        new_label: version
        new_value: Google-Cloud-OTLP manifests:0.2.0 OpenTelemetry Collector Built By Google/0.121.0 (linux/amd64)

  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  transform/collision:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

  # The relative ordering of statements between ReplicaSet & Deployment and Job & CronJob are important.
  # The ordering of these controllers is decided based on the k8s controller documentation available at
  # https://kubernetes.io/docs/concepts/workloads/controllers.
  # The relative ordering of the other controllers in this list is inconsequential since they directly
  # create pods.
  transform/aco-gke:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["top_level_controller_type"], "ReplicaSet") where resource.attributes["k8s.replicaset.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.replicaset.name"]) where resource.attributes["k8s.replicaset.name"] != nil
      - set(attributes["top_level_controller_type"], "Deployment") where resource.attributes["k8s.deployment.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.deployment.name"]) where resource.attributes["k8s.deployment.name"] != nil
      - set(attributes["top_level_controller_type"], "DaemonSet") where resource.attributes["k8s.daemonset.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.daemonset.name"]) where resource.attributes["k8s.daemonset.name"] != nil
      - set(attributes["top_level_controller_type"], "StatefulSet") where resource.attributes["k8s.statefulset.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.statefulset.name"]) where resource.attributes["k8s.statefulset.name"] != nil
      - set(attributes["top_level_controller_type"], "Job") where resource.attributes["k8s.job.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.job.name"]) where resource.attributes["k8s.job.name"] != nil
      - set(attributes["top_level_controller_type"], "CronJob") where resource.attributes["k8s.cronjob.name"] != nil
      - set(attributes["top_level_controller_name"], resource.attributes["k8s.cronjob.name"]) where resource.attributes["k8s.cronjob.name"] != nil

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:4317
      http:
        cors:
          allowed_origins:
          - http://*
          - https://*
        endpoint: ${env:MY_POD_IP}:4318
  otlp/self-metrics:
    protocols:
      grpc:
        endpoint: ${env:MY_POD_IP}:14317

service:
  extensions:
  - health_check
  pipelines:
    logs:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - resourcedetection
      - memory_limiter
      - batch
      receivers:
      - otlp
    metrics/otlp:
      exporters:
      - googlemanagedprometheus
      processors:
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - transform/collision
      - transform/aco-gke
      - batch
      receivers:
      - otlp
    metrics/self-metrics:
      exporters:
      - googlemanagedprometheus
      processors:
      - filter/self-metrics
      - metricstransform/self-metrics
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - batch
      receivers:
      - otlp/self-metrics
    traces:
      exporters:
      - googlecloud
      processors:
      - k8sattributes
      - memory_limiter
      - resourcedetection
      - batch
      receivers:
      - otlp
  telemetry:
    logs:
      encoding: json
    metrics:
      readers:
      - periodic:
          exporter:
            otlp:
              protocol: grpc
              endpoint: ${env:MY_POD_IP}:14317

Exporters

The Collector configuration includes the following exporters:

  • googlecloud exporter, for logs and traces. This exporter is configured with a default log name.

  • googlemanagedprometheus exporter, for metrics. This exporter does not require any configuration, but there are configuration options. For information about configuration options for the googlemanagedprometheus exporter, see Get started with the OpenTelemetry Collector in the Google Cloud Managed Service for Prometheus documentation.

Processors

The Collector configuration includes the following processors:

  • batch: Configured to batch telemetry requests at the Google Cloud maximum number of entries per request, or at the Google Cloud minimum interval of every 5 seconds (whichever comes first).

  • memory_limiter: Caps the Collector's memory usage to prevent out-of-memory crashes by dropping data points when the limit is exceeded.

  • resourcedetection: Automatically detects Google Cloud resource labels such as project_id and cluster_name.

  • k8sattributes: Automatically maps Kubernetes resource attributes to telemetry labels.

  • transform: Renames metric labels that collide with labels on Google Cloud monitored resources.

Receivers

The Collector configuration includes only the otlp receiver. For information about instrumenting your applications to send OTLP traces and metrics to the Collector's OTLP endpoint, see see Choose an instrumentation approach.

Available components

The Google-Built OpenTelemetry Collector contains the components that most users will need to enable a rich experience within Google Cloud Observability. For a complete list of available components, see Components in the opentelemetry-operations-collector repository.

To request any changes or additions to the available components, open a feature request. in the opentelemetry-operations-collector repository.

Generate telemetry

This section describes deploying a sample application and pointing that application to the Collector's OTLP endpoint, and viewing the telemetry in Google Cloud. The sample application is a small generator that exports traces, logs, and metrics to the Collector.

If you already have an application instrumented with an OpenTelemetry SDK, then you can point your application to the Collector's endpoint instead.

To deploy the sample application, run the following command:

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/otlp-k8s-ingest/main/sample/app.yaml

To point existing applications that use the OpenTelemetry SDK at the Collector's endpoint, set the OTEL_EXPORTER_OTLP_ENDPOINT environment variable to http://opentelemetry-collector.opentelemetry.svc.cluster.local:4317.

After a few minutes, telemetry generated by the application begins flowing through the Collector to the Google Cloud console for each signal.

View telemetry

The Google-Built OpenTelemetry Collector sends metrics, logs, and traces from your instrumented applications to Google Cloud Observability. The Collector also sends self-observability metrics. The following sections describe how to view this telemetry.

View your metrics

The Google-Built OpenTelemetry Collector collects Prometheus metrics that you can view by using the Metrics Explorer. The metrics collected depend on the instrumentation of the app, although the Google-built Collector also writes some self-metrics.

To view the metrics collected by the Google-Built OpenTelemetry Collector, do the following:
  1. In the Google Cloud console, go to the  Metrics explorer page:

    Go to Metrics explorer

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. In the toolbar of the Google Cloud console, select your Google Cloud project.
  3. In the Metric element, expand the Select a metric menu, enter Prometheus Target in the filter bar, and then use the submenus to select a specific resource type and metric:
    1. In the Active resources menu, select Prometheus Target.
    2. To select a metric, use the Active metric categories and Active metrics menus. Metrics collected by the Google-Built OpenTelemetry Collector have the prefix prometheus.googleapis.com.
    3. Click Apply.
  4. Configure how the data is viewed.

    When the measurements for a metric are cumulative, Metrics Explorer automatically normalizes the measured data by the alignment period, which results in the chart displaying a rate. For more information, see Kinds, types, and conversions.

    When integer or double values are measured, such as with counter metrics, Metrics Explorer automatically sums all time series. To change this behavior, set the first menu of the Aggregation entry to None.

    For more information about configuring a chart, see Select metrics when using Metrics Explorer.

View your traces

To view your trace data, do the following:

  1. In the Google Cloud console, go to the Trace explorer page:

    Go to Trace explorer

    You can also find this page by using the search bar.

  2. In the toolbar of the Google Cloud console, select your Google Cloud project. For App Hub configurations, select the App Hub host project.
  3. In the scatter plot, select a trace.
  4. In the Gantt chart on the Trace details panel, select a span.

    A panel opens that displays information about the traced request. These details include the method, status code, number of bytes, and the user agent of the caller.

  5. To view the logs associated with this trace, select the Logs & Events tab.

    The tab shows individual logs. To view the details of the log entry, expand the log entry. You can also click View Logs and view the log by using the Logs Explorer.

For more information about using the Cloud Trace explorer, see Find and explore traces.

View your logs

From the Logs Explorer, you can inspect your logs, and you can also view associated traces, when they exist.

  1. In the Google Cloud console, go to the Logs Explorer page:

    Go to Logs Explorer

    If you use the search bar to find this page, then select the result whose subheading is Logging.

  2. Locate a log entry from your instrumented app. To view the details, expand the log entry.

  3. Click Traces on a log entry with a trace message, and then select View trace details.

    A Trace details panel opens and displays the selected trace.

For more information about using the Logs Explorer, see View logs by using the Logs Explorer.

Observe and debug the Collector

The Google-Built OpenTelemetry Collector automatically provides self-observability metrics to help you monitor its performance and ensure continued uptime of the OTLP ingestion pipeline.

To monitor the Collector, install the sample dashboard for the Collector. This dashboard offers at-a-glance insights into several metrics from the Collector, including uptime, memory usage, and API calls to Google Cloud Observability.

To install the dashboard, do the following:

  1. In the Google Cloud console, go to the  Dashboards page:

    Go to Dashboards

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Click Dashboard Templates.
  3. Search for the OpenTelemetry Collector dashboard.
  4. Optional: To preview the dashboard, select it.
  5. Click Add dashboard to your list and then complete the dialog.

    The dialog lets you select the name of the dashboard, and add labels to the dashboard.

For more information about installing dashboards, see Install a dashboard template.