Observability for proxyless gRPC

Cloud Service Mesh Observability for proxyless gRPC builds on top of the existing gRPC OpenTelemetry plugin, records metrics (latency, message sizes, etc.) for all gRPC channels and servers that are Cloud Service Mesh-enabled, and provides additional attributes that show topological mesh information for Cloud Service Mesh traffic. A gRPC channel is considered to be Cloud Service Mesh enabled if it gets configuration from the Cloud Service Mesh control plane, while all gRPC servers are considered Cloud Service Mesh enabled.

Mesh Attributes

The following mesh attributes are available on metrics.

Local Environment Labels:

  • csm.mesh_id
    • The mesh ID.
  • Other local environment attributes are obtained from the OpenTelemetry Resource.
    • Managed Service for Prometheus (GMP) can be set up to use Google infrastructure to store metrics. If using this, resource attributes that describe the application's local environment are automatically added in as a MonitoredResource.
    • If using non-Google infrastructure for exporting and storing metrics, the collection pipeline should add in attributes on the metrics that describe the environment that the application is running on.

Remote Environment Labels:

  • csm.remote_workload_type
    • The type of the remote peer. ("gcp_kubernetes_engine" for GKE).
  • Based on the type of the peer, additional attributes would be present.
    • For a peer running on GKE -
    • csm.remote_workload_project_id
      • The identifier of the project associated with this resource, such as "my-project".
    • csm.remote_workload_location *The physical location of the cluster that contains the container.
    • csm.remote_workload_cluster_name
      • The cluster where the container is running.
    • csm.remote_workload_namespace_name
      • The namespace where the container is running.
    • csm.remote_workload_name
      • The name of the remote workload. This should be the name of the object that contains the Pod definition (for example, a Deployment, ReplicaSet, or just the Pod name for a bare Pod).

Service Labels: Information about the backend service (xDS cluster) that the RPC is being routed to. Note that this is only available if the backend service has been configured through the Gateway API.

  • csm.service_name
    • The service name.
  • csm.service_namespace_name
    • The service namespace name.

The term remote_workload refers to the peer, that is, for clients, the server Pod that is the target of an RPC is the remote workload, whereas for servers, the client Pod that initiated the RPC is the remote workload.

Note that these attributes won't be available on grpc.client.attempt.started and grpc.server.call.started since all topological mesh information is not available at the collection point of these metrics.

Observability setup instructions

This section explains how to enable Cloud Service Mesh Observability for proxyless gRPC on a service mesh setup.

C++

Observability support is only available through the Bazel build system. The target grpcpp_csm_observability needs to be added as a dependency.

Required code changes

The following code needs to be added to your gRPC clients and servers in order to make use of Cloud Service Mesh observability.

#include <grpcpp/ext/csm_observability.h>

int main() {
  // …
  auto observability = grpc::CsmObservabilityBuilder()
                          .SetMeterProvider(std::move(meter_provider))
                          .BuildAndRegister();
  assert(observability.ok());
  // …
}

Before any gRPC operations, including creating a channel, server, or credentials, use the CsmObservabilityBuilder API to register a plugin. The following sample shows how to set up Cloud Service Mesh Observability with a Prometheus exporter.

  opentelemetry::exporter::metrics::PrometheusExporterOptions opts;
  opts.url = "0.0.0.0:9464";
  auto prometheus_exporter =
      opentelemetry::exporter::metrics::PrometheusExporterFactory::Create(opts);
  auto meter_provider =
      std::make_shared<opentelemetry::sdk::metrics::MeterProvider>();
  meter_provider->AddMetricReader(std::move(prometheus_exporter));
  auto observability = grpc:::CsmObservabilityBuilder()
                          .SetMeterProvider(std::move(meter_provider))
                          .BuildAndRegister();

The SetMeterProvider() API on CsmObservabilityBuilder() allows users to set a MeterProvider object that can be configured with exporters.

Java

To enable Cloud Service Mesh Observability for Java gRPC applications, perform the following steps:

  1. Ensure project includes the grpc-gcp-csm-observability artifact. Use gRPC version 1.65.0 or later.

  2. Within main() method, initialize Cloud Service Mesh Observability by providing a configured OpenTelemetry SDK instance with a MeterProvider to collect and export metrics.

    Before you perform any gRPC operations like setting up a channel or server, make sure to use the CsmObservability.Builder() API to register OpenTelemetry SDK.

    Once the CsmObservability instance is created, invoking registerGlobal() on the instance enables Cloud Service Mesh Observability for all Cloud Service Mesh channels and servers.

    The following example demonstrates how to set up Cloud Service Mesh Observability using a Prometheus exporter.

import io.grpc.gcp.csm.observability.CsmObservability;
...

public static void main(String[] args) {
    ...

    int prometheusPort = 9464;

    SdkMeterProvider sdkMeterProvider = SdkMeterProvider.builder()
        .registerMetricReader(
            PrometheusHttpServer.builder().setPort(prometheusPort).build())
        .build();

    OpenTelemetrySdk openTelemetrySdk = OpenTelemetrySdk.builder()
        .setMeterProvider(sdkMeterProvider)
        .build();

    CsmObservability observability = new CsmObservability.Builder()
        .sdk(openTelemetrySdk)
        .build();
    observability.registerGlobal();

    // ... (continue with channel and server configuration)
}

Go

Before any gRPC operations, including creating a ClientConn or Server, or credentials, configure Cloud Service Mesh Observability globally with a MeterProvider. The following sample shows how to set up Cloud Service Mesh Observability. After setting Cloud Service MeshObservability up, any Cloud Service Mesh Channels and all servers will pick up an OpenTelemetry stats plugin configured with provided options and with additional Cloud Service Mesh Labels. Non Cloud Service Mesh Channels will get an OpenTelemetry stats plugin without Cloud Service Mesh Labels.

  import (
  "context"

  "google.golang.org/grpc/stats/opentelemetry"
  "google.golang.org/grpc/stats/opentelemetry/csm"

  "go.opentelemetry.io/otel/sdk/metric"
)

func main() {
  reader := metric.NewManualReader()
  provider := metric.NewMeterProvider(metric.WithReader(reader))
  opts := opentelemetry.Options{
    MetricsOptions: opentelemetry.MetricsOptions{
        MeterProvider: provider,
    },
  }
  cleanup := csm.EnableObservability(context.Background(), opts)
  defer cleanup()
  // Any created ClientConns and servers will be configured with an
  // OpenTelemetry stats plugin configured with provided options.

}

Python

The following gRPC dependencies are required for Cloud Service Mesh Observability:

grpcio>=1.65.0
grpcio-observability>=1.65.0
grpcio-csm-observability>=1.65.0

Before any gRPC operations, including creating a channel, server, or credentials, use the CsmOpenTelemetryPlugin API to create and register a plugin:

import grpc_csm_observability

# ...
csm_plugin = grpc_csm_observability.CsmOpenTelemetryPlugin(
    meter_provider=[your_meter_provider],
)
csm_plugin.register_global()

# Create server or client 

After all gRPC operations, use the following code to deregister and clean up resources:

csm_plugin.deregister_global()

The following sample shows how to set up Cloud Service Mesh Observability with a Prometheus exporter:

import grpc_csm_observability
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(port=9464, addr="0.0.0.0")
reader = PrometheusMetricReader()
meter_provider = MeterProvider(metric_readers=[reader])
csm_plugin = CsmOpenTelemetryPlugin(
    meter_provider=meter_provider,
)
csm_plugin.register_global()

# Clean up after use

csm_plugin.deregister_global()

In the previous sample, you can scrape localhost:9464/metrics to get the metrics reported by Cloud Service Mesh Observability.

Note that for the mesh attributes added onto the gRPC metrics to work, both the client and server binaries need to be set up with CsmObservability.

If using non-Google infrastructure for exporting and storing metrics, the collection pipeline should add in attributes on the metrics that describe the environment that the application is running on. This along with the mesh attributes described previously can be utilized to get a view of the traffic running on the mesh.

Spec changes

Cloud Service Mesh Observability determines the mesh topological information through environment variables that need to be added to the container's env, both for clients and servers. This information is made available to peers for metrics reporting through Cloud Service Mesh Observability.

spec:
  containers:
  - image: IMAGE_NAME
    name: CONTAINER_NAME
    env:
    - name: GRPC_XDS_BOOTSTRAP
      value: "/tmp/grpc-xds/td-grpc-bootstrap.json" #created by td-grpc-bootstrap
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: NAMESPACE_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
     - name: CSM_WORKLOAD_NAME
       value: CSM_WORKLOAD_NAME
     - name: CONTAINER_NAME
       value: CONTAINER_NAME
     - name: OTEL_RESOURCE_ATTRIBUTES
       value: k8s.pod.name=$(POD_NAME),k8s.namespace.name=$(NAMESPACE_NAME),k8s.container.name=CONTAINER_NAME

Replacing the following:

  • IMAGE_NAME with the name of the image.
  • CONTAINER_NAME with the name of the container.
  • CSM_WORKLOAD_NAME with the workload name, for example the deployment name.