The OpenTelemetry Collector

The OpenTelemetry Collector is an application written in Go. The GitHub readme does a great job of describing it:

The OpenTelemetry Collector offers a vendor-agnostic implementation on how to receive, process and export telemetry data. In addition, it removes the need to run, operate and maintain multiple agents/collectors in order to support open-source telemetry data formats (e.g. Jaeger, Prometheus, etc.) sending to multiple open-source or commercial back-ends.

Objectives:

  • Usable: Reasonable default configuration, supports popular protocols, runs and collects out of the box.
  • Performant: Highly stable and performant under varying loads and configurations.
  • Observable: An exemplar of an observable service.
  • Extensible: Customizable without touching the core code.
  • Unified: Single codebase, deployable as an agent or collector with support for traces, metrics and logs.

So the OpenTelemetry collector is a Go binary that does exactly what its name implies: it collects data and sends it to a back-end. But there’s a lot of functionality that lies in between.

What a neat service! A local destination for data that handles the final sending of Open Telemetry information to your back end. But let’s cover some of the reasons you might not install the Otel collector.

Cons of Installing the Otel Collector

  • Additional setup time – If you’ve got some automated instrumentation working, or even just some manual code points instrumented, it’s another step to set up and configure the collector to receive that data. If the service you’re monitoring is a web service, it’s probably extremely good at sending data over the network, so why not have it just send that directly to Prometheus or the Open Telemetry endpoint?
  • The collector isn’t a monolith – at the last Open Telemetry Community Days, it was made clear to me how much the OpenTelemetry collector isn’t a single ‘thing’ but more a framework for the components you want. As the cncf site explains, a collector consists of receivers, processors, and exporters, all with their own options. That means that ‘installing the Otel collector’ just isn’t a one-step process
  • Resource consumption – (I will note, dear reader, that this concern is almost totally erroneous, and I’ll cover why in the ‘pros’ section) Running an extra process either in its own container or on your running service will consume some resources, Surely if we want the most performative service possible it’s better not to do this collection on our stack and instead export the data directly?

These seem like reasonable concerns, but the standard documentation on Open Telemetry recommends that the collector almost always be used. The reasons you should use an Otel collector on your service are as follows.

Pros of Installing the Otel Collector

  • Setup isn’t as hard as you fear – The setup docs are fairly complete in this area, and there are builds for most architectures (the collector does run a lot better in a Linux environment, but it should be no trouble to run it in a container if necessary)
  • Configurable, yes, complex, not necessarily – While there is great flexibility in how you want to build your own collector, you can just use a pre-built distro to start your project today.
  • Performance – The concern that a collector process would be a resource hog is something of a bugaboo. In reality, if you have multiple sources of data on your service, running a single collector process should be significantly more efficient than having each one send separately to your backend data collection service.

Reliability of telemetry

  • Once you start instrumenting things, it’s hard to stop. And when you add instrumentation to things like your delayed jobs or special-use code, these services may not have particularly robust ways of sending telemetry data. They may fail to retry sending or timeout too readily, making their telemetry quite brittle. Rather than debugging this on every single service, it’s a lot more reliable to have everything report to the collector.
  • Security and PII – That processor component is a key part of the strength of an Open Telemetry Collector. The ability to sanitize the data you’re collecting before sending it on can be key. In my many years working in observability, I’ve seen data reported to a backend service that was so revealing it would make your hair stand on end. It is much much better when working with real users to have some processing step to filter secure data, Personally identifiable information (PII), and other critical information that you don’t want to spread around.
  • Performance (again) – Finally, there’s the issue of resource consumption by your host. The collector is built to send telemetry data and will generally do so in an efficient way. Having individual services send a packet every time a single metric updates puts you in danger of overusing network resources and either causing latency for your critical services or a surprise bill for all that bandwidth.

Improved visibility into your system

Should you run an Open Telemetry Collector? Yes probably. The short time you’ll spend configuring your collector can save you countless headaches down the road. And if you share responsibility for configuration, you’ll spread awareness of your stack’s observability setup.

Finally, by establishing a collector process, now you’re preparing yourself for future expansion. When you need more security steps, processing of observability data, or reporting of data to more than one location, you’ll be ready.