To know how your application is working, you need to know how it behaves. So that you can either fix an issue or improve it. Everything revolves around the need to uncover the unknown. Its importance in the age of information is immense. With the rapid growth of technology around us and the reliability of apps in our lives, developers need to understand how their apps and their hosting environments are working.

The need of the hour is observability. So that developers can ensure things are working the way they should. This is necessary because many apps and programs are being developed and released in phases, communicating through multiple microservices and relying on cloud-based technologies. Nobody wants such systems to become too difficult to fix. Everyone wants a simplified solution.

Observability, for those who do not know and those who need a refresher, is simply put as the ability to ascertain and monitor a system’s internal states, through external outputs, for example, its performance. With designs and requirements becoming more and more complex, system failures and bugs have started to multiply and become more abundant. Many tools have been created to ensure these issues do not become a part of the released system for the end-users to face to deal with this situation.

As many of the applications are now being developed following microservice architecture, the growth of complex and diverse requirements at each step of the development is growing. These microservices each have distinctive single-function modules, and understanding how they behave data for developers to work on them further. In this structure, many applications are communicating with each other than before, and the margin for error is increasing with the level of complexity. The critical role of observability saves a lot of time and money for software developing companies. This is because, at every step of the process, bugs and issues are being dealt with. It provides information on how the system is going to work.

While observability is synonymous with the term monitoring, and in this day and age, cloud-native telemetry is becoming increasingly accepted as the crowd favorite for developers. It helps them find a lead and find where the bug is by tracing the behavior and getting to its bottom.

To have observability, developers need to be equipped with tools and libraries to collect and analyze data. This is done chiefly through distributed tracing in which is a source of metrics and logs that can help look into individual requests within the program so that you can get closer and faster at solving the problems if any arise. Having a good distributed tracing tool will help businesses save time and energy and help developers focus on improving the systems they are developing. Projects like OpenTelemetry, OpenTracing, and OpenCensus were created just for that purpose. In this article, we will be delving into the two-decade-long journey to OpenTelemetry, its need, and how it evolved from its predecessors OpenTracing and OpenCensus. This way, you will also know how helpful these systems can be for you, especially for DevOps.

OpenTelemetry: A Unified Open Specification

As the name suggests, OpenTelemetry collects telemetry data (remember the three types mentioned previously). It is an ecosystem of instrumentation libraries and tools used to generate, collect, process, and export data. This tool does so by looking at the data from a distributed system to troubleshoot, debug and manage applications and the host environments that they are in. This is helpful because it allows IT and developer teams to issue their code base for data collection and adjust and adapt as they grow. It helps IT professionals to be able to analyze data using any language or platform they are comfortable with so that they are not tied to specifics in the long run. At the moment, it is only supporting a handful of languages, but it will be releasing more updates soon.

It is a vendor-neutral or vendor-agnostic tool that the Cloud Native Computing Foundation created. This sandbox project merged two different projects, OpenCensus and OpenTracing (which we will explain later). Currently, OpenTelemetry is still incubating and has been since May 2019. It is slowly releasing the standards in parts, as described below. Because the project is open-source, it takes in contributions from many developers. You can find it on GitHub, where it is very transparent for developers to follow what is going on and who is contributing to it.

OpenTelemetry consists of several tools that can come in handy for a developer, mainly for observability. The project is meant to be flexible and extensible to support a broad range of open-source, commercial, and end-user solutions. It was meant to bring together two projects that served almost the exact purpose of tracing metrics that were both open-sourced and could standardize the processes. The unification served multiple purposes and fostered collaboration within the developers’ community instead of creating products that perform the same thing in the market and are also controlled by vendors. It can provide you with the following:

  • A single, vendor-neutral instrumentation library that is also language-specific and can support you with automatic and manual instrumentations.
  • A single collector binary that you can use to deploy in many ways, including but not limited to as an agent or as a gateway.
  • An end-to-end implementation that can help you generate, emit, collect, process, and export telemetry data.
  • Complete control of data, including being able to send data to many destinations simultaneously through configuration.
  • Open standard semantic conventions to make sure that data collection is always vendor neutrality.
  • Support of multiple context propagation formats in conjunction with assisting with migration in the future as standards evolve.
  • The ability to add on more technology protocols and formats as the technology evolves and new ways of observing data arise.

It must be mentioned that OpenTelemetry is not to be confused with back-end providers like Prometheus or Jaeger. It supports the export of data to open source and commercial back ends to understand the data better as the developer requires.

Why Was OpenTelemetry Created?

The goal of creating OpenTelemetry was to bring together a myriad of technologies to form a vendor-neutral observability platform. As mentioned earlier, it is a part of the Cloud Native Computing Foundation and came from a merger of the OpenTracing and OpenCensus projects.

With growing complexity due to rapid changes in technology, new challenges have pushed for further solutions. The sole purpose was to manage the complexity and diversity of data that would become available. OpenTelemetry then came forward to consolidate and unify the environment for developers.

The unified set of libraries and specifications purpose was to create a platform that would be a complete telemetry system. This was done to be suitable for monitoring microservices and many other types of modern and distributed systems that would be compatible with most OSS and commercial backends.

Previously there were no accurate, standardized methods of describing what a system was doing. This happened mainly because different developers used different ways, languages, and machines in their different combinations. The burden of maintenance for instrumentation is heavily laid on the shoulders of the user as well. There was also a lack of data portability, so this became a challenge for observability tools to search for compatibility in various environments and requirements. Thus, the need for OpenTelemetry was to create a standardization for what distributed systems were doing. This also included the flexibility of using different languages and hardware systems, and APIs.

OpenTelemetry resulted from two decades worth of effort put into creating standards for observability and vendor-neutral APIs. While there were projects available that helped observability for software developers, the problem was that they had to look at different options in the market before settling for one that suited them the most. Developers had to study the pros and cons and consider which one worked better for them.

In 2016, OpenTracing was incubated in the CNCF, which focused on vendor-neutral APIs for the consumption of traces. In 2018, OpenCensus was created by Google, and it captured retracing permissions and metrics. The approaches (explained below) were more complementary rather than contradictory. In the same timeframe, World Wide Web Consortium (W3C) worked on Trace Contect and Correlation Context header specifications. This was used explicitly for efficient communication of traceIDs over HTTP. And these weren’t the only projects available at that time.

Ben Sigelman, in March 2019, then announced that the two projects, OpenTracing and OpenCensus, would be merging. This was because both had a common goal for open standards that focused on interoperability and a vendor-neutral observability ecosystem. The vendor-neutral approach would empower developers because they would not have to be contractually bound to a vendor for using their tools and have the flexibility.

By bringing together two distributed tracing libraries, CNCF and Google essentially killed the competition. While competition is good in the market as it fosters innovation, the same could not be said for OpenTracing and OpenCensus. Both were open-source and would benefit from the collaboration rather than compete with each other. While OpenTracing took care of tracing and logs, OpenCensus provided additional context while doing the same thing as you’ll see below. OpenTelemetry thus became a unifier of open specification that would support, empower and strengthen developers and their creations.

Benefits of OpenTelemetry

OpenTelemetry, while still in the early stage of its inception, is becoming a hot topic for developers to follow. The benefits that OpenTelemetry will provide to developers is as follows:

  • Flexibility: Because of its open-sourced nature, it is easy for developers to change backends without the need to change instrumentation. Apart from that, developers can work with more vendors, platforms, and projects quickly because of the single set of standards. It also frees you from being bound to a vendor because of contracts. Gone are the days to be locked into vendor roadmap priorities and configuration. Additionally, when new technologies emerge in the market, you no longer have to wait for instrumentation to be updated by the vendors.
  • Simplification: Not only does it save your time from choosing between projects and standards, but OpenTelemetry also allows you to simplify your choice between OpenTracing and OpenCensus easily. It does everything; you don’t have to go comparing, trying out, and reading reviews about different platforms to use anymore. This saves your time and effort so that you can focus on building reliable and fantastic software. It also simplifies telemetry data management, and you can easily export it to a form that you can analyze quickly.
  • Streamlined observability: Since there is a single standard that will be followed by many developers, many vendors will be moving towards OpenTelemetry as well because of its flexibility. Since OpenTelemetry’s focus is on high-quality telemetry data that is streamlined, it will become effortless to accommodate and even test a single standard.
  • Cross-platform and languages: OpenTelemetry already supports various languages and backends and is building up to accommodate more in the future. Being able to accommodate multiple platforms and languages provides ease for developers to capture and transmit telemetry to backends without changing existing instrumentations. Even more remarkable is the fact that OpenTelemetry’s installation and integration are as simple as adding a few lines of code into the system.
  • More control over your data: OpenTelemetry helps ease the burden of data collection from various sources and technologies. This aspect provides you the observability and the monitoring capabilities to focus more on analyzing the data and to better understand your applications. You no longer have to go through tedious methods of collecting data as everything will be streamlined according to your customizations. This way, you will ensure that the program you deliver can enhance user experiences and improve business outcomes.
  • Backward compatibility: This is a bonus benefit if you have already been using OpenCensus or OpenTracing. OpenTelemetry supports the use of its predecessors so that you can seamlessly migrate the systems.

When Did OpenTelemetry Become Available?

OpenTelemetry is in the beta stage across several languages. It has been in incubation in the Cloud Native Sandbox since May 2019. At the moment, it has broad language support for :

  • Java
  • C#
  • Go
  • C++
  • JavaScript
  • Python
  • Rust
  • Erlang/Elixir

It can integrate with frameworks such as:

  • MySQL
  • Jetty
  • Django
  • Redis
  • Kafka
  • Akka
  • Spring
  • Flask
  • RabbitMQ
  • net/http
  • Gorilla/mux
  • JDBC
  • WSGI
  • PostgreSQL

OpenTelemetry went to the beta stage in March 2020. Currently, OpenTelemetry Tracing Specification has reached 1.0. Metric will achieve the same status within the second half of 2021, and logs will receive specifications by 2022.

Components of OpenTelemetry

OpenTelemetry consists of many components, some of which include:

  • APIs: Application Program Interface (API) is the core component of OpenTelemetry. It is one of the sources of telemetry data. It is language-specific and is used to instrument the code to create traces. This action is conducted through code change or auto-instrumentation agents.
  • SDKs: Software Development Kit is used for the implementation of the API. It helps process and export data. The SDKs support configuration and help with transaction sampling and request filtering as well. You can imagine it as a bridge used to deliver data gathered from the API and the exporter.
  • Exporters: Exporters let developers configure where they want the telemetry data to be sent. It can translate the data according to requirements and customized formats and can be sent to the backend.
  • Collector: This is an optional part of the OpenTelemetry that allows you to make a seamless telemetry solution. It can be used for filtering data, batching, aggregation, and communication with the backends. You can do this either on the agent residing in the host application or through a standalone process. It also has two versions of the collector, which are either Core, which is foundational, and Contrib, which are all the components that are available along with all the optional and experimental components.

Key Terms for OpenTelemetry

Telemetry data is the output that is required to understand the system. Observability requires the telemetry data for developers to study how their programs are working. The following are terms that are commonly used in OpenTelemetry, which will also explain the data types available.

  • Metrics: A metric is a piece of quantifiable data that determines a component’s behavior over time. Metrics have attributes that can give you information about Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs). You can use metrics to have a holistic view of the health of the system and its performance. Metrics are usually raw measurements about the service and are captured while the application is running. In OpenTelemetry, there are three metric instruments: observer, counter, and measure.
  • Traces: Traces are a representation of the end-to-end journey that a request makes through a system. The information tracks the request moving through the entire system from once it is made to once it is delivered. This way, you can identify at what stage of operation the request found an issue. Traces provide you with the context that you need for troubleshooting. It keeps track of an activity happening from the beginning till the end.
  • Logs: A log is a record of what is happening within the application. It helps you understand what your application is doing. These are lines of text that are structured, unstructured or plain text. Logging provides details about when a code had an issue, which makes fixing issues easy because it is now easy to find.
  • Spans: Spans are single operations that are named, timed, and found within a trace. They are nested to form a trace tree. Each trace becomes a root span. You can use this to explain end-to-end latency and also its sub spans.
  • Context: Each span contains a context. This is a unique identifier that represents the request that the span is a part of. It shows the data that is moving throughout the environment. It can support correlation context. This essentially helps carry user-defined properties, if required.
  • Context Propagation: Through context propagation, context is bundled and communicated between services. It is typically done through HTTP headers, though not limited to it. It is an instrumental part of OpenTelemetry and can be used in cases other than tracing. Through OpenTelemetry, multiple protocols are supported to avoid issues for context propagation.

What is OpenTracing?

OpenTracing is a vendor-agnostic API developed to help developers instrument tracing in their codes. It became a CNCF project in 2016, backed by the goal of having a vendor-neutral specification for distributed tracing and providing developers the ability to trace a request from start to finish while instrumenting their code.

It is a set of standard APIs that consistently model and explain the behavior of your distributed systems. OpenTracing relies on three constituencies:

  • Tracing Tool Maintainers
  • Software developers who are responsible for building and deploying applications
  • Software developers who are contributing to widely used software.

How it supports developers is by creating a standard vendor-neutral framework for instrumentation through its API. Developers could try out various distributed tracing systems without the tedium that comes with repeating the entire instrumentation process from scratch for a new distributed tracing system. The purpose of this API is the incorporation of the distributed tracing at the service level and application level to allow developers to track requests across the services used to make the application.

OpenTracing’s specifications for span management can be used for any of the supported platforms:

  • Go
  • Java
  • JavaScript
  • Python
  • PHP
  • Objective-C
  • Ruby
  • C++
  • C#

OpenTracing is also compatible with the following tracers:

  • CNCF Jaeger
  • Instana
  • Elastic APM
  • Apache SkyWalking
  • LightStep
  • inspectIT
  • Datadog
  • Stagemonitor
  • Wavefront by VMWare

The OpenTracing API is pretty straightforward. It is a standardization that lies between application and library code and the myriad of systems that use data for tracing and for causality. Users of the standardization brought forth by OpenTracing could benefit from the offerings of consistent, unified, and tracer-neutral instrumentation API that could support a wide range of frameworks, programming languages, and platforms. It was able to:

  • Provide an infrastructure overview that was out of the box and show what the interactions between different services were like and what they depended on.
  • Provide information on the efficiency and detection of any latency issues.
  • Provide smart error reporting through span transport errors messages and stack the traces. It is a valuable insight to find out the root cause of the issues and system failures.
  • Provide information on trace data, which can be sent to other log processing platforms to analyze the data for useful information.

OpenTracing also takes advantage of distributed context propagation, which is made of the causal chain and breaks down the transaction from its starting point to its end. This way, you can tell what happened from the time the request began to the time it was finished or when an error occurred.

What is OpenCensus?

OpenCensus is an open-source project created by Google back in 2018. Its internal Census tool became an open standard that had implementations on API for metrics and traces. On integration with an application code, OpenCensus can emit traces and merits for a better understanding of the program and how it is behaving, allowing you to debug easily.

It is a set of libraries that can be used in various languages. It helps you collect metrics and distributed traces. This allows developers to capture, export, and manipulate metrics and distributed traces to their choice of backend. The core function of OpenCensus is to provide an ability to collect traces and metrics from applications, display them and then send them to a tool that can be used to analyze data, which is often known as backend.

Upon instrumentation of OpenCensus on a code, developers will arm themselves with tools to help them optimize the speed of their services, learn the exact way the request travels in the services, and gain metrics about the entire architecture. It uses context propagation, distributed trace collection, time-series metrics collection, APIs, and a myriad of integrations to support developers with their software with a lot of backend support.

There are many benefits of OpenCensus for the ecosystem:

For one, it aims to make application metrics and distributed traces accessible and available for developers in a more effortless manner than before. It provides a standard for good automatic instrumentation that helps developers know how well their code is performing.

Vendors of APM will have to deal with lesser issues based on setup friction. It makes it easy for their customers to switch when needed without compatibility problems and needs for upgrades and changes. Having broader language support means more ease of integration.

It provides local debugging capabilities. This way, developers can look at the metrics and requests on their own and customize sampling rates for traces.
OpenCensus aims to increase collaboration and support between vendors and open source-based providers, giving more power to the developers to be flexible with their software environment design.

It helps service providers and developers be able to solve customer issues better and faster.

OpenCensus can support the following languages:

  • C#
  • Go
  • Node.js
  • Java
  • C++
  • Erlang/Elixir
  • Ruby
  • PHP
  • Python

OpenCensus provides observability capabilities for the following:

  • Google Cloud
  • Redis
  • Dropwizard
  • Memcached
  • Caddy
  • SQL
  • Go kit
  • MongoDB
  • GroupCache

OpenCensus has the following backend support as well:

  • Azure Monitor
  • Datadog
  • AWS X-Ray
  • Google Cloud
  • Elasticsearch
  • Instana
  • New Relic
  • Jaeger
  • Honeycomb
  • SignalFx
  • Wavefront
  • Prometheus
  • Zipkin

Over to You

As you can see, OpenTelemetry took to OpenTracing’s tracing and distributed context propagation and OpenCensus’ time-series metrics and brought together two ambitious projects through a sense of collaboration for an open-source library for understanding telemetry data. It shows that the leadership of OpenCensus and OpenTracing are dedicated to bringing together communities towards a single and unified initiative that would benefit the developing community.

The point of the merger is to provide straightforward backward compatibility with the legacy projects using software bridges so that the transition is easy and smooth. While OpenTracing and OpenCensus will soon be on read-only mode, OpenTelemetry will take over as the specifications become more readily available. Remember, as of writing this article, it is available in some languages, tracing has reached specification 1, and logs and metrics will follow suit. The primary benefit to the community and the ecosystem is the consolidation of specifications and their standardization.

By now, you have already understood how important it is to know how your application works. Several tools are available to help you. Some are available through a vendor. Some through open source options. In a world where technological advances are demanding more complex and diverse solutions, it is only fair to assume that there is a need for a standard that can help make things easy. In the world of microservice architecture and the move towards cloud-based technologies in software development, OpenTelemetry provides the solution.

Rising from the merger between Google’s OpenCensus and CNCF’s OpenTracing APIs for the same purpose of increased observability and distributed tracing, OpenTelemetry will provide a way to collect data and to analyze it so that you are always one step ahead in your software development journey easily.

As a developer, you can also contribute to OpenTelemetry because it is open source. CNCF urges and welcomes support and suggestions from the community, and you can follow the project on GitHub if you are interested.