Performance Monitoring isn’t for everyone.

When I first started working with APM tools, I thought everyone’s needs were similar. The goal of these tools was to tell you how fast your system was. In general, this meant how long the system took to respond to each web request. And perhaps at the time, it was a universal need. After all, you remember when most websites loaded slowly, right? A big part of that slowness was extensive monolithic backend web services that took more than 10 seconds to respond to requests.

Currently, though, fast code is not the business differentiator it used to be. For many of us, a page loading too slowly means we need to modify our public cloud hosting settings, not that we need to rewrite our codebase.

APM is specific; Observability is general

APM is an approach focused on monitoring system performance metrics in order to detect problems quickly.  The goal is to identify anomalies or issues before they impact the user experience or cause significant downtime. APM allows companies to respond quickly when there’s an issue, but it doesn’t provide much insight into the root causes of problems or how systems interact. APM monitors key performance points, e.g., page load time on the home screen, and uses that as an indicator for the whole system’s performance.

For me, the classic measurement for APM was the total page load time on a key page for the business. This number was a good stand-in for most of the user’s experience of how ‘fast’ the site was loading and feeling. Changes made to back-end or front-end systems were lauded if they improved this loading time and added to ‘tech debt’ if the time went up. This was Application Performance Monitoring at its most pure: a single number that offered some insight into how the whole system was working.

Observability takes a different approach by providing visibility into how applications work internally as well as their relationships with other systems within an environment. It provides deep insights into what components are not performing optimally and why giving teams more context around issues happening in real-time or over timeframes more extended than what APM can detect. Observability also serves as an early warning sign that can be used for proactive maintenance activities or potential threats from external sources like attackers trying to breach security protocols from outside your network boundaries. 

In my personal experience, Observability came to the fore when I started to use serverless and containerized compute instances. While I still cared about numbers like ‘total load time,’ I found that my everyday questions had become deeper and more complex. Questions like ‘When a user purchases something on our e-commerce site, what microservices are involved in telling him his credit card is approved?’ or ‘When an administrator logs on, if X microservice is down, can she still load account details?’

These questions were less about high-level statistics about performance and more about understanding the system overall. This is, to me, the heart of Observability. 

Which should you use?

For a CTO looking at long-term success, both Application Performance Monitoring (APM) and observability serve distinct but complementary roles within their IT infrastructure strategy: 

   – APM helps investigate current technical performance issues; whereas 

   – observability gives insights on what components need improvement over time, so future incidents can be avoided proactively without any hindrance in the system’s performance while saving costs associated with frequent repairs & troubleshooting activities needed due to less effective approaches adopted by using basic metrics alone generated through conventional monitoring techniques.

They Work Together

So which should you use?

When we discuss Performance monitoring vs. Observability, neither of these approaches should replace one another; rather, they should supplement each other when implemented together properly, ensuring stability across all critical operations while avoiding unplanned downtime and outages based upon comprehensive understanding provided through timely forecasting and predictive analytics generated via both application performance monitoring tools coupled along with enhanced observational capabilities offered via modern day observer patterns making sure any unforeseen event surface upfront allowing appropriate countermeasures taken prior actual occurrence helping business stay ahead of curve backed up strong data-driven decisions made possible due systematic use advanced analytics tooling technology offers today.

So APM dashboards may make sense to go on the wall in the break room for the whole team to get a sense of ‘how are we doing in general?’ Whereas Observability tools will be the things, we study with our dual monitors open, trying to dive deep into what’s going on inside the system.

Again, which should you use?

Do you need APM? You might! It’s going to depend on your team, your relationship with tech debt, and how well you can fix performance problems by expanding your cloud budget.

Do you need Observability? You definitely need it! It will help you save time and money on debugging, automate regular system health checks, and provide operational visibility and insights.

By taking advantage of both Performance Monitoring and Observability tools, organizations are able to create a comprehensive monitoring system that provides key metrics in order to make smarter decisions. With the insights that these two systems offer, teams can take informed action when it comes to making improvements or changes. This ultimately provides greater cost savings as well as an overall improved customer experience.

Evaluating solutions

When evaluating which APM or observability tools you’re going to use, there are a few questions you want to ask: 

  1. Does it work in my environment?
  2. What implementation/configuration options are available?
  3. How much customization can I do with the platform?
  4. Is it easy to use and maintain?
  5. How much does it cost?
  6. What types of data does it provide?
  7. Does it have the ability to log and alert on issues?

Finally, when selecting APM and observability tools, it’s important to understand your own needs and whether or not the tools offered will meet them. Further it’s helpful to consult with vendors to get a better understanding of their product, as well as to get in-depth technical support and advice. 

Tool Suggestions

For Ruby environments, it’s worth considering a mature APM product like Scout APM, which installs quickly and gets you insights into your stack within hours.

When you’re working with Java, Go, or Python, it’s worth considering an open solution for Observability based around OpenTelemetry. With OpenTelemetry, metrics storage and processing are standardized, and you can instrument multiple environments with their data format consistent between them. Large engineering teams like eBay have moved to OpenTelemetry in the past year.

If you do end up using OpenTelemetry, you’ll need an endpoint since it’s not generally recommended to run your own data storage for metrics. After all, if your system is down, you don’t want your observability and alerting tool to go down at the same time! For an efficient and affordable OpenTelemetry solution with built-in dashboards and alerting, check out TelemetryHub!