The USE Method Revisited

A refreshed perspective on Brendan Gregg's USE method

Many long-time software engineering or operations practitioners have developed, over time, diagnostic methodologies they use to understand why a system or individual software component is failing. While there is some intuition built over time, we have to fall back on methodical inspection of failures. There are many published methodologies, but many of us that work on systems have coalesced around the USE method published by Brendan Gregg (of Intel, previously Netflix and others) – an approach to diagnostics that focuses on Utilization, Saturation, and Errors (USE).

While I will try to summarize and quote effectively, I highly recommend reading the original article. It includes many details I’ve chosen to omit here and a real-world diagnostics example.

To build an observability product, we must thoroughly understand how practitioners diagnose issues. Without knowledge of diagnostic procedures, it is difficult to know how to interpret the metrics emitted by software. For example, a service may expose a monotonically increasing Counter metric for requests. While the raw metric may be useful for analytics and should be stored, it is far more likely that a user is interested in the request rate during diagnostics.

The primary use case for observability products is real-time diagnostics. During an incident, an engineer must be able to quickly see key performance indicators of a service and its dependencies (both hardware and software). The signal presented must be immediately useful. Transforming metrics in out-of-the-box dashboards and allowing users to arbitrarily transform their metrics in custom dashboards is table stakes.

Overview

As stated by Brendan Gregg in the original article, the USE method is summarized as

For every resource, check utilization, saturation, and errors.

While the original article focuses only on hardware components, they can also be applied to software dependencies.

Resources

Each software component has a set of resources it depends on to do its work – hardware and software.

  • CPU - cores (physical or virtual), threads (hardware / hyperthreads, OS / software thread limits)
  • Memory - capacity of RAM; swap is a special case in that it is technically memory utilization but generally constrains a program due to disk utilization
  • Network interfaces - NICs, sockets
  • Storage devices - I/O, capacity
  • Interconnects - Busses [1], queueing or logging systems between services (Kafka, SQS, RabbitMQ)

[1] While hardware interconnects are rarely a resource that we have to think about anymore, hardware features like NUMA can introduce additional latency for certain operations that can cause inconsistent performance.

Utilization

  • resource - all dependent functional components
  • utilization - the average/percentage of time the resource was busy
  • saturation - the depth of the wait queue for a resource
  • errors - the count and/or rate of error events

Over a specific period of time, utilization is the percentage of time spent doing work. Let’s take a look at CPU and memory as examples.

CPU utilization can be shown as a percentage of cores or allocated CPU used (e.g. container CPU limits). At the operating system level, you can inspect the number of allocated OS threads vs. the total allocatable OS threads. At the software level, you can show the percentage of time execution threads are actively doing work (e.g., green threads, virtual threads, thread pools).

Memory utilization can have a few different components. Fundamentally, we are talking about the available memory that’s allocatable to an application. This can come in the form of total memory available on a server or virtual machine. Memory can be constrained further by limits imposed by containers. At the smallest unit of confinement are memory limits constrained by runtime environments such as maximum stack or heap space. At the hardware level, there are constraints on things like CPU cache (measured in cache hits vs misses) and RAM bus bandwidth (harder and less important to measure).

If we see saturation or errors, we can often attribute this to a highly-contended resource.

Saturation

A system at 100% utilization is a system that’s likely to become saturated. When saturated, a system will begin to queue requests. When it comes to saturation, what we’re ultimately talking about are queues at various stages of the process:

  • Packet processing queues in the NIC
  • Socket receive buffers
  • CPU run queues
  • Age of the oldest message in a queue
  • Kafka topic offset

We can’t talk about queueing without talking about Little’s Law and its application in this context.

As a reminder, Little’s Law states that the average queue length is the average arrival (request) rate multiplied by the time to service a request (response time).

Errors

Spikes (or even non-zero) error rates can be a clear indicator that something is wrong, though it may not always be the case that it’s easy to determine what. Investigate and rule out errors quickly. Understanding the kinds of errors that a system can encounter is important: some errors are benign, and some are not. As long as your observability tools allow you to distinguish between the two easily, you should be able to quickly ascertain what the problematic resource is even if you cannot identify the actual problem.

Diagnostics with the USE method

Let’s examine the impact of response time and arrival rate in a fairly common application architecture. Consider Java’s concurrency pattern of using ThreadPools and ThreadPoolExecutors.

ThreadPools have a fixed size that represents available threads of execution for an application to handle requests. When a thread from the pool is available, the request is passed to a ThreadPoolExecutor for processing. For diagnostic purposes, there are two observable response times: from the client perspective and the application’s perspective. In the case of Little’s Law, what we’re considering is how long it takes for a ThreadPoolExecutor to process a single request.

If requests arrive at a rate of 10/sec, and we have a 1sec response time, then we’ll need to be able to execute 10 threads in parallel to prevent our queue length from increasing. In reality, this is much more complicated. As we’re talking about average response time, there may be requests that take much longer and some that complete much faster than 1 second. When thinking about concurrent request handling, we have to take into consideration that executors can be blocked on external services, I/O, etc.

An example in a distributed system that processes messages on a Kafka topic. Offset lag is monotonically increasing for a service. We know that 10 requests can be handled concurrently, so either the arrival rate of messages has increased or the time to process a request has increased. Kafka exposes metrics that directly indicate the rate at which messages arrive on a topic (record-send-rate). We can easily examine this metric over time. If record-send-rate has indeed increased, we may be able to increase the number of consumers on the topic to decrease consumer lag. This can happen during the course of normal operations, and there are autoscaling patterns for dealing with situations like this as they arise.

What happens when autoscaling makes the problem worse?

We rely on observability tools to help us understand where faults lie. When digging into performance issues, remember the USE method and systematically investigate each system’s components starting locally—as it’s often easier.

I like to begin with low-level metrics and move up in the stack. Trajectory imbues order. How much of the (virtual) machine’s capacity am I using? What are processes waiting on (i.e., what are they blocking on the most)? Are there any obvious errors: OOMs, rapidly-restarting processes with abnormal exit codes?

Once system-level metrics are eliminated, jump up in the stack level and look at the process’s runtime environment. The metrics for the runtime environment have, in many cases, corollaries to system-level metrics. CPU can be measured in available threads of execution. The amount of allocatable memory for the heap and stack may be constrained and garbage collection may be starving threads from being executed. The process may not be able to open any additional sockets to its dependencies.

Finally, move on to external dependencies. Look to see if downstream dependencies have increasing response times, are returning errors, or are no longer accepting connections. Look to others after examining the health of your own systems and services. Just because a database query is taking longer return doesn’t immediately imply that the database has gotten slower;the complexity of a query may have changed or unexpected data may have been added causing naturally slower response times as the database has to do more work.

While Little’s Law can help us understand why queue depths increase, simply increasing the number of parallel requests we can process is not always the answer. If your dependencies are overwhelmed, your service auto-scaling could cause more problems. It’s very easy to get caught up in intuition or small details–attempting to solve for local maxima–without considering the system holistically, and having a methodical diagnostics system can help break out of that thought loop.

Parting thoughts

As builders of observability tools, we must facilitate system diagnostics as effectively as we can. Most of the time, our users encounter our tools during times of stress: production incidents. Looking at dashboards is not something that they do every day. While dashboards may end up on a big screen in a NOC or next to a team’s cluster of desks for fun, they’re rarely the tools used every day or during diagnostics. When they do use our tools, they’re likely doing so when their pager goes off—sometimes waking them up from what should have been a good night’s sleep.

While a certain amount of satisfaction can be extracted from solving a problem, users want a product that lets them easily investigate an issue and get back to doing work that really matters. If we are going to build products that help people solve these problems quickly, then we have to truly understand how it is that this work is accomplished. Understanding diagnostic processes like the USE method and putting them into practice can help us make informed choices as we design the user experience.

The USE method is not only diagnostic methods, but it’s a great approach for thorny performance problems with no readily apparent causes. It can also come in handy with confusing legacy systems without modern instrumentation capabilities or pre-DevOps organizations with traditional operations teams. When all else fails, look at the computer.