runqlat and runqslower – eBPF command line tools

August 22, 2024

In this blog post we will look at runqlat and runqslower commands. They are available in both BCC and bpftrace tool collections.

Background

One of the core functions of Linux operating system is to schedule processes across available CPUs. When service gets a request, Linux typically will need to schedule the process, processing that request to run on one of CPUs. This might be very quick process if idle CPU is available or it can take significant time, if all CPUs are currently busy running different processes. In complicated environments other limits may exist – CPU usage limits in Virtualized environments or cgroups CPU limits often used in Kubernetes environments.

No matter what is the reason of CPU scheduling delay, it will be seen as additional latency from the standpoint of service or user originating request. In many cases CPU starvation first represents itself as additional tail latencies and may not be visible in overall CPU utilization graphs.

Problem

Check if CPU starvation is causing significant additional latency in my application, see what applications are affected and if this added latency reaches unacceptable levels.

Tools Usage

Running runqlat on relatively idle system we see most processes are scheduled in less than 10 microseconds, with very few cases of scheduling taking more than 1ms

# runqlat
Tracing run queue latency… Hit Ctrl-C to end.
^C

usecs : count distribution
0    ->    1    : 2672 **
2    ->    3    : 14113 *************
4    ->    7    : 37836 ****************************************
8    ->    15   : 10423 ***********
16   ->    31   : 542 *****
32   ->    63   : 3642 ***
64   ->    127  : 3563 ***
128  ->    255  : 2409 **
256  ->    511  : 1147 *
512  ->    1023 : 363
1024 ->    2047 : 161
2048 ->    4095 : 61
4096 ->    8191 : 6

If we put the same system under CPU starvation condition (a lot more runnable processes than CPUs available) we will see different picture:

# runqlat
Tracing run queue latency… Hit Ctrl-C to end.
^C

usecs : count distribution
0      -> 1      : 18415 **
2      -> 3      : 20065 ***
4      -> 7      : 237573 ****************************************
8      -> 15     : 51526 ********
16     -> 31     : 29037 ****
32     -> 63     : 18729 ***
64     -> 127    : 19147 ***
128    -> 255    : 10897 *
256    -> 511    : 7154  *
512    -> 1023   : 7506  *
1024   -> 2047   : 10168 *
2048   -> 4095   : 11203 *
4096   -> 8191   : 9991  *
8192   -> 16383  : 10058 *
16384  -> 32767  : 4019
32768  -> 65535  : 57832 *********
65536  -> 131071 : 29226 ****
131072 -> 262143 : 363
262144 -> 524287 : 8

Note we have significantly more cases when the process had to wait for 50ms or longer to be scheduled.

In this case we’re looking at the run queue latency / scheduling latency for system overall, where it is common to be interested only latency impact for specific process or cgroup – the tools also allows it (see Command Line Options Section)

Interested to easily see what process are impacted by scheduling latency ? You can see runqslower tool for that. Let’s see which processes are affected by waiting more than 100ms to get placed on the CPU ?

# runqslower 100000
Tracing run queue latency higher than 100000 us

TIME     COMM    TID    LAT (us)
19:20:22 stress 1063596 105001
19:20:24 stress 1063599 101999
19:20:24 stress 1063599 100537
19:20:24 stress 1063616 101000
19:20:24 stress 1063605 100359
19:20:24 stress 1063575 102001
19:20:24 stress 1063570 102001
19:20:24 stress 1063591 102005
19:20:25 stress 1063562 112699
19:20:25 stress 1063576 131656
19:20:25 stress 1063574 102000
19:20:31 stress 1063565 100999
19:20:31 stress 1063612 100998
19:20:31 stress 1063593 101001

“Stress” is application I used to cause high CPU usage – and we can see Linux being pretty smart here and now “interactive” applications are showing high scheduling latency in this example

Command Line Options

# runqlat –help

Usage: runqlat [OPTION…]

Summarize run queue (scheduler) latency as a histogram.

USAGE: runqlat [–help] [-T] [-m] [–pidnss] [-L] [-P] [-p PID] [interval] [count] [-c CG]

EXAMPLES:

runqlat # summarize run queue latency as a histogram

runqlat 1 10 # print 1 second summaries, 10 times

runqlat -mT 1 # 1s summaries, milliseconds, and timestamps

runqlat -P # show each PID separately

runqlat -p 185 # trace PID 185 only

runqlat -c CG # Trace process under cgroupsPath CG

-c, –cgroup=/sys/fs/cgroup/unified

Trace process in cgroup path

-L, –tids Print a histogram per thread ID

-m, –milliseconds Millisecond histogram

–pidnss Print a histogram per PID namespace

-p, –pid=PID Trace this PID only

-P, –pids Print a histogram per process ID

-T, –timestamp Include timestamp on output

-v, –verbose Verbose debug output

-?, –help Give this help list

–usage Give a short usage message

-V, –version Print program version

# runqslower –help

Usage: runqslower [OPTION…]

Trace high run queue latency.

USAGE: runqslower [–help] [-p PID] [-t TID] [-P] [min_us]

EXAMPLES:

runqslower # trace latency higher than 10000 us (default)

runqslower 1000 # trace latency higher than 1000 us

runqslower -p 123 # trace pid 123

runqslower -t 123 # trace tid 123 (use for threads only)

runqslower -P # also show previous task name and TID

-p, –pid=PID Process PID to trace

-P, –previous also show previous task name and TID

-t, –tid=TID Thread TID to trace

-v, –verbose Verbose debug output

-?, –help Give this help list

–usage Give a short usage message

-V, –version Print program version

Related Features in Coroot

Coroot uses Linux Delay Accounting to help to understand if CPU starvation is a limiting factor for service performance. As of Coroot 1.4.1 Coroot does not provide distribution latency information or detects specific outliers.