It's been a long time goal of mine to allow automatic generation of Flame Graphs inside sbt-jmh just by passing an option. It's a bit tricky with a few moving parts that have to align to measure the right thing, and invoke all the scripts etc. While I attempted to solve it a few times, never had the time to really nail it.

Thankfully, thanks to the Lightbend's Scala Team's focus on performance recently, and them also adopting sbt-jmh for all the performance and performance-regression testing of Scala Jason Zaugg yesterday completed Support async-profiler, improve JFR profiler #135 so we're able to automatically generate flames for any benchmark you're running using sbt-jmh just by passing some options.

It works on Linux via perf, and also via async-profiler on Linux or Mac.

In this port I'll post a quick tutorial how to make use of the new async-profiler integration to generate some flame graphs.

Prepare async-profiler

In order to use this tool you have to compile it, which is simple enough really:

# clone async profiler
$ git clone https://github.com/jvm-profiling-tools/async-profiler && cd async-profiler

# export the JAVA_HOME you want to build against
ktoso @ 三日月~/code/async-profiler [master*]  
$ export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/

# make the profiler
ktoso @ 三日月~/code/async-profiler [master*]  
$ make

# you should end up with these files being created:
$ ls build/
jattach*             libasyncProfiler.so*  

Next you'll want to export paths to the tools we'll be using:

export ASYNC_PROFILER_DIR=$HOME/code/async-profiler  
export FLAME_GRAPH_DIR=$HOME/code/FlameGraph  

I recommend putting these into your .bash_profile for example. You can also set it on every invocation of the benchmark as we'll show below, but these env variables are picked up by the extras profiler impl.

Run benchmarks using the jmh-extras profiler

We're done setting things up! So let's run some example benchmark, I just picked some random ByteString benchmark from Akka's codebase so we can run the benchmark like this:

> sbt shell
> project akka-bench-jmh
> jmh:run -f1 -wi 10 -i 20 akka.util.ByteString_drop.* -prof jmh.extras.Async:flameGraphOpts=--minwidth,2;verbose=true;asyncProfilerDir=/Users/ktoso/code/async-profiler

Notice that we can either use the asyncProfilerDir option here, or rely on the env variables we've just set (I'm just showing the option here for your information).

And that's pretty much it, benchmarks will run and you'll notice the additional output like this:

[info] # Benchmark: akka.util.ByteString_dropRight_Benchmark.bss_avg
[info] # Run progress: 0.00% complete, ETA 00:03:00
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 514233.790 ops/s
[info] # Warmup Iteration   2: 634550.494 ops/s
>>> [info] Iteration   1: /Users/ktoso/code/async-profiler/build/jattach 91981 load /Users/ktoso/code/async-profiler/build/libasyncProfiler.so true start,event=cpu,framebuf=8388608
>>> [info] Connected to remote JVM
>>> [info] Response code = Started [cpu] profiling
...

And once the run completes you'll see the secondary output, which are the flamegraphs generated (disregard the results though, this was run on a dying on battery laptop while watching a movie, on an airplane...):

[info]   8874162.427 ±(99.9%) 5804032.164 ops/s [Average]
[info]   (min, avg, max) = (7915524.239, 8874162.427, 10007799.748), stdev = 898180.402
[info]   CI (99.9%): [3070130.263, 14678194.591] (assumes normal distribution)
[info] Secondary result "akka.util.ByteString_dropSliceTake_Benchmark.bss_large_dropRight_256:async-profiler":
[info] /var/folders/zx/qxkr88kn5rjb9lqzxdvh6m100000gn/T/akka.util.ByteString_dropSliceTake_Benchmark.bss_large_dropRight_256-Throughput6220345527746181021/collapsed-cpu.txt
[info] /var/folders/zx/qxkr88kn5rjb9lqzxdvh6m100000gn/T/akka.util.ByteString_dropSliceTake_Benchmark.bss_large_dropRight_256-Throughput6220345527746181021/summary.txt

# The flame graph:
[info] /var/folders/zx/qxkr88kn5rjb9lqzxdvh6m100000gn/T/akka.util.ByteString_dropSliceTake_Benchmark.bss_large_dropRight_256-Throughput6220345527746181021/flame-graph-cpu.svg

# The "reverse" flame graph:
[info] /var/folders/zx/qxkr88kn5rjb9lqzxdvh6m100000gn/T/akka.util.ByteString_dropSliceTake_Benchmark.bss_large_dropRight_256-Throughput6220345527746181021/flame-graph-cpu-reverse.svg

We're most interested in the svg file for now, since that's the flamegraph we can inspect. Pasting below the screenshot for some "eyecandy", but you know what to do from here! Make sure the benchmark is good, and keep digging into the benchmark's results and flamegraphs. That's a huge topic by itself though :-)

flames

So that's it -- automatically generated flame graphs from your sbt-jmh benchmarks! This is available by using the 0.3.0 version of the plugin. And the extras profilers which make this magic happen are available on maven central as: pl.project13.scala.sbt-jmh-extras -- though please note that the implementation is pure Java, and you could use these tools in a plain Java project without depending on any scala or sbt at all.

Happy hakking!