QSym - A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

This paper describes QSym, which aims to improve the speed of concolic execution engines by using "dynamic binary translation" rather than interpreting an IR. It also introduces some soundness compromises and other optimizations that the authors believe are sensible to maximize concolic execution utility.

The authors start by noting that concolic execution is often too slow to be used on real-world codebases, largely because of interpreter overhead. They note three main contributions: improved concolic execution speed, the elimination of snapshots, and new hybrid fuzzing heuristics.

Slow Symbolic Emulation

The authors demostrate that two cutting edge symbolic execution engines are much slower than native even for programs with no branching or complex path constraints. They claim that this is because both engines emulate an IR rather than native assembly. For evidence, they show that angr's VEX IR has on average 4.69x as many instructions as the equivalent machine code.

Additionally, they show that using IR rather than machine code prevents a certain interpreter optimization. A common optimization is to skip symbolically executing basic blocks that do not deal with symbolic variables. By performing this optimization with instruction-level granularity, they are able to skip many more instructions. However, using IR prevents this because the IR must be obtained from machine-code, and to avoid repeating this translation most engines cache it on the basic-block level. Caching each instruction individually would greatly impact cache performance.

QSym avoids this pitfall by executing machine code directly instead of using an IR. Then, they are able to skip instructions individually, allowing QSym to waste much less time. The authors claim that the implementation complexity is worth the speedup.

I wonder if it would be possible for the cache to split basic blocks into parts that use symbolic variables and parts that don't. My first instinct is that it would reduce the cache effectiveness too much, so QSym's approach seems reasonable.

Snapshot Overhead

The authors note that most state of the art concolic execution engines avoid re-executing code for every branch by taking a snapshot of the program state and backtracking. They give two reasons that snapshotting is not effective:

TODO: I don't understand the first point ("fuzzing input does not share a common branch").
Snapshots can't model external state

QSym remediates this by completely dodging it and avoiding snapshots altogether. The authors claim that this is reasonable because of QSym's improved interpretation speed.

Soundness Relaxation

Most concolic execution engines prioritize soundness in that any constraint set is guaranteed to execute the expected path. Thus, there are no false positives, reducing developer headache and making them more inclined to use the tool, since the worst-case is that it finds no bugs.

The authors note that completely constraining a path often results in extremely complex constraint sets, leading to really slow solving. Additionally, they note that a path can be over-constrained, leading to the engine to missing a path. They provide an example on page 749.

To solve this, QSym again just decides to relax the soundness requirement. Instead of solving the whole constraint set, it just solves an easy subset such that it is likely but not guaranteed to execute the expected path. I'm not sure how they address false positives.

xs

The rest of the paper explains the design and implementation in a little more detail, and provides experimental results. It's clear that QSym's contributions are important because it found many bugs in real-world programs, but they also seem to dodge the problem in a lot of ways.

Mikail Khan