CUTE - A Concolic Unit Testing Engine for C
This paper describes CUTE, a concolic execution engine that can be used to generate inputs that will test as many possible program execution paths as possible, especially in programs that make use of advanced dynamic data structures with pointer operations.
First, the paper introduces unit testing and symbolic execution. Unit testing is commonly understood, but it's not incredibly effective unless the author has a really good undersrtanding of possible failure cases, in which case it's still mostly useful for regression testing anyway since said author could just double check their code.
Symbolic Execution
Symbolic execution is much more interesting. By executing the program with symbolic values instead of concrete ones, it is possible for an interpreter to authomatically find every possible execution path of a program. By building and solving constraints for each path, it can also generate an ideal suite of unit tests. However, it is rarely possible to actually find every path. Thus, most symbolic execution research aims to in some way improve the speed at which unique paths are found so that we can maximize test coverage with a finite execution time. This is usually done by either optimizing interpretation itself, or by choosing better paths to explore.
Concolic Execution / DART
This paper builds upon previous work to optimize symbolic execution by merging concrete execution with symbolic execution, most directly DART (Godefroid et al.). DART starts by executing the program with concrete values. However, alongside the normal concrete execution, it also builds the symbolic constraints that result in the path it follows.
An example:
int some_func(int x) {
// Concrete input (chosen randomly) - {x: 10}
// Symbolic Constraints: []
if (x < 0) {
// block skipped
return 1 / x;
} else {
// Symbolic Constraints: [!(x < 0)]
if (x < 15) {
// Symbolic Constraints: [!(x < 0), x < 15]
return 1 / (x - 15);
} else {
// block skipped
return 1 / (x - 10);
}
}
}
The final symbolic constraints for the path are: [!(x < 0), x < 15]
. Executing a program
with an input fitting the same constraints would probably redundantly execute the same path.
However, Godefroid et al. claim that negating any of the constraints in the set will likely
result in following new execution path. Thus, we can create a constraint set [x < 0, x < 15]
,
and randomly generate an input fitting these constraints {x: -5}
. Repeating this with the four
combinations of this constraint set only results in one redundant input, since there are three
possible execution paths. This is evident without actually executing the program by noting that
(x < 15) -> (x < 0)
, so I wonder if it is worth pre-emptively skipping the constraint [x < 0, x < 15]
.
CUTE
The authors of CUTE note that DART falls apart in programs with dynamic data structures and pointer arithmetic because "alias analysis may only be approximate in the presence of pointer arithmetic, using symbolic values to precisely track such pointers may result in constraints whose satisfaction is undecidable."
CUTE remediates this by separating pointers constraints from other constraints, using an approximation
to make the constraints solvable. The example in the paper is that a constraint on the variable p->f
, where p is a pointer to a struct and f is a field on that struct, would be simplified to a constraint
on a new symbolic variable f
. From my understanding, this approximation fails in the following case:
struct some_struct {
int a;
int f;
};
int some_func(some_struct* p) {
// constraints: []
// random concrete inputs: {p: some_struct {a: 4, f: 12}}
*(p+4) = p->f - 12; // assume *(p+4) is the same as p->f
// concrete values: {p: some_struct {a: 4, f: 0}}
if (p->f > 0) {
// block skipped, since the concrete p->f == 0
return 1 / 0;
} else {
// constraints: [!(p->f > 0)]
return 10;
}
}
The final constraint set is [!(p->f) > 0]
. Without the approximation, it would be something like
[!(p->f - 12) > 0]
. Negating this constraint yields [p->f > 0]
, so a possible input is
p: some_struct {a: 4, f: 6}
, which follows the same execution path and misses the error.
xs
The authors claim that, despite this approximation, CUTE is effective enough in practice. The rest of the paper describes CUTE's implementation, provide evidence of its effectiveness including a few bugs found in SGLIB, and then introduces some related work and concludes.