Combining the top-down propagation and bottom-up enumeration for inductive program synthesis
This post is in progressThis paper describes an algorithm for programming-by-example (PBE). I've implemented it in this repo, and it's compiled to WASM accessible here: https://mkhan45.github.io/learning_synthesis/.
Because I've implemented the algorithm, this post is meant more to explain the algorithm than to summarize the paper. Some sections are collapsible, so you can understand the broad strokes of the algorithm without getting bogged down in the details. Some details may also be collapsed if I expect some readers to understand it already.
Prerequisites
This post will make a lot more sense if you understand
- ASTs
-
Basic Interpreters
- At risk of overconfidence, a lot of these are mine
- Choose a language: https://github.com/mkhan45/expr_eval
- I've been told this is helpful: https://blog.mikail-khan.com/post/json-interpreter,
- but I think this is better: https://blog.mikail-khan.com/post/expression-evaluator
What's PBE?
PBE is synthesis of programs from input-output examples. The user provides a set of inputs and their corresponding outputs, and the synthesizer tries to find a correct program. The premier example of this is FlashFill in Excel, which is what lets you autofill a whole column based on a few examples. FlashFill first synthesizes a program from the given output rows, and then runs that program on the remaining inputs to autofill the outputs.
You can try PBE on my implementation of Duet here: https://mkhan45.github.io/learning_synthesis/. If you have Excel, try out its FlashFill, which is much more powerful.
Bottom-up enumeration
One component of Duet is bottom-up enumeration. It's super simple; just construct all possible programs from the grammar, and then check if any of them are correct. It's obviously incredibly slow, but given enough time (maybe until past the heat-death of the universe), it will find a correct program eventually.
To construct all possible programs of an expression-based language, we use BFS on a graph where each node is an expression, and edges are the operators or functions.
Below is an example, but the code might be better to understand the details. Turns out code is a pretty good specification for algorithms 🤯. With that said, the algorithm I describe below is not exactly the same as this simple version: bottom_up.rs
Example
Language
For example, let's define a simple string manipulation language that has three primitive values:-
X
a reference to the input -
'_'
a string containing just the underscore character -
0
, a constant zero, which we can use to index into strings
The language will also have a few functions:
-
concat(a, b)
which concatenatesa
andb
-
substr(x, start, end)
which returns the substring ofx
fromstart
toend
-
find(outer, inner)
which returns the index of the first occurrence ofinner
inouter
Using this language, we can write a program that takes a string like
"hello_world"
and returns "hello"
:
substr(X, 0, find(X, '_'))
Enumeration
The graph starts with one root for each primitive: Then, we iterate through each node at depth 0 (the roots), and add edges for each function, to each possible argument. So we would start my looking at the rootX
, and adding
an edge for concat
:
Then, we look at expressions in our bank that match the needed type. In this case, we need
a string, so we can use X
or '_'
. We add a separate adjaceny for each of these:
The arrows above aren't actually in the graph, they just show the complete program that the node represents.
Next, we continue this process for each function and each root. When we complete this process for the first root
we get the following graph:
There are a few things to note here: -
substr
needs more arguments, right now, the program represented by the bottom node issubstr(X, 0, ??)
. We won't add the last argument yet because it would exceed our needed depth of 1. - The branching factor is really large, and increases over time. Even with a very simple language, we can't really go beyond depth 10 or so.
- There's a lot of redundancy. This is later addressed by a data structure called a VSA.
Top-down propagation
Instead of starting with the input and trying to combine transitions to get the output, top-down propagation starts with the output and asks what transitions could have possibly produced it. To do this, we use the inverse semantics of the language.
The semantics of a language define how to forward evaluate a program; given an input, language semantics define
the output. For example, in our language, the semantics of concat
would define that
concat("hello", "world")
is "helloworld"
.
Inverse semantics are the opposite of semantics. Given an output, inverse semantics define the input. However, while languages generally define a single output for a given input, there are many possible inputs for a given output. As such, the inverse semantics of a function is a whole set of possible inputs.
For example, the inverse semantics of concat
for "abc" is the set of all argument pairs that would concatenate
to "abc": {("abc", ""), ("ab", "c"), ("a", "bc"), ("", "abc")}
.
Top-down propagation attempts to apply inverse semantics until we are left with primitives. You can think of it as building a tree from the top-down instead of bottom-up, except it's a much smaller tree than the one we built in bottom-up enumeration. If our inverse semantics are complete, then we are sure to find a program if it exists. However, it is generally difficult/slow to write complete inverse semantics, so top-down propagation is also ineffective on its own.
TODO: Example
Version-Space Algebra (VSA)
The core difficulty of program synthesis is that the search space is huge. It would be great if we could
compress it somehow. VSAs do exactly that by observing that many programs have the same output on a given
input. For example, in our string language, given the input "hello_world"
, the following programs all
have the same output of "hello"
: substr(X, 0, find(X, '_'))
, substr(X, 0, 5)
, and hardcoded "hello"
.
For a given input, we can treat all of these programs as equivalent, combining them into a single node.
This is not just true for the program as a whole, but also for every subexpression. Thus, a VSA
is like an AST where every node is a set of equivalent programs.
Then, the basic VSA synthesis algorithm is as follows:
- For each input, construct a VSA of all possible programs that go from the given input to its corresponding output.
- Intersect every resulting VSA. The resulting VSA will result in a general VSA that works for all inputs.
- Choose a program from the VSA; all of them will work for the provided I/O examples. However, it is often a good idea to use a heuristic to choose the best program. The general idea is to choose the smallest program with the least hardcoding.
The default version of VSA synthesis uses top-down propagation for step 1. At each step, we take every possible input from the inverse semantics and unify them in a VSA. This exponentially reduces the space needed, since we are only storing one node for each set of equivalent programs. However, it's still ineffective when we can't make fast complete inverse semantics. Thus, Duet uses bottom-up enumeration to fill in the gaps; along with using the inverse semantics, we also build a bank of all possible programs of a given size using bottom-up enumeration. Then, we add programs from the bank into their corresponding VSAs.
Full Example
TODO