Formal Verification: From Programs to Formulas
Imagine being able to prove that, no matter what inputs your program receives, certain properties or invariants will always hold by the end of the execution. This is what formal verification allows us to do, and in this article, we'll explore how to convert a program into logical formulas that we can then solve to create proofs about it.
The general steps involved in this process are:
Implementing an interpreter for a simple imperative language (IMP).
Implementing a parser that turns IMP programs into ASTs.
Converting IMP ASTs into logical formulas (SMT formulas), also represented as an AST.
Embedding the properties we want to prove into the logical formulas.
Passing the logical formulas to an SMT solver (Z3), which will determine whether the properties we specified hold true for every possible input to the program.
In this article, we'll see how to implement all the previous steps intuitively. Moreover, you'll find a link to the code on GitHub at the end of the article. If there's enough interest, I'll create a follow-up article going through that code step by step. Let's begin!
Prerequisites
If you are reading this, I assume you already have a good understanding of general programming concepts. In addition, I suggest having a basic understanding of what a parser and an AST are before reading this article.
Converting the program into an AST
First, we need to define the structure of the IMP language that we are going to be working with. Here's an example program written in the IMP language:
balance := 0;
bonus := 10;
if (1000 <= deposit) {
balance := balance + deposit * bonus
} else {
balance := balance + deposit
}
This program receives a deposit
amount, and adds it to the balance
. If the deposit
is greater than 1000
, then it gets multiplied by a bonus
factor. We'll see how we can formally verify invariants such as balance >= 0
, for every possible deposit
amount.
To convert the program into logical formulas, we first need to convert it from text to a representation in which we can manipulate it better: an AST. Let's see a simple example on how a program is represented as an AST:
a := 1;
if (a == 1) {
a := 2
} else {
a := 3
}
We can see the AST is just a tree with different types of nodes. For instance:
Seq
represents a sequence of two statements. Thus, it has two child nodes.Set
represents a variable assignment and holds the variable name. It has one child node that represents the value of the variable.Lit
represents an integer literal. It doesn't have any child nodes, so we call it a leaf of the tree.If
represents anif-then-else
statement, and has three child nodes: the condition, the statement in theif
branch, and the statement in theelse
branch.
Transforming the IMP AST into a logical formula
Now that we have an IMP AST, we have to convert it to a logical formula, which can be also represented as an AST. We'll call the latter the Z3 AST since we'll use a tool called Z3 to solve the formulas (we'll see exactly what solving the formulas means in the next sections).
Let's see how the IMP AST we saw before translates to a Z3 AST:
Although this doesn’t look exactly as an AST, the structure is fairly similar to the one of the IMP AST. The main differences are:
We use node types provided by the Z3 API.
The
bv
prefix stands for bit-vectors, which are used to represent numbers as an array of bits. In this case, we are using arrays of32
bits for variables and literals.💡There are different types of SMT formulas, which are called theories. In this case, we are using the theory of fixed-width bit-vectors.
Each statement converts to a separate tree; that's why we see one tree for the initial variable assignment and one "tree" for the
if-then-else
(ite).- All trees, however, are part of the same Z3 computation, which represents the logical formula that will be solved at the end.
Variable assignments are represented with a combination of
Z3.eq(var, value)
andZ3.assert(condition)
.For example, the assignment
a := 1
translates toZ3.assert(Z3.eq("a", 1))
.💡Z3.assert
adds a constraint to the Z3 computation, which links the variable with its corresponding value.
Each time a variable is assigned to, we create a fresh variable with the same name, and Z3 assigns it an incremental
id
that differentiates it from the variable before the assignment.💡This is known as the Static Single-Assignment form (SSA).The variable
a!3
represents the value ofa
after theif-then-else
, and encodes all possible values ofa
considering both execution paths (the one where theif
branch is executed, and the one where theelse
branch is executed instead).💡In the example program we are considering, we already know what branch will be executed since there are no free variables (i.e. all variables have a fixed initial value). However, we'll then see an example in which the branch to be executed is unknown until runtime, so the solver has to consider all possible outcomes to determine whether a property holds true for every initial state of the program.
Adding constraints to the logical formula
Once we have the Z3 AST (i.e. the logical formula that encodes the IMP program), we can add aditional constraints to it. These constraints will represent the properties or invariants that we want to formally prove about our program.
a := 1; // a!0
if (a == 1) {
a := 2 // a!1
} else {
a := 3 // a!2
}
// a!3
In the example program we are considering, we could add a constraint on the final value of a
. For this, we would have to constrain the a!3
variable, as we saw in the Z3 AST before. For instance, we could add the constraint a!3 == 2
, which should hold true given the logic of the program.
Now we are finally ready to provide the Z3 solver with the Z3 AST plus the final constraints we added. The solver's job will be to find one possible assignment to all the variables in the program (namely a!0
, a!1
, a!2
, a!3
) that satisfies all the constraints (i.e. the ones in the Z3 AST, and the ones we added at the end).
If the solver finds an assignment for all the variables that satisfies all constraints, it will output those assignments.
If it can't find an assignment that satisfies all constraints, it will tell us that the logical formula is unsatisfiable.
Running the solver to prove properties
Let's go back to the first example program we saw to put everything together:
balance := 0;
bonus := 10;
if (1000 <= deposit) {
balance := balance + deposit * bonus
} else {
balance := balance + deposit
}
There are three variables in this program: balance
, bonus
and deposit
. Note that deposit
doesn't have an initial value (i.e. it's a free variable), so we'll let Z3 explore all possible initial values to prove some properties about this program.
Generating counterexamples
First, we'll want to prove that for every non-negative deposit
amount, after the program executes, the balance
is always non-negative as well. For this, we need to impose two constraints: one for the initial value of deposit
, and one for the final value of balance
. We can do so by using Z3.assert
as we saw before:
The box on the right is the Z3 AST representation of the program along with its corresponding constraints, which is generated with the process we discussed before.
We add two additional constraints (the two boxes on the left) that encode the property we want to prove about the program:
We constrain the initial value of
deposit
(i.e.deposit!0
) to be greater than or equal to 0, usingZ3.bvSge
(signed>=
for bit-vectors).We constrain the final value of
balance
(i.e.balance!k
, wherek
is just the internal id Z3 will assign to that final variable) to be less than or equal to-1
, usingZ3.bvSle
(signed<=
for bit-vectors).💡Note that we want to prove thatbalance >= 0
, but we are actually adding the constraintbalance <= -1
, which is the opposite. This is because if the solver finds an initial value fordeposit
that satisfiesbalance <= -1
at the end of the execution, we want it to output it as a counterexample. If the solver can't find such initial value, it will outputUnsatisfiable
and we'll know there's no initial value fordeposit
that makes thebalance
negative, which is what we want to prove.
If we finally run the Z3 solver passing it the final logical formula we just built, it will output the following:
deposit!0 -> #x7fffffff
deposit!8 -> #x7fffffff
bonus!7 -> #x0000000a
balance!6 -> #xfffffff6
balance!5 -> #x7fffffff
balance!4 -> #xfffffff6
bonus!3 -> #x0000000a
balance!2 -> #x00000000
It gave us a counterexample! This means the property balance >= 0
does not hold for every possible initial deposit
. Looking at the value the Z3 solver assigned to the deposit!0
variable (i.e. the initial value for deposit
), we can see it's the maximum signed integer value using 32 bits (i.e. 2147483647
). That value will create an overflow on the balance
variable on the following line:
balance := balance + deposit * bonus
This will make the balance
negative, satisfying the constraint we imposed.
Let's also understand what the rest of the variable assignments actually are:
balance!2
andbonus!3
are the variables that represent the initialbalance
andbonus
at the start of the program:balance := 0; bonus := 10;
bonus!3 -> #x0000000a balance!2 -> #x00000000
balance!4
andbalance!5
represent thebalance
within theif
andelse
branches, respectively.if (1000 <= deposit) { balance := balance + deposit * bonus } else { balance := balance + deposit }
balance!5 -> #x7fffffff balance!4 -> #xfffffff6
💡Note thatbalance!4
is where the overflow happens, since0xfffffff6 == -10
in signed two's complement.deposit:8
,bonus!7
andbalance:6
represent the final state of the program, after theif-then-else
statement.deposit!8 -> #x7fffffff bonus!7 -> #x0000000a balance!6 -> #xfffffff6
Both
deposit
andbonus
will keep their original value since they are constants, andbalance
will be assigned the value ofbalance!4
, since theif
branch will be taken in this particular case.
bonus!3
and bonus!7
are completely different variables; what ends up making them have the same value at the end are the constraints created during the Z3 AST construction.Proving a property for all possible inputs
Let's conclude with an example where the Z3 solver cannot find a variable assignment to satisfy all constraints in the Z3 computation.
balance := 0;
bonus := 10;
if (1000 <= deposit) {
balance := balance + deposit * bonus
} else {
balance := balance + deposit
}
We now want to prove that, no matter what the deposit
amount is, it's impossible for the balance
to be 1000
at the end of the program's execution.
Using the same technique as before, we now impose the constraint balance == 1000
.
Running the Z3 solver, we'll now get:
Unsat
This means that it's impossible to create an assignment for all the variables in the Z3 computation such that all constraints are satisfied at the same time.
Therefore, the property balance != 1000
holds true for every possible deposit
amount.
Conclusion
In this article, we saw how to turn an imperative program into a logical formula (encoded using Z3's API), and then prove properties about that program by encoding these properties as constraints to the Z3 solver.
Let me know if you'd be interested in a follow-up article explaining the actual code that implements all of this from scratch. Otherwise, the code is available on GitHub for you to see on your own. Although it's implemented in Haskell (just like HEVM), it shouldn't be hard to understand it at a high level if you understood the process we went through in this article.
Until the next research project!
References
Subscribe to my newsletter
Read articles from Ezequiel Perez directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by