I have covered the basics of a SAT solver and their working mechanism in detail in the first part, in this part I'll try to delve deep-inside SMT theory and how they function being built on-top of a SAT solver.

Basic Terminologies
Satisfiability of Quantifier-free formulas
Some other theories

Basic Terminologies

Quantifiers

They are basically logical symbols that are used to express "statements" over the variables of the particular domain. A domain is simply a set of values the variable is allowed to take.

∀ *for all : indicates that a statement holds true for all the variables in the domain.

∃ *there exists : indicates that a statement holds true for at least one element in the domain.

Example:

∀x (even(x) ∨ odd(x))

This denotes for all "x", they are either even or odd.

Quantifier-free logic

A formula is quantifier-free if it does not contain any quantifiers, instead, all variables are assumed to be ranged between their domain implicitly.

Say we have a statement, For all integers x, if x > 0, then x + 3 > x..

It could be expressed like:

∀x ∈ Int. (x > 0 → 2x + 3 ≥ x)

Here we are not specifying any assignments, the logic would have to reason of every possible value of x.

But in quantifier-free logic, it could be expressed as:

4 + 3 ≥ 4

whilst, we are just asking, "Does the above formula, holds true for our assignment?".

Why am i telling this? Well, most of the quantified formulas may become undecidable, and solving quantifier-free formulas is much more efficient for modern solvers.

Linear Arithmetic

There are 2 theories of linear arithmetic:

Linear Integer Arithmetic (LIA) and
Linear Real Arithmetic (LRA).

Their signature consists of types Int and Real respectively.

Their represented:

constants are 0 and 1,
binary functions are +, -,
the predicates < and =

Here, you cannot multiply with a variable, as that will be considered non-linear arithmetic, but multiplication with constants is possible as 2x can be represented as x + x. Both of these theories, satisfiability is decidable.

In general, they can be NP-complete or worse.

It could be written as

c1·x₁ + c2·x2 + ... + cn·xn ⋈ d

where

⋈ = {=, >=, <=, >, <}, "x" is variable and "c" and "d" are constants

Difference Logic is a further restricted subset of Linear Arithmetic where formulas only involve differences between two variables compared to a constant. There's also Integer Difference Logic and Real Difference Logic.

They are P-time solvable.

They could be shown as:

x - y ⋈ c.

Bit-Vectors

They are what they've been named of, a vector of bits. Each bit-vector variable belongs to the domain 2^n where n is the bit-width, e.g.: a bit-vector of length n is a string 'x1,x2...xn', containing the binary alphabets {0, 1}.

Their signature have many functionalities and operations like:

Arithmetic (bvadd, bvsub, etc.)
Bitwise (bvor, bvxor, etc.)
Shifts (bvshl, bvshr, etc)
Comparisons ... etc.

All these operations are mod-2^n, so the overflows or underflows wrap around in shifts or other operations.

A high-level formula might look like:

x + y = z, where x, y, z ∈ BV8.

Equality and Uninterpreted Functions (EUF)

This deals with equalities, built from uninterpreted functions, in-enhance- no fixed interpretation for the function is assumed, or their behavior is completely generic.

Their signature:

Functions: Symbols like - f(x), g(x, y) ...etc.
Predicates: Only equality = is interpreted, all others are uninterpreted.

An example might look like:

g(x, y) = g(f(x), y) ∧ f(x) != x

This is satisfiable if the functions behave abstractly, like

f(a) = b, g(a, b) = g(b, b)

The quantifier-free formulas are decidable, however formula including quantifiers are undecidable.

Arrays

Well, as the name suggests. They give a functional interface rather than imperative memory updates.

Their signatures are:

Sorts:

idx: Index sorts (e.g. integers, addresses),
elem: element sort (e.g. integers, booleans),
arr: the arrays

Functions:

read: arr * idx -> elem
write: arr * idx * elem -> arr (updated)

There are axioms like capturing array equality (through index) but iI am not getting more into this as you probably already have idea what's this about.

Satisfiability of Quantifier-free formulas

The Eager Approach

The eager translates high-level constraints or theories to pure SAT problems (boolean logic formula). This transformation allows the problem to be handed-off directly to SAT solver.

Though bit-vector logic is naturally suitable for eager approach, because all operations are defined bit-wise, so even if we have linear arithmetic logic, we may represent variables as bit-vectors.

The method used here is bit-blasting, where instead of dealing with the vector as a whole, we deal with individual bits.

For example, if we have a 4-bit bit-vector:

z = x + y

We can represent as:

x = x0, x1, x2, x3 ... and so on for other variables.

For the above example, we first want to encode it to CNF so our SAT solver can understand it.

First the addition is done using Full Adder Circuits.
Next, the transformation of complex logic circuits like XOR, to simple CNF that our SAT solver can understand is done through Tseitin Transformation. For example if we have
- ```
      s0 = x0 ⊕ y0 => ~x0 ∨ y0 s0.
```
After we get the constraints, we simply add the unit clause for our CNF.

The Lazy Approach (CDCL (T))

The eager approach is not very natural at solving theories, because it is not possible for some non-linear arithmetic equations to be transformed into boolean logics.

Here in short:

SAT solver takes care of the propositional logic, treating the theory clause(atoms to be precise) as boolean variables.
A dedicated theory solver extends the SAT solver handling the theory part.

Consider a linear arithmetic formula:

phi = (x + y <= 3) ∧ ( x − y >= 5)

Now the boolean abstraction would give us

x1 = ( x + y <= 3 )
x2 = ( x − y >= 5 )

// hence 
phi = x1 ∧ x2

If the SAT solver considers both the abstractions to be true; the corresponding theory assignment would be:

M = {x + y <= 3, x - y >= 5}

The theory is then solved by the dedicated solver, which is just plain maths so i am not going into that.

The rest of the process is similar to a SAT solver, which you can read in my previous post in-case you missed something. This can also be applied for bit-vector theory.

Congruence closure in uninterpreted functions

The congruence rule says that, consider 2 tuples

{x1...xn} and {v1...vn}

if they are equivalent, then,

f({x1...xn}) == f({v1...vn})

I am giving you an example:

Consider a relationship:

equiv_1 = {
    { x1, x2, x3 }, 
    { f(x1), f(x3) },
    { g(x2, f(x3)) },
    { g(x2, f(x1)) }
}

This however, is not under function congruence because

f(x1) == f(x3)
x2 == x2 # trivially true

This is fine, but

g(x2, f(x1)) == g(x2, f(x3))

This should be true, but the given relationship does not tells us that exclusively.

To fix this, we consider another relation that is a union of the previous relation and the new equality we need

equiv_2 = equiv_1 U g(x2, f(x1)) == g(x2, f(x3))

Now this is closed under function congruence.

You can also implement this equivalence class using union-find data structure, which basically keeps track of the equivalent elements in the same tree.

This is actually the core part of the solver, it needs to keep track of equivalence class of the terms, and sub-terms. When equality between two terms or elements is asserted, the logic:

merges the corresponding classes,
applies the congruence rule until the relation is closed under it,
checks if any of recorded disequality, from previous, is violated.

When disequality is asserted:

checks if the terms are equivalent,
records the disequality.

Difference logic

I've already explained what it is in the previous post. There's a lot of normalization rules like rewriting

x == y + c to (x <= y + c) ∧ (x >= y + c)

i won't be listing all of them as it will be a long post then.

Consider i have

x <= y + k

We can obtain such solver for this that can be a constraint graph meaning from the above:

edge from x to y with weight k

If i give an example, consider:

constraints = [
    (x1, x3, -6),
    (x1, x4, -3),
    (x2, x1, 3),
    (x3, x2, 2),
    (x3, x4, -1),
    (x4, x2, 5)
]

Where for example, the first one means edge from x1 to x3 with weight -6

Hence the cycle would be

x1 --> x3 --> x2 --> x1

It's total weight is -6 + 2 + 3 = -1

The negative means the equation is unsatisfiable, but a positive means satisfiable.

Some other theories

Non-linear arithmetic

The satisfiability of non-linear arithmetic over integers is undecidable as of Hilbert's tenth problem, though, it is decidable for reals. You can do it through cylindrical algebraic decomposition method.

Theory Combination (Nelson–Oppen)

If we're working with multiple theories, we need a way to combine theory solvers to decide satisfiability.

The Nelson-Oppen Combination works when:

The theories are stably infinite (if satisfiable, they have infinite models.)
The signatures are disjoint (no shared function symbols).

The solving might go like, suppose we have a formula that uses both arrays and integer arithmetic:

We split the formula into theory-specific parts:

  A: a[i] = v        ← array part  
  L: i < 5           ← arithmetic part

We have variables that link them, called interface variables.
Then after separately solving the models, we exchange equalities, this might lead to conflict and involves backtracking and other process.

Though i am not going deep into this, i just mentioned it so if you might be interested you can research on it.

Quantifier based formulas

To solve this, we have a lot of methods that are unique to each other, some of them are:

Quantifier Instantiation (E-Matching):
- Here we initialize quantified formulas with terms that appear in the formula like
```
  ∀x. P(x) --> Q(x) and P(a)
```
- instantiate with x = a --> P(a) -> Q(a)
- This is called equality-aware matching.
Model-Based Quantifier Instantiation:
- Here we build separate model for the quantifier-free part,
- Test if the quantified formula holds True in that model.
- If it doesn't, we generate a counterexample and instantiate the quantifier accordingly.
- This is used in Z3.

Though this is all for this blog, in the next part i will get the hands on for building a solver from scratch in code and i will try to explain accordingly.

An SMT solver from scratch - Part 2: SMT Theories and Logics

Table of Contents