Using SMT to check specifications for errors
SMT solvers are incredibly flexible tools for analyzing complex systems. In my previous post, I showed how you can: generate a symbolic execution trace from running an instrumented interpreter on some input values; turn the trace into an SMT circuit; and use an SMT solver to check that the SMT circuit matches your original trace. This post will explore how we can check assertions, array index bounds, etc. to find bugs in the specification and how we can enumerate all the paths through a specification.
Checking assertions
The “AddWithCarry” function that I am using as my running example does not contain any assertions — but it does contain this line
result = unsigned_sum[0 +: N]; // same value as signed_sum[0 +: N]
So, for the sake of continuing the example, let’s turn that comment into an assertion
assert unsigned_sum[0 +: N] == signed_sum[0 +: N];
In the execution trace, this will generate something like this:
%14 = getslice_bits(0, 32, %8)
%t15 = getslice_bits(0, 32, %13)
%t16 = eq_bits(%14, %t15)
assert %t17 "functions/AddWithCarry.asl" 8
Saying that line 8 of file “functions/AddWithCarry.asl” asserts that condition %t16
holds.
As we are translating the trace to SMT, we can generate a metadata file
that contains information about this assertion:
AWC.json:
{
"assertions" : [
("t17", "functions/AddWithCarry.asl", 8)
]
}
We can use this list of assertions to check that the assertions all hold for all possible inputs that follow the same path
; only consider inputs that lead to this path
(assert (not v16))
(assert v19)
; check for any input that causes the assertion to fail
(assert (not t17))
When we run z3 on this check, it can’t find any failing inputs
$ z3 t2.smt
unsat
This is good news: it means that the comment is actually true!
Implicit assertions
Of course, programs have lots of implicit assertions too:
- array indexes must be in bounds
- bitslice indexes must be in bounds
- bitslices must have a non-negative width
- division by zero is not allowed
For each of these, we can add checks to check for failures.
For example, given the expression x[N-1]
, we will generate
a check that 0 <= N-1 < N
.
This is going to result in a lot of checks — many of which will trivially
pass. But SMT solvers are very efficient at dealing with trivial checks like
this so let’s just go ahead and spray the code with checks like this, use Z3 to
check them all and then, once we know that they all pass we could generate
a fresh SMT file that omits the checks. (Or maybe we just leave them in?)
Finding other paths
So far, all we are doing is exploring one path at a time. So we have checked that an assertion does not fail on one path but that does not guarantee that it does not fail on any of the other paths that may exist.
This series of threads is focussed on concolic execution where we generate a symbolic execution trace by executing an instrumented interpreter on some concrete input values. On the running example from the previous post, the trace recorded that it hit two conditional branches
branch %16 = FALSE
branch %19 = TRUE
The trace is only valid for inputs that would take the same path so I added these assertions to restrict execution to the same path through the code.
(assert (not v16))
(assert v19)
More generally, I might have a long list of branches b1, b2, b3, … bm. And what I want to do is to explore paths that differ in at least one of these conditions. So I am going to use the SMT solver to try to find several different branch condition sequences:
!b1
b1, !b2
b1, b2, !b3
...
b1, b2, b3, ... !bm
On my running example, this gives me two sets of assertions to try:
(assert v16)
and
(assert (not v16))
(assert (not v19))
Running with these two sets of assertions generates two sets of input values
#xfffffffe #x00000001 #b0
#xb9d1fe83 #xc60f00ff #b1
Once we have these values, we can rerun the interpreter to generate new traces and then we can repeat the process of generating alternative branch sequences to generate traces for alternative paths through the function under test. It’s not long before this finds a fourth (final) path through the function and an input value that can trigger it.
#x7bfbff67 #x0403fe96 #b0
Going further
Checking assertions, indexes, etc. is good — but we can go further. An obvious thing to do is to try to write down any invariants that you expect to hold about the overall processor state. Another option is to scour the natural language part of the specification looking for more properties we can check. I wrote a couple of posts about doing that here and here and I wrote a paper about it.
Related posts and papers
- Paper: End-to-End Verification of ARM Processors with ISA-Formal, CAV 2016.
- Verifying against the official ARM specification
- Finding Bugs versus Proving Absence of Bugs
- Limitations of ISA-Formal
- Paper: Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture, FMCAD 2016.
- ARM’s ASL Specification Language
- ARM Releases Machine Readable Architecture Specification
- Dissecting the ARM Machine Readable Architecture files
- Code: MRA Tools
- ASL Lexical Syntax
- Arm v8.3 Machine Readable Specifications
- Paper: Who guards the guards? Formal Validation of the Arm v8-M Architecture Specification), OOPSLA 2017.
- Are Natural Language Specifications Useful?
- Formal validation of the Arm v8-M specification
- Bidirectional ARM Assembly Syntax Specifications
- Talk: [How can you trust formally verified software (pdf)], Chaos Communication Congress, 2017.
- This series (part 1): Generating SMT from traces
- This series (part 2): Using SMT to check specifications for errors
- This series (part 3): Generating multiple solutions with SMT
The opinions expressed are my own views and not my employer's.