Verification Competitions
Since joining Google in September, I have spent a lot of time reading about software verification tools and one of the things that keeps coming up is that there are competitions between different verification tools.
These competitions are interesting for several reasons
-
You can see which tools can solve the most problems in the least amount of time in graphs like this graph from the SV-COMP 2020 competition.
Each curve shows how many problems were solved by a given tool in a particular amount of time. So the winner is the tool that goes furthest to the right (solves more problems) and is lower (takes less time to solve them). Most curves end with a near vertical line where they are not able to solve any more problems no matter how much more time you give them.
-
The categories on which the tools are evaluated give you some idea of what tools entering this competition are good at.
For example, the categories in the SV-COMP competition are
- ReachSafety: is it possible to reach some part of the code? In particular, this is often used to show that it is impossible to fail an assertion or call some error function.
- MemSafety: is malloc/free used correctly and are there any space leaks?
- Overflow: do any signed integer arithmetic operations cause overflow? (In C, it is not an error for unsigned integers to overflow.)
- Termination: does the program terminate?
- SoftwareSystems: are some of the above properties but applied to real world software from busybox, Linux drivers, OpenBSD and SQLite.
So, very quickly, you get an idea on what this class of tools is able to help you with.
-
You can learn about new, actively developed tools by looking at the list of entrants. Before I looked at the above graph, I know about CBMC but, straight away, I know about ten more tools and I can quickly see the top few tools and how they stack up on different problem classes.
-
It encourages tools to adopt a standard interface so that they can take part in the competition.
-
It helps drive progress in the field because instead of just claiming that some technique lets you solve more problems or solve problems more quickly, you are now expected to show it by reporting your performance on these benchmarks.
-
To turn prototypes into mature tools. For example, to enter SV-COMP, you have to be able to cope with a large part of the C language, not just a subset.
Martin Nyx Brain has written an explanation of how these competitions are organized.
Here is a list of all the verification competitions that I have been able to find. Some of the competitions have particular input formats or output formats that are interesting so I listed them as well. For example, the SAT competition became concerned about bugs in verifiers skewing the result so they have adopted the DRAT output format that allows external tools to check any UNSAT results.
In addition, I found a paper that gives an overview of competitions: TOOLympics 2019: An Overview of Competitions in Formal Methods (pdf). And Martin Nyx Brain pointed me at the FLOC Olympic games and ETAPS TOOLympics
In addition to these tool competitions, there are also contests that pit teams of users against each other. So these competitions test both the skills of the participants and the power/flexibility of their tools.
Contests | Tool |
---|---|
VerifyThis | Any Auto-active theorem prover |
Proof Ground interactive proving contest | Any Interactive theorem prover |
Proving for fun | Isabelle |
LP/CP Programming Contest | Prolog, Potassco, Ergo-Lite, Picat, MiniZinc, IDP |
CPS ARCH comp |
And there are non-competitions that are maybe best described as “Rosetta Code but for proof”. That is, the goal is to have many people solve the same problem using as many different tools as possible.
Rosetta proof |
---|
Let’s prove leftpad |
Let’s prove a blocking queue |
If you know of any other competitions, please DM me on twitter or send me an email.
Acknowledgements for pointing me at some of the above: Amit Gurung, Claire Wolf, Igor Skochinsky, Jack Pappas, John Regehr, Hernan Ponce De Leon, @lemmstr, Martin Nyx Brain, SidiMohamed Beillahi, Taylor T Johnson.
The opinions expressed are my own views and not my employer's.