Verification Competitions

Since joining Google in September, I have spent a lot of time reading about software verification tools and one of the things that keeps coming up is that there are competitions between different verification tools.

These competitions are interesting for several reasons

You can see which tools can solve the most problems in the least amount of time in graphs like this graph from the SV-COMP 2020 competition.

Each curve shows how many problems were solved by a given tool in a particular amount of time. So the winner is the tool that goes furthest to the right (solves more problems) and is lower (takes less time to solve them). Most curves end with a near vertical line where they are not able to solve any more problems no matter how much more time you give them.
The categories on which the tools are evaluated give you some idea of what tools entering this competition are good at.

For example, the categories in the SV-COMP competition are
- ReachSafety: is it possible to reach some part of the code? In particular, this is often used to show that it is impossible to fail an assertion or call some error function.
- MemSafety: is malloc/free used correctly and are there any space leaks?
- Overflow: do any signed integer arithmetic operations cause overflow? (In C, it is not an error for unsigned integers to overflow.)
- Termination: does the program terminate?
- SoftwareSystems: are some of the above properties but applied to real world software from busybox, Linux drivers, OpenBSD and SQLite.
So, very quickly, you get an idea on what this class of tools is able to help you with.
You can learn about new, actively developed tools by looking at the list of entrants. Before I looked at the above graph, I know about CBMC but, straight away, I know about ten more tools and I can quickly see the top few tools and how they stack up on different problem classes.
It encourages tools to adopt a standard interface so that they can take part in the competition.
It helps drive progress in the field because instead of just claiming that some technique lets you solve more problems or solve problems more quickly, you are now expected to show it by reporting your performance on these benchmarks.
To turn prototypes into mature tools. For example, to enter SV-COMP, you have to be able to cope with a large part of the C language, not just a subset.

Martin Nyx Brain has written an explanation of how these competitions are organized.

Here is a list of all the verification competitions that I have been able to find. Some of the competitions have particular input formats or output formats that are interesting so I listed them as well. For example, the SAT competition became concerned about bugs in verifiers skewing the result so they have adopted the DRAT output format that allows external tools to check any UNSAT results.

In addition, I found a paper that gives an overview of competitions: TOOLympics 2019: An Overview of Competitions in Formal Methods (pdf). And Martin Nyx Brain pointed me at the FLOC Olympic games and ETAPS TOOLympics

Competition	Input format	Output format
SAT Competition	DIMACS	DRAT
SMT Competition	SMT-LIB
Hardware Model Checking competition	BTOR2, AIGER
Competition on Software Verification (SV-COMP)
Separation Logic Competition	SMTLIB-SL
MaxSAT
Constrained Horned Clause (HCVC)	HORN (smt-lib)
Confluence Competition (COCO)	TRS, CTRS, HRS, MSTRS
Rewriting Engines Competition (REC)	TRS
Sparkle SAT
Pseudo-Boolean Competition
Quantified boolean formula (QBF) evaluation
Constraint Satisfaction	XCSP
MiniZinc challenge	MiniZinc
Termination competition
Tableaux and non-classical system comparision
TPTP CASC	automated theorem provers
Answer Set Programming challenge
Reactive sythesis competition	LTL, TLSF, AIGER, HOA
Syntax Guided Synthesis (SyGuS)	SyGuS-IF
Runtime Verification (CRV)
Production Software Verification Tools Rodeo
Model Checking Contest (MCC)	LTL
Quantitative Formal Models (QComp)	Markov/Petri
Rigorous Examination of Reactive Systems
Software testing (Test-Comp)
WCET tool challenge	ARM, MPC5554
VNN-COMP	Neural networks

In addition to these tool competitions, there are also contests that pit teams of users against each other. So these competitions test both the skills of the participants and the power/flexibility of their tools.

Contests	Tool
VerifyThis	Any Auto-active theorem prover
Proof Ground interactive proving contest	Any Interactive theorem prover
Proving for fun	Isabelle
LP/CP Programming Contest	Prolog, Potassco, Ergo-Lite, Picat, MiniZinc, IDP
CPS ARCH comp

And there are non-competitions that are maybe best described as “Rosetta Code but for proof”. That is, the goal is to have many people solve the same problem using as many different tools as possible.

Rosetta proof
Let’s prove leftpad
Let’s prove a blocking queue

If you know of any other competitions, please DM me on twitter or send me an email.

Acknowledgements for pointing me at some of the above: Amit Gurung, Claire Wolf, Igor Skochinsky, Jack Pappas, John Regehr, Hernan Ponce De Leon, @lemmstr, Martin Nyx Brain, SidiMohamed Beillahi, Taylor T Johnson.

Written on April 19, 2020.
The opinions expressed are my own views and not my employer's.