Verification Competitions

Since joining Google in September, I have spent a lot of time reading about software verification tools and one of the things that keeps coming up is that there are competitions between different verification tools.

These competitions are interesting for several reasons

  1. You can see which tools can solve the most problems in the least amount of time in graphs like this graph from the SV-COMP 2020 competition.

    cactus plot for SV-COMP overflow verification results

    Each curve shows how many problems were solved by a given tool in a particular amount of time. So the winner is the tool that goes furthest to the right (solves more problems) and is lower (takes less time to solve them). Most curves end with a near vertical line where they are not able to solve any more problems no matter how much more time you give them.

  2. The categories on which the tools are evaluated give you some idea of what tools entering this competition are good at.

    For example, the categories in the SV-COMP competition are

    • ReachSafety: is it possible to reach some part of the code? In particular, this is often used to show that it is impossible to fail an assertion or call some error function.
    • MemSafety: is malloc/free used correctly and are there any space leaks?
    • Overflow: do any signed integer arithmetic operations cause overflow? (In C, it is not an error for unsigned integers to overflow.)
    • Termination: does the program terminate?
    • SoftwareSystems: are some of the above properties but applied to real world software from busybox, Linux drivers, OpenBSD and SQLite.

    So, very quickly, you get an idea on what this class of tools is able to help you with.

  3. You can learn about new, actively developed tools by looking at the list of entrants. Before I looked at the above graph, I know about CBMC but, straight away, I know about ten more tools and I can quickly see the top few tools and how they stack up on different problem classes.

  4. It encourages tools to adopt a standard interface so that they can take part in the competition.

  5. It helps drive progress in the field because instead of just claiming that some technique lets you solve more problems or solve problems more quickly, you are now expected to show it by reporting your performance on these benchmarks.

  6. To turn prototypes into mature tools. For example, to enter SV-COMP, you have to be able to cope with a large part of the C language, not just a subset.

Martin Nyx Brain has written an explanation of how these competitions are organized.

Here is a list of all the verification competitions that I have been able to find. Some of the competitions have particular input formats or output formats that are interesting so I listed them as well. For example, the SAT competition became concerned about bugs in verifiers skewing the result so they have adopted the DRAT output format that allows external tools to check any UNSAT results.

In addition, I found a paper that gives an overview of competitions: TOOLympics 2019: An Overview of Competitions in Formal Methods (pdf). And Martin Nyx Brain pointed me at the FLOC Olympic games and ETAPS TOOLympics

Competition Input format Output format
SMT Competition SMT-LIB  
Hardware Model Checking competition BTOR2, AIGER  
Competition on Software Verification (SV-COMP)    
Separation Logic Competition SMTLIB-SL  
Constrained Horned Clause (HCVC) HORN (smt-lib)  
Confluence Competition (COCO) TRS, CTRS, HRS, MSTRS  
Rewriting Engines Competition (REC) TRS  
Sparkle SAT    
Pseudo-Boolean Competition    
Quantified boolean formula (QBF) evaluation    
Constraint Satisfaction XCSP  
MiniZinc challenge MiniZinc  
Termination competition    
Tableaux and non-classical system comparision    
TPTP CASC automated theorem provers  
Answer Set Programming challenge    
Reactive sythesis competition LTL, TLSF, AIGER, HOA  
Syntax Guided Synthesis (SyGuS) SyGuS-IF  
Runtime Verification (CRV)    
Production Software Verification Tools Rodeo    
Model Checking Contest (MCC) LTL  
Quantitative Formal Models (QComp) Markov/Petri  
Rigorous Examination of Reactive Systems    
Software testing (Test-Comp)    
WCET tool challenge ARM, MPC5554  
VNN-COMP Neural networks  

In addition to these tool competitions, there are also contests that pit teams of users against each other. So these competitions test both the skills of the participants and the power/flexibility of their tools.

Contests Tool
VerifyThis Any Auto-active theorem prover
Proof Ground interactive proving contest Any Interactive theorem prover
Proving for fun Isabelle
LP/CP Programming Contest Prolog, Potassco, Ergo-Lite, Picat, MiniZinc, IDP
CPS ARCH comp  

And there are non-competitions that are maybe best described as “Rosetta Code but for proof”. That is, the goal is to have many people solve the same problem using as many different tools as possible.

Rosetta proof
Let’s prove leftpad
Let’s prove a blocking queue

If you know of any other competitions, please DM me on twitter or send me an email.

Acknowledgements for pointing me at some of the above: Amit Gurung, Claire Wolf, Igor Skochinsky, Jack Pappas, John Regehr, Hernan Ponce De Leon, @lemmstr, Martin Nyx Brain, SidiMohamed Beillahi, Taylor T Johnson.

Written on April 19, 2020.
The opinions expressed are my own views and not my employer's.