This frequently referred to article was published just after the end of the 2nd World War by Vannevar Bush (the Director of the Office of Scientific Research and Development). It is a swords-to-ploughshares best known for the Memex concept that influenced creators of hypertext, etc.
Modern code review: A case study at Google
Code review goes back to heavyweight practices in the ’70s but this paper looks at modern, lightweight review as practiced at Google since the early days. Modern code review is
Lessons from building static analysis tools at Google
This paper is an update on their earlier paper ([sadowski:icse:2015]) that brings the story up to date and fills in some of the background stories about previous failures that informed their later approach and some of the successes. Despite the overlap, it is worth reading both of them because the two articles have a different emphasis and the overlap is not too severe.
Large-scale automated refactoring using ClangMR
ClangMR is a refactoring tool that scales to systems of the size of Google’s monorepo ([potvin:cacm:2016]). It consists of three phases:
Tricorder: Building a program analysis ecosystem
This paper describes the experience of deploying a static analysis infrastructure at Google. This infrastructure (“Tricorder”) had to support multiple languages, tens of thousands of users, large code scale, etc. Most importantly though, it had to be accepted by developers if it was going to provide any benefit.
Why Google stores billions of lines of code in a single repository
This paper describes Google’s monorepo approach to managing source code in a single repository, the pros, the cons and what was needed to support this approach.
API usability at scale
How can you evaluate the usability of web APIs? They gather usage data from the Google API explorer for 26 APIs looking for explorations that produce 4xx (i.e., HTTP error codes). They find that three particular APIs are especially error prone.
Spanner: Google's globally distributed database
Spanner is a scalable, globally-distributed, temporal database that shards data across many [Paxos] instances. It features automatic failover, resharding and migration. Although not explicitly stated, it seems to be a replacement for Bigtable ([chang:tocs:2012]) and Megastore. One of the early clients was to support F1: an ads database.
Extended static checking: A ten-year perspective
Super-interesting background and explanation of the choices made in ESC/Modula-3 and ESC/Java. Both a form of [auto-active verifier][auto-active verification] (although that term was not used by the author for another 10 years.
QuickCheck: A lightweight tool for random testing of Haskell programs
QuickCheck takes a test-based approach to formal verification. You write universally-quantified properties and QuickCheck uses random testing ([fuzz testing]) to check whether the property holds. QuickCheck was originally written for Haskell but various people have since implemented it for around 40 different languages.
Parameterized unit tests
Parameterized unit tests are a form of [property-based testing] that generalizes traditional unit tests. The tests can be instantiated with concrete values to serve as traditional tests, they can be verified using [symbolic execution] or, to support modular verification, they can be used as axioms in the spirit of algebraic specification to use when verifying calls to a library.
Feedback-directed unit test generation for C/C++ using concolic execution
This paper describes a test generation tool based on a hybrid of [symbolic execution] and [fuzz testing]. The goal is to generate high coverage sequences of method calls.
Study of integrating random and symbolic testing for object-oriented software
Describes and evaluates JDOOP: a hybrid fuzzer/symbolic executor for Java that uses random search to quickly build coverage and symbolic execution to hit hard to hit branch conditions. JDOOP is a combination of RANDOOP and JDART.
FUDGE: Fuzz driver generation at scale
Addresses the problem of generating fuzzing test harnesses for libraries by automatically generating harnesses based on example uses in existing clients. The goal is to increase the productivity of users and it is expected that a human will examine the generated harnesses for legality, etc. before they are used to report bugs.
DeepState: Symbolic unit testing for C and C++
DeepState seeks to narrow the gap between techniques that programmers are familiar with (e.g., testing) and symbolic execution / binary analysis tools by enabling the use of parameterized unit tests with a variety of symbolic execution backends (angr, Manticore and Dr. Fuzz) and fuzzers and making it easy to switch between backends.
Locating defects is uncertain
This one-page, position paper talks about bug localization. Failures are seen as the symptom of a chain of infection that starts with some initial cause/infection.
Soundness and its role in bug detection systems
This one-page, position paper talks about three different, competing costs in a tool: soundness (no false negatives); cost; and usability (which relates to false positives). They argue:
The soundness of bugs is what matters (position statement)
This one-page, position paper is “written in a provocative style for entertainment purposes” but, nevertheless, makes some very good points. The main argument is that the goal of most verification research is defect detection and that they should therefore focus on finding “sound bugs” (bugs that can actually occur) and avoid “unsound bugs” (false alarms). This is why (despite its limitations) testing remains so important: testing finds sound bugs. The solution is to switch from “may analysis” (over-approximation) to “must analysis” ([under-approximation]).
Issues in deploying software defect detection tools
This one-page, position paper talks about what makes a bug-finding tool useful and usable in an industrial setting. The slides are worth reading since they say things not present in the paper.
False positives over time: A problem in deploying static analysis tools
This one-page, position paper by one of the cofounders of Coverity makes the interesting observation (that is obvious once it has been said)
The human in formal methods
This invited talk looks at “how to bring the other 90% into the fold” of formal methods with a focus on education, comfort with formal methods and effectiveness with formal methods. This focus leads to emphasizing model finding such as the [Alloy verifier], [SAT solver]s and [SMT solver]s over deductive methods.
Relational test tables: A practical specification language for evolution and security
Relational test tables describe sets of traces and constraints on those traces and (the relational part) between traces and support for stuttering (to allow for timing differences).
COASTAL: Combining concolic and fuzzing for Java (competition contribution)
This is a nice short, readable description of Coastal which
SUSHI: A test generator for programs with complex structured inputs
[Search based test generation] alone is unable to generate inputs that have complex structures such as dependencies between different data structures.
STARS: Rise and fall of minicomputers [scanning our past]
This describes the history of minicomputers from the late ’50s through to the ’80s. One of the things I found fascinating (because I had not heard it before) is that many minicomputers were sold to OEMs who incorporated the minis into products where they were used for industrial control and things like that. The paper argues that we should distinguish these “classic minicomputers” from minicomputers that are defined more by their price relative to mainframes.
Bell's law for the birth and death of computer classes
This paper argues that “A new computer class forms and approximately doubles each decade, establishing a new industry.” and names this “Bell’s Law”. This is a consequence of Moore’s Law decreasing the price of computer hardware and “intense, competitive entrepreneurial action.”
Software model checking
This 2009 [survey] of software [model checking] covers a lot of ground from the roots in hardware model checking through to its application to software. A number of the key algorithms are described in pseudocode.
Boolean satisfiability from theoretical hardness to practical success
This paper is a great intro/overview of SAT solving: what it is, why it is (NP) hard, how SAT solvers work, the role of benchmarks and competitions, the massive performance improvements in the 10 years prior to it being published and the industrial impact.
Test-case reduction via test-case generation: Insights from the Hypothesis reducer
This paper proposes a general purpose way of getting [test-case reduction] “for free”. The paper shows that it “provides adequate reduction for any generated test case, while avoiding the test-case validity problem.”
Model checking: Algorithmic verification and debugging
This [Turing Award] paper/talk is a great overview of [model checking: definition, [temporal logic], what it is good for, challenges (mostly state-space explosion), tricks (like [symbolic model checking], [bounded model checking], [partial order reduction] and [CEGAR]).
Enhancing symbolic execution with veritesting
This paper attacks the path explosion problem of [symbolic execution] by using a hybrid of dynamic symbolic execution (DSE) and static symbolic execution (SSE) that they call “veritesting”. Their tool “MergePoint” starts with concolic execution (a form of DSE) from some seed but opportunistically switches to SSE whenever the paths ahead contain straightforward code that SSE can cope with. They find that the additional cost due to having larger, more complex SMT queries is compensated for by the exponentially smaller number of paths
Code-level model checking in the software development workflow
This very readable paper describes the successful introduction of a formal verification flow into the development of the Amazon C Common library. Anyone trying to introduce formal verification into a traditional development team should read this paper!
An empirical study of the reliability of UNIX utilities
This is the paper that started the field of [fuzz testing], coined the name, and tells the origin story.
Selective symbolic execution
The [S2E verifier] can analyze application code, complex GUI libraries, kernel code, device drivers, and even hardware devices. It does this by alternating between concrete execution (using QEMU) and [symbolic execution] (using the [KLEE verifier]). Switching modes makes it more flexible but it also makes it more efficient because (slower) symbolic execution can be restricted to the code of most interest which might be a library, a recently changed code path, etc.
The art, science, and engineering of fuzzing: A survey
This large (21 pages, 240 references) survey of [fuzz testing] proposes a taxonomy and standardized terminology. They use a “model fuzzer” throughout the survey to motivate/explain fuzzer design choices.
Fuzzing: Hack, art, and science
Nice overview article that describes whitebox fuzzing such as the [SAGE verifier] in the context of blackbox fuzzing, blackbox fuzzing, greybox fuzzing, hybrid fuzzing, test generation, security analysis, etc.
Symbolic execution for software testing: Three decades later
This is a survey of symbolic execution with an emphasis on the [KLEE verifier]. It is probably a better introduction to the concepts of symbolic execution than the [baldoni:compsurv:2018] survey but this makes it less complete.
A survey of symbolic execution techniques
This [survey] reinforces my impression that [symbolic execution] is a beautifully simple idea that, like most beautifully simple ideas, requires a lot of optimizations to make it scale. As the structure of the paper shows, most of the survey is concerned with the optimizations.
Fuzzing: On the exponential cost of vulnerability discovery
Using a large empirical study of fuzzing (using AFL and libFuzzer), this paper explores the cost of finding vulnerabilities (or increasing coverage).
An introduction to test specification in FQL
Proposes a query language [FQL] for specifying coverage goals that can be used for
FShell: Systematic test case generation for dynamic analysis and measurement
FShell adapts the “program as database” metaphor previously used by the [BLAST verifier] from model checking to test generation. It generates families of testcases based on coverage criteria specified using the “FShell” query language ([FQL]) that (I think) was introduced in this paper.
Satisfiability modulo theories: Introduction and applications
This paper is a great intro/overview of [SMT solver]: what it is, how it builds on SAT, an example of how one particular theory (difference arithmetic) is implemented, and the many applications of SMT solvers in program analysis, program verification, etc.
Software verification: Testing vs. model checking
Compare test generators against model checkers. The test generators are AFL-Fuzz, CPATiger, CREST-PPC, FShell, [KLEE verifier] and PRTest. The model checkers are [CBMC verifier], CPA-seq, ESBMC-incr, ESBMC-kInd.
TestCov: Robust test-suite execution and coverage measurement
TestCov is a tool that measures the coverage of a test suites and generates a reduced test suite. It is used in the [Test competition] and can work with all the tools that compete in that competition.
Boosting fuzzer efficiency: An information theoretic perspective
Greybox fuzzers prioritise which seeds to explore in order to (hopefully) maximise some metric such as coverage. This paper proposes an information theoretic model (entropy) for evaluating the effectiveness of each seed. This forms the basis of an “entropy based power schedule” to assign more energy to the seeds that elicit the most information. The approach is inspired by “Active SLAM”: an approach for autonomous robot exploration of unknown terrain.
This is a good survey of model checking that lays out the main concepts and why they matter:
Verification of safety functions implemented in Rust: A symbolic execution based approach
This paper looks at how to use the [KLEE verifier] to verify code implementing a state machine written in Rust based on their earlier work ([lindner:indin:2018]). The challenge here is that KLEE is a [symbolic execution] tool and does not guarantee to explore all paths through the code to a sufficient depth to guarantee soundness. Their solution lies in constructing an appropriate verification harness.
No panic! Verification of Rust programs by symbolic execution
This paper describes the
cargo-Rust that allows the [KLEE verifier]
to be used to verify Rust programs.
KLEE is a [symbolic execution] tool so using it to verify code leads
to a path explosion problem.
The solution taken in this paper is to use a [contract driven development]
approach to enable [modular verification].
This reduces the path explosion problem to only involve the number of
paths per function instead of the number of paths through the entire
RustHorn: CHC-based verification for Rust programs
Verifies Rust programs by converting basic blocks in the [MIR] representation of the code into Constrained Horn Clauses (CHCs) that are then verified using Horn clause support in Z3 or HoIce.
Benchmarking solvers, SAT-style
Written as an encouragement to the computer algebra community to create a [verification competition]. It explains why [verification competition]s are designed the way they are. It also has an analysis of how improving time to solve affects scores where improvements might be shaving a second off the time, adding parallelism, random variations in time, etc.
Fractional permissions without the fractions
The standard way of distinguishing read and write permissions in separation logic is with [fractional permissions][bornat:popl:2005] in which you can write a resource if you have 100% of the resource read a resource if you have some fraction of the resource and you have no access if you have 0% of the resource. When you replicate references to a resource, you split the fraction into some smaller parts and when you are done with those references, you recombine (sum) all their fractions.
StkTokens: Enforcing well-bracketed control flow and stack encapsulation using linear capabilities
This paper follows on from [an earlier paper by the same authors][skorstengaard:esop:2018] that looked at how mutually distrusting pieces of code could safely call each other. In particular, it was concerned about whether the stack and return address were what the caller expected when they called the function.
Dafny: An automatic program verifier for functional correctness
Dafny is an [auto-active program verifier][auto-active verification] for an imperative language with functional and object oriented features that uses the [Boogie][leino:tacas:2010] intermediate verification language and the [Z3][demoura:tacas:2008] SMT solver for automated proof.
The Boogie verification debugger
A critical part of any verification tool is a debugger that helps you understand and localise any errors detected. For dynamic verification (i.e., testing), we typically use a debugger like gdb/lldb to halt the program at interesting points and inspect the state of the system. For static verification (i.e., formal verification), it initially seems quite different because the verification tool typically does not run the program from the beginning and does not generate concrete values for all variables. I think the message of this paper is that it is not as hard as it seems.
The ramifications of sharing in data structures
This paper tackles a generalized form of the [ramification problem][krishnaswami:tldi:2010]. That is, it tackles the problem that separation logic is good at reasoning about pointer-based structures such as trees when you can split the heap into disjoint parts but separation logic’s weakness had been reasoning about structures such as dags that rely on sharing. In particular, reasoning about something like a garbage collector has been hard.
Developing verified programs with Dafny
[Dafny][leino:lpair:2010] is both a language and a verification tool for creating verified programs. The language has features of object-oriented languages and functional languages. The verification support is based on contract-style verification. This short, easy read seems to be the accompaniment for a tutorial and discusses verification of six different functions that demonstrates contracts and the specification notation, loop invariants, immutable inductive datatypes, mutable datatypes, use of pure functions in specifications, classes, ghost-fields, invariants, and lemmas. Proofs of lemmas are especially interesting because the lemmas are just ghost methods and the body of those methods are the proofs of the lemmas. e.g., to write an inductive proof, one writes a recursive function using a case split to separate the base case from the inductive step.
Reasoning about a machine with local capabilities
This paper is concerned with reasoning about software running on a capability machine such as CHERI and, in particular, proving that the calling convention ensures proper nesting of function calls no matter how buggy or malicious the function you’re calling may be. It enforces that, if a function returns, it can only return to the return address it was given, with the stack pointer it was given and that it cannot use some other address or stack pointer or even re-use values from some earlier invocation of the function.
Program verification based on assertions is fundamentally wrong. Like Ptolemy’s planetary theory of cycles within cycles, they are overly complex and we need something as simple as Galileo’s theory of elliptical orbits. This paper forcefully argues that they should be replaced with “specified blocks” that form a specification of an entire block of code. That is, we should describe what a block of code is intended to do instead of what is true at strategic points in the code.
Local reasoning about while-loops
As [Hehner had previously discovered][hehner:vstte:2008], we are reasoning about loops in the wrong way. Instead of using loop invariants that describe what the loop has done so far, we should be using loop specifications that describe what the loop will do in the future. This change enables local reasoning (in the usual “separation logic” sense) about loops because it focusses on what the loop does and because it eliminates the need to describe partial data structures such as the initial segment of a list.
Verifying event-driven programs using ramified frame properties
Separation logic’s strength is that it let’s you reason locally about heap-manipulating programs by letting you split the heap into disjoint parts. But some data structures don’t easily split into parts and you have to maintain global consistency properties across the structure because they “use mutation and assignment as a way of globally broadcasting information to the rest of the program state.” This paper tackles an example of such a structure: an instance of the subject-observer pattern like that found in a spreadsheet where each node contains a list of the nodes it depends on and also of the nodes that depend on it so that, when a node changes value, nodes that depend on it can be invalidated/updated.
Specification and verification: The Spec# experience
[Spec#][barnett:cassis:2004] was a Microsoft Research project that added automated verification to C# through the addition of method contracts, class invariants and loop invariants. This was all integrated into the Visual Studio IDE to make for a very different user experience from normal verification: they wanted verification to be integrated into the development flow. This paper gives a retrospective on the project with a particular focus on efforts to get the techniques adopted.
Separation logic and abstraction
This paper tackles three problems:
Verification of concurrent programs with Chalice
This is a tutorial on the Chalice object-oriented language and verifier. Methods have contracts, loops have invariants, pointers have ownership annotations, there are threads, locks and fractional ownership, there are lock orderings to detect deadlock and there are ghost variables and fold/unfold ghost statements.
Compositional shape analysis by means of bi-abduction
One of the big challenges in automated verification is the annotation burden. In particular, to make verification compositional, we need contracts for all the functions but writing and maintaining contracts for a large codebase is a significant overhead and discovering the contracts for an existing codebase is a lot of work. This paper focusses on discovering contracts that describe what parts of a heap-allocated data structure a function depends on in order to support separation logic based verification of memory safety along the lines of [Smallfoot][berdine:fmco:2005].
Smallfoot: Modular automatic assertion checking with separation logic
Smallfoot is a tool for automatically verifying both sequential and concurrent heap-manipulating programs that is based on (concurrent) separation logic and symbolic execution. The proof theoretic foundations of the tool are described in a [companion paper][berdine:aplas:2005].
Symbolic execution with separation logic
This paper describes the proof-theoretic basis for the Smallfoot verification tool described in a [companion paper][berdine:fmco:2005]. See the other paper for a description.
Implicit dynamic frames: Combining dynamic frames and separation logic
Permission logics are Hoare-style logics for reasoning about heap allocated data structures and whether a piece of code has permission to access a given part of the structure. Their particular strength is the ability to reason about the lack of aliases – drawing on ideas from linear logic. The best known permission logic is [separation logic][ohearn:cacm:2019]; another permission logic is dynamic frames. This paper tackles the problem that dynamic frames have a high annotation overhead because of the need to define and manipulate “frame annotations” for each method. Their solution is to infer the frame information directly from the access assertions in the pre/post-conditions of functions.
Lightweight support for magic wands in an automatic verifier
Verifying a loop that walks over a heap data structure is tricky. At the start, we might have a pointer to the root of the structure; in some intermediate iteration of the loop, we have a pointer into the middle of the structure and the nodes we have visited so far have been changed in some way; and then, at the end, the entire structure from the root down has been fully changed. The difficulty is that, to verify the loop, we have to write an invariant that describes the changes to the nodes that we have visited so far. That is, we have to describe a partial data structure: from the root down to the current position in the list.
Software verification with VeriFast: Industrial case studies
This paper reports on four case studies using [VeriFast][jacobs:nfm:2011] to verify real code for an absence of safety violations such as illegal memory accesses or data races.
Viper: A verification infrastructure for permission-based reasoning
[Viper][Viper verifier] is like [Boogie][barnett:fmco:2005] in that it is an [intermediate verification language] (IVL) that can support multiple language frontends and also multiple verification backends. The most important difference is that Viper is based on a form of [permission logic] that is well suited to reasoning about heap based data structures.
Leveraging Rust types for modular specification and verification
This paper exploits the close similarity between Rust’s type system and permission based reasoning approaches such as [separation logic][ohearn:cacm:2019] and implicit dynamic frames logic. Specifically, they describe the embedding of Rust programs into the [Viper intermediate verification language][muller:vmcai:2016] in a way that exploits the properties of Rust’s borrow-based type system. This embedding is (partially?) implemented in the Prusti verification tool that translates Rust to Viper and translates any errors back from Viper to Rust. Their focus at the moment is on “safe” code: code that obeys the Rust type system.
VeriFast: A powerful, sound, predictable, fast verifier for C and Java
VeriFast is an auto-active verification tool for C and Java based on separation logic and SMT solvers. This paper is a nice overview of the state of the project in 2011 when they had
seL4: from general purpose to a proof of information flow enforcement
Most operating systems do two things: they isolate processes from each other; and they allow some limited communication between processes. There have been multiple attempts over more than 30 years to prove that an operating system actually achieves these goals; this is the first really convincing such proof. The proof is about the seL4 microkernel that had already been [formally verified][klein:sosp:2009] as implementing its formal specification. The paper takes us step by step through the proof as well as being very clear about the assumptions and gaps in the verification and the effort required for the verification.
Permission accounting in separation logic
Separation logic is about tracking ownership of objects and, usually, this means full ownership: you either own an object and can do anything you want with it or you have no access at all. This paper is about sharing ownership as in the reader-writer pattern where an object can be shared between several readers or ownership can be given to a single writer.
This survey paper updates [Reynolds 2002 survey of Separation Logic][reynolds:lics:2002] with an overview of the developments, applications and extensions in Separation Logic and Concurrent Separation Logic over the last 20 years.
Separation logic: a logic for shared mutable data structures
Separation logic is an extension of Hoare logic for reasoning about programs that use pointers. In particular, it copes well with aliasing (by providing a compact way of saying that a set of pointers do not alias) and a “frame rule” that lets you reason locally about a piece of code without getting bogged down in the surrounding context. This paper introduces separation logic, illustrates its use to verify a number of list, tree and dag algorithms [this is one of the real strengths of this paper] and discusses a lot of the work by the author and others on developing, defining, using and extending the different forms of separation logic that existed at the time (2002).
A solver for reachability modulo theories
This paper describes the Corral tool for finding bugs in device drivers using a variant of bounded model checking. Like many bounded model checkers, they inline all functions and unroll all loops up to some bound – increasing the bound until some limit or until a bug is found. What is interesting is that, instead of inlining all functions at once, they incrementally inline functions according to some heuristics. They also use a CEGAR-like mechanism to control variable abstraction.
Sound formal verification of Linux's USB BP keyboard driver
This paper reports on verification of a simple Linux USB keyboard driver using VeriFast. Except, of course, Linux drivers are not simple: they have asynchronous callbacks, dynamically allocated memory, synchronization and interact with complex Linux driver APIs including USB, input and spinlocks. This work found bugs in the driver and the fixes have been accepted by the maintainer.
VeriFast: Imperative programs as proofs
VeriFast is a symbolic evaluator based on [Separation Logic][reynolds:lics:2002] (wikipedia) for verifying C and Java code. Specifications can use inductive datatypes, primitive recursive functions and abstract predicates. As the name suggests, a key feature is performance though the abstract also mentions predictability.
Annotation inference for separation logic based verifiers
One of the challenges in automatic program verification tools such as the [VCC verifier] and the [VeriFast verifier] is the amount of annotation required to write the specification and guide the verification tool to construct a proof.
Stacked borrows: An aliasing model for Rust
As in [the Rustbelt paper][jung:popl:2017], the topic of this paper is the soundness of Rust code that is marked “unsafe” because it needs to do something that cannot be expressed within the restrictions of Rust’s type system but that the programmer (hopefully) considers to be ok. In particular, this paper defines what the “unsafe” code is and (crucially) is not allowed to do based on what they want Rust types to mean.
The case for writing a kernel in Rust
The Rust language is increasingly being proposed as a safe replacement for C in systems programming. This paper (by the authors of TockOS) explores some of the key challenges in doing so. In particular, they are interested in how they can write low-level code without having to weaken Rust’s safety story by using the “unsafe” feature of Rust.
Verifying Rust programs with SMACK
The Rust programming language promises a balance between safety and control that makes it an interesting target for formal verification work. Unfortunately, some of the earliest verification tools supporting Rust are not maintained. [SMACK][rakamaric:cav:2014] is an LLVM-based verification toolchain that translates LLVM IR into verification conditions. In principle, it should be able to cope with any language that can be translated to LLVM IR but every language has its quirks and this paper describes additional features and analyses required to support the Rust language well. An interesting aspect of basing this on LLVM is that it allows verification of programs that combine Rust and C (or any other language that LLVM supports).
Crust: A bounded verifier for Rust
One of the key strengths and weaknesses of Rust is the ability to extend its typesystem with libraries that do not follow Rust’s typesystem (i.e., they are “unsafe”) but that (we hope) do not break the Rust semantics that Rust’s typesystem is meant to enforce. This paper describes a bounded model checking approach to verifying these libraries for the kinds of error that violate Rust’s typesystem. The tool is demonstrated on three libraries and is able to detect bugs in old versions of the libraries.
A type system for expressive security policies
These days, many people define security properties based on some form of intransitive information flow specification. But in the ’90s, Fred Schneider and others were working on security automata that would monitor the execution of the program and terminate it if the program violated the security specification. For example, a specification might say that if a program has read a file containing a secret, it cannot then open a network connection.
RustBelt: Securing the foundations of the Rust programming language
Rust aims to archive the holy grail of a language that gives low-control over resource management and safe high-level abstractions. It does this using a type system that restricts programs to eliminate unsafe programming practices coupled with a practice of extending that type system with libraries that are observably safe but that internally use “unsafe” operations. This paper provides a framework for proving that these libraries do not break the safety guarantees of the standard type system. Along the way, the paper gives a nice insight into the way that the community as a whole has been developing.
One of the difficult things about reasoning about programs that modify pointer-based data structures is the possibility of aliases. If p and q alias each other then you need to recognize that writing to “*p” will modify “*q”. Around the late 90s, there was a lot of work on how best to reason about aliasing. Nowadays, a lot of academic interest seems to have settled on using separation logic and a lot of practical interest seems to have settled on Rust’s ownership types or C++11’s smart pointers, etc. But what makes this paper interesting to me is that it tackled one of the outstanding problems in the [TALx86 paper][morrisett:wcsss:1999]. In particular, because they were interested in reasoning about assembly code, they could not rely on simplifications made in high-level languages such as
TALx86: A realistic typed assembly language
Typed assembly language is normal assembly language (x86 in this case) where any branch targets are labelled with the type of all the registers. The main purpose of the types is to make the assembly code type/memory safe and so the focus is on distinguishing pointers from non-pointers and distinguishing the size and type of what each pointer points to. This paper is one of the later papers in a series on typed assembly language and also related to the Cyclone language: a type/memory-safe C substitute.
End-to-end verification of information flow security for C and assembly programs
Information-flow control tackles two of the classic Confidentiality-Integrity-Availability (CIA) aspects of security. This paper presents a framework that applies IFC to the mCertiKOS microkernel in order to prove confidentiality properties. The proof consists of Coq proofs about individual system calls and a general framework relating lower-level “unwinding conditions” to the overall security properties. The proof covers both C and x86 assembly code in mCertiKOS and includes reasoning about page tables.
A precise yet efficient memory model for C
A memory model (in this context) is basically “what is a pointer”? Is a pointer an identifier for an object or an integer index into an array of bytes? How do you handle fields of objects? How do you handle pointer arithmetic? etc. After some experimentation (described in a lovely historical note in section 7), the authors came up with a model that can handle many of the awkward features of C: pointers into the middle of objects, pointers past the end of objects, pointer casts, unions, bitfields, etc. This formed part of the [VCC tool][cohen:cav:2010] that was used to reason about real, complex OS-level code (that needed many of these features).
Frama-C: A software analysis perspective
Frama-C is a platform for verifying C programs. This foundational article describes the architecture of this platform, the analyses built into Frama-C and the additional plugins that can be used with it and provides references to those other plugins. The paper cites many related papers about individual plugins and case studies.
CertiKOS: An extensible architecture for building certified concurrent OS Kernels
This is one of a steady stream of papers from the FLINT group at Yale on creating a formally verified OS. Some distinguishing features of CertiKOS are the completeness of the verification (cf. seL4), the way that the C code and the proofs are split into more than 30 very small layers each adding one feature at a time and greatly simplifying the proof of each layer and their use of an extended version of the CompCert C compiler. The layered structure is strongly reminiscent of Bryan Ford’s work on Recursive Virtual Machines – unsurprising since Bryan was an early contributor to this project. The use of CompCert avoids the need to perform translation validation (cf. seL4 work). This paper adds concurrency with fine-grained locking and progress guarantees to CertiKOS.
Formal verification of a memory allocation module of Contiki with Frama-C: a case study
This paper describes the use of [Frama-C][cuoq:sefm:2012] to specify and verify the Contiki memory allocator. The description is very clear and contains enough of the specification that it should be easy to reproduce or adapt to other allocators. Use of the specification is demonstrated on a simple program that allocates two objects and then frees them. The annotation burden is about 154 lines to 139 lines of code (around 110%) – which seems to be about normal for Frama-C.
überSpark: Enforcing verifiable object abstractions for automated compositional security analysis of a hypervisor
Verifies properties of a concurrent micro-hypervisor uXMHF using the [Frama-C verifier]) with some custom plugins and using CompCert to compile. Assembly language and a hardware model are accomodated by invoking via function calls and providing a C equivalent function that accesses shadow C variables that model changes in hardware state. Interestingly, the hypervisor also allows unverified plugins - but all unverified plugins are run in a sandbox (and incur higher overhead).
Combining mechanized proofs and model-based testing in the formal analysis of a hypervisor
Describes formal verification in Coq of a model of the commercial NOVA micro-hypervisor. NOVA is a hypervisor in the L4 family that supports DMA access (protected by IOMMU). Very extensive testing is used to compare the model against the actual implementation using over 12 million conformance tests.
-Overify: Optimizing programs for fast verification
This paper explores the benefit of using different optimization options (or even custom transformations) when compiling a program for optimization. Their optimization options are over 5000x better than -O0 and 14x better than -O3 for verification using KLEE on their motivating example. Compared with -O3, their main goal is to reduce the amount of information lost during compilation. They mention but do little to address the problem of undefined behaviour in C causing different compilations to behave differently although they did look for this in their evaluation and found that they detected the same bugs with -O0, -O3 and -Overify.
A lightweight symbolic virtual machine for solver-aided host languages
Rosetta is a Racket DSL for implementing Solver-aided Domain Specific Languages (SDSLs): tools based on solvers that support angelic execution, debugging, verifying properties and synthesis. A key feature of Rosetta is “symbolic reflection” – that allows Racket code to be symbolically evaluated (in addition to being executable in the normal way).
An abstract interpretation framework for refactoring with application to extract methods with contracts
There is a lot to recommend Design by Contract where each function/method is equipped with contracts (pre/post-conditions). But the cost is that you have to create and maintain all those contracts. This paper addresses an interesting part of the problem that arises while refactoring code: creating a contract for any functions/methods extracted. The insight here is that this is really a matter of refactoring the proof that the original code meets its contracts. The goal they set is that the new contract must meet four criteria: validity, safety, completeness and generality.
Komodo: Using verification to disentangle secure-enclave hardware from software
Komodo implements functionality similar to SGX but is implemented in formally verified software on ARMv7 processors instead of being implemented in hardware and microcode. The introduction makes a very convincing argument that this is the right way to deploy security extensions because it can be updated much faster, and the verification is transparent (“it replaces folklore with formal verification”). They verify both functional properties and security properties (non-interference). (Komodo lacks the memory encryption features of SGX – that would benefit from having hardware support. Komodo is also limited to a single processor at the moment.)
Attacking, repairing, and verifying SecVisor: A retrospective on the security of a hypervisor
Uses the Murphi model checker to formalize the combined software and hardware security features in the [SecVisor][seshadri:sosp:2007] hypervisor developed by some of the authors the year before. Interestingly, the adversary is modelled as an explicit piece of code that tries to write to everywhere that it should not. They were able to discover two bugs involving mapping physical pages that contain unapproved code and creating aliases in the virtual address space.
SecVisor: A tiny hypervisor to provide lifetime kernel code integrity for commodity OSes
SecVisor is a hypervisor that virtualizes physical RAM, the MMU and the IO-MMU in order to protect the kernel executable code from modification. It uses the AMD architectural support for hypervisors to provide most of the protection and switches memory protection on every entry/exit from kernel. And it requires small changes to parts of Linux that normally load/modify the kernel executable: boot and module load/unload.
Verifying security invariants in ExpressOS
ExpressOS is a replacement single-threaded Android kernel written in C# running on top of L4/Fiasco.
Boogie: A modular reusable verifier for object-oriented programs
Boogie factors verification of [Spec#][barnett:cassis:2004] programs into
- Generating the Intermediate Verification Language BoogiePL by encoding language semantics, using abstract interpretation and introducing ghost state.
- Verifying BoogiePL by generating Verification Conditions and proving them using “Simplify” (with plans of switch to “Zap”). This combination simplifies and separates the two tasks and allows use of abstract interpretation (good at calculating fixpoints) and theorem proving (good at handling quantification).
The existence of refinement mappings
Theoretical paper justifying the practice of adding auxiliary variables (ghost state) to an implementation to enable refinement proof by providing a completeness proof that a refinement mapping from an implementation S1 to a specification S2 exists if S1 refines S2 and three conditions hold
SpaceSearch: A library for building and verifying solver-aided tools
Coq embedding of [Rosette’s solver aided language][torlak:pldi:2014] inspired by smten.
Boost the impact of continuous formal verification in industry
Proposes a way to use model checking on large codebases when CI is used and lots of regression tests are available (at least one per C function). Like fb-infer, focusses on verifying changes. Approach is to use equivalence checking to find which functions may have changed, select regression tests relevant to those functions and then to generalise inputs of regression tests to increase their coverage.
Multi-prover verification of C programs
Describes “Caduceus”: a tool to translate C code to the WhyML specification language lowering imperative code to pure code and handling pointers. Introduced the specification notation that later became ACSL and a forerunner to [Frama-C][cuoq:sefm:2012].
Scaling symbolic evaluation for automated verification of systems code with Serval
This paper continues the theme of “push-button automation” from the UNSAT group’s earlier work on [Hyperkernel][nelson:sosp:2017], [Yggadrisil][sigurbjarnarson:osdi:2016] and [Nickel][sigurbjarnarson:osdi:2018]. Like the earlier work, they are using the [Z3 SMT solver][demoura:tacas:2008] to verify systems code automatically and they are avoiding adding annotations (“auto-active verification”) to help the verification by restricting the code they verify to “finite interfaces” so that all loops are bounded. The big difference is that the earlier works wrote specifications in Python using the Z3Py library whereas this paper uses [Rosette][torlak:pldi:2014]’s symbolic execution support.
Scalable translation validation of unverified legacy OS code
Describes a toolchain for translation validation of radere2 reverse engineering tool using the [ASL] specification of the [Arm architecture] based on the PVS7 theorem prover.
Noninterference, transitivity, and channel-control security policies
This very highly cited report provides (what I think is) the accepted extension of [Goguen and Meseguer’s notion of interference][goguen:secpriv:1982] to handle intransitive security policies.
Secure information flow by self composition
This paper is a counter to the popular type-based approaches to information flow control. They point out that type-based approaches are sound but not complete and fail on things like “output := secret; output := 0;”. Their solution is a more semantic approach that is based more closely on the basic definitions of non-interference and is sound and complete (but, unlike type-based approaches, is often undecidable and not obviously compositional).
SMACK: Decoupling source language details from verifier implementations
SMACK translates LLVM bitcode files to the [Boogie language][Boogie verifier] (an [intermediate verification language]) allowing it to support any language that LLVM supports (mostly C/C++ in this paper).
Local verification of global invariants in concurrent programs
Describes how VCC was used to verify Hyper-V Hypervisor and PikeOS RTOS. Key idea is “two-state invariants” which are basically specs of parts of the transition relation. Required to be reflexive to allow a stuttering-style verification. Overall name for the technique is “Locally Checked Invariants” (LCI). Unlike alternatives like Concurrent Separation Logic, this is built out of some simple primitives with a small amount of syntactic sugar so that it can be very explicit about object lifetime, object ownership, which fields are protected by each lock, etc. Annotations can introduce ghost state and ghost code to modify that state — this allows them to keep things first-order. Significant annotation burden: 1 line of annotation per line of code. This seems to be partly the cost of flexibility / building everything out of primitives and partly the complexity of the relationships being described. I suspect that more sugar and some annotation inference would make a big difference. They allude to some performance issues that are fixed by changing how the invariants, etc are written. With 1/3 of the 100kloc annotated, it takes around 16 CPU hours to verify the properties — but this is very parallel. Limitation: seems to assume SC or DRF.
A hardware design language for timing-sensitive information flow security
Extends Verilog with security labels and declassification. Labels can be static or dynamic and are tracked using a modified type system with some tricks to improve precision (related to implicit flows). Analysis is (of course) timing sensitive so timing channels are automatically handled. Constraints generated by type system are solved using [the Z3 SMT solver][demoura:tacas:2008]. Evaluated on a MIPS CPU with two additional instructions to set timing labels and PPA evaluated (little impact). Benchmarks include MiBench and OpenSSL. Not clear if any form of label polymorphism is supported.
Complete information flow tracking from the gates up
Builds on ideas also described in Theoretical analysis of gate-level information flow tracking of adding “shadow circuits” that calculate whether each wire/flop in a processor depends on some initial set of untrusted values.
The Flask security architecture: System support for diverse security policies
This paper by some of my former colleagues in the Flux group at the University of Utah describes the Flask microkernel-based operating system. Like its successor SE-Linux, Flask adds fine-grained protection checks into the operating system and separates mechanism from policy by making decisions in a separate policy module that is free to implement whatever policy you want. That is, you are not stuck with whatever Linux or Windows or … provides by default.
On the foundations of quantitative information flow
Argues that standard measurement of size of information leaks based on Shannon entropy are not useful because the number cannot be used to bound the resulting threat. Proposes alternative based on threat that secret can be guessed correctly in a single try that is equivalent to min-entropy. Uses examples to show that this results in different measures for leaks that intuitively seem different (and that Shannon entropy views as same). Tackles general form with non-uniform input distribution and derives simpler formulae for special cases. As it is arguing that prior art has wrong definition, it has good survey of the state of art when written. Highly cited (500+).
Theoretical analysis of gate level information flow tracking
GLIFT (Gate Level Information Flow Tracking) calculates a precise “taint” bit for every signal indicating whether the signal depends on some input signals. GLIFT is added by calculating taint for each primitive gate type (and, or, XOR, not) with an assumption of linearity. This paper looks at how the number of midterms grows with circuit complexity for 1-8 bit adder, multiplier, comparator, shifter and multiplexer. All costs are exponential and quite high.
Secure autonomous cyber-physical systems through verifiable information flow control
This paper ties together several different threads to create a CPS system (autonomous vehicle) that is resistant to many forms of attack. Builds on Jif (Java with security labels), SecVerilog (HDL with security labels), Hyperflow (processor with security labels) and statistical detection of attack based on sensor fusion (sensors plus map data). The Hyperflow processor is not used in the experiment at this stage because they need to port Java/Jif to that architecture. Statistics try to distinguish noise (Gaussian) from attack (uniform) — not clear to me that this is sufficiently robust.
Verifying the Microsoft Hyper-V hypervisor with VCC
This short (3.5 page) paper gives a brief overview of Hyper-V verification. Hyper-V is a high performance commercial hypervisor not written with verification in mind. It is 100kloc of C plus 5kloc x86-64 assembly, uses lock free data structures, simulates 2-stage page translation. Verification makes heavy use of “two-state invariants” and “claims”. Useful as an overview but need to read other papers to get any detail.
SecChisel: Language and tool for practical and scalable security verification of security-aware hardware architectures
Extends Chisel with security labels to track information flow. Uses [the Z3 SMT solver][demoura:tacas:2008] to check but check is based on syntactic structure, not on semantic analysis. That is, it just propagates labels. Suggests this is important for performance.
Verifying constant-time implementations
Tool for verifying constant time behaviour of C code based on self composition preprocessing then using [SMACK][rakamaric:cav:2014] and [Boogie][barnett:fmco:2005]. Parameterized by whether leakage is just PC divergence or also memory access divergence. Correctness of a reduced/simplified version of analysis is proved in Coq and full algorithm “should not present any additional difficulty”. Demonstrated on a very broad range of crypto code, fixed-time fixed-point code. Key idea is to reduce security property (2-safety) to 1-safety property using a product construction exploiting the requirement that, if the code is constant time then there should be no PC divergence between the two runs. A key strength is the ability to be output insensitive: it is ok to leak information that will be leaked anyway. E.g., if the size is part of the intended output, then it is ok for runtime to depend on size even though size may depend on the secret input. This distinguishes it from approaches that use some form of tainting of inputs.
Simple verification of Rust programs via functional purification
Transpiles Rust code to a functional program in Lean ([demoura:cade:2015]) to allow verification of Rust programs in Lean. Demonstrated on a binary search implementation.