Related Work – Interesting papers

Serberus: Protecting Cryptographic Code from Spectres at Compile-Time

Serberus is a mitigation against [speculative execution] attacks implemented as an extension of LLVM. It relies on the use of Intel Architecture CET features to constrain speculative branches.

Automating constraint-aware datapath optimization using E-graphs

Combines [egraphs], interval analysis and sub-domain equivalences to generate efficient floating point datapath circuits.

End-to-end formal verification of a RISC-V processor extended with capability pointers

Describes verification of [CHERI-Flute] implementation of the [CHERI architecture] for the [RISCV architecture] against the [SAIL language] [ISA specification] of the architecture.

Opening Pandora's box: A systematic study of new ways microarchitecture can leak private data

Looks at more recent microarchitectural optimizations to see whether any are as bad as speculative execution. Spoiler alert: yes.

The fallacy of spec-based design

Describes a process similar to the later [reid:fmcad:2016] of using the official [ISA specification] of a processor directly as a reference model for processor validation as well as for documentation. This was developed during the design of [Intel]’s [Itanium architecture] and then maintained as the architecture evolved. The architecture specification was subject to a strict review policy.

The tensor algebra compiler

The Tensor Algebra COmpiler (TACO) is a compiler for tensor index notation for sparse tensor code (with dense tensors as an easy special case). This early paper about the system introduces a format that generalizes existing sparse formats such as CSR, CSC or DCSR; introduces iteration graphs for capturing the structure of the loop nest that will be generated; and the use of merge lattices for capturing the constraints on iteration due to use of addition or multiplication, etc.

Sparse GPU kernels for deep learning

Scientific sparse matrices tend to be 99% sparse and have many short rows (eg 4-16 elements) while ML matrices tend to be 60-99% sparse and have many moderate length rows (eg 64-256 elements) with much less variability in row lengths.

The state of sparsity in deep neural networks

About techniques for learning [sparse model]s either by learning a model and then making it sparse or by modifying how the initial model is learned or by using a sparse architecture for the model.

Fast sparse ConvNets

Describes the motivation and implementation of sparse convolution operations in XNNPACK (implemented in Arm NEON). Can be used with the [TensorFlow] lite model pruning library that learns sparse representations.

TensorFlow: Large-scale machine learning on heterogeneous distributed systems

Language for expressing machine learning computations using stateful dataflow graphs increasing portability by avoiding leaky abstractions. Replaces an earlier model called DistBelief.

In-datacenter performance analysis of a tensor processing unit

Describes the design and development of [Google]’s [TPU] in 15 months.

SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training

An accelerator for sparse matrix-matrix multiplies that achieves 10.8 TFLOPS (at multiple levels of sparsity) at 22.33W and 65.10mm^2 on a 28nm process.

ExTensor: An accelerator for sparse tensor algebra

The key idea in this paper is to exploit hierarchy based on the observation that “0 * x == 0” not just for scalar zeros but for tensor zeros. So they can exploit “hyper-sparsity” (sparsity at the level of entire rows, columns or tiles) and avoid fetching large amounts of data. And they can potentially avoid computing data that would only have been multiplied by zero.

Attention is all you need

Introduced the [transformer model] and replacing the recurrent layers entirely with [attention][attention function] to improve efficiency.

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

Achieves efficiency (and, hence, scaling) by using a [mixture of experts] approach with thousands of experts and exploiting sparsity in the [gating network] where at most k experts are selected.

GShard: Scaling giant models with conditional computation and automatic sharding

GShard enabled us to scale up multilingual neural machine translation [Transformer model] with Sparsely-Gated [Mixture of Experts] beyond 600 billion parameters using automatic sharding. We demonstrate that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art.

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

Observes that the [mixture of experts] approach suffers from problems in complexity, communication costs and training instabilities. Simplifies it by replacing the “top-k” approach from the [mixture of experts] (where the results from k experts are combined) with a switch that selects just one expert. That is, it uses k=1. This preserves model quality, reduces routing computation and performs better.

GeST: An automatic framework for generating CPU stress-tests

Digital electronics is a myth: the underlying analog nature of digital electronics is revealed by overclocking, voltage droops, resonance and a host of other effects. Designers (and users) find it useful to stress-test CPUs using [power viruses][power virus] that try to create high IPC load, thermal load, voltage noise, etc. at the resonant frequency of the chips power delivery network (PDN). (Matching the PDN’s 1st order resonance frequency maximizes the CPU voltage droops and overshoots.)

Silent data corruptions at scale

This paper explores the same problem as [hochschild:hotos:2021]: CPUs that silently give incorrect results without any detection or correction mechanism (like, for example, ECC on SRAM/DRAM). Unlike radiation-induced failures that are transient, they are interested in more consistent and higher frequency failures that they observe in datacenters due to timing errors, manufacturing variation, degradation and end-of-life wearout and associated with increased transistor density and wider datapaths.

Cores that don't count

Silent Corrupt Execution Errors (CEEs) due to minor manufacturing defects in CPUs cause infrequent corruption of output while otherwise appearing to be fine. (The infrequent aspect is what makes them “silent”.) Google and Facebook are both seeing a few “mercurial” cores per several thousand machines (not clear if they mean CPUs).

CHERI concentrate: Practical compressed capabilities

The core of the [CHERI architecture] is a form of “fat pointer” ([capabilities]) that captures the value of a pointer, lower and upper bounds on what addresses can be dereferenced at offsets from that pointer, permission flags and type flags. The original architecture design had 1+256-bit fat pointers; this paper slims this down to 1+128-bits. (The 1+ part is because an unforgeable tag bit is added to prevent spoofing of capabilities.)

The next 700 semantics: A research challenge

As a step towards making it easier to construct a semantics for all the languages found in the wild, this paper considers the common practice of having a small(ish) core language and a set of desugarings from the surface language.

snmalloc: A message passing allocator

snmalloc is an efficient multiprocessor memory allocator optimized for the asymmetric allocation / deallocation pattern found in pipelined programs.

CacheQuery: Learning replacement policies from hardware caches

Uses Angluin-style learning to synthesize a model of hardware cache eviction policies. Models are readable. Evaluated on a simulator to test what policies it can learn and also on x86 cores to learn L1, L2 and L3 policies and found two new eviction policies.

PAC: Practical accountability for CCF

Permissioned ledgers that use Byzantine fault tolerance can detect misbehaviour provided that there is enough diversity between machines but, if there are only a few cloud providers, you don’t get much diversity.

Toward confidential cloud computing: Extending hardware-enforced cryptographic protection to data while in use

Talks about the use of reduced TCB, encryption at rest and in storage, etc. to increase confidentiality in datacenters and (?) transparency. Relates to the Confidential Computing Consortium (10+ companies from H/W, cloud and OS vendors).

Spectector: Principled detection of speculative information flows

Proposes ‘speculative [non-interference]’ (SNI) as a semantic notion of security against [side-channel] attacks and the use of [symbolic execution] ([concolic execution]) to detect leaks due to [speculative execution] or prove their absence. (More precisely, SNI characterizes “disclosure gadgets”.)

Phantom types and subtyping

When creating a safe interface to code in another language, you want to encode the type system of the other language. This paper describes a general encoding of subtyping hierarchies in the Hindley-Milner type system using [phantom types] and generalizes [reppy:att:1996], [finne:icfp:1999] and [blume:babel:2001].

GhostCell: Separating permissions from data in Rust

This (currently unpublished) draft paper shows how to use phantom types [fluet:jfp:2006] together with [Rust][Rust language]’s lifetime annotations to capture the idea that a data structure should be seen as a single data structure even though it is made up of multiple objects. This gives a zero-overhead, safe mechanism to create data structures with sharing and cycles like a doubly linked list.

Understanding memory and thread safety practices and issues in real-world Rust programs

A manual, empirical study of 850 uses of ‘unsafe’ in Rust and of 170 bugs (70 memory safety, 100 concurrency) based on studies of Servo, Tock, Parity Ethereum, TikV, Redox and five libraries and with most (145) of the bugs being in code after Rust 1.0 (2016). [I think that the choice has a bias towards system code so this may be a biased sample?] (Contrast with the much larger, automated studies in [astrauskas:oopsla:2020] and [evans:icse:2020].) Also describes a use-after-free checker and a double-lock checker for checking unsafe code.

How do programmers use unsafe Rust?

This is an empirical study of unsafe Rust code. It differs from [evans:icse:2020] by doing a more detailed measurement and analysis of the different reasons that unsafe code is being used and that they consider direct uses of unsafe annotations, not indirect uses via dependencies. It is also based on code that is 18 months younger.

Is Rust used safely by software developers?

Empirical study of unsafe annotations in open source Rust crates. Uses a tool to count different forms of unsafe annotation in the crates and the transitive dependencies of safe functions on other functions that contain unsafe annotations.

Checking type safety of foreign function calls

Checks [foreign function interface] calls between O’Caml and C. The important code (i.e., where the bugs happen) is the C code. They use a flow-sensitive type system based on the data representation to detect shape errors and they use an effect system to detect when calls from C to O’Caml (that could cause a GC) are made without appropriate precautions.

Dependent types for low-level programming

Deputy is a [dependent type] system for C that allows programmers to annotate pointers and tagged unions, infers annotations, and mostly performs compile-time checks that array/pointer offsets are in bounds but (to preserve decidability) falls back to run-time checks when necessary.

QSYM: A practical concolic execution engine tailored for hybrid fuzzing

QSYM significantly improves the performance of [concolic execution] of binaries to support [hybrid testing] (a form of [fuzz testing]). The key ideas (based on a detailed examination of all the usual design choices and their problems) are

Symbolic execution with SymCC: Don't interpret, compile!

SymCC is an [LLVM][LLVM compiler]-based [symbolic execution] tool that focuses on performance by compiling the LLVM bitcode instead of interpreting it (as is done by tools like [KLEE][KLEE verifier]). The result is an average speedup (relative to KLEE) of 12 and up to 3 orders of magnitude and a consequent improvement in coverage and bug discovery.

SAVIOR: Towards bug-driven hybrid testing

[SAVIOR] is a [hybrid testing] tool that prioritizes bug-finding over coverage by using a pre-testing analysis that identifies code of interest (i.e., potentially containing bugs) and prioritizes code paths leading to that code. It combines [KLEE][KLEE verifier] and [AFL fuzzer].

TracerX: Dynamic symbolic execution with interpolation

The big problem with dynamic [symbolic execution] is the exponential path explosion resulting from visiting each path independently. One approach is [state merging] (e.g., [kuznetsov:pldi:2012]) which merges similar states so that paths can be brought back together so that the merged path only needs to be explored once.

Practical, low-effort equivalence verification of real code

Performs [differential testing] using the [KLEE][KLEE verifier] [symbolic execution] tool extended with [symbolic memory] and [lazy initialization] to enable under-constrained symbolic execution (see [ramos:sec:2015]).

A local shape analysis based on separation logic

This paper moves beyond [berdine:aplas:2005] to support reasoning about loops by augmenting [symbolic execution] with [abstract interpretation]. This allows them to reason about loops that reverse lists, deallocate entire lists, append lists, copy lists, insert/delete from the middle of a list, delete and filter circular lists, and deallocate ranges of values from sorted lists.

Beyond reachability: Shape abstraction in the presence of pointer arithmetic

This paper combines [separation logic], [abstract interpretation] and some custom abstract domains/transformations to show how separation logic can be used to automatically verify low-level memory manipulation code of the kind found in memory allocators.

Noninterference for free

This paper is so much more than the paper title suggests. From the paper title, I was expecting something about parametricity and non-interference and the conclusion does say

Generalized symbolic execution for model checking and testing

Introduced the idea of [lazy initialization] to allow pointer-manipulating programs to be verified using [symbolic execution] without having to construct a data structure for the program to symbolically execute.

Scalable error detection using boolean satisfiability

[Saturn][Saturn verifier] is a SAT-based bug-finding tool for checking temporal logic formulae (expressed as state machines) on C programs and, in particular, for checking for locking errors in Linux kernel code. A key part of its design is the generation of function summaries using [lazy initialization] to infer the precondition of functions. It achieves a false positive rate of only 40%. A problem (that it shares with successors such as [engler:issta:2007] and [ramos:sec:2015]) is its handling of pointer aliases: the function summaries generated are always trees, not DAGs.

Under-constrained symbolic execution: Correctness checking for real code

Avoids the path explosion of application level symbolic execution by analyzing functions in isolation checking for crashes and also using three checkers for memory leaks, uninitialized data and unsanitized use of user input. Applying to BIND, OpenSSL and Linux kernel, they find 67 new bugs. They also apply this to patches from BIND and OpenSSL and find 12 bugs and verify (with caveats) 115 patches. (On the others, they do not exhaust all execution paths but achieve high coverage.)

Quantifying the performance of garbage collection vs. explicit memory management

This paper uses CPU performance simulators and oracles to measure the impact of replacing Java’s garbage collectors with explicit memory management. They conclude that you need around 5x more physical memory before GC performs better than explicit memory management, it is 17% worse at a mere 3x and 70% worse at 2x memory.

Under-constrained execution: Making automatic code destruction easy and scalable

Extends the [EXE verifier] with support for under-constrained variables. Any errors detected by symbolic execution are only reported if the path condition does not depend on the value of an under-constrained variable. For example, if a pointer is dereferenced, we add a constraint that the pointer is non-null and a new under-constrained variable is created for the value read. This idea apparently originates in Java PathFinder (JPF) ([khurshid:tacas:2003], [xie:popl:2005]).

Engineering the Servo web browser engine using Rust

This is a good intro to the Rust language for PL researchers. It gives the motivation, a description of the challenges of building a high performance web browser, an intro to the ownership and concurrency features of Rust, closures, algebraic data types, a sketch of compilation via monomorphization (including across package boundaries), FFI, macros and plugins.

Formal specification and testing of QUIC

The [QUIC protocol] is a Google-designed protocol providing a secure, authenticated replacement for TLS/TCP (i.e., to replace https). The goal of this paper is to improve on the use of interoperability testing to test the QUIC specification and implementations. They create a “substantial” specification of the QUIC standard in the [Ivy verifier] language from which they generate tests to use with implementations using a compositional approach (assume/guarantee reasoning) and constrained random generation. They found bugs in implementations and a potential DoS in the protocol. Some parts of the informally specified standard were hard to formalize effectively. They focus on safety but discuss liveness.

Executable counterexamples in software model checking

Argues that counterexamples from software model checking (SMC) should take the form of executable mock environments that can be linked with the code under analysis (CUA). These mock environments are a simple form of [test double] that implement each external function with a function that returns a sequence of values to steer the CUA towards failure.

The meaning of memory safety

Aims to define [memory safety] in terms of reasoning principles it enables instead of the errors that memory unsafety manifests. The key idea is the [separation logic] notion of a function’s “footprint” – whether specified by a separation logic predicate or by reachability (from free variables / arguments).

Large teams have developed science and technology; small teams have disrupted it

This paper argues that papers with a small number of authors cause more disruption (introduce new ideas) than those by large teams while papers by large teams tend to develop existing (typically recent) ideas. Interestingly, authors behaviour changes depending on the team size. It uses an extensive data analysis across many different fields and across papers, patents and software to argue each point: the main argument is in the highly recommended first 6 pages but the remaining 47 pages have lots of graphs, explanation of the methodology, etc.

A segmented memory model for symbolic execution

This paper provides a different solution to the problem in [coppa:ase:2017]: how to handle symbolic addresses during symbolic execution. Forking (the default behaviour of [KLEE][KLEE verifier]) every time the pointer target could refer to multiple objects (“multi-resolution”) causes a path explosion while treating the entire memory as a single flat object is inefficient.

Rethinking pointer reasoning in symbolic execution

Symbolic addresses (and array indices) are a problem for symbolic execution that could lead to a large explosion in SMT terms due to the need to add an ‘ite’ expression for every potentially aliasing write. Many symbolic execution tools deal with this by selectively concretizing addresses used for reads, for writes or for both – leading to lower coverage of the program (i.e., unsound results).

Safe systems programming in Rust: The promise and the challenge

This is a good introduction for PL researchers of the [Rust language], it’s type system and its approach to safety. It reuses explanations from the authors’ earlier works ([jung:popl:2017], [jung:popl:2020]) but with more space dedicated to explanation making this the best place to start in exploring their work as well as a good way for PL people to map Rust concepts onto their existing concepts.

Swarm verification techniques

Swarm verification uses parallelism and (most importantly) diversity to hunt for violations using model checkers. The goal is to find violations within some limited time budget, not to prove absence of violations using an exhaustive search.

Swarm testing

This paper adapts the ideas of [swarm verification] ([holzmann:ieeetse:2011]) in a testing context. Like [swarm verification], the idea is to use a diverse range of configurations to get better coverage than you would get with a single configuration (even if it is carefully chosen). Swarm testing makes better use of a fixed time budget and reduces the effort needed to tune testing configurations.

Chopped symbolic execution

This adds a form of lazy evaluation to symbolic execution: deferring execution of code until the analysis actually needs it.

Scaling symbolic execution using ranged analysis

Key insights: a path can be encoded by a concrete test input; a set of paths can be encoded by a pair of paths (all paths lexicographically between the two paths).

A synergistic approach for distributed symbolic execution using test ranges

Test ranges (from [siddiqui:oopsla:2012]) compactly represent sets of program paths. using a pair of paths to define a range of paths based on lexicographic ordering of the paths and can be used to divide the symbolic execution tree into a number of subtrees that can be explored in parallel.

Efficient state merging in symbolic execution

Classic [symbolic execution] suffers from path explosions due to forking the state on every conditional branch. This paper explores how to selectively merge states to push symbolic execution further along the [symbolic evaluation] spectrum while stopping short of merging states at every join point as in [bounded model checking]. By not merging at every join point, they aim to retain the benefits of high levels of concrete execution, simpler solver queries and flexibility of search strategies to explore the most interesting paths first.

Sanity checks in formal verification

Writing specifications is hard. It is especially hard to write specifications that don’t leave things out (underconstrain) or say too much (overconstrain). This paper deals with two particular problems and argues that they can be tackled with the same analysis

Evaluating manual intervention to address the challenges of bug finding with KLEE

This paper looks at how to find and fix verification performance bugs encountered with KLEE; builds a bug corpus for the evaluation; and categorizes different kinds of problem.

Finding code that explodes under symbolic evaluation

Like [galea:arxiv:2018] (which is also worth a read), this paper looks at how to find verification bottlenecks in symbolic evaluation. They propose a symbolic profiling approach that works for any symbolic evaluator (symbolic executor, BMC or hybrids) and, using a simple but effective metric to rank hotspots, is effective at highlighting the root cause of the problem.

As we may think

This frequently referred to article was published just after the end of the 2nd World War by Vannevar Bush (the Director of the Office of Scientific Research and Development). It is a swords-to-ploughshares best known for the Memex concept that influenced creators of hypertext, etc.

Modern code review: A case study at Google

Code review goes back to heavyweight practices in the ’70s but this paper looks at modern, lightweight review as practiced at Google since the early days. Modern code review is

Lessons from building static analysis tools at Google

This paper is an update on their earlier paper ([sadowski:icse:2015]) that brings the story up to date and fills in some of the background stories about previous failures that informed their later approach and some of the successes. Despite the overlap, it is worth reading both of them because the two articles have a different emphasis and the overlap is not too severe.

Large-scale automated refactoring using ClangMR

ClangMR is a refactoring tool that scales to systems of the size of Google’s monorepo ([potvin:cacm:2016]). It consists of three phases:

Tricorder: Building a program analysis ecosystem

This paper describes the experience of deploying a static analysis infrastructure at Google. This infrastructure (“Tricorder”) had to support multiple languages, tens of thousands of users, large code scale, etc. Most importantly though, it had to be accepted by developers if it was going to provide any benefit.

Why Google stores billions of lines of code in a single repository

This paper describes Google’s monorepo approach to managing source code in a single repository, the pros, the cons and what was needed to support this approach.

API usability at scale

How can you evaluate the usability of web APIs? They gather usage data from the Google API explorer for 26 APIs looking for explorations that produce 4xx (i.e., HTTP error codes). They find that three particular APIs are especially error prone.

Spanner: Google's globally distributed database

Spanner is a scalable, globally-distributed, temporal database that shards data across many [Paxos] instances. It features automatic failover, resharding and migration. Although not explicitly stated, it seems to be a replacement for Bigtable ([chang:tocs:2012]) and Megastore. One of the early clients was to support F1: an ads database.

Extended static checking: A ten-year perspective

Super-interesting background and explanation of the choices made in ESC/Modula-3 and ESC/Java. Both a form of [auto-active verifier][auto-active verification] (although that term was not used by the author for another 10 years.

QuickCheck: A lightweight tool for random testing of Haskell programs

QuickCheck takes a test-based approach to formal verification. You write universally-quantified properties and QuickCheck uses random testing ([fuzz testing]) to check whether the property holds. QuickCheck was originally written for Haskell but various people have since implemented it for around 40 different languages.

Parameterized unit tests

Parameterized unit tests are a form of [property-based testing] that generalizes traditional unit tests. The tests can be instantiated with concrete values to serve as traditional tests, they can be verified using [symbolic execution] or, to support modular verification, they can be used as axioms in the spirit of algebraic specification to use when verifying calls to a library.

Feedback-directed unit test generation for C/C++ using concolic execution

This paper describes a test generation tool based on a hybrid of [symbolic execution] and [fuzz testing]. The goal is to generate high coverage sequences of method calls.

Study of integrating random and symbolic testing for object-oriented software

Describes and evaluates JDOOP: a hybrid fuzzer/symbolic executor for Java that uses random search to quickly build coverage and symbolic execution to hit hard to hit branch conditions. JDOOP is a combination of RANDOOP and JDART.

FUDGE: Fuzz driver generation at scale

Addresses the problem of generating fuzzing test harnesses for libraries by automatically generating harnesses based on example uses in existing clients. The goal is to increase the productivity of users and it is expected that a human will examine the generated harnesses for legality, etc. before they are used to report bugs.

DeepState: Symbolic unit testing for C and C++

DeepState seeks to narrow the gap between techniques that programmers are familiar with (e.g., testing) and symbolic execution / binary analysis tools by enabling the use of parameterized unit tests with a variety of symbolic execution backends (angr, Manticore and Dr. Fuzz) and fuzzers and making it easy to switch between backends.

Locating defects is uncertain

This one-page, position paper talks about bug localization. Failures are seen as the symptom of a chain of infection that starts with some initial cause/infection.

Soundness and its role in bug detection systems

This one-page, position paper talks about three different, competing costs in a tool: soundness (no false negatives); cost; and usability (which relates to false positives). They argue:

The soundness of bugs is what matters (position statement)

This one-page, position paper is “written in a provocative style for entertainment purposes” but, nevertheless, makes some very good points. The main argument is that the goal of most verification research is defect detection and that they should therefore focus on finding “sound bugs” (bugs that can actually occur) and avoid “unsound bugs” (false alarms). This is why (despite its limitations) testing remains so important: testing finds sound bugs. The solution is to switch from “may analysis” (over-approximation) to “must analysis” ([under-approximation]).

Issues in deploying software defect detection tools

This one-page, position paper talks about what makes a bug-finding tool useful and usable in an industrial setting. The slides are worth reading since they say things not present in the paper.

False positives over time: A problem in deploying static analysis tools

This one-page, position paper by one of the cofounders of Coverity makes the interesting observation (that is obvious once it has been said)

The human in formal methods

This invited talk looks at “how to bring the other 90% into the fold” of formal methods with a focus on education, comfort with formal methods and effectiveness with formal methods. This focus leads to emphasizing model finding such as the [Alloy verifier], [SAT solver]s and [SMT solver]s over deductive methods.

Relational test tables: A practical specification language for evolution and security

Relational test tables describe sets of traces and constraints on those traces and (the relational part) between traces and support for stuttering (to allow for timing differences).

COASTAL: Combining concolic and fuzzing for Java (competition contribution)

This is a nice short, readable description of Coastal which

SUSHI: A test generator for programs with complex structured inputs

[Search based test generation] alone is unable to generate inputs that have complex structures such as dependencies between different data structures.

STARS: Rise and fall of minicomputers [scanning our past]

This describes the history of minicomputers from the late ’50s through to the ’80s. One of the things I found fascinating (because I had not heard it before) is that many minicomputers were sold to OEMs who incorporated the minis into products where they were used for industrial control and things like that. The paper argues that we should distinguish these “classic minicomputers” from minicomputers that are defined more by their price relative to mainframes.

Bell's law for the birth and death of computer classes

This paper argues that “A new computer class forms and approximately doubles each decade, establishing a new industry.” and names this “Bell’s Law”. This is a consequence of Moore’s Law decreasing the price of computer hardware and “intense, competitive entrepreneurial action.”

Software model checking

This 2009 [survey] of software [model checking] covers a lot of ground from the roots in hardware model checking through to its application to software. A number of the key algorithms are described in pseudocode.

Boolean satisfiability from theoretical hardness to practical success

This paper is a great intro/overview of SAT solving: what it is, why it is (NP) hard, how SAT solvers work, the role of benchmarks and competitions, the massive performance improvements in the 10 years prior to it being published and the industrial impact.

Test-case reduction via test-case generation: Insights from the Hypothesis reducer

This paper proposes a general purpose way of getting [test-case reduction] “for free”. The paper shows that it “provides adequate reduction for any generated test case, while avoiding the test-case validity problem.”

Model checking: Algorithmic verification and debugging

This [Turing Award] paper/talk is a great overview of [model checking: definition, [temporal logic], what it is good for, challenges (mostly state-space explosion), tricks (like [symbolic model checking], [bounded model checking], [partial order reduction] and [CEGAR]).

Enhancing symbolic execution with veritesting

This paper attacks the path explosion problem of [symbolic execution] by using a hybrid of dynamic symbolic execution (DSE) and static symbolic execution (SSE) that they call “veritesting”. Their tool “MergePoint” starts with concolic execution (a form of DSE) from some seed but opportunistically switches to SSE whenever the paths ahead contain straightforward code that SSE can cope with. They find that the additional cost due to having larger, more complex SMT queries is compensated for by the exponentially smaller number of paths

Code-level model checking in the software development workflow

This very readable paper describes the successful introduction of a formal verification flow into the development of the Amazon C Common library. Anyone trying to introduce formal verification into a traditional development team should read this paper!

An empirical study of the reliability of UNIX utilities

This is the paper that started the field of [fuzz testing], coined the name, and tells the origin story.

Selective symbolic execution

The [S2E verifier] can analyze application code, complex GUI libraries, kernel code, device drivers, and even hardware devices. It does this by alternating between concrete execution (using QEMU) and [symbolic execution] (using the [KLEE verifier]). Switching modes makes it more flexible but it also makes it more efficient because (slower) symbolic execution can be restricted to the code of most interest which might be a library, a recently changed code path, etc.

The art, science, and engineering of fuzzing: A survey

This large (21 pages, 240 references) survey of [fuzz testing] proposes a taxonomy and standardized terminology. They use a “model fuzzer” throughout the survey to motivate/explain fuzzer design choices.

Fuzzing: Hack, art, and science

Nice overview article that describes whitebox fuzzing such as the [SAGE verifier] in the context of blackbox fuzzing, blackbox fuzzing, greybox fuzzing, hybrid fuzzing, test generation, security analysis, etc.

Symbolic execution for software testing: Three decades later

This is a survey of symbolic execution with an emphasis on the [KLEE verifier]. It is probably a better introduction to the concepts of symbolic execution than the [baldoni:compsurv:2018] survey but this makes it less complete.

A survey of symbolic execution techniques

This [survey] reinforces my impression that [symbolic execution] is a beautifully simple idea that, like most beautifully simple ideas, requires a lot of optimizations to make it scale. As the structure of the paper shows, most of the survey is concerned with the optimizations.

Fuzzing: On the exponential cost of vulnerability discovery

Using a large empirical study of fuzzing (using AFL and libFuzzer), this paper explores the cost of finding vulnerabilities (or increasing coverage).

An introduction to test specification in FQL

Proposes a query language [FQL] for specifying coverage goals that can be used for

FShell: Systematic test case generation for dynamic analysis and measurement

FShell adapts the “program as database” metaphor previously used by the [BLAST verifier] from model checking to test generation. It generates families of testcases based on coverage criteria specified using the “FShell” query language ([FQL]) that (I think) was introduced in this paper.

Satisfiability modulo theories: Introduction and applications

This paper is a great intro/overview of [SMT solver]: what it is, how it builds on SAT, an example of how one particular theory (difference arithmetic) is implemented, and the many applications of SMT solvers in program analysis, program verification, etc.

Software verification: Testing vs. model checking

Compare test generators against model checkers. The test generators are AFL-Fuzz, CPATiger, CREST-PPC, FShell, [KLEE verifier] and PRTest. The model checkers are [CBMC verifier], CPA-seq, ESBMC-incr, ESBMC-kInd.

TestCov: Robust test-suite execution and coverage measurement

TestCov is a tool that measures the coverage of a test suites and generates a reduced test suite. It is used in the [Test competition] and can work with all the tools that compete in that competition.

Boosting fuzzer efficiency: An information theoretic perspective

Greybox fuzzers prioritise which seeds to explore in order to (hopefully) maximise some metric such as coverage. This paper proposes an information theoretic model (entropy) for evaluating the effectiveness of each seed. This forms the basis of an “entropy based power schedule” to assign more energy to the seeds that elicit the most information. The approach is inspired by “Active SLAM”: an approach for autonomous robot exploration of unknown terrain.

Model checking

This is a good survey of model checking that lays out the main concepts and why they matter:

Verification of safety functions implemented in Rust: A symbolic execution based approach

This paper looks at how to use the [KLEE verifier] to verify code implementing a state machine written in Rust based on their earlier work ([lindner:indin:2018]). The challenge here is that KLEE is a [symbolic execution] tool and does not guarantee to explore all paths through the code to a sufficient depth to guarantee soundness. Their solution lies in constructing an appropriate verification harness.

No panic! Verification of Rust programs by symbolic execution

This paper describes the cargo-Rust that allows the [KLEE verifier] to be used to verify Rust programs. KLEE is a [symbolic execution] tool so using it to verify code leads to a path explosion problem. The solution taken in this paper is to use a [contract driven development] approach to enable [modular verification]. This reduces the path explosion problem to only involve the number of paths per function instead of the number of paths through the entire program.

Simple verification of Rust programs via functional purification

Transpiles Rust code to a functional program in Lean ([demoura:cade:2015]) to allow verification of Rust programs in Lean. Demonstrated on a binary search implementation.

RustHorn: CHC-based verification for Rust programs

Verifies Rust programs by converting basic blocks in the [MIR] representation of the code into Constrained Horn Clauses (CHCs) that are then verified using Horn clause support in Z3 or HoIce.

Benchmarking solvers, SAT-style

Written as an encouragement to the computer algebra community to create a [verification competition]. It explains why [verification competition]s are designed the way they are. It also has an analysis of how improving time to solve affects scores where improvements might be shaving a second off the time, adding parallelism, random variations in time, etc.

Fractional permissions without the fractions

The standard way of distinguishing read and write permissions in separation logic is with [fractional permissions][bornat:popl:2005] in which you can write a resource if you have 100% of the resource read a resource if you have some fraction of the resource and you have no access if you have 0% of the resource. When you replicate references to a resource, you split the fraction into some smaller parts and when you are done with those references, you recombine (sum) all their fractions.

StkTokens: Enforcing well-bracketed control flow and stack encapsulation using linear capabilities

This paper follows on from [an earlier paper by the same authors][skorstengaard:esop:2018] that looked at how mutually distrusting pieces of code could safely call each other. In particular, it was concerned about whether the stack and return address were what the caller expected when they called the function.

Dafny: An automatic program verifier for functional correctness

Dafny is an [auto-active program verifier][auto-active verification] for an imperative language with functional and object oriented features that uses the [Boogie][leino:tacas:2010] intermediate verification language and the [Z3][demoura:tacas:2008] SMT solver for automated proof.

The Boogie verification debugger

A critical part of any verification tool is a debugger that helps you understand and localise any errors detected. For dynamic verification (i.e., testing), we typically use a debugger like gdb/lldb to halt the program at interesting points and inspect the state of the system. For static verification (i.e., formal verification), it initially seems quite different because the verification tool typically does not run the program from the beginning and does not generate concrete values for all variables. I think the message of this paper is that it is not as hard as it seems.

The ramifications of sharing in data structures

This paper tackles a generalized form of the [ramification problem][krishnaswami:tldi:2010]. That is, it tackles the problem that separation logic is good at reasoning about pointer-based structures such as trees when you can split the heap into disjoint parts but separation logic’s weakness had been reasoning about structures such as dags that rely on sharing. In particular, reasoning about something like a garbage collector has been hard.

Developing verified programs with Dafny

[Dafny][leino:lpair:2010] is both a language and a verification tool for creating verified programs. The language has features of object-oriented languages and functional languages. The verification support is based on contract-style verification. This short, easy read seems to be the accompaniment for a tutorial and discusses verification of six different functions that demonstrates contracts and the specification notation, loop invariants, immutable inductive datatypes, mutable datatypes, use of pure functions in specifications, classes, ghost-fields, invariants, and lemmas. Proofs of lemmas are especially interesting because the lemmas are just ghost methods and the body of those methods are the proofs of the lemmas. e.g., to write an inductive proof, one writes a recursive function using a case split to separate the base case from the inductive step.

Reasoning about a machine with local capabilities

This paper is concerned with reasoning about software running on a capability machine such as CHERI and, in particular, proving that the calling convention ensures proper nesting of function calls no matter how buggy or malicious the function you’re calling may be. It enforces that, if a function returns, it can only return to the return address it was given, with the stack pointer it was given and that it cannot use some other address or stack pointer or even re-use values from some earlier invocation of the function.

Specified blocks

Program verification based on assertions is fundamentally wrong. Like Ptolemy’s planetary theory of cycles within cycles, they are overly complex and we need something as simple as Galileo’s theory of elliptical orbits. This paper forcefully argues that they should be replaced with “specified blocks” that form a specification of an entire block of code. That is, we should describe what a block of code is intended to do instead of what is true at strategic points in the code.

Local reasoning about while-loops

As [Hehner had previously discovered][hehner:vstte:2008], we are reasoning about loops in the wrong way. Instead of using loop invariants that describe what the loop has done so far, we should be using loop specifications that describe what the loop will do in the future. This change enables local reasoning (in the usual “separation logic” sense) about loops because it focusses on what the loop does and because it eliminates the need to describe partial data structures such as the initial segment of a list.

Verifying event-driven programs using ramified frame properties

Separation logic’s strength is that it let’s you reason locally about heap-manipulating programs by letting you split the heap into disjoint parts. But some data structures don’t easily split into parts and you have to maintain global consistency properties across the structure because they “use mutation and assignment as a way of globally broadcasting information to the rest of the program state.” This paper tackles an example of such a structure: an instance of the subject-observer pattern like that found in a spreadsheet where each node contains a list of the nodes it depends on and also of the nodes that depend on it so that, when a node changes value, nodes that depend on it can be invalidated/updated.

Specification and verification: The Spec# experience

[Spec#][barnett:cassis:2004] was a Microsoft Research project that added automated verification to C# through the addition of method contracts, class invariants and loop invariants. This was all integrated into the Visual Studio IDE to make for a very different user experience from normal verification: they wanted verification to be integrated into the development flow. This paper gives a retrospective on the project with a particular focus on efforts to get the techniques adopted.

Separation logic and abstraction

This paper tackles three problems:

Verification of concurrent programs with Chalice

This is a tutorial on the Chalice object-oriented language and verifier. Methods have contracts, loops have invariants, pointers have ownership annotations, there are threads, locks and fractional ownership, there are lock orderings to detect deadlock and there are ghost variables and fold/unfold ghost statements.

Compositional shape analysis by means of bi-abduction

One of the big challenges in automated verification is the annotation burden. In particular, to make verification compositional, we need contracts for all the functions but writing and maintaining contracts for a large codebase is a significant overhead and discovering the contracts for an existing codebase is a lot of work. This paper focusses on discovering contracts that describe what parts of a heap-allocated data structure a function depends on in order to support separation logic based verification of memory safety along the lines of [Smallfoot][berdine:fmco:2005].

Smallfoot: Modular automatic assertion checking with separation logic

Smallfoot is a tool for automatically verifying both sequential and concurrent heap-manipulating programs that is based on (concurrent) separation logic and symbolic execution. The proof theoretic foundations of the tool are described in a [companion paper][berdine:aplas:2005].

Symbolic execution with separation logic

This paper describes the proof-theoretic basis for the Smallfoot verification tool described in a [companion paper][berdine:fmco:2005]. See the other paper for a description.

Implicit dynamic frames: Combining dynamic frames and separation logic

Permission logics are Hoare-style logics for reasoning about heap allocated data structures and whether a piece of code has permission to access a given part of the structure. Their particular strength is the ability to reason about the lack of aliases – drawing on ideas from linear logic. The best known permission logic is [separation logic][ohearn:cacm:2019]; another permission logic is dynamic frames. This paper tackles the problem that dynamic frames have a high annotation overhead because of the need to define and manipulate “frame annotations” for each method. Their solution is to infer the frame information directly from the access assertions in the pre/post-conditions of functions.

Lightweight support for magic wands in an automatic verifier

Verifying a loop that walks over a heap data structure is tricky. At the start, we might have a pointer to the root of the structure; in some intermediate iteration of the loop, we have a pointer into the middle of the structure and the nodes we have visited so far have been changed in some way; and then, at the end, the entire structure from the root down has been fully changed. The difficulty is that, to verify the loop, we have to write an invariant that describes the changes to the nodes that we have visited so far. That is, we have to describe a partial data structure: from the root down to the current position in the list.

Software verification with VeriFast: Industrial case studies

This paper reports on four case studies using [VeriFast][jacobs:nfm:2011] to verify real code for an absence of safety violations such as illegal memory accesses or data races.

Viper: A verification infrastructure for permission-based reasoning

[Viper][Viper verifier] is like [Boogie][barnett:fmco:2005] in that it is an [intermediate verification language] (IVL) that can support multiple language frontends and also multiple verification backends. The most important difference is that Viper is based on a form of [permission logic] that is well suited to reasoning about heap based data structures.

Leveraging Rust types for modular specification and verification

This paper exploits the close similarity between Rust’s type system and permission based reasoning approaches such as [separation logic][ohearn:cacm:2019] and implicit dynamic frames logic. Specifically, they describe the embedding of Rust programs into the [Viper intermediate verification language][muller:vmcai:2016] in a way that exploits the properties of Rust’s borrow-based type system. This embedding is (partially?) implemented in the Prusti verification tool that translates Rust to Viper and translates any errors back from Viper to Rust. Their focus at the moment is on “safe” code: code that obeys the Rust type system.

VeriFast: A powerful, sound, predictable, fast verifier for C and Java

VeriFast is an auto-active verification tool for C and Java based on separation logic and SMT solvers. This paper is a nice overview of the state of the project in 2011 when they had

seL4: from general purpose to a proof of information flow enforcement

Most operating systems do two things: they isolate processes from each other; and they allow some limited communication between processes. There have been multiple attempts over more than 30 years to prove that an operating system actually achieves these goals; this is the first really convincing such proof. The proof is about the seL4 microkernel that had already been [formally verified][klein:sosp:2009] as implementing its formal specification. The paper takes us step by step through the proof as well as being very clear about the assumptions and gaps in the verification and the effort required for the verification.

Permission accounting in separation logic

Separation logic is about tracking ownership of objects and, usually, this means full ownership: you either own an object and can do anything you want with it or you have no access at all. This paper is about sharing ownership as in the reader-writer pattern where an object can be shared between several readers or ownership can be given to a single writer.

Separation logic

This survey paper updates [Reynolds 2002 survey of Separation Logic][reynolds:lics:2002] with an overview of the developments, applications and extensions in Separation Logic and Concurrent Separation Logic over the last 20 years.

Separation logic: a logic for shared mutable data structures

Separation logic is an extension of Hoare logic for reasoning about programs that use pointers. In particular, it copes well with aliasing (by providing a compact way of saying that a set of pointers do not alias) and a “frame rule” that lets you reason locally about a piece of code without getting bogged down in the surrounding context. This paper introduces separation logic, illustrates its use to verify a number of list, tree and dag algorithms [this is one of the real strengths of this paper] and discusses a lot of the work by the author and others on developing, defining, using and extending the different forms of separation logic that existed at the time (2002).

A solver for reachability modulo theories

This paper describes the Corral tool for finding bugs in device drivers using a variant of bounded model checking. Like many bounded model checkers, they inline all functions and unroll all loops up to some bound – increasing the bound until some limit or until a bug is found. What is interesting is that, instead of inlining all functions at once, they incrementally inline functions according to some heuristics. They also use a CEGAR-like mechanism to control variable abstraction.

Sound formal verification of Linux's USB BP keyboard driver

This paper reports on verification of a simple Linux USB keyboard driver using VeriFast. Except, of course, Linux drivers are not simple: they have asynchronous callbacks, dynamically allocated memory, synchronization and interact with complex Linux driver APIs including USB, input and spinlocks. This work found bugs in the driver and the fixes have been accepted by the maintainer.

VeriFast: Imperative programs as proofs

VeriFast is a symbolic evaluator based on [Separation Logic][reynolds:lics:2002] (wikipedia) for verifying C and Java code. Specifications can use inductive datatypes, primitive recursive functions and abstract predicates. As the name suggests, a key feature is performance though the abstract also mentions predictability.

Annotation inference for separation logic based verifiers

One of the challenges in automatic program verification tools such as the [VCC verifier] and the [VeriFast verifier] is the amount of annotation required to write the specification and guide the verification tool to construct a proof.

Stacked borrows: An aliasing model for Rust

As in [the Rustbelt paper][jung:popl:2017], the topic of this paper is the soundness of Rust code that is marked “unsafe” because it needs to do something that cannot be expressed within the restrictions of Rust’s type system but that the programmer (hopefully) considers to be ok. In particular, this paper defines what the “unsafe” code is and (crucially) is not allowed to do based on what they want Rust types to mean.

The case for writing a kernel in Rust

The Rust language is increasingly being proposed as a safe replacement for C in systems programming. This paper (by the authors of TockOS) explores some of the key challenges in doing so. In particular, they are interested in how they can write low-level code without having to weaken Rust’s safety story by using the “unsafe” feature of Rust.

Verifying Rust programs with SMACK

The Rust programming language promises a balance between safety and control that makes it an interesting target for formal verification work. Unfortunately, some of the earliest verification tools supporting Rust are not maintained. [SMACK][rakamaric:cav:2014] is an LLVM-based verification toolchain that translates LLVM IR into verification conditions. In principle, it should be able to cope with any language that can be translated to LLVM IR but every language has its quirks and this paper describes additional features and analyses required to support the Rust language well. An interesting aspect of basing this on LLVM is that it allows verification of programs that combine Rust and C (or any other language that LLVM supports).

Crust: A bounded verifier for Rust

One of the key strengths and weaknesses of Rust is the ability to extend its typesystem with libraries that do not follow Rust’s typesystem (i.e., they are “unsafe”) but that (we hope) do not break the Rust semantics that Rust’s typesystem is meant to enforce. This paper describes a bounded model checking approach to verifying these libraries for the kinds of error that violate Rust’s typesystem. The tool is demonstrated on three libraries and is able to detect bugs in old versions of the libraries.

A type system for expressive security policies

These days, many people define security properties based on some form of intransitive information flow specification. But in the ’90s, Fred Schneider and others were working on security automata that would monitor the execution of the program and terminate it if the program violated the security specification. For example, a specification might say that if a program has read a file containing a secret, it cannot then open a network connection.

RustBelt: Securing the foundations of the Rust programming language

Rust aims to archive the holy grail of a language that gives low-control over resource management and safe high-level abstractions. It does this using a type system that restricts programs to eliminate unsafe programming practices coupled with a practice of extending that type system with libraries that are observably safe but that internally use “unsafe” operations. This paper provides a framework for proving that these libraries do not break the safety guarantees of the standard type system. Along the way, the paper gives a nice insight into the way that the community as a whole has been developing.

Alias types

One of the difficult things about reasoning about programs that modify pointer-based data structures is the possibility of aliases. If p and q alias each other then you need to recognize that writing to “*p” will modify “*q”. Around the late 90s, there was a lot of work on how best to reason about aliasing. Nowadays, a lot of academic interest seems to have settled on using separation logic and a lot of practical interest seems to have settled on Rust’s ownership types or C++11’s smart pointers, etc. But what makes this paper interesting to me is that it tackled one of the outstanding problems in the [TALx86 paper][morrisett:wcsss:1999]. In particular, because they were interested in reasoning about assembly code, they could not rely on simplifications made in high-level languages such as

TALx86: A realistic typed assembly language

Typed assembly language is normal assembly language (x86 in this case) where any branch targets are labelled with the type of all the registers. The main purpose of the types is to make the assembly code type/memory safe and so the focus is on distinguishing pointers from non-pointers and distinguishing the size and type of what each pointer points to. This paper is one of the later papers in a series on typed assembly language and also related to the [Cyclone language]: a type/memory-safe C substitute.

End-to-end verification of information flow security for C and assembly programs

Information-flow control tackles two of the classic Confidentiality-Integrity-Availability (CIA) aspects of security. This paper presents a framework that applies IFC to the mCertiKOS microkernel in order to prove confidentiality properties. The proof consists of Coq proofs about individual system calls and a general framework relating lower-level “unwinding conditions” to the overall security properties. The proof covers both C and x86 assembly code in mCertiKOS and includes reasoning about page tables.

A precise yet efficient memory model for C

A memory model (in this context) is basically “what is a pointer”? Is a pointer an identifier for an object or an integer index into an array of bytes? How do you handle fields of objects? How do you handle pointer arithmetic? etc. After some experimentation (described in a lovely historical note in section 7), the authors came up with a model that can handle many of the awkward features of C: pointers into the middle of objects, pointers past the end of objects, pointer casts, unions, bitfields, etc. This formed part of the [VCC tool][cohen:cav:2010] that was used to reason about real, complex OS-level code (that needed many of these features).

Frama-C: A software analysis perspective

Frama-C is a platform for verifying C programs. This foundational article describes the architecture of this platform, the analyses built into Frama-C and the additional plugins that can be used with it and provides references to those other plugins. The paper cites many related papers about individual plugins and case studies.

CertiKOS: An extensible architecture for building certified concurrent OS Kernels

This is one of a steady stream of papers from the FLINT group at Yale on creating a formally verified OS. Some distinguishing features of CertiKOS are the completeness of the verification (cf. seL4), the way that the C code and the proofs are split into more than 30 very small layers each adding one feature at a time and greatly simplifying the proof of each layer and their use of an extended version of the CompCert C compiler. The layered structure is strongly reminiscent of Bryan Ford’s work on Recursive Virtual Machines – unsurprising since Bryan was an early contributor to this project. The use of CompCert avoids the need to perform translation validation (cf. seL4 work). This paper adds concurrency with fine-grained locking and progress guarantees to CertiKOS.

Formal verification of a memory allocation module of Contiki with Frama-C: a case study

This paper describes the use of [Frama-C][cuoq:sefm:2012] to specify and verify the Contiki memory allocator. The description is very clear and contains enough of the specification that it should be easy to reproduce or adapt to other allocators. Use of the specification is demonstrated on a simple program that allocates two objects and then frees them. The annotation burden is about 154 lines to 139 lines of code (around 110%) – which seems to be about normal for Frama-C.

überSpark: Enforcing verifiable object abstractions for automated compositional security analysis of a hypervisor

Verifies properties of a concurrent micro-hypervisor uXMHF using the [Frama-C verifier]) with some custom plugins and using CompCert to compile. Assembly language and a hardware model are accomodated by invoking via function calls and providing a C equivalent function that accesses shadow C variables that model changes in hardware state. Interestingly, the hypervisor also allows unverified plugins - but all unverified plugins are run in a sandbox (and incur higher overhead).

Combining mechanized proofs and model-based testing in the formal analysis of a hypervisor

Describes formal verification in Coq of a model of the commercial NOVA micro-hypervisor. NOVA is a hypervisor in the L4 family that supports DMA access (protected by IOMMU). Very extensive testing is used to compare the model against the actual implementation using over 12 million conformance tests.

-Overify: Optimizing programs for fast verification

This paper explores the benefit of using different optimization options (or even custom transformations) when compiling a program for optimization. Their optimization options are over 5000x better than -O0 and 14x better than -O3 for verification using KLEE on their motivating example. Compared with -O3, their main goal is to reduce the amount of information lost during compilation. They mention but do little to address the problem of undefined behaviour in C causing different compilations to behave differently although they did look for this in their evaluation and found that they detected the same bugs with -O0, -O3 and -Overify.

A lightweight symbolic virtual machine for solver-aided host languages

Rosette is a Racket DSL for implementing Solver-aided Domain Specific Languages (SDSLs): tools based on solvers that support angelic execution, debugging, verifying properties and synthesis. A key feature of Rosette is “symbolic reflection” – that allows Racket code to be symbolically evaluated (in addition to being executable in the normal way).

An abstract interpretation framework for refactoring with application to extract methods with contracts

There is a lot to recommend Design by Contract where each function/method is equipped with contracts (pre/post-conditions). But the cost is that you have to create and maintain all those contracts. This paper addresses an interesting part of the problem that arises while refactoring code: creating a contract for any functions/methods extracted. The insight here is that this is really a matter of refactoring the proof that the original code meets its contracts. The goal they set is that the new contract must meet four criteria: validity, safety, completeness and generality.

Komodo: Using verification to disentangle secure-enclave hardware from software

Komodo implements functionality similar to SGX but is implemented in formally verified software on ARMv7 processors instead of being implemented in hardware and microcode. The introduction makes a very convincing argument that this is the right way to deploy security extensions because it can be updated much faster, and the verification is transparent (“it replaces folklore with formal verification”). They verify both functional properties and security properties ([non-interference]). (Komodo lacks the memory encryption features of SGX – that would benefit from having hardware support. Komodo is also limited to a single processor at the moment.)

Attacking, repairing, and verifying SecVisor: A retrospective on the security of a hypervisor

Uses the Murphi model checker to formalize the combined software and hardware security features in the [SecVisor][seshadri:sosp:2007] hypervisor developed by some of the authors the year before. Interestingly, the adversary is modelled as an explicit piece of code that tries to write to everywhere that it should not. They were able to discover two bugs involving mapping physical pages that contain unapproved code and creating aliases in the virtual address space.

SecVisor: A tiny hypervisor to provide lifetime kernel code integrity for commodity OSes

SecVisor is a hypervisor that virtualizes physical RAM, the MMU and the IO-MMU in order to protect the kernel executable code from modification. It uses the AMD architectural support for hypervisors to provide most of the protection and switches memory protection on every entry/exit from kernel. And it requires small changes to parts of Linux that normally load/modify the kernel executable: boot and module load/unload.

Verifying security invariants in ExpressOS

ExpressOS is a replacement single-threaded Android kernel written in C# running on top of L4/Fiasco.

Boogie: A modular reusable verifier for object-oriented programs

Boogie factors verification of [Spec#][barnett:cassis:2004] programs into

Generating the Intermediate Verification Language BoogiePL by encoding language semantics, using abstract interpretation and introducing ghost state.
Verifying BoogiePL by generating Verification Conditions and proving them using “Simplify” (with plans of switch to “Zap”). This combination simplifies and separates the two tasks and allows use of abstract interpretation (good at calculating fixpoints) and theorem proving (good at handling quantification).

The existence of refinement mappings

Theoretical paper justifying the practice of adding auxiliary variables (ghost state) to an implementation to enable refinement proof by providing a completeness proof that a refinement mapping from an implementation S1 to a specification S2 exists if S1 refines S2 and three conditions hold

SpaceSearch: A library for building and verifying solver-aided tools

Coq embedding of [Rosette’s solver aided language][torlak:pldi:2014] inspired by smten.

Boost the impact of continuous formal verification in industry

Proposes a way to use model checking on large codebases when CI is used and lots of regression tests are available (at least one per C function). Like fb-infer, focusses on verifying changes. Approach is to use equivalence checking to find which functions may have changed, select regression tests relevant to those functions and then to generalise inputs of regression tests to increase their coverage.

Multi-prover verification of C programs

Describes “Caduceus”: a tool to translate C code to the WhyML specification language lowering imperative code to pure code and handling pointers. Introduced the specification notation that later became ACSL and a forerunner to [Frama-C][cuoq:sefm:2012].

Scaling symbolic evaluation for automated verification of systems code with Serval

This paper continues the theme of “push-button automation” from the UNSAT group’s earlier work on [Hyperkernel][nelson:sosp:2017], [Yggadrisil][sigurbjarnarson:osdi:2016] and [Nickel][sigurbjarnarson:osdi:2018]. Like the earlier work, they are using the [Z3 SMT solver][demoura:tacas:2008] to verify systems code automatically and they are avoiding adding annotations (“auto-active verification”) to help the verification by restricting the code they verify to “finite interfaces” so that all loops are bounded. The big difference is that the earlier works wrote specifications in Python using the Z3Py library whereas this paper uses [Rosette][torlak:pldi:2014]’s symbolic execution support.

Scalable translation validation of unverified legacy OS code

Describes a toolchain for translation validation of radere2 reverse engineering tool using the [ASL] specification of the [Arm architecture] based on the PVS7 theorem prover.

Noninterference, transitivity, and channel-control security policies

This very highly cited report provides (what I think is) the accepted extension of [Goguen and Meseguer’s notion of interference][goguen:secpriv:1982] to handle intransitive security policies.

Secure information flow by self composition

This paper is a counter to the popular type-based approaches to information flow control. They point out that type-based approaches are sound but not complete and fail on things like “output := secret; output := 0;”. Their solution is a more semantic approach that is based more closely on the basic definitions of non-interference and is sound and complete (but, unlike type-based approaches, is often undecidable and not obviously compositional).

SMACK: Decoupling source language details from verifier implementations

SMACK translates LLVM bitcode files to the [Boogie language][Boogie verifier] (an [intermediate verification language]) allowing it to support any language that LLVM supports (mostly C/C++ in this paper).

Local verification of global invariants in concurrent programs

Describes how VCC was used to verify Hyper-V Hypervisor and PikeOS RTOS. Key idea is “two-state invariants” which are basically specs of parts of the transition relation. Required to be reflexive to allow a stuttering-style verification. Overall name for the technique is “Locally Checked Invariants” (LCI). Unlike alternatives like Concurrent Separation Logic, this is built out of some simple primitives with a small amount of syntactic sugar so that it can be very explicit about object lifetime, object ownership, which fields are protected by each lock, etc. Annotations can introduce ghost state and ghost code to modify that state — this allows them to keep things first-order. Significant annotation burden: 1 line of annotation per line of code. This seems to be partly the cost of flexibility / building everything out of primitives and partly the complexity of the relationships being described. I suspect that more sugar and some annotation inference would make a big difference. They allude to some performance issues that are fixed by changing how the invariants, etc are written. With 1/3 of the 100kloc annotated, it takes around 16 CPU hours to verify the properties — but this is very parallel. Limitation: seems to assume SC or DRF.

A hardware design language for timing-sensitive information flow security

Extends Verilog with security labels and declassification. Labels can be static or dynamic and are tracked using a modified type system with some tricks to improve precision (related to implicit flows). Analysis is (of course) timing sensitive so timing channels are automatically handled. Constraints generated by type system are solved using [the Z3 SMT solver][demoura:tacas:2008]. Evaluated on a MIPS CPU with two additional instructions to set timing labels and PPA evaluated (little impact). Benchmarks include MiBench and OpenSSL. Not clear if any form of label polymorphism is supported.

Complete information flow tracking from the gates up

Builds on ideas also described in Theoretical analysis of gate-level information flow tracking of adding “shadow circuits” that calculate whether each wire/flop in a processor depends on some initial set of untrusted values.

The Flask security architecture: System support for diverse security policies

This paper by some of my former colleagues in the Flux group at the University of Utah describes the Flask microkernel-based operating system. Like its successor SE-Linux, Flask adds fine-grained protection checks into the operating system and separates mechanism from policy by making decisions in a separate policy module that is free to implement whatever policy you want. That is, you are not stuck with whatever Linux or Windows or … provides by default.

On the foundations of quantitative information flow

Argues that standard measurement of size of information leaks based on Shannon entropy are not useful because the number cannot be used to bound the resulting threat. Proposes alternative based on threat that secret can be guessed correctly in a single try that is equivalent to min-entropy. Uses examples to show that this results in different measures for leaks that intuitively seem different (and that Shannon entropy views as same). Tackles general form with non-uniform input distribution and derives simpler formulae for special cases. As it is arguing that prior art has wrong definition, it has good survey of the state of art when written. Highly cited (500+).

Theoretical analysis of gate level information flow tracking

GLIFT (Gate Level Information Flow Tracking) calculates a precise “taint” bit for every signal indicating whether the signal depends on some input signals. GLIFT is added by calculating taint for each primitive gate type (and, or, XOR, not) with an assumption of linearity. This paper looks at how the number of midterms grows with circuit complexity for 1-8 bit adder, multiplier, comparator, shifter and multiplexer. All costs are exponential and quite high.

Secure autonomous cyber-physical systems through verifiable information flow control

This paper ties together several different threads to create a CPS system (autonomous vehicle) that is resistant to many forms of attack. Builds on Jif (Java with security labels), SecVerilog (HDL with security labels), Hyperflow (processor with security labels) and statistical detection of attack based on sensor fusion (sensors plus map data). The Hyperflow processor is not used in the experiment at this stage because they need to port Java/Jif to that architecture. Statistics try to distinguish noise (Gaussian) from attack (uniform) — not clear to me that this is sufficiently robust.

Verifying the Microsoft Hyper-V hypervisor with VCC

This short (3.5 page) paper gives a brief overview of Hyper-V verification. Hyper-V is a high performance commercial hypervisor not written with verification in mind. It is 100kloc of C plus 5kloc x86-64 assembly, uses lock free data structures, simulates 2-stage page translation. Verification makes heavy use of “two-state invariants” and “claims”. Useful as an overview but need to read other papers to get any detail.

SecChisel: Language and tool for practical and scalable security verification of security-aware hardware architectures

Extends Chisel with security labels to track information flow. Uses [the Z3 SMT solver][demoura:tacas:2008] to check but check is based on syntactic structure, not on semantic analysis. That is, it just propagates labels. Suggests this is important for performance.

Verifying constant-time implementations

Tool for verifying constant time behaviour of C code based on self composition preprocessing then using [SMACK][rakamaric:cav:2014] and [Boogie][barnett:fmco:2005]. Parameterized by whether leakage is just PC divergence or also memory access divergence. Correctness of a reduced/simplified version of analysis is proved in Coq and full algorithm “should not present any additional difficulty”. Demonstrated on a very broad range of crypto code, fixed-time fixed-point code. Key idea is to reduce security property (2-safety) to 1-safety property using a product construction exploiting the requirement that, if the code is constant time then there should be no PC divergence between the two runs. A key strength is the ability to be output insensitive: it is ok to leak information that will be leaked anyway. E.g., if the size is part of the intended output, then it is ok for runtime to depend on size even though size may depend on the secret input. This distinguishes it from approaches that use some form of tainting of inputs.