Summarizing 12 months of reading papers (2020)

This is an overview of the papers that I have read over the last 12 months since I joined Google Research. Over the last year, I have read 122 papers and I have added 364 to my paper backlog. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers. As the number of papers grew, I reorganized several times and the site became a bit like a Zettelkasten.

How I organize my notes about papers

I started the year by trying to define a few topics that the papers I read would fit into and trying to maintain overviews of each topic that linked to each paper. This is how I used to organize papers in filing boxes, filing cabinets, stacks on my desk, etc. but it doesn’t work very well. The most obvious reason that it doesn’t work is that some papers span multiple research topics so, whichever topic I assign them to, it will be wrong. More seriously though, I am often reading because I don’t understand the field so how can I possibly know how to organize papers when I first start reading. I also found that I was not updating the overviews because each time I read a paper, I would need to restructure the overview to accomodate what I had just learned.

After a few months, the original system was starting to fail, and I learned about the ZettelKasten method of making notes about papers and concepts. I have not fully adopted this method but I have adopted some of the ideas:

If a paper introduces an important looking concept, I create a new page in my notes and link the paper to the note. This avoids duplication of the information and, by adding links to all the referring pages, it makes it easy to find all the papers related to the concept.
I vigorously add links between papers and notes: going back to previously read papers and adding links when I add a new concept.
I rely on the links to define the structure of the site. That is, I allow topics to emerge and evolve organically as I read papers instead of trying to impose structure on a topic before I even understand it.

The main way that my approach differs from ZettelKasten is that a true ZettelKasten would create notes for every concept whereas I tend to create notes only if I think multiple papers will link to the concept. I should probably follow the ZettelKasten approach more closely and move a lot of the information I currently put in paper summaries into notes about concepts.

What I read

Over the last year, I have mostly been reading about software verification and security including Information flow control, Operating Systems, Verification tools, Separation logic (and, more generally, permission logic), The Rust language, Fuzz testing, the 2005 Bugs workshop, Property-based testing, Test generation, and Google (in, more or less, chronological order).

Each paper links to my summary of the paper. A few caveats about my summaries:

Paper summaries are written fairly quickly, in a single pass, with little/no revision.
Notes about concepts are always in a state of flux and often contain ‘todo’ comments about how they should be restructured when I find the time.
Especially when I am reading about a new topic, the summaries will contain errors because I don’t understand that topic well enough to write a good summary.
Since the summaries are mostly intended to help my understanding they may skip explaining concepts that I already understand well. Reading other papers in the same topic may help, or it may not.
The summaries are not a substitute for reading the paper yourself and forming your own understanding and opinion. I encourage you to write your own paper summaries and maybe only use my list as one source of ideas of what to read.

Information flow control

Tracking and controlling how information moves through a program – usually for security reasons

Secure information flow by self composition [barthe:csfw:2004]
SecChisel: Language and tool for practical and scalable security verification of security-aware hardware architectures [deng:hasp:2019]
Noninterference, transitivity, and channel-control security policies [rushby:sri:1992]
A hardware design language for timing-sensitive information flow security [zhang:asplos:2015]
Complete information flow tracking from the gates up [tiwari:asplos:2009]
Theoretical analysis of gate level information flow tracking [oberg:dac:2010]
On the foundations of quantitative information flow [smith:fossacs:2009]
Secure autonomous cyber-physical systems through verifiable information flow control [liu:cpsspc:2018]
Verifying constant-time implementations [almeida:security:2016]

Operating Systems

Mostly concerned with verification of OSes and information flow control.

seL4: from general purpose to a proof of information flow enforcement [murray:secpriv:2013]
Scaling symbolic evaluation for automated verification of systems code with Serval [nelson:sosp:2019]
Combining mechanized proofs and model-based testing in the formal analysis of a hypervisor [becker:fm:2016]
überSpark: Enforcing verifiable object abstractions for automated compositional security analysis of a hypervisor [vasudevan:usenix:2016]
Komodo: Using verification to disentangle secure-enclave hardware from software [ferraiuolo:sosp:2017]
CertiKOS: An extensible architecture for building certified concurrent OS Kernels [gu:osdi:2016]
End-to-end verification of information flow security for C and assembly programs [costanzo:pldi:2016]
Attacking, repairing, and verifying SecVisor: A retrospective on the security of a hypervisor [franklin:cmu:2008]
SecVisor: A tiny hypervisor to provide lifetime kernel code integrity for commodity OSes [seshadri:sosp:2007]
Verifying security invariants in ExpressOS [mai:asplos:2013]
Scalable translation validation of unverified legacy OS code [tahat:fmcad:2019]
Verifying the Microsoft Hyper-V hypervisor with VCC [leinenbach:fm:2009]
Sound formal verification of Linux’s USB BP keyboard driver [penninckx:nfm:2012]
Local verification of global invariants in concurrent programs [cohen:cav:2010]
The Flask security architecture: System support for diverse security policies [spencer:security:1999]

Verification tools

With a focus on verification tools.

Bounded model checking and Model checking
- Software model checking [jhala:compsurv:2009]
- Model checking: Algorithmic verification and debugging [clarke:cacm:2009]
- Code-level model checking in the software development workflow [chong:icse:2020]
- Software verification: Testing vs. model checking [beyer:hvc:2017]
- Model checking [mcmillan:ecs:2003]
- A lightweight symbolic virtual machine for solver-aided host languages [torlak:pldi:2014]
Symbolic execution
- Symbolic execution for software testing: Three decades later [cadar:cacm:2013]
- The art, science, and engineering of fuzzing: A survey [manes:ieeetse:2019]
- A survey of symbolic execution techniques [baldoni:compsurv:2018]
- Enhancing symbolic execution with veritesting [avgerinos:icse:2014]
- Selective symbolic execution [chipounov:hotdep:2009]
- -Overify: Optimizing programs for fast verification [wagner:hotos:2013]
- COASTAL: Combining concolic and fuzzing for Java (competition contribution) [visser:tacas:2020]
Auto active verification – requires input from user but all interaction is in terms of the original program.
- Extended static checking: A ten-year perspective [leino:informatics:2001]
- Dafny: An automatic program verifier for functional correctness [leino:lpair:2010]
- Developing verified programs with Dafny [leino:icse:2013]
SAT and SMT
- Boolean satisfiability from theoretical hardness to practical success [malik:cacm:2009]
- Satisfiability modulo theories: Introduction and applications [demoura:cacm:2011]
- Benchmarking solvers, SAT-style [nyxbrain:sc2:2017]
SMACK: Decoupling source language details from verifier implementations [rakamaric:cav:2014]
The Boogie verification debugger [legoues:sefm:2011]
Boogie: A modular reusable verifier for object-oriented programs [barnett:fmco:2005]
Specification and verification: The Spec# experience [barnett:cacm:2011]
VeriFast: Imperative programs as proofs [jacobs:vstte:2010]
Software verification with VeriFast: Industrial case studies [philippaerts:scp:2014]
VeriFast: A powerful, sound, predictable, fast verifier for C and Java [jacobs:nfm:2011]
A solver for reachability modulo theories [lal:cav:2012]
A precise yet efficient memory model for C [cohen:entcs:2009]
Frama-C: A software analysis perspective [cuoq:sefm:2012]
Multi-prover verification of C programs [filliatre:fem:2004]
Formal verification of a memory allocation module of Contiki with Frama-C: a case study [mangano:crisis:2016]
SpaceSearch: A library for building and verifying solver-aided tools [weitz:icfp:2017]

Separation logic (and, more generally, permission logic)

For reasoning about heap data structures and aliasing.

Separation logic [ohearn:cacm:2019]
VeriFast: Imperative programs as proofs [jacobs:vstte:2010]
Software verification with VeriFast: Industrial case studies [philippaerts:scp:2014]
VeriFast: A powerful, sound, predictable, fast verifier for C and Java [jacobs:nfm:2011]
Viper: A verification infrastructure for permission-based reasoning [muller:vmcai:2016]
Lightweight support for magic wands in an automatic verifier [schwerhoff:ecoop:2015]
Specified blocks [hehner:vstte:2008]
Local reasoning about while-loops [tuerk:vstte:2010]
Alias types [smith:esop:2000]
TALx86: A realistic typed assembly language [morrisett:wcsss:1999]
Fractional permissions without the fractions [heule:ftfjp:2011]
The ramifications of sharing in data structures [hobor:popl:2013]
Verifying event-driven programs using ramified frame properties [krishnaswami:tldi:2010]
Separation logic and abstraction [parkinson:popl:2005]
Verification of concurrent programs with Chalice [leino:fosad:2007]
Compositional shape analysis by means of bi-abduction [calcagno:popl:2009]
Smallfoot: Modular automatic assertion checking with separation logic [berdine:fmco:2005]
Symbolic execution with separation logic [berdine:aplas:2005]
Implicit dynamic frames: Combining dynamic frames and separation logic [smans:ecoop:2009]
Permission accounting in separation logic [bornat:popl:2005]
Separation logic: a logic for shared mutable data structures [reynolds:lics:2002]
Annotation inference for separation logic based verifiers [vogels:fmoods:2011]

The Rust language

Rust is designed to be safer, easier to reason about and the type system has some interesting connections with separation logic.

RustBelt: Securing the foundations of the Rust programming language [jung:popl:2017]
Stacked borrows: An aliasing model for Rust [jung:popl:2020]
The case for writing a kernel in Rust [levy:apsys:2017]
Viper: A verification infrastructure for permission-based reasoning [muller:vmcai:2016]
Leveraging Rust types for modular specification and verification [astrauskas:oopsla:2019]
Verifying Rust programs with SMACK [baranowski:atva:2018]
Verification of safety functions implemented in Rust: A symbolic execution based approach [lindner:indin:2019]
No panic! Verification of Rust programs by symbolic execution [lindner:indin:2018]
RustHorn: CHC-based verification for Rust programs [matsushita:esop:2020]
Crust: A bounded verifier for Rust [toman:ase:2015]
Simple verification of Rust programs via functional purification [ullrich:msc:2016]

Fuzz testing

Testing a program by throwing random input at it.

An empirical study of the reliability of UNIX utilities [miller:cacm:1990]
Fuzzing: Hack, art, and science [godefroid:cacm:2020]
FUDGE: Fuzz driver generation at scale [babic:fse:2019]
The art, science, and engineering of fuzzing: A survey [manes:ieeetse:2019]
Fuzzing: On the exponential cost of vulnerability discovery [bohme2:fse:2020]
Boosting fuzzer efficiency: An information theoretic perspective [bohme:fse:2020]
Feedback-directed unit test generation for C/C++ using concolic execution [garg:icse:2013]
COASTAL: Combining concolic and fuzzing for Java (competition contribution) [visser:tacas:2020]

Papers from the 2005 Bugs workshop

The soundness of bugs is what matters (position statement) [godefroid:bugs:2005]
False positives over time: A problem in deploying static analysis tools [chou:bugs:2005]
Issues in deploying software defect detection tools [cok:bugs:2005]
Locating defects is uncertain [zeller:bugs:2005]
Soundness and its role in bug detection systems [xie:bugs:2005]

Property-based testing

QuickCheck: A lightweight tool for random testing of Haskell programs [claessen:icfp:2000]
DeepState: Symbolic unit testing for C and C++ [goodman:ndss:2018]
Parameterized unit tests [tillmann:fse:2005]
Test-case reduction via test-case generation: Insights from the Hypothesis reducer [maciver:ecoop:2020]

Test generation

Feedback-directed unit test generation for C/C++ using concolic execution [garg:icse:2013]
SUSHI: A test generator for programs with complex structured inputs [braione:icse:2018]
Study of integrating random and symbolic testing for object-oriented software [dimjasevic:ifm:2018]
TestCov: Robust test-suite execution and coverage measurement [beyer:ase:2019]
An introduction to test specification in FQL [holzer:hvc:2010]
FShell: Systematic test case generation for dynamic analysis and measurement [holzer:cav:2008]

Google

Lessons from building static analysis tools at Google [sadowski:cacm:2018]
Modern code review: A case study at Google [sadowski:icse-seip:2018]
Large-scale automated refactoring using ClangMR [wright:icsm:2013]
Tricorder: Building a program analysis ecosystem [sadowski:icse:2015]
Why Google stores billions of lines of code in a single repository [potvin:cacm:2016]
Spanner: Google’s globally distributed database [corbett:tocs:2013]
API usability at scale [macvean:ppig:2016]

Miscellaneous

The human in formal methods [krishnamurthi:fm:2019]
Relational test tables: A practical specification language for evolution and security [weigl:arxiv:2019]
STARS: Rise and fall of minicomputers [scanning our past] [bell:procieee:2014]
Bell’s law for the birth and death of computer classes [bell:cacm:2008]
StkTokens: Enforcing well-bracketed control flow and stack encapsulation using linear capabilities [skorstengaard:popl:2019]
Reasoning about a machine with local capabilities [skorstengaard:esop:2018]
A type system for expressive security policies [walker:popl:2000]
An abstract interpretation framework for refactoring with application to extract methods with contracts [cousot:oopsla:2012]
The existence of refinement mappings [abadi:tcs:1991]
Boost the impact of continuous formal verification in industry [monteiro:arxiv:2019]

Reflections

Looking back over my research career, I wish that I had started writing paper summaries for myself decades earlier. Some of the benefits I have seen are:

Writing the summaries turns the passive act of reading papers into an active act and it forces me to try harder to understand what I am writing about.
When I introduce a new concept, I try to go back to all papers that relate to that concept and add links back to the concept. This strengthens my understanding of papers and helps me see different ways that papers and concepts connect to each other.
When I am struggling to remember a paper, I can search the repository. This is especially useful when the title of the paper doesn’t describe the content very well or when the part I remember is not the primary point of the paper.
A few weeks ago, I wrote a workshop paper with other members of my team. Thanks to the summaries and the way I organize them, I had all of the related work available to me: I had a clear memory of it, I could find the papers, I could quickly remind myself what the paper was about to confirm that I was citing the correct paper.

Another crucial ingredient was that my project planning documents explained the reason for the project, the design decisions, alternatives, etc. (These documents are not public.)

I found the experience of writing the paper to be very productive and satisfying. It seemed to match the claims made about ZettelKasten as a way to organize your research into separate stages of
- reading related work,
- forming ideas (through the notes on concepts, the project planning documents, etc.),
- collecting ideas into a paper,
- and then, finally, reorganizing and rewriting to make a coherent paper.
Since I had already done most of the work before we even thought about writing the paper, I was mostly able to focus on how to present the ideas without being distracted the earlier phases.
When I summarize a paper, I also capture enough metadata that I can construct BibTeX for all the papers. So, if I later cite the paper, I have already done the tedious work of finding the citation, finding a doi entry, etc. for all the papers that I cite. Again, this lets me focus on the act of writing without being distracted by the mechanics of BibTeX files.

To make citing the papers as easy as possible, I try to make sure that this BibTeX is usable as is by correcting the capitalization of paper titles to overcome the BiBTeX design bug.

An unexpected side effect of using a machine-generated BibTeX file is that it is also easy to identify papers that my co-authors cite (so that I can read them too).

After trying this for a year, I am planning to continue the practice and I would encourage others to do the same. Whether you choose to make your reading list public or not is up to you. On the plus side, you can share links with people when you mention a paper; on the negative side, I am slightly more guarded/diplomatic about papers that I have problems with.

Other paper summaries

The focus of RelatedWork is on understanding a field so it is about a set of papers, typically over an extended time period, their shared concepts and how they interconnect. Another paper summary site is Adrian Colyer’s excellent “the morning paper.” Adrian reads a broad sweep of brand new papers to keep abreast of a much larger subset of computer science.

Software

The tools I use are just Jekyll, some python scripts and an editor. But people pointed out some other tools that might work better (most of which I have not tried).

The Archive
nvAlt for Macs. (This is the tool that first got me hooked on Markdown)
Obsidian
Roam
Zettlr

Written on October 4, 2020.
The opinions expressed are my own views and not my employer's.