Summarizing 12 months of reading papers (2020)

This is an overview of the papers that I have read over the last 12 months since I joined Google Research. Over the last year, I have read 122 papers and I have added 364 to my paper backlog. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers. As the number of papers grew, I reorganized several times and the site became a bit like a Zettelkasten.

How I organize my notes about papers

I started the year by trying to define a few topics that the papers I read would fit into and trying to maintain overviews of each topic that linked to each paper. This is how I used to organize papers in filing boxes, filing cabinets, stacks on my desk, etc. but it doesn’t work very well. The most obvious reason that it doesn’t work is that some papers span multiple research topics so, whichever topic I assign them to, it will be wrong. More seriously though, I am often reading because I don’t understand the field so how can I possibly know how to organize papers when I first start reading. I also found that I was not updating the overviews because each time I read a paper, I would need to restructure the overview to accomodate what I had just learned.

After a few months, the original system was starting to fail, and I learned about the ZettelKasten method of making notes about papers and concepts. I have not fully adopted this method but I have adopted some of the ideas:

  • If a paper introduces an important looking concept, I create a new page in my notes and link the paper to the note. This avoids duplication of the information and, by adding links to all the referring pages, it makes it easy to find all the papers related to the concept.

  • I vigorously add links between papers and notes: going back to previously read papers and adding links when I add a new concept.

  • I rely on the links to define the structure of the site. That is, I allow topics to emerge and evolve organically as I read papers instead of trying to impose structure on a topic before I even understand it.

The main way that my approach differs from ZettelKasten is that a true ZettelKasten would create notes for every concept whereas I tend to create notes only if I think multiple papers will link to the concept. I should probably follow the ZettelKasten approach more closely and move a lot of the information I currently put in paper summaries into notes about concepts.

What I read

Over the last year, I have mostly been reading about software verification and security including Information flow control, Operating Systems, Verification tools, Separation logic (and, more generally, permission logic), The Rust language, Fuzz testing, the 2005 Bugs workshop, Property-based testing, Test generation, and Google (in, more or less, chronological order).

Each paper links to my summary of the paper. A few caveats about my summaries:

  • Paper summaries are written fairly quickly, in a single pass, with little/no revision.

  • Notes about concepts are always in a state of flux and often contain ‘todo’ comments about how they should be restructured when I find the time.

  • Especially when I am reading about a new topic, the summaries will contain errors because I don’t understand that topic well enough to write a good summary.

  • Since the summaries are mostly intended to help my understanding they may skip explaining concepts that I already understand well. Reading other papers in the same topic may help, or it may not.

  • The summaries are not a substitute for reading the paper yourself and forming your own understanding and opinion. I encourage you to write your own paper summaries and maybe only use my list as one source of ideas of what to read.

Information flow control

Tracking and controlling how information moves through a program – usually for security reasons

  • Secure information flow by self composition [barthe:csfw:2004]
  • SecChisel: Language and tool for practical and scalable security verification of security-aware hardware architectures [deng:hasp:2019]
  • Noninterference, transitivity, and channel-control security policies [rushby:sri:1992]
  • A hardware design language for timing-sensitive information flow security [zhang:asplos:2015]
  • Complete information flow tracking from the gates up [tiwari:asplos:2009]
  • Theoretical analysis of gate level information flow tracking [oberg:dac:2010]
  • On the foundations of quantitative information flow [smith:fossacs:2009]
  • Secure autonomous cyber-physical systems through verifiable information flow control [liu:cpsspc:2018]
  • Verifying constant-time implementations [almeida:security:2016]

Operating Systems

Mostly concerned with verification of OSes and information flow control.

  • seL4: from general purpose to a proof of information flow enforcement [murray:secpriv:2013]
  • Scaling symbolic evaluation for automated verification of systems code with Serval [nelson:sosp:2019]
  • Combining mechanized proofs and model-based testing in the formal analysis of a hypervisor [becker:fm:2016]
  • überSpark: Enforcing verifiable object abstractions for automated compositional security analysis of a hypervisor [vasudevan:usenix:2016]
  • Komodo: Using verification to disentangle secure-enclave hardware from software [ferraiuolo:sosp:2017]
  • CertiKOS: An extensible architecture for building certified concurrent OS Kernels [gu:osdi:2016]
  • End-to-end verification of information flow security for C and assembly programs [costanzo:pldi:2016]
  • Attacking, repairing, and verifying SecVisor: A retrospective on the security of a hypervisor [franklin:cmu:2008]
  • SecVisor: A tiny hypervisor to provide lifetime kernel code integrity for commodity OSes [seshadri:sosp:2007]
  • Verifying security invariants in ExpressOS [mai:asplos:2013]
  • Scalable translation validation of unverified legacy OS code [tahat:fmcad:2019]
  • Verifying the Microsoft Hyper-V hypervisor with VCC [leinenbach:fm:2009]
  • Sound formal verification of Linux’s USB BP keyboard driver [penninckx:nfm:2012]
  • Local verification of global invariants in concurrent programs [cohen:cav:2010]
  • The Flask security architecture: System support for diverse security policies [spencer:security:1999]

Verification tools

With a focus on verification tools.

Separation logic (and, more generally, permission logic)

For reasoning about heap data structures and aliasing.

The Rust language

Rust is designed to be safer, easier to reason about and the type system has some interesting connections with separation logic.

Fuzz testing

Testing a program by throwing random input at it.

Papers from the 2005 Bugs workshop

Property-based testing

Test generation

  • Feedback-directed unit test generation for C/C++ using concolic execution [garg:icse:2013]
  • SUSHI: A test generator for programs with complex structured inputs [braione:icse:2018]
  • Study of integrating random and symbolic testing for object-oriented software [dimjasevic:ifm:2018]
  • TestCov: Robust test-suite execution and coverage measurement [beyer:ase:2019]
  • An introduction to test specification in FQL [holzer:hvc:2010]
  • FShell: Systematic test case generation for dynamic analysis and measurement [holzer:cav:2008]

Google

Miscellaneous

Reflections

Looking back over my research career, I wish that I had started writing paper summaries for myself decades earlier. Some of the benefits I have seen are:

  • Writing the summaries turns the passive act of reading papers into an active act and it forces me to try harder to understand what I am writing about.

  • When I introduce a new concept, I try to go back to all papers that relate to that concept and add links back to the concept. This strengthens my understanding of papers and helps me see different ways that papers and concepts connect to each other.

  • When I am struggling to remember a paper, I can search the repository. This is especially useful when the title of the paper doesn’t describe the content very well or when the part I remember is not the primary point of the paper.

  • A few weeks ago, I wrote a workshop paper with other members of my team. Thanks to the summaries and the way I organize them, I had all of the related work available to me: I had a clear memory of it, I could find the papers, I could quickly remind myself what the paper was about to confirm that I was citing the correct paper.

    Another crucial ingredient was that my project planning documents explained the reason for the project, the design decisions, alternatives, etc. (These documents are not public.)

    I found the experience of writing the paper to be very productive and satisfying. It seemed to match the claims made about ZettelKasten as a way to organize your research into separate stages of

    • reading related work,
    • forming ideas (through the notes on concepts, the project planning documents, etc.),
    • collecting ideas into a paper,
    • and then, finally, reorganizing and rewriting to make a coherent paper.

    Since I had already done most of the work before we even thought about writing the paper, I was mostly able to focus on how to present the ideas without being distracted the earlier phases.

  • When I summarize a paper, I also capture enough metadata that I can construct BibTeX for all the papers. So, if I later cite the paper, I have already done the tedious work of finding the citation, finding a doi entry, etc. for all the papers that I cite. Again, this lets me focus on the act of writing without being distracted by the mechanics of BibTeX files.

    To make citing the papers as easy as possible, I try to make sure that this BibTeX is usable as is by correcting the capitalization of paper titles to overcome the BiBTeX design bug.

    An unexpected side effect of using a machine-generated BibTeX file is that it is also easy to identify papers that my co-authors cite (so that I can read them too).

After trying this for a year, I am planning to continue the practice and I would encourage others to do the same. Whether you choose to make your reading list public or not is up to you. On the plus side, you can share links with people when you mention a paper; on the negative side, I am slightly more guarded/diplomatic about papers that I have problems with.

Other paper summaries

The focus of RelatedWork is on understanding a field so it is about a set of papers, typically over an extended time period, their shared concepts and how they interconnect. Another paper summary site is Adrian Colyer’s excellent “the morning paper.” Adrian reads a broad sweep of brand new papers to keep abreast of a much larger subset of computer science.

Software

The tools I use are just Jekyll, some python scripts and an editor. But people pointed out some other tools that might work better (most of which I have not tried).


Written on October 4, 2020.
The opinions expressed are my own views and not my employer's.