Summarizing 12 months of reading papers (2021)

Last year, I wrote about the 122 papers that I read in my first year at Google and that I summarize on the RelatedWork site. Over the last 18 months or so, I’ve spent a lot less time doing the one hour train commute between Cambridge and London so I only read 59 papers in the last year and added 188 papers to the backlog of unread papers. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers.

[Note that the main motivation for writing these paper summaries is to help me learn new research fields. So these summaries are usually not high quality reviews by an expert in the field but, instead, me trying to make sense of a barrage of terminology, notation and ideas as I try to figure out whether the paper is useful to my work. You should write your own summaries of any of the papers I list that sound interesting.]

Zettelkasten

I have found that a great way of organizing my thoughts about papers is as a Zettelkasten. The basic structuring idea is based on creating links between papers and concepts. Every time I come across a new concept in a paper, I create a new page about it and link the paper to that page. Each paper, concept or link added to the Zettelkasten refines my understanding of the research field and that understanding is (partly) captured in the links between concepts, between papers and between papers and concepts. And since every page has back-references to the pages that link to it, I can easily find related papers that have different views of a concept or that improve upon an idea.

Last year, I had only just adopted the Zettelkasten concept and I was often not adding new concept pages until after I came across the concept a second time. This year, I have tried to be more aggressive about adding new concepts. This has turned out to be much easier when I am reading papers in a completely new field because my ignorance makes it easier to spot new concepts. For example, when when I started reading about machine learning every page I read had a bunch of new acronyms like RNN, CNN, ReLU or unfamiliar terms like Softmax, Activation or Attention and I created pages for each of these concepts, looked them up, linked to the wikipedia page (or similar) and linked the current and later papers to the concept.

[I wrote a lot more about Zettelkasten and the tools that support it last year]

Rust and verification

I spent most of the year working on the Rust verification project at Google so, unsurprisingly, many of the papers are about the Rust language with a bit of an emphasis around Rust unsafe code.

Rust language
- Engineering the Servo web browser engine using Rust [anderson:icse:2016]
Rust unsafe code
- Safe systems programming in Rust: The promise and the challenge [jung:cacm:2021]
- Understanding memory and thread safety practices and issues in real-world Rust programs [qin:pldi:2020]
- How do programmers use unsafe Rust? [astrauskas:oopsla:2020]
- Is Rust used safely by software developers? [evans:icse:2020]
Phantom types
- GhostCell: Separating permissions from data in Rust [yanovski:unknown:2021]
- Phantom types and subtyping [fluet:jfp:2006]

Also, some pre-Rust papers that these papers build on.

Dependent types for low-level programming [condit:esop:2007]
Quantifying the performance of garbage collection vs. explicit memory management [hertz:oopsla:2005]
The meaning of memory safety [azevedo:post:2018]
Checking type safety of foreign function calls [furr:pldi:2005]

Symbolic execution, verification and testing

The Rust verification project focused on creating a continuum of verification techniques and tools including fuzz-testing, concolic execution, symbolic execution, bounded model checking and abstract interpretation.

Performance and efficiency
- QSYM: A practical concolic execution engine tailored for hybrid fuzzing [yun:usenix:2018]
- Symbolic execution with SymCC: Don’t interpret, compile! [poeplau:usenix:2020]
- TracerX: Dynamic symbolic execution with interpolation [jaffar:arxiv:2020]
- Efficient state merging in symbolic execution [kuznetsov:pldi:2012]
- Evaluating manual intervention to address the challenges of bug finding with KLEE [galea:arxiv:2018]
- Finding code that explodes under symbolic evaluation [bornholt:oopsla:2018]
- Chopped symbolic execution [trabish:icse:2018]
Hybrid fuzzing
- SAVIOR: Towards bug-driven hybrid testing [chen:sp:2020]
Symbolic memory
- Rethinking pointer reasoning in symbolic execution [coppa:ase:2017]
- A segmented memory model for symbolic execution [kapus:fse:2019]
Lazy initialization
- Generalized symbolic execution for model checking and testing [khurshid:tacas:2003]
- Scalable error detection using boolean satisfiability [xie:popl:2005]
- Under-constrained symbolic execution: Correctness checking for real code [ramos:sec:2015]
- Under-constrained execution: Making automatic code destruction easy and scalable [engler:issta:2007]
- Practical, low-effort equivalence verification of real code [ramos:cav:2011]
Swarming
- Swarm verification techniques [holzmann:ieeetse:2011]
- Swarm testing [groce:issta:2012]
- Scaling symbolic execution using ranged analysis [siddiqui:oopsla:2012]
- A synergistic approach for distributed symbolic execution using test ranges [qiu:icse:2017]
Vacuity checks
- Sanity checks in formal verification [kupferman:concur:2006]
Counterexamples and test generation
- Executable counterexamples in software model checking [gennari:vstte:2018]
- Formal specification and testing of QUIC [mcmillan:sigcomm:2019]
Separation logic
- A local shape analysis based on separation logic [distefano:tacas:2006]
- Beyond reachability: Shape abstraction in the presence of pointer arithmetic [calcagno:sas:2006]

CPUs and security

Microarchitecture
- Hardware faults
  - Silent data corruptions at scale [dixit:arxiv:2021]
  - Cores that don’t count [hochschild:hotos:2021]
- Power viruses
  - GeST: An automatic framework for generating CPU stress-tests [hadjilambrou:ispass:2019]
- Side channels
  - Spectector: Principled detection of speculative information flows [guarnieri:sandp:2020]
  - CacheQuery: Learning replacement policies from hardware caches [vila:pldi:2020]
Security
- CHERI concentrate: Practical compressed capabilities [woodruff:tocs:2019]
- snmalloc: A message passing allocator [lietar:ismm:2019]
- PAC: Practical Accountability for CCF [shamis:arxiv:2021]
- Toward confidential cloud computing: Extending hardware-enforced cryptographic protection to data while in use [russinovich:acmq:2021]

Neural networks / Machine learning

Since I work in a machine-learning part of Google, I have been reading about machine learning.

Language and kernels
- TensorFlow: Large-scale machine learning on heterogeneous distributed systems [abadi:arxiv:2016]
- The tensor algebra compiler [kjolstad:oopsla:2017]
- Sparse GPU kernels for deep learning [gale:arxiv:2020]
Hardware
- In-datacenter performance analysis of a tensor processing unit [jouppi:isca:2017]
- SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training [qin:hpca:2020]
- ExTensor: An accelerator for sparse tensor algebra [hedge:micro:2019]
Sparsity
- The state of sparsity in deep neural networks [gale:arxiv:2019]
- Fast sparse ConvNets [elsen:arxiv:2019]
Scaling
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer [shazeer:arxiv:2017]
- GShard: Scaling giant models with conditional computation and automatic sharding [lepikhin:arxiv:2020]
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [fedus:arxiv:2021]
- Attention is all you need [vaswani:arxiv:2017]

Information flow control

Noninterference for free [bowman:icfp:2015]

Programming languages

The next 700 semantics: A research challenge [krishnamurthi:snapl:2019]

Miscellaneous

Large teams have developed science and technology; small teams have disrupted it [wu:arxiv:2017]
As we may think [bush:atlantic:1945]

Written on October 6, 2021.
The opinions expressed are my own views and not my employer's.