Alastair Reid

How to improve the RISC-V specification

2024-04-27T00:00:00+00:00

My main project is to create an executable spec of the Intel Architecture but, every now and then, I get to take a broader look at ISA specifications and think about the strengths and weaknesses of other ISA specs: what makes them work well; and what techniques they could borrow from other specifications. Earlier this month, someone asked me for my thoughts on the RISC-V specification and I thought that it would be useful to share what I found out.

This post is about the RISC-V specification. This specification effectively consists of several different parts each an essential part of defining the RISC-V architecture. The most obvious part of the specification are RISC-V Instruction Set Manual, volume 1 and RISC-V Instruction Set Manual, volume 2: PDF documents that describe the unprivileged and privileged parts of the architecture. If you are developing a RISC-V processor, you will probably make heavy use of the RISC-V Test Suites to check that your processor correctly implements the architecture. And you will probably be using the Sail RISC-V model as a formal specification of the architecture and using Pydrofoil to execute the Sail spec as a “golden reference simulator” to compare the behavior of your processor against. (And, if you are on the bleeding edge, you might also be using the Sail spec in a formal verification flow similar to ISA-Formal [reid:cav:2016].) And you may still be using Spike RISC-V ISA Simulator which was the original “golden reference model” because some ISA extensions still aren’t specified in Sail.

[This post was edited on 28th April and 1 May to clarify what I said about the Sail spec and the status of the Spike simulator.]

What’s wrong with this picture?

Overall, the specification is not in a very healthy state.

The two PDF documents rely heavily on natural language text in the specification with the result that the description is not always very precise and it is impossible to test the descriptions to determine whether they match the other artifacts.

A common complaint about the test suites is that the RISC-V architecture is very highly configurable with many subsets and implementation choices allowed but the test suites are not easily configured to check the full envelope of allowed behaviors. Some of these issues stems from the challenge of creating a machine-readable description of which particular configuration you intended to build. (This also limits the ability to use Spike as a golden reference model.) If you intend to build something outside the range of configurations that Spike and the test suite support, you will end up with a large number of “waivers” documenting the places where your processor is expected to fail the tests.

Finally, there is the Sail model written in the Sail ISA specification language. This is a precise, executable, formal specification of the architecture but it is not used as much as it should be. Ideally, parts of the Sail specification would be included in the official PDF documents so that the Sail code can add the precision that the natural language spec lacks and the natural language specification can make up for the almost total absence of comments in the Sail specification.

Connecting the pieces: machine-readable formats

The most important problem is that these four artifacts seem to be completely disconnected from each other. They are maintained as four separate codebases in four separate repositories and no part of one artifact is generated from part of any other artifact. If you are writing an architecture extension, you have to write the same thing four times in four separate formats and then you have to use testing and review to check that all four copies are the same.

But worse than this, there is a whole ecosystem around RISC-V who will also be writing code in response to your architecture extension. There are hardware designers, hardware validation teams, developers of software tools (such as assemblers, compilers and JITs), OS hackers and debugger developers, etc. that are all going to have to write some code to support or benefit from any architecture extension. They also have to write all this code by hand and then review and test the code to make it consistent with the official spec.

It is worth noting that most architecture extensions evolve between the first design and the final, ratified design: evaluating that design requires tool support; socializing and reviewing the design requires documentation. So each extension is typically implemented many times and each revision requires updates in many places.

The easiest way to improve this would be to capture as much of the architecture as possible in formats that are easy to read and manipulate. In particular, instruction encodings and control/status registers are easily described by simple JSON/YAML/XML/… formats. It doesn’t take long to figure out what data you need to include to support generation of many different artifacts from such files:

The name and a short description (for generating documentaton).
The size, bitfields, fixed bits, etc.
For registers, there will be a numeric id that is used to identify the register.
What constraints are there on using the instruction or register? Is it usable in hypervisor mode? supervisor mode? usermode?
For register fields, is the field readable? is it writable? what is its reset value? what is the meaning of different values that this field can have?
For instructions, what is the assembly syntax? (See Bidirectional ARM Assembly Syntax Specifications)
Which configurations support each entry, field or value?

I’ve probably missed a few important pieces of information but you get the idea.

Of course, creating this easy-to-use, highly-reusable machine readable spec is just the first part. To make a difference, you have to modify the architecture reference manual, the Spike simulator, the testsuite and the Sail specification to use this data. You have to find every part of these artifacts that currently contains information about instructions or CSRs and refactor them around code that is automatically generated from the json/yaml/xml file. Doing this is critical to making it equally possible to auto-generate code for any new architecture extensions.

And, as that is being done, people should be looking at the extended ecosystem of build tools, libraries, OSes, etc. to autogenerate as much of them as possible from the machine readable data files.

Making Sail easier for non-PL people to read

A complaint that I often hear about the Sail specification is that it is a bit inaccessible for normal users. The Sail language design builds on a number of popular academic languages and programming language research ideas such as the OCaml Functional Programming Language, Liquid types, Monads, and Effect Systems. The challenge with drawing on these different sources is that the primary audience for the specification is not programming language (PL) researchers but people who work on hardware design, hardware validation, OS developers, compiler developers, etc. — many of whom are not familiar with the academic programming language literature and are confused and put off by the syntax of Sail. To illustrate this, consider the following Sail example from early in the Sail reference manual:

val my_replicate_bits : forall 'n 'm, 'm >= 1 & 'n >= 1.  (int('n), bits('m)) -> bits('n * 'm)

If you are not familiar with OCaml and liquid types, some questions that you might ask are “What are the quote marks for in 'm and 'n?” and “What does forall 'n 'm mean?”

There are a range of improvements that can be made here from changing the syntax to applying sensible defaults to reduce the amount of clutter from the type system. (Ideally, this would be done by adopting some of the syntax of the ASL-1.0 ISA specification language which has many of the same features but was designed with more focus on use in documentation.)

Counter-intuitively, the success of Sail is a barrier to improving this situation. If radical changes are made, people may be concerned that there has been a change in the meaning of the specification, not just its appearance.

[For clarity, note that I am not suggesting that Liquid types, etc. are removed from the language: only that the syntax is simplified and decluttered in the common case.]

Including Sail in the documentation

The other problem with the Sail specification is more fundamental but also easier to solve: the Sail spec is not included in the architecture manual. The easiest option would be to include the entire spec as a giant appendix at the back of the document. But much better would to put each part of the Sail spec on the same page as the natural language description of the same feature so that the Sail spec can amplify your understanding of the natural language spec and the natural language spec can help you understand the Sail spec.

The frustrating thing here is that this actually a solved problem. The tools to pick out snippets of LaTeX to use in docs has existed for a long time, and there has been a version of the unpriv spec with instruction semantics properly interleaved since 2019.

An even better integration of Sail, in my opinion, is the way that the Sail instruction specs are integrated into the CHERI ISAv9 spec (e.g., see page 176). This reduces a lot of the clutter and looks a lot more like normal ISA specs.

[Note: With RISC-V switching from LaTeX to ASCIIdoc, the tool you would probably want to use is the Sail-to-ASCIIdoc tool.]

Using Sail in tools

Why do we have two executable parts to the specification: the Sail specification and the Spike golden reference simulator? The historical reasons for this are clear but it would be good if Spike support for future extensions was generated from their Sail specification.

This will take a bit of work to do. Mostly, this will consist of defining the interface that code generated from the Sail specs uses to access parts of the state that are represented in Spike.

Conclusion

The RISC-V architecture was developed in classic startup/academic style: innovating quickly and avoiding too much investment in long-term engineering. This has resulted in the current situation where the architecture specification is, in effect, scattered over four different artifacts and each downstream tool/library/application has to transcribe information from those sources instead of being able to use architect-provided machine-readable formats to generate the code instead.

This was not too big a problem for the original base architecture of just 47 instructions and 32 registers but there have been many RISC-V ISA extensions added since then.

This post was discussed on Hacker News.

Dagstuhl: Formal Methods for Correct Persistent Programming

2023-10-15T00:00:00+00:00

Last week, I boarded the train to Wadern, Germany (through London, Paris and Saarbrücken) to attend Dagstuhl seminar 23412 on “Formal Methods for Correct Persistent Programming”.

What is persistent programming?

Persistent programming is concerned with memory that retains its contents after a power loss. While magnetic disks and solid state disks satisfy that definition, the main focus was on non-volatile memory that provides fast access (not too much slower than DRAM) and fine granularity (notionally byte addressable but, in practice, at the granularity of a cache line (64 bytes)).

Non-volatile memory can be built using DRAM and a battery (and some non-volatile memories require the use of battery-backed caches for safety) but it is usually based on some more exotic technology such as memristors, spin, ferro-electric FETs, phase change, etc. These technologies promise to provide much higher capacity at much lower cost per gigabyte and much lower power consumption. Sadly, for the field, the most prominent non-volatile memory product was shut down last summer but we did not feel that this was the end of persistent systems.

What makes persistent programming hard?

There seem to be two main issues in persistent programming. The first is that your code has to be able to cope with a power failure at any moment so every write to non-volatile memory must take care that

Any data that it refers to is also in non-volatile memory because, otherwise, that data would be lost if the power failed.
It is possible to recover from a power failure after every single write. Your data structure in non-volatile memory must never be in an inconsistent state.

These properties are checked with type systems and/or formal verification of the software.

The second, and more challenging problem, is that, for performance reasons, memory systems reorder memory reads and writes so even if you write to address ‘x’ then address ‘y’ and then print “It is now safe to turn your computer off” on the screen, those three steps could occur in the reverse order. That is, all modern computer systems provide some form of “weak memory consistency.” Obviously, this reordering makes it harder to ensure that the data structure in the non-volatile memory is always in a recoverable state.

To avoid problems with weak memory, you need to add memory fences and flush cache contents out to the non-volatile memory but, if you do this too much, performance will suffer. So the trick is to add the bare minimum of fences and flushes to ensure that your data is safe. Unsurprisingly, this is what most of our discussion centered on with questions like

What exactly do the fence and flush instructions guarantee? The more precisely we understand what they do, the closer we can get to the edge (without stepping over).

Note that different ISAs have different properties so sometimes you need different code for different systems to make it simultaneously fast and sound.
How can we confirm that our understanding of the memory system is correct? Power cycling is too slow to test exhaustively by repeatedly turning off the power and checking that the memory got the values that we expect!
How should we formalize our understanding of the memory system’s guarantees? The most popular approach seemed to use the “cat” declarative notation that is supported by the “herd” tools and others.
Is one memory system stronger than another? A program written for one memory system should also work for any stronger memory system.
Given an algorithm or a C program or a machine-code program, can we formally verify that it contains enough fences and flushes to ensure that we can always recover from a power failure.

Lots of fun stuff to discuss!

What else was discussed?

A related, but subtly different, topic is intermittent computing. This is concerned with systems that use energy harvesting such as solar power, vibrations, RF power, etc. instead of having mains power or a long-life battery. The challenge here is that you may only have a few milliseconds of power before it disappears with no warning. To deal with this, your program must contain frequent ‘checkpoints’ where the state of the system is saved to non-volatile memory and the code between checkpoints must be idempotent because that code will be re-executed when power is restored. (There are usually no weak memory effects to worry about because embedded systems are designed with a different set of performance constraints.)

And, of course, you don’t need special hardware to get persistence: your filesystem provides a reliable way to save the state of your system to disk. So it’s possible to create interesting persistent applications and tools that anybody can run.

What is Dagstuhl?

To quote the Dagstuhl website, “Schloss Dagstuhl, the Leibniz Center for Informatics was originally founded in 1990 to provide a retreat for world class research and training in computer science.” In practice, this means that 30 or so researchers travel to a place out in the Saarland countryside that provides the perfect environment for focusing on a topic and meeting other researchers working in the field. A few things that stand out are the excellent conference facilities; the many smaller meeting rooms in case a group want to split off for a more focused discussion; the kitchen staff randomly assigning seats at lunch and dinner; and the potential for walks in the surrounding countryside or billiards in the games room.

Dagstuhl is well known among researchers in continental Europe but, even if you have never heard of Dagstuhl, you have probably heard of DBLP the CS publication database.

During the week, we shared Dagstuhl with a seminar on Accountable Software Systems. In the mornings and evenings we would talk to those researchers about their research and how their seminar was going. There were some similarities to our group (some of them also work on formal methods) but they were a much more diverse group with lawyers, control theorists, mathematicians, etc. so they were working hard to build a shared understanding of what accountability is while we largely understood the problems and were working on solutions. A very different meeting!

PLDI 2023

2023-07-16T00:00:00+00:00

I was long overdue to attend a conference in person: the last conference I attended was POPL 2020. This year’s PLDI was in Orlando, Florida at the height of a hot, humid summer in the largest conference center I have ever seen (we did a lot of walking within the conference center). PLDI was co-located with other conferences including the hardware conference ISCA as part of the Federated Computing Research Conference (FCRC) and I dropped in on a few of the ISCA talks.

There were two tracks so I didn’t attend all the talks but some of the themes in the talks I did attend were as follows (there were lots of other great papers in the program).

Sparse matrix multiply, Tensors, Transformers and Machine learning. In particular, ways of generating and combining high performance implementations. bansal:pldi:2023
Formally verified software, and fast crypto implementations kuepper:pldi:2023
Extending and using SMT solvers wang:pldi:2023
Quantum computing chen:pldi:2023
Undefined Behaviour in C isemann:pldi:2023
Fuzzing and testing livinskii:pldi:2023
Information leakage, non-interference and Approximate model counting saha:pldi:2023
Memory management, and parallelism arora:pldi:2023
Defunctionalization brandon:pldi:2023
Synthesis guria:pldi:2023
Separation logic liu:pldi:2023
Type systems nigam:pldi:2023

In other words, this is still PLDI.

One of the most interesting talks was about a type system inspired by incorrectness logic used to reason about whether a random test generator was theoretically capable of generating all possible input values zhou:pldi:2023. Just as incorrectness logic turns Hoare logic on its head, so their type system turns normal typing rules on their head.

And I dropped in on my colleague Sam Coward’s E-graphs talk about his extensions to E-graphs to support the use of egg to generate efficient floating point hardware. (See coward:arxiv:2023 for the related paper.)

I also enjoyed the Plenary Talks I attended

“Computing in the foundation model era” by Kunle Olukotun, Stanford University link

This was about designing hardware for constructing large language models and how his students’ research has fed into a startup that is creating chips (and racks) for machine learning.
“Constructing & deconstructing trust: Employing cryptographic recipe in the ML domain” by Shafi Goldwasser, University of California Berkeley link

This was about taking all the understanding of threat models, attacks, etc. developed by the cryptographic community and using that to think about the known attacks and potential vulnerabilities in the creation and use of machine learning models.

But, really, I went to catch up with colleagues that I had not seen for several years; to meet researchers whose work I knew but had never met; and for the random meetings, dinners and introductions that happen in the “corridor track”.

PLARCH 2023

2023-07-01T00:00:00+00:00

A couple of weeks ago, I attended PLARCH 2023: a new workshop about the intersection between Programming Languages and Computer Architecture. There was a lot of interest in attending and speaking at the workshop so the program consisted of a lot of short talks with group discussions in between.

So what topics are at the intersection of programming languages and hardware? Looking through the program, you can see the following

Hardware specifications and things you can do with them
Microarchitectural side channels
Domain specific languages
Machine learning - with a strong emphasis on sparse models
Formally verified hardware
Weak memory models
Model checking hardware
Using E-graphs to generate efficient hardware
A better hardware design language than Verilog
The CHERI capability architecture
Coping with hardware wearout
Software synthesis
ChatGPT

Overall, I would say that the topics, speakers and audience was more PL people than hardware people - but there was a bit of a mix.

There were a lot of great talks but some of the highlights for me were:

The growth of formal verification of hardware and software at Sandia National Labs (a very familiar story of overcoming preconceptions and technical challenges about formal verification while building up the team and tool capability)
“Non-Newtonian hardware design for longevity” (co-designing hardware-software to anticipate and respond to hardware wearout)
“Fearless hardware design” is about a type system for reasoning about timing in hardware design (see the PLDI paper)

And, while I admit to a degree of personal bias in this selection, I also enjoyed

the Silver Oak hardware-software co-verification project (lead by my friend Satnam Singh)
the E-graph based tool for creating efficient floating point hardware (by colleagues at Intel)
my talk about multi-use ISA specifications

But, of course the real highlight is talking to people during the breaks.

This inaugural workshop was organized to take advantage of the co-location of PLDI and ISCA — but I hope it becomes an annual event.

Modularizing ISA specifications

2023-02-21T00:00:00+00:00

Programming languages provide modules as a way of splitting large programs up into small, separate pieces. Modules enable information hiding that prevents one part of the program from using and becoming dependent on some internal detail of how other parts of the program are implemented. Almost every major language designed in the last 50 years has some form of module system.

Specifications of modern ISAs weigh in at around 50,000 – 100,000 lines of specification and yet, despite that, they are specified in languages (ASL and SAIL) that do not have a module system. Why not? When is that a problem? And, most importantly, what can we do about it?

Why ISA specs don’t use modules

Early in the process of turning Arm’s pseudocode language into a language that can be parsed, typechecked and executed, I was keen to add a module system. We added qualified names like “AArch64.TakeException” but we never introduced module boundaries or any other form of information hiding mechanisms.

The problem with information hiding is that it is a poor match for the primary way that ISA specifications are used. Their original use and still their most important use is as a PDF document. ISA specifications weigh in at around 11,000 pages (for the Arm architecture) or 5,000 pages (for the Intel architecture). At this scale, the usual way to find information is by searching or using the index. But, when you navigate the specification this way, you are just jumping from one page to another and you have little idea what chapter of the document you are in and you therefore have little idea what module you are looking at. So any module structure that may be present in the specification does nothing to help the reader understand the specification and may even cause confusion if the meaning of part of the specification depends on knowing what module a given part of the specification is in.

So, rather to my surprise, I concluded that modules would not help us write better, more easily understood specifications.

When would modules be useful

There is one other important aspect of modules that I did not mention above: reuse. Module systems promote reuse by splitting a large, monolithic system into pieces with clearly defined interfaces.

All module systems clearly define the exports of a module: making it clear what parts of the module are available for use outside the module. The better module systems also clearly define the imports of a module: making it possible to use the code in radically different contexts. A good example of this is Matthew Flatt’s “units” [flatt:pldi:1998] where each “unit” defines a set of imports and a set of exports and units can be connected to any other unit provided that it has the right interfaces.

I first saw the power of this in operating system research. Mike Jones [jones:sosp:1993] extending operating systems by “interposing” on OS interfaces; and Edoardo Biagioni [biagioni:sigcomm:1994] creating flexible, layered network stacks using Standard ML’s module system to define each layer in the stack as a separate module that can be stacked on top of any other module. The ultimate in this was Bryan Ford’s “Microkernels meet recursive virtual machines” [ford:sosp:1996] that let you build a range of operating systems supporting features like isolation, process recovery and migration, etc. out of a form of module system. (This same idea of building an OS out of many small modules is also the foundation of CertiKOS [gu:osdi:2016].)

In previous articles about Uses for ISA specifications and Machine readable specifications at scale, I have emphasized the importance of being able to use the specifications in many different ways: as documentation, to verify hardware, to build simulators, etc. A modular ISA specification would make it significantly easier to achieve this high level of reuse in different applications.

For example, in the “ISA-Formal” method for formally verifying Arm processors we needed specifications of instructions. We didn’t want other parts of the architecture spec such as virtual memory, taking exceptions, instruction fetch, IEEE floating point, etc. (See “Verifying against the official ARM specification” and my paper [reid:cav:2016] for details of the ISA-Formal method.) So what we really wanted was to split the ISA specification into separate modules for each instruction, for the virtual memory system, for exceptions, for instruction fetch, etc.

Having a modular ISA specification would also make it easier to integrate the specification into simulators. For example, suppose you want to extend an existing simulator with some new instructions. You don’t want to add all of the instructions in your ISA spec (because you already have those in the simulator); and you don’t want to add a new memory hierarchy (because you want the new instructions to access the same memory as the old instructions). If the ISA spec was modular (and the module boundaries were in roughly the right place) then you could easily grab just the new instructions that you want and ignore all the rest of the specification.

How to modularize an operating system

Given a non-modular ISA specification, how can we split it into a number of modules? I tackled a similar problem when I was at the University of Utah. The Flux OSKit [ford:sosp:1997] was based on the idea that operating systems like Linux and FreeBSD would be more useful to researchers if they were composed of modules that were designed for reuse. The original OSKit used COM to define module interfaces but this was a bit heavyweight and awkward so programmers reacted by creating relatively few module interfaces to avoid excessive performance and programming overhead. My variation of the OSKit (a system called “Knit”) adapted Flatt’s “units” to the C programming language to create a much lighter weight system that encouraged the creation of very small modules because there was no performance overhead and very little programmer overhead. One of the big demonstrations in the paper [reid:osdi:2000] was a network router where each component typically consisted of 5–10 lines of code.

The key idea behind the Knit system was that we could re-modularize a monolithic system by parsing the entire system, constructing the function call graph and then discarding those parts of the system that did not belong in a particular module. That is, I automated what programmers normally do when we ask them to break a large system into modules.

To extract a module from a big system, the programmer defines an initial set of exports and imports for that module. The Knit tool uses the callgraph to find all the code that is reachable from the exports without going through the imports. The first run of the Knit tool usually reveals a dependency that the programmer has forgotten about and the modularization tool pulls in far too much code. The programmer then refines the import list and checks whether the resulting module is closer to what they want. After a few iterations, they have something pretty close to what they want.

Once we had split our OS into modules, Eric Eide [eide:icse:2002] used a variation on Mike Jones’s interposition trick to achieve something like Aspect Oriented Programming.

How to modularize an ISA specification

Given that background, it should be no surprise that I am using the same basic idea to slice up the monolithic ASL specifications into modules. That is, I define module boundaries by listing the imports and exports of each module and then create the module by discarding everything that is outside of those boundaries.

Since ISA specifications are large and the various implementations and uses are also large (processors, verification IP, simulators, etc.), it is important that the whole process works at scale. To make it easy to automate the process of defining module boundaries as far as possible, I define the list of imports and exports in JSON files so that the module interfaces can be automatically generated.

The module boundaries themselves consist of several kinds of object. The main ones are functions, types, constants and variables. These need slightly different handling.

Functions are the easiest: they are usually just exported exactly as they are.

However, when transforming ASL/SAIL specifications to C, Verilog or SMT, it is useful to monomorphize ASL: turning a single polymorphic function into a family of monomorphic instances. For example, we might transform a polymorphic floating point function into separate instances for half, single and double precision by creating a 16-, 32- and 64-bit instance of the function. Depending on the application, it might be useful to export the original polymorphic function or some of the monomorphic instances. (See Formal validation of the Arm v8-M specification for detail on monomorphization.)
Types are also fairly straightforward. But sometimes a function mentions a type but does not really depend on the definition of the type. In this case, we could potentially import the type abstractly which would allow the module to be used with other types that provide the same interface.
Constants can be imported/exported exactly as they are. But, as with types, it can be useful to abstract them so that modules that import them can be instantiated with different values for the constants.
The most tricky are variables (which typically represent registers). Instead of importing / exporting the variable directly, it is helpful to introduce a pair of functions: one to read the variable and one to write to the variable. Changing the interface to involve these access functions makes it easier to adapt how the specification is used. For example, in verification applications it is often useful to record whether a variable was written to by an execution. Or, if extending a simulator with some extra instructions, if a new instruction accesses the processor state, we want it to call a function in the original simulator when it performs that access.

So, when importing/exporting a variable, it can be useful to create access functions to read/write the variable and transform all references to the variable into a call to the appropriate access function.

Fortunately, ASL specs tend to already use access functions to access variables because most registers need some special access code to implement banking, masking, access checks, etc. So, for ASL specs, there is often not much that has to be done.

Each use of the specification turns out to need a slightly different module interface.

Summary

Module systems are a poor match for the way that ISA specifications are published (embedded in a document that is 1000s of pages long). But they are great for enabling reuse of different parts of the specification.

Many of the different applications of ISA specifications are based on extracting the parts of the specification that are needed by that application. That is, by dividing the large, monolithic ISA spec up into a number of smaller modules in a way that is better suited to the application on hand.

[We often don’t really know what we are doing until after we have finished doing it and try to explain it to others. I have been slicing specifications up in the way I describe above for almost a decade now and I have usually called it “callgraph surgery.” It was only when I tried explaining the technique to my colleagues that I realized that what I was really doing was introducing module boundaries in the middle of the specification, and that I made the connection to the work that I did on modular operating systems back in the late ’90s, and wrote this article.]

Machine readable specifications at scale

2022-01-25T00:00:00+00:00

There are lots of potential uses for machine readable specifications so you would think that every major real world artifact like long-lived hardware and software systems, protocols, languages, etc. would have a formal specification that is used by all teams extending the design, creating new implementations, testing/verifying the system, verifying code that uses the system, doing security analyses or any of the other potential uses. But, in practice, this is usually not true: most real world systems do not have a well tested, up to date, machine readable specification.

This article is about why you might want to change this and some things to consider as you go about it. In particular, it is about the use and creation of machine readable specifications at scale: when the number of engineers affected is counted in the thousands. This sort of scale leads to different problems and solutions than you would see in a 5–10 person project and both the challenges and the potential benefits are significantly larger.

The main things I want to explain are

Why your specification language should be weak and inexpressive
Don’t expect that the specification to meet 100% of the the needs of any individual team but aim to meet almost all the needs of almost all of the teams.
That specifications of long lived real world systems need clear, precise mechanisms to be explicitly vague and imprecise.
Specifications must be great documentation: even as we enable the use of specifications in tools, the primary purpose of a specification is still communication between human beings.
We create a specification by distilling multiple existing “sources of truth” into a single specification and by validating the result against all of the sources.
Try to create a virtuous cycle where every additional user increases the quality of the specification and this, in turn, attracts more users.
The downside of machine readable specifications.

The benefits of machine readable specifications

Given all the potential uses of specifications, the most obvious benefit that we hope for is higher quality implementations of the specification. This is especially important when developing systems where bugs have a major impact on users and fixing them is hard or expensive. Depending on when the problem is found, it can delay release (affecting time to market), require “respins” (having to manufacture the product again), or affect reputation. Areas with these issues include chip design and building secure systems.
Important real world systems have lots of implementations: earlier versions going back decades in time, different teams in a single company creating independent implementations for different market niches, and teams at different companies creating competing implementations. All of these implementations have to be consistent with each other: users depend on backwards compatibility, on compatibility across implementations and on compatibility between the products of different companies.
If designers start with a low quality or ambiguous specification, problems will be found late during development by the validation team. The later the problems are found, the more impact it has on the release date which reduces how much of the market that product can capture and also increases the engineering costs for that product and delays when the team can start building the next product.
Most importantly, machine readable specifications improve communication within the company, with any surrounding ecosystem (e.g., compiler writers, OS teams, or malware analysis tools), and with users. Real world systems are regularly extended with new features, the specifications often change in the process of developing the first implementation and those changes have to be communicated to all the other teams adding support for the new features. While “diffing” a paper document will catch some of those changes, automated uses of machine readable specifications let teams automatically update everything that they generate from the specifications.

What makes a good machine readable specification?

The benefits of having machine readable specifications come from having many different users. Those users all use the specifications in different ways: architects test the spec as they are building it; RTL designers read the specification to check what they have to build; verification teams use the spec as a golden reference or test oracle; security analysts verify security properties of the specification; simulation engineers transform the specifications into simulators or feed them to JIT engines for really high performance simulators; anti-malware teams turn the specification into tools for symbolically simulating the specification; technical communication teams turn the specification into accessible, readable, well structured, accessible documentation for users of the system; etc.

And those different users all need to use the specification in a different way: transforming it into specifications for their favourite formal verification tool; transforming the specification into C; transforming the specification into Verilog; generating tests from the specification; measuring coverage of their tests; performing forwards, backwards or bidirectional static analyses; performing information flow analyses to find “surprising” information flows; and there are many, many readers of the documentation from a variety of backgrounds: hardware engineers, compiler engineers, OS engineers, security engineers, etc.

Should specification languages be rich, powerful and concise?

This diversity of uses has a surprising impact on the choice of specification notation

Weak, inexpressive languages are significantly better for writing specifications than rich, powerful languages.

The reason for this counterintuitive claim is that we need to be able to convert the specification into many different languages to meet the needs of all the different users. The more features the specification language has, the more likely we are to hit problems during this conversion process and the more likely it is that the result will be buggy, slow or inconvenient to work with.

Moreover, the more powerful a language is, the more likely that it will confuse readers of the specification. Higher order functions, monads, polymorphism, overloaded operators, and the like make the specification more concise and readable to those familiar with those features but they cause confusion and risk misinterpretation for everyone else. If readers of the specification need to read more than a few pages of documentation for the specification language before they can read the specification then we are in trouble. We need all readers to be able to easily understand the specification and to arrive at the same interpretation of the specification as every other reader of the specification. And if readers of the specification ever need to consult a “language lawyer” to distinguish between two interpretations, then we have failed. (Note that I am specifically talking about readers of the specification. There are many more readers than there are writers and it is reasonable to demand a bit more of the architects who are specifying the system than of all the people who have to implement, verify, analyze and use the system.)

I often say that the best specification language is a table. It is usually obvious how to read a table; to check whether the table is complete (are all the boxes filled in); how to invert a table (reading it as a mapping from outputs to inputs); and how to convert the table to a range of different languages (e.g., as an array, as an if-statement or as a case-statement.) After that, restricted languages like finite state machines (ideally specified using a table!) are great. And there are various special purpose notations to describe things like instruction encodings, register fields, packet formats, etc. that capture the intended meaning in a high level way that is easy to explain, easy to understand and easy to use in a variety of different ways. You should only (reluctantly!) fall back on something like a programming language if you need to specify something that is quite irregular.

Which use case should be prioritized?

Another important consequence of having many users using the specification in different ways is that

Specifications (and the languages that they are written in) are, necessarily, a compromise between the needs of all the different users. Every individual use would be better served by a different specification in a different specification language.

For example, if I’m verifying hardware with the Coq theorem prover, a commercial bounded model checker for Verilog, or the Forte symbolic simulation system, it would be better to write the specification in Gallina, in System Verilog or in reFLect respectively. Or if I’m writing a simulator, it would be better to write the specification in C or C++. And so on for every other use. The only problem is that while this would be the perfect specification for one team, it would be hard or impossible to use by most other teams. Instead, we aim for a good engineering compromise: one that delivers almost all of the benefit to almost every team.

A consequence of this is that although we aim to use 100% of the specification to automate some task that was previously done manually, we should accept that we may only manage to automate 97–99% of the task. For example, it is hard to write a specification of floating point operations that is simultaneously clear and easy to read and also a good reference for automatic formal verification of a high performance floating point unit or that will give adequate performance in a simulator. In situations like these, where the most readable possible specification is not the best for other uses, we have two choices. One choice is that we might write two versions of the specification: one that we want to publish in documentation and one that we want to execute, verify against or whatever. The second choice is to hand-write code for the 1–3% of the task that we cannot automate. In both cases, we will want to thoroughly test or formally verify that the two versions are equivalent to each other. And we will hope that this awkward 1–3% does not take too much effort!

Although it is not strictly necessary, I think that it is much easier to find the right notation if the specification is created by a team with a central role in the company such as a research team. The specification could reasonably be built by the team designing new architecture, by the team writing documentation, or by the team verifying implementations but each team will be tempted to write a specification that perfectly suits their needs instead of a specification that is sufficiently good for all the potential users of the specification. (Once the initial specification has been created and a diverse set of users are using the specification and asking for (potentially inconsistent) improvements, you will probably want to setup a committee to refine and extend the compromise between their different needs.)

Explicit underspecification

Different implementations of real world systems have slightly different behaviour from each other. Later implementations may use opcodes or register fields that were reserved in early versions of the specifications; implementations targeting different markets may have different sets of features, different cache sizes, etc.; and implementations may have accidental variations for a variety of reasons. If a specification is to be useful to users of the system, it needs to describe most of the implementations that the users are likely to come across: those implemented 10 years ago; those that will be implemented 10 years in the future; perhapse even those implemented by rival companies.

It is therefore important to have ways to underspecify the system: leaving room for some of this variation between implementations. At the same time though, it is unfortunately extremely easy to accidentally write a specification that is much looser than intended and therefore fails to give users the information that they need to use the system correctly, securely and efficiently and that gives implementations far more room for variety than is helpful.

We need mechanisms for *explicit underspecification: that allow us to be extremely clear about which parts of the specification are left deliberately vague.

Being explicit about underspecification makes it possible to write tools that warn when the specification is being accidentally vague.

This explicit underspecification is another example of the necessary compromise between different uses of the specification. For programmers using the system, the underspecification makes it easier to write portable code. But, validation teams that are checking a particular implementation will want to check that the implementers built what they intend to build. Validation teams will therefore want to be able to dial in the particular choices made by the design team.

Multiple views of documentation

Specifications must be great documentation since the overall goal is to aid communication between many different teams of engineers. To be able to build and validate tools that use the specification formally, engineers first need to be able to understand the specification informally. And, of course, many engineers will just want to read the specification and do not want to automate their flow.

It is important to support multiple views of the specification such as

Only show features that are supported by versions before or after some version number or by some particular product.
Show a simplified version corresponding to how the system behaves in some specific context such as user-level execution.
Where there are two alternative (allegedly equivalent) versions of a piece of the specification, select which one to show.
Only show features that have been publicly announced or are allowed under a specific non-disclosure agreement (NDA).
To comply with export control regulations it may be necessary to omit parts of the specification.

These different views might be generated simply by omitting instructions that are not relevant or, if the specification uses “feature tests” like “HaveFeatureX() -> bool” to test for a feature, then setting the feature to FALSE and running a dead code elimination pass can produce a simplified view.

However, where the purpose of the modified view is to keep information private, we might want to be doubly sure that the information cannot accidentally leak out. For example, we might use separate git branches or repositories for released and unreleased architecture specs with a “gatekeeper” carefully checking all merges from the unreleased branch. There will probably be multiple gatekeepers: senior architects deciding which extensions are ready to be implemented, product managers deciding which extensions to put in the product they manage, senior documentation engineers deciding when extensions should be incorporated into the master documentation.

How do we create specifications?

When working on real world systems, the problem is not that there is no source of truth but that there are so many of them. For example, there are the actual products, simulators, test suites, documentation, etc. In a large organization there might be 10 or more such artifacts with each one capturing some important aspects of the system but none meeting all users needs.

The task of creating a specification is to distill all the existing sources of truth down into a single authoritative specification that is higher quality than any other source of truth.

There are two ways to take advantage of an existing source of truth.

Every significant product (e.g., processors, simulators, etc.) will have a large, thorough testsuite and may have a formal verification story. This is incredibly valuable because a good testsuite lets you check your specification. The “easiest” way to use this is to write a tool to convert the specification into code that can be inserted into a simulator.
In a few cases, we can semi-automatically translate existing simulators or specifications to our chosen specification language. The goal is probably not to get a fully working specification but just to reduce the manual effort of transcribing the specification into the specification language.

If we can also generate a simulator from the result, then we can “round trip” the spec to check whether either conversion process broke anything.

It is inevitable that the initial specification and tools will not be very good. It will be a bit buggy, it will be incomplete and the tools will generate relatively slow buggy simulators. However, once it becomes good enough that it is useful to one (relatively forgiving) team, that team will start to find issues and will start to report and fix the bugs. As problems are fixed, the quality of the specification and tools will improve to the point where a second wave of users are able to use the specification. These teams are typically more sensitive to quality problems and they will find new bugs that were unlikely to affect the early adopters. Fixing this second wave of bugs will further improve the specification which will enable a third wave of users with even higher quality requirements.

Aim to create a “virtuous cycle” of users by looking for users with a broad variety of needs.

Initially, our goal is simply to catch up: just creating high quality specifications of what exists (or is due to appear in the next generation of products) and trying to automatically generate as much as possible of the existing tools and verification collateral. Once we have caught up though, it is important to sustain the effort. New architecture extensions should be developed using the specification as a design tool: taking advantage of the ability to easily add new extensions into simulator, to generate high quality documentation at an early stage, etc.

The downside of machine readable specifications

In many ways, having a high quality, authoritative, human readable, machine readable specification that is used by all major users of existing documentation is a no-brainer. It takes work to set it up and to meet everybody’s requirements and it takes some coordination effort to maintain it in a usable state for all users but, if you can do that, it saves a lot of effort, improves coordination and communication both within the company and outside and it leads to higher quality products.

However, there is one important disadvantage of removing all the redundant work that goes on at the moment: that redundancy sometimes catches mistakes in the specification. This is especially true because the engineers currently transcribing the documentation into Verilog, C++, test vectors, compilers, OSes, etc. are experts with many years of experience. As we automate the more repetitive parts of the task, we reduce the number of expert eyeballs looking at the specification and the number of different perspectives from which it is being viewed.

The cost of success is that you end up with “all your eggs in one basket.”

To overcome this, we need to re-introduce some redundancy (i.e., add another “basket”) but we want to do this in a way that is most likely to find problems. So, as the number of automated uses of the specification increases, it becomes important to start formally validating the specification itself: writing down properties that the current design and any extensions are expected to satisfy and creating tools to verify that they do, indeed, hold.

Code: ASLi
Code: MRA Tools
Paper: End-to-End Verification of ARM Processors with ISA-Formal, CAV 2016.
Paper: Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture, FMCAD 2016.
Paper: Who guards the guards? Formal Validation of the Arm v8-M Architecture Specification), OOPSLA 2017.
Verifying against the official ARM specification
Finding Bugs versus Proving Absence of Bugs
Limitations of ISA-Formal
ARM’s ASL Specification Language
ARM Releases Machine Readable Architecture Specification
Dissecting the ARM Machine Readable Architecture files
ASL Lexical Syntax
Arm v8.3 Machine Readable Specifications
Are Natural Language Specifications Useful?
Formal validation of the Arm v8-M specification
Bidirectional ARM Assembly Syntax Specifications
Talk: [How can you trust formally verified software (pdf)], Chaos Communication Congress, 2017.
Using ASLi with Arm’s v8.6-A ISA specification
What can you do with an ISA specification?

Joining Intel

2021-12-01T00:00:00+00:00

Today I am joining Intel Strategic CAD Labs to work on formal specifications. There’s not much to say so far because I’ve only just started but I think it’s going to be a lot of fun (and a lot of work). I figure that a lot of the first month or two will be spent figuring out what people want to do with formal specifications so, if you think that this will be relevant to you, please get in touch. I am equally interested in uses that are on my list of uses for ISA specifications and uses that I haven’t even thought of.

Farewell to Google

2021-11-30T00:00:00+00:00

I’ve spent the last couple of years working at Google Research. For the first 5–6 months, I was working in London in an office nestled between King’s Cross station and the iconic St Pancras station. This location was ideal for me because I was still living in Cambridge at the time and it was a very easy journey to either of these stations from Cambridge.

I started off working on privacy and security in operating systems. This led to fun experiments with UW’s impressive Serval that automatically verifies machine code and Bart Jacobs’ amazing VeriFast tool whose user interface transformed my thinking about auto-active verification.

After that, I lead the Rust verification project at Google: a project focused on usability of formal verification. I wrote surveys of the available Rust verification tools in 2020 and 2021; I used a risk based approach to research planning; I collaborated with some usability researchers on a paper about “making formal normal”; and I used KLEE to understand some of the infrastructure barriers to using verification tools on real Rust code. For example, I found and filled gaps in language feature support, library support, runtime/linker support, etc. (See the workshop presentation for details [pdf] [video].) With these extensions in hand, I was able to try to run verification tools on large, complex, real-world Rust code such as “Rust for Linux” [pdf] and the Rust rewrite of CoreUtils to understand the remaining issues.

For the last 2–3 months, I have been working on a hardware-software codesign project. These are always a lot of fun because you start with a huge design space and then gradually narrow down what parts of the task are best solved by the programmer, the compiler, the runtime and libraries, or the hardware. A critical part of these projects is about enabling and encouraging the right kind of collaboration between subteams with radically different expertise: hardware, software, tools, systems, etc. So I built a performance modeling tool to enable software and hardware engineers to find the right design compromise: discovering the performance bottlenecks in the code and exploring different microarchitecture and vectorization choices to overcome them.

I’ve learned a lot of new things over the last two years but I’ve decided that it is time to move on. So, today I bid my colleagues farewell and leave Google Research.

What can you do with an ISA specification?

2021-11-24T00:00:00+00:00

ISA specifications describe the behaviour of a processor: the instructions, memory protection, the privilege mechanisms, debug mechanisms, etc. The traditional form of an ISA specification is as a paper document but, as ISAs have grown, this has become unwieldy. More importantly though, there are more and more potential uses for machine readable, mechanized, executable ISA specifications.

Documentation
Architecture design
Verifying processor pipelines
Compilers
Software security
Verifying software

Documentation

The most obvious purpose of an ISA specification is as documentation. An early formal notation is Bell and Newell’s “Instruction Set Processor” (ISP) notation [bell:afips:1970] that was used to write specifications for 14 systems including the PDP-8, PDP-11 and CDC 6600. ISP followed in the Algol language tradition and is similar to the less formal pseudocode notations typically used in ISA definitions in the present day. ISP was used during design of the PDP-11 and included in the manufacturer’s processor handbook [pdp11:book:1973].

Here is a fragment of a specification of the CDC 6600 [bell:afips:1970] that shows how it compactly describes assembly syntax, instruction encoding and semantics.

An even earlier ISA specification was Falkoff et al.’s use of APL to describe the IBM System/360 [falkoff:ibm:1964]. However, given the novelty and unfamiliarity of APL at the time, it is not clear that the primary goal was documentation.

Architecture design

Although it is common for ISA specifications to be written or updated after the architecture has been designed or extended, a complete executable specification can be a useful aid to architects as they are developing or extending an architecture both by providing a clearer language (than natural language) for expressing their thoughts and by allowing architects to test that the changes behave as intended.

In addition, having the architects themselves write and test the changes to the specification simplifies the process of developing and maintaining ISA specifications and avoids the effort and errors associated with transcribing natural language documents and / or C++ simulators into some specification language.

Generating simulators

Shi’s Simlight simulator[shi:phd:2013, joloboff:dsetta:2015], was based on parsing Arm’s reference manuals, Fox’s MIPS specification [fox:itps:2015] written in L3 has been shown to boot FreeBSD, and my own work within Arm [reid:fmcad:2016] was later shown to be able to boot Linux. An especially interesting approach is the automatic generation of binary translations between architectures [bansal:osdi:2008].

Simulators vary significantly in performance: from around 10kHz (for an unoptimized interpreter modelling full address translation on every instruction fetch) to 500MHz for metatracing simulators like Pydgin [lockhart:ispass:2015].

Testing architecture design

ISA specifications are as prone to bugs as any other software of similar size and complexity. Fonseca et al.’s empirical study of the correctness of formally verified systems found bugs in specifications [fonseca:esc:2017]. So, before an ISA specification can be considered trustworthy, it must be tested or verified against an accurate oracle [barr:tse:2015] (usually hardware).

Different specification development efforts vary significantly in how much testing they perform: from a few 10s of tests per instruction to executing billions of instructions and booting OSes [fox:itp:2010, goel:fmcad:2014, flur:popl:2016, shi:phd:2013, reid:fmcad:2016, armstrong:popl:2019].

Formal verification of a processor against a specification [hunt:jar:1989, fox:ucam:2002, reid:cav:2016] has the desirable side-effect of detecting bugs in the specification and ensuring compatibility.

Automatic generation of test cases

Building a good testsuite is very laborious and error-prone but some of the effort can be avoided by automatically generating test cases. [martignoni:asplos:2012, godefroid:acmq:2012, campbell:fmics:2014].

In my experience though, most commonly used test generation techniques focus on achieving consistent levels of control-coverage (i.e., they focus on control flow graphs) and they are relatively weak at achieving consistent value coverage. In my adaptation of the concolic testcase generation technique described in [martignoni:asplos:2012], I was happy with the tests generated for instructions that set condition flags (e.g., ADD). Unfortunately, for instructions with just one control path such as signed multiply (SMUL), just one test would be generated when even the weakest hand-written testsuite would test for all combinations of positive, zero, and negative operands and would test for various overflow conditions. I feel that we still have more to learn here.

Verifying architecture design

We can use testing and verification to check that a specification matches existing implementations of an ISA. But we hit a chicken-and-egg situation when we want to check extensions to the specification: testsuites and processors are created after the specification is written so they cannot be used to test the specification as it is being written.

The solution used in [reid:oopsla:2017] and [bauereiss:ucam:2021] is to identify and formally verify important properties that the architecture must satisfy if it is to achieve its purpose and is not to break existing properties of the architecture. Often, the most important things to verify are security properties.

Verifying processor pipelines

With processor complexity rising (an inevitable result of both commercial pressures and the end of Moore’s law), formal verification of processors is increasingly important. Some processors that have been formally verified against their ISA specification include FM8502 [hunt:jar:1989], ARM6 [fox:ucam:2002], DLX [beyer:ijsttt:2006], five Arm processors [reid:cav:2016], Y86-64 [bryant:cmu:2018], Silver [loow:pldi:2019], and x86 [goel:spisa:2019, goel:cpp:2020, goel:cav:2021].

Compilers

Compiler generators

Barbacci developed Bell and Newell’s ISP notation into a machine readable notation “Instruction Set Processor Semantics” (ISPS) [barbacci2:computer:1973, barbacci:ieee:1981] that targets compiler-related uses such as the automatic derivation of compiler code generators by Fraser [fraser:sigart:1977] that used ISP specifications of the IBM-360 and PDP-10 and Cattell [cattell:toplas:1980] that used ISPS specifications of the IBM-360, PDP-8, PDP-10, PDP-11, Intel 8080, and Motorola 6800.

This topic seems to have largely died off until instruction selection in SLED [dias:popl:2010].

Discovery, verification and synthesis of peephole optimisations

One part of compilation that is especially well suited to automation is the discovery / generation of peephole optimizations using “superoptimization” [massalin:asplos:1987] (an exhaustive search). For example, Bansal’s superoptimizer [bansal:asplos:2006], Denali [joshi:pldi:2002], and Souper [mukherjee:oopsla:2020]. Where peephole optimizations are discovered and implemented manually, tools like Alive [lopes:pldi:2015] can be used to verify that the optimizations are correct.

Verifying compilers

Some of the earliest uses of formal semantics were for automatic reasoning about programs such as Samet’s development of Translation Validation [samet:ieeetse:1977, samet:phd:1975] (later reinvented and refined by Pnuelli [pnueli:tacas:1998] and Necula [necula:popl:1997]).

Verifying a simple “compiler” is now often part of masters / doctoral - level courses on using interactive theorem provers, some more complete compiler verifications include

CompCert C compiler [leroy:cacm:2009],
LISP compiler [myreen:tphols:2009],
JIT compiler for LISP [myreen:popl:2010],
Milawi theorem prover [myreen:itp:2011],
CakeML compiler [kumar:popl:2014, fox:cpp:2017, tan:icfp:2016], and
LLVM C compiler [lopes:pldi:2021].

Software security

Both “white-hat” and “black-hat” security engineers analyze binaries to find vulnerabilities and to construct signatures for detecting malware.

Some of the binary analysis tools used are

The Mayhem automatic exploit generation tool [cha:sandp:2012] based on CMU’s BAP binary analysis platform [brumley:cav:2011].
The angr tool [shoshitaishvili:sp:2016] that uses valgrind to convert binary code to the VEX intermediate representation that is then symbolically executed.
The McSema binary lifter that uses the Remill library and IDA Pro to convert (“lift”) binary machine code into LLVM code.
The Binsec/Rel relational symbolic execution that checks that binary code is constant time [daniel:sandp:2020].
[mycroft:esop:1999] and [noonan:pldi:2016] describe how to decompile machine code to higher-level languages.
The Serval symbolic evaluation tool is used to verify machine code microkernels, LLVM code and Linux BPF code [nelson:sosp:2019].
[dasgupta:pldi:2020] and [hendrix:itp:2019] describe the generation and checking of binary lifters.
[regehr:emsoft:2003, regehr:asplos:2004, regehr:lctes:2006] describe the automatic generation and use of abstract transfer functions to analyze binary programs.
[shoshitaishvili:sp:2016] is a recent(ish) survey of offensive techniques.

With the exception of [dasgupta:pldi:2020], none of these currently use formal ISA specifications. However, as the arms race between attackers and defenders hots up, there is an increasing need for the completeness and trustworthiness of a full formal ISA spec.

Verifying software

Last but not least is the use of ISA specifications is to verify software.

Windows hypervisor code [maus:amast:2008].
Hoare logic for machine code[myreen:fse:2007].
Separation logic for machine code [jensen:popl:2013].
Generating abstract interpreters for machine code [lim:toplas:2013].
Parts of a operating system [goel:fmcad:2014, goel:phd:2016].
The SeL4 capability-based microkernel[klein:sosp:2009, sewell:pldi:2013].
A tiny Arm hypervisor [dam:trusted:2013, baumann:eucnc:2016].
The Vale tool for writing verified assembly language [bond:usenix:2017].
Verification in the presence of TLBs[syeda:itp:2018] (see also [simner:pls:2020]).
A symbolically executable hypervisor [nordholz:eurosys:2020].
The Isla symbolic execution tool based on SAIL [armstrong:cav:2021].
The SeKVM hypervisor [li:usenix:2021].
An Arm hypervisor[tao:sosp:2021].
Proving LTL properties of binaries [liu:arxiv:2021].
Generating instruction encoders/decoders [xu:cav:2021].

Summarizing 12 months of reading papers (2021)

2021-10-06T00:00:00+00:00

Last year, I wrote about the 122 papers that I read in my first year at Google and that I summarize on the RelatedWork site. Over the last 18 months or so, I’ve spent a lot less time doing the one hour train commute between Cambridge and London so I only read 59 papers in the last year and added 188 papers to the backlog of unread papers. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers.

[Note that the main motivation for writing these paper summaries is to help me learn new research fields. So these summaries are usually not high quality reviews by an expert in the field but, instead, me trying to make sense of a barrage of terminology, notation and ideas as I try to figure out whether the paper is useful to my work. You should write your own summaries of any of the papers I list that sound interesting.]

Zettelkasten

I have found that a great way of organizing my thoughts about papers is as a Zettelkasten. The basic structuring idea is based on creating links between papers and concepts. Every time I come across a new concept in a paper, I create a new page about it and link the paper to that page. Each paper, concept or link added to the Zettelkasten refines my understanding of the research field and that understanding is (partly) captured in the links between concepts, between papers and between papers and concepts. And since every page has back-references to the pages that link to it, I can easily find related papers that have different views of a concept or that improve upon an idea.

Last year, I had only just adopted the Zettelkasten concept and I was often not adding new concept pages until after I came across the concept a second time. This year, I have tried to be more aggressive about adding new concepts. This has turned out to be much easier when I am reading papers in a completely new field because my ignorance makes it easier to spot new concepts. For example, when when I started reading about machine learning every page I read had a bunch of new acronyms like RNN, CNN, ReLU or unfamiliar terms like Softmax, Activation or Attention and I created pages for each of these concepts, looked them up, linked to the wikipedia page (or similar) and linked the current and later papers to the concept.

[I wrote a lot more about Zettelkasten and the tools that support it last year]

Rust and verification

I spent most of the year working on the Rust verification project at Google so, unsurprisingly, many of the papers are about the Rust language with a bit of an emphasis around Rust unsafe code.

Rust language
- Engineering the Servo web browser engine using Rust [anderson:icse:2016]
Rust unsafe code
- Safe systems programming in Rust: The promise and the challenge [jung:cacm:2021]
- Understanding memory and thread safety practices and issues in real-world Rust programs [qin:pldi:2020]
- How do programmers use unsafe Rust? [astrauskas:oopsla:2020]
- Is Rust used safely by software developers? [evans:icse:2020]
Phantom types
- GhostCell: Separating permissions from data in Rust [yanovski:unknown:2021]
- Phantom types and subtyping [fluet:jfp:2006]

Also, some pre-Rust papers that these papers build on.

Dependent types for low-level programming [condit:esop:2007]
Quantifying the performance of garbage collection vs. explicit memory management [hertz:oopsla:2005]
The meaning of memory safety [azevedo:post:2018]
Checking type safety of foreign function calls [furr:pldi:2005]

Symbolic execution, verification and testing

The Rust verification project focused on creating a continuum of verification techniques and tools including fuzz-testing, concolic execution, symbolic execution, bounded model checking and abstract interpretation.

Performance and efficiency
- QSYM: A practical concolic execution engine tailored for hybrid fuzzing [yun:usenix:2018]
- Symbolic execution with SymCC: Don’t interpret, compile! [poeplau:usenix:2020]
- TracerX: Dynamic symbolic execution with interpolation [jaffar:arxiv:2020]
- Efficient state merging in symbolic execution [kuznetsov:pldi:2012]
- Evaluating manual intervention to address the challenges of bug finding with KLEE [galea:arxiv:2018]
- Finding code that explodes under symbolic evaluation [bornholt:oopsla:2018]
- Chopped symbolic execution [trabish:icse:2018]
Hybrid fuzzing
- SAVIOR: Towards bug-driven hybrid testing [chen:sp:2020]
Symbolic memory
- Rethinking pointer reasoning in symbolic execution [coppa:ase:2017]
- A segmented memory model for symbolic execution [kapus:fse:2019]
Lazy initialization
- Generalized symbolic execution for model checking and testing [khurshid:tacas:2003]
- Scalable error detection using boolean satisfiability [xie:popl:2005]
- Under-constrained symbolic execution: Correctness checking for real code [ramos:sec:2015]
- Under-constrained execution: Making automatic code destruction easy and scalable [engler:issta:2007]
- Practical, low-effort equivalence verification of real code [ramos:cav:2011]
Swarming
- Swarm verification techniques [holzmann:ieeetse:2011]
- Swarm testing [groce:issta:2012]
- Scaling symbolic execution using ranged analysis [siddiqui:oopsla:2012]
- A synergistic approach for distributed symbolic execution using test ranges [qiu:icse:2017]
Vacuity checks
- Sanity checks in formal verification [kupferman:concur:2006]
Counterexamples and test generation
- Executable counterexamples in software model checking [gennari:vstte:2018]
- Formal specification and testing of QUIC [mcmillan:sigcomm:2019]
Separation logic
- A local shape analysis based on separation logic [distefano:tacas:2006]
- Beyond reachability: Shape abstraction in the presence of pointer arithmetic [calcagno:sas:2006]

CPUs and security

Microarchitecture
- Hardware faults
  - Silent data corruptions at scale [dixit:arxiv:2021]
  - Cores that don’t count [hochschild:hotos:2021]
- Power viruses
  - GeST: An automatic framework for generating CPU stress-tests [hadjilambrou:ispass:2019]
- Side channels
  - Spectector: Principled detection of speculative information flows [guarnieri:sandp:2020]
  - CacheQuery: Learning replacement policies from hardware caches [vila:pldi:2020]
Security
- CHERI concentrate: Practical compressed capabilities [woodruff:tocs:2019]
- snmalloc: A message passing allocator [lietar:ismm:2019]
- PAC: Practical Accountability for CCF [shamis:arxiv:2021]
- Toward confidential cloud computing: Extending hardware-enforced cryptographic protection to data while in use [russinovich:acmq:2021]

Neural networks / Machine learning

Since I work in a machine-learning part of Google, I have been reading about machine learning.

Language and kernels
- TensorFlow: Large-scale machine learning on heterogeneous distributed systems [abadi:arxiv:2016]
- The tensor algebra compiler [kjolstad:oopsla:2017]
- Sparse GPU kernels for deep learning [gale:arxiv:2020]
Hardware
- In-datacenter performance analysis of a tensor processing unit [jouppi:isca:2017]
- SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training [qin:hpca:2020]
- ExTensor: An accelerator for sparse tensor algebra [hedge:micro:2019]
Sparsity
- The state of sparsity in deep neural networks [gale:arxiv:2019]
- Fast sparse ConvNets [elsen:arxiv:2019]
Scaling
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer [shazeer:arxiv:2017]
- GShard: Scaling giant models with conditional computation and automatic sharding [lepikhin:arxiv:2020]
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [fedus:arxiv:2021]
- Attention is all you need [vaswani:arxiv:2017]

Information flow control

Noninterference for free [bowman:icfp:2015]

Programming languages

The next 700 semantics: A research challenge [krishnamurthi:snapl:2019]

Miscellaneous

Large teams have developed science and technology; small teams have disrupted it [wu:arxiv:2017]
As we may think [bush:atlantic:1945]