Today I am joining Intel Strategic CAD Labs to work on formal specifications. There’s not much to say so far because I’ve only just started but I think it’s going to be a lot of fun (and a lot of work). I figure that a lot of the first month or two will be spent figuring out what people want to do with formal specifications so, if you think that this will be relevant to you, please get in touch. I am equally interested in uses that are on my list of uses for ISA specifications and uses that I haven’t even thought of.
I’ve spent the last couple of years working at Google Research. For the first 5–6 months, I was working in London in an office nestled between King’s Cross station and the iconic St Pancras station. This location was ideal for me because I was still living in Cambridge at the time and it was a very easy journey to either of these stations from Cambridge.
ISA specifications describe the behaviour of a processor: the instructions, memory protection, the privilege mechanisms, debug mechanisms, etc. The traditional form of an ISA specification is as a paper document but, as ISAs have grown, this has become unwieldy. More importantly though, there are more and more potential uses for machine readable, mechanized, executable ISA specifications.
Last year, I wrote about the 122 papers that I read in my first year at Google and that I summarize on the RelatedWork site. Over the last 18 months or so, I’ve spent a lot less time doing the one hour train commute between Cambridge and London so I only read 59 papers in the last year and added 188 papers to the backlog of unread papers. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers.
Research is uncertain. It’s not clear what problems you will hit. It’s not clear how many problems you will hit. It’s not clear how long success will take or what success will look like or whether you will even succeed. When we talk about research, we often focus on the ideas, the sudden insight, the stroke of genius, standing on the shoulders of giants, etc. We sometimes talk about uncertainty and persistence. But we very rarely talk about risk and, most importantly, how to manage it.
Last year, I wrote a summary of all the Rust verification tools that I knew about. But formal verification of Rust is a fast-changing field so I took advantage of all the experts at the 2021 Rust Verification Workshop to make an up to date list of tools. This post focuses on the automatic verification tools.
What can we do when designing Rust code to make it easier to test? This is a survey of everything I could find[^survey-method] about testing Rust with a particular focus on design for testability for correctness. Some of the articles show multiple things to do on a worked example, some are more focused on a particular trick.
This is an overview of the papers that I have read over the last 12 months since I joined Google Research. Over the last year, I have read 122 papers and I have added 364 to my paper backlog. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers. As the number of papers grew, I reorganized several times and the site became a bit like a Zettelkasten.
Dijkstra famously dissed testing by saying “Program testing can be used to show the presence of bugs, but never to show their absence!” As if you should choose one over the other. I don’t see them as opposites but as complementary techniques that should both be used to improve the quality of your code.
The Rust language and the Rust community are really interesting if you are want to build better quality systems software.
Since joining Google in September, I have spent a lot of time reading about software verification tools and one of the things that keeps coming up is that there are competitions between different verification tools.
ASLi is an interpreter for Arm’s Architecture Specification Language. To use it, you need to download the MRA tools, Arm’s Architecture Specifications and, of course, ASLi. (Note: that I am linking to an updated version of ASLi instead of to Arm’s official release – I had to extend ASLi a bit to be able to execute programs with it.)
Facebook London has built a great team to bring the latest bugfinding techniques into Facebook’s development process. A lot of the time, they are having to develop those techniques themselves. But, they also give grants to academic groups to encourage them and they hold an annual two day symposium to talk about the challenges, progress, techniques, etc. The symposium was open to anybody interested in the topic: I met Facebook staff, many academic researchers (professors and PhD students), people from some of the other major tech companies, people from automative industry, entrepreneurs creating bugfinding tools, and many others. I think it is great that Facebook is investing in developing the testing and verification community in this way.
It feels as if the idea of having a formal specification of a processor has turned a corner. Instead of having incomplete specs for parts of some architectures, we now have specs for Arm and RISC-V that are complete enough that you can boot an OS on them and we have complete specs of the x86 instruction set. Instead of having specs that are tied to some particular project/purpose, we have flexible specs that can be used to reason for many different purposes. So this is a great time to hold a workshop for the people working on all the different specs and applications to get together, compare notes and identify future challenges.
After 15 great years at Arm, I’m making a change.
You can use an SMT solver to find a solution to a set of constraints. But what happens if you want to find multiple solutions? My previous posts in this series have looked at how you can turn execution traces into SMT problems and at how you can use an SMT solver to enumerate all paths through a specification. In this post, I’ll look at how you can generate multiple inputs that will trigger each path. This can be useful if you are trying to generate testcases although it is good for other things too.
SMT solvers are incredibly flexible tools for analyzing complex systems. In my previous post, I showed how you can: generate a symbolic execution trace from running an instrumented interpreter on some input values; turn the trace into an SMT circuit; and use an SMT solver to check that the SMT circuit matches your original trace. This post will explore how we can check assertions, array index bounds, etc. to find bugs in the specification and how we can enumerate all the paths through a specification.
In a previous post about formal validation of the Arm architecture I sketched how you can reason about Arm’s processor specifications using SMT solvers such as Z3, Boolector and CVC4. The main idea is that you translate ASL (the language that Arm’s specifications are written in) to SMT-Lib2 (the language that SMT solvers accept).
Last week’s S-REPLS keynote by Sylvan Clebsch was a talk about the limitations of current microprocessor architecture and how it hides everything of interest from the programmer: instruction level parallelism is hidden behind out-of-order execution, message passing is hidden behind cache coherency, etc. This reminded me that I have been meaning to write about the SoC-C project that aimed to make parallelism explicit by adding some language extensions to C and that achieved very high performance, good scaling and high energy efficiency by using compiler tricks and some programmer annotations to let us exploit a fairly bare-bones parallel system.
One of the tantalising pieces of information contained in ARM’s machine readable specifications is a specification of the assembly syntax. A few years ago (on an earlier) version of the specification, Wojciech Meyer and I decided to try to transform this specification into assemblers and disassemblers. At the time, this was not very useful to ARM because we already had assemblers and disassemblers so, although technically successful, the project died and the code has been slowly bitrotting ever since. In a few days time, I will be giving a talk at the 34th Chaos Communication Congress [video] [pdf] in Leipzig about practical things you can do with ARM’s specification and I thought it would be a good idea to suggest that someone creates a similar tool. But maybe it would be a good idea if I showed you what Wojciech and I did to get you started?
In my last post about natural language specifications, I talked about the problem with executable specifications: they are great for specifying existing behaviour but they are bad about describing overall properties and, in particular, what is not allowed to happen. This makes it possible that extensions that add new behaviour could break fundamental properties of the specification without anybody noticing.
In my efforts to create a formal specification of the Arm architecture, I have focussed on the parts written in “pseudocode”. I have reverse engineered the ASL language hiding inside the pseudocode to create a formal, executable specification that I can execute and test. In the process, I have tended to ignore all the natural language prose that makes up the bulk of the 6,000 page Arm Architecture Reference Manual. In defiance of Betteridge’s Law, this article is going to explain how I finally found a use for all that prose.
Three months ago, Arm released version v8.2 of its processor architecture specification. Arm’s release includes PDF and HTML for the specification but what makes this specification unusual is that it includes a machine readable spec as well. The machine readable spec contains the instruction encodings, the system register encodings and an executable specification (written in ASL) of each instruction and all the supporting code such as exception handling, interrupt handling, page table walks, hypervisor support, debug support, etc.
In my post about dissecting the ARM Machine Readable Architecture files I described how to extract the ASL code from the XML files that ARM provides. In this post, I will describe how to start processing that code by examining the lexical syntax. In doing so, I will be going back to one of the first things I had to figure out when I started trying to use ARM’s documentation as an executable specification so I will be looking at code I have barely thought about in 6 years and trying to remember my thought processes at the time as I reverse engineered the language lurking inside ARM’s pseudocode.
Last week, ARM released their Machine Readable Architecture Specification and I wrote about what you can do with it. But before you can do anything with the specification, you need to extract the useful bits from the release so I thought I would try for myself and describe what I found out (and release some scripts that demonstrate/test what I am saying).
The device you are reading this post on consists of a very tall stack of layers - all the way from transistors and NAND gates all the way up to processors, C, Linux/ Android/ iOS/ Windows to the browser. Each of these layers may be written by a different team possibly in a different company and the interface between these layers is documented and specified so that each team knows what it can assume and what it must provide. One of the most important interfaces in this stack is the one between hardware and software that says what a processor will do if it is configured a certain way, provided with page tables, interrupt handlers, put in a certain privilege level and finallyprovided with a program to execute. If you want to write a compiler, operating system, hypervisor or security layer, then you need to know how the processor will behave. If someone gives you a virus and asks you to figure out how it works and how to defend against it, then you need to know how the processor will behave.
What language should you write a specification in? Should you use the language supported by your favourite verification tool (HOL, Verilog, …)? Or should you write it in a DSL and translate that to whatever your verification tool needs?
As I was explaining the ISA-Formal technique for verifying ARM processors to people, I realized that it was important to be clear about what ISA-Formal does not verify and why.
Part of the reason why the ISA-Formal technique for verifying ARM processors is so effective and so portable across different ARM processors is the fact that we directly use the ARM Instruction Set Architecture (ISA) Specification in our flow. That is, I translate ARM’s official printed documentation into something that I can load into a model checker alongside ARM’s processor Verilog and I verify that the two match each other.
Probably the most important thing I didn’t say enough in my paper about verifying ARM processors is why we focus on finding bugs.
I have spent the last 5 years working on ARM’s processor specifications: making them executable, testable, mechanizing them, formalizing them, making them trustworthy, etc.
The difficult thing about Industrial Research Labs is that you don’t always get to talk about what you are doing so, for the last five years or so, I haven’t published anything except the occasional patent and I have given only abstract descriptions to most of my friends.
I write lots of minor documentation files: Readme’s; release notes; instructions for checking out and building the project; weekly reports; todo lists; internal documentation; etc. If I want to keep these in sync with the source code they refer to, then I need to put it in git/Hg/SVN/CVS version control which rules out using MS Word (not that I was seriously planning to use Word). So it’s plain text then…
I don’t have a problem. My wife thinks it strange that I always have a pen in my pocket - but I feel that she also admires the practicality of it. Almost the first thing I do on getting a new computer is to install software for taking and cross-referencing notes - but everyone does that, don’t they? And when I’m travelling and want to be able to take notes without whipping out a laptop and typing a password, then I use a pocketmod.
My first laptops came with docking stations - hefty beasts you clip your laptop into with USB, DVI, power, printer and possibly even RS-232 connectors. Convenient at the desk but too bulky to travel with so it sits on my desk at work and doesn’t come home with me (or sits at home and …)
I’ve been using a Synology Diskstation for a while as a Network Attached Storage box for sharing music and photos and as a TimeMachine server but I’ve always known that it can do a lot more and I wanted a private git server so I googled and found these useful pages:
I have been wanting to experiment with ARM’s new 64-bit architecture for a while and now ARM has released a simulator.
What does it take to turn a Mac into a decent programming environment?
I recently got a new home computer - my first Mac. As with any new machine, it takes a while to set it up just as you like it. Here are some of the basics that I started off with.
Historically, I’ve changed jobs every 5-6 years; I’ve been at ARM over 7 years now. When I mentioned this to my manager, he started wondering where this was heading. In fact, I was explaining why I felt it was time for me to shift my focus away from parallelism, vector processing, performance optimization, etc. and into something I didn’t know much about. It’s a fast moving industry and if you don’t keep moving, your skills become irrelevant and if you don’t make a big enough shift, you won’t take the risks you need to take to achieve something new. (A very bad paraphrase of Richard Hamming.)
When I worked for universities, I had a very visible public presence. Anyone I talked to could easily google for me and find my recent projects, papers, etc. Without even trying, I was the first hit on Google. But 7 years ago, I moved to industry and suddenly I was invisible. I had no web page, there was no single collection of all my publications, nothing about recent projects, nothing about the people I work with. And I often can’t say much about my current project until all the patents are filed or because we’re working with another company and there are confidentiality requirements.