Machine readable specifications at scale

There are lots of potential uses for machine readable specifications so you would think that every major real world artifact like long-lived hardware and software systems, protocols, languages, etc. would have a formal specification that is used by all teams extending the design, creating new implementations, testing/verifying the system, verifying code that uses the system, doing security analyses or any of the other potential uses. But, in practice, this is usually not true: most real world systems do not have a well tested, up to date, machine readable specification.

Read More

Joining Intel

Intel logo Today I am joining Intel Strategic CAD Labs to work on formal specifications. There’s not much to say so far because I’ve only just started but I think it’s going to be a lot of fun (and a lot of work). I figure that a lot of the first month or two will be spent figuring out what people want to do with formal specifications so, if you think that this will be relevant to you, please get in touch. I am equally interested in uses that are on my list of uses for ISA specifications and uses that I haven’t even thought of.

Read More

Farewell to Google

Google logo I’ve spent the last couple of years working at Google Research. For the first 5–6 months, I was working in London in an office nestled between King’s Cross station and the iconic St Pancras station. This location was ideal for me because I was still living in Cambridge at the time and it was a very easy journey to either of these stations from Cambridge.

Read More

What can you do with an ISA specification?

ISA specifications describe the behaviour of a processor: the instructions, memory protection, the privilege mechanisms, debug mechanisms, etc. The traditional form of an ISA specification is as a paper document but, as ISAs have grown, this has become unwieldy. More importantly though, there are more and more potential uses for machine readable, mechanized, executable ISA specifications.

Read More

Summarizing 12 months of reading papers (2021)

Last year, I wrote about the 122 papers that I read in my first year at Google and that I summarize on the RelatedWork site. Over the last 18 months or so, I’ve spent a lot less time doing the one hour train commute between Cambridge and London so I only read 59 papers in the last year and added 188 papers to the backlog of unread papers. You can read my summaries of all the papers, read notes on common themes in the papers, and download BibTeX for all the papers.

Read More

Retrospective

Rust logo There are lots of hard problems in industrial research but one of the hardest is timing. If you start a project too early, your users are focused on other problems and don’t yet feel a need for your solution. And, if you start a project too late, your users have already hit the problem and you have to work round whatever solution they have found. And so it is with this project.[^but-also-other-things] At the moment, our potential users are still working on all the things you must have to support a new language in a large company and there is not a strong pull for a new, unproven research idea. So, as we close the project down, I thought it would be good to have a bit of a retrospective on what the project was trying to do and how.

Read More

Using KLEE on Rust-for-Linux (part 3)

Linux logo To conclude this series on using KLEE with Rust for Linux, we are going to write a simple verification harness for testing the rust_semaphore device driver written in Rust and symbolically execute the verification harness in KLEE. It’s worth repeating that the goal of these posts is not to verify the Rust for Linux code or to find bugs in it. Instead, my goal is that you should be able to apply these ideas to do that verification yourself. You might choose to continue in the same direction as these posts suggest by creating better mocks and test harnesses. Or you might want to try something else based on different answers to the questions in the first part of this series; or, perhaps, using some tool other than KLEE that I described in the second part of the series. The code changes described in this post are in this branch of my fork of Rust for Linux.

Read More

Using KLEE on Rust-for-Linux (part 2)

Linux logo Using tools like the KLEE symbolic execution tool or bounded model checkers with Linux kernel code is still a bit of a black art. The first part of this series on using KLEE on the Rust for Linux considered what we would want to check. This second part, digs deeply into how to prepare the codebase for use with LLVM-based tools like KLEE. (Warning: it may contain far more detail than most people are interested in.) The final part will show how we can build simple verification frameworks. The code changes described in this post are in this branch of my fork of Rust for Linux.

Read More

Using KLEE on Rust-for-Linux (part 1)

Linux logo The Rust for Linux project is working on adding support for the Rust language to the Linux kernel with the hope that using Rust will make new code more safe, easier to refactor and review, and easier to write. (See the RFC for more detail about goals and for the varied responses of the Linux Kernel community.)

Read More

Research risks in the Rust verification project

Rust logo In larger, long-running research projects, it can be useful to make a list of all the risks and research questions you have and use that as one way to guide what you tackle first. This is a list that we considered as we were planning the Rust verification project. Although some of the questions are specific to our particular context (what we want to achieve, who we want to help, etc.), I think that many of the questions apply to any project that aims to help developers to verify their own code.

Read More

Managing risks in research

Research is uncertain. It’s not clear what problems you will hit. It’s not clear how many problems you will hit. It’s not clear how long success will take or what success will look like or whether you will even succeed. When we talk about research, we often focus on the ideas, the sudden insight, the stroke of genius, standing on the shoulders of giants, etc. We sometimes talk about uncertainty and persistence. But we very rarely talk about risk and, most importantly, how to manage it.

Read More

Using KLEE on Coreutils

KLEE logo A lot of our work over the last year was on identifying and fixing obstacles to using KLEE with Rust and the main technique we used for finding new obstacles was to try to use KLEE with different Rust programs and libraries. One of the largest suites of programs we tackled was the Rust CoreUtils library: a Rust rewrite and drop in replacement for the GNU CoreUtils suite that includes programs like ls, cp, df, cat, and about 90 other standard Unix shell commands.

Read More

Fixing bottlenecks in Rust verification

KLEE logo It is inevitable that automatic verification tools will have performance problems because they push up against “the decidability ceiling”: trying to solve undecidable problems and often getting away with it. In an earlier article, we looked at how to profile the verification process to find which part of your program is causing the problem. But that is only half the problem: we need to actually fix the problem.[^not-even-half] So this article looks at one way that we can fix performance bottlenecks when verifying Rust code using the KLEE symbolic execution tool. In particular, it looks at using path-merging to overcome the path explosion problem.

Read More

Verifying vectorized Rust revisited

Rust logo Research is characterized by allowing yourself to make mistakes: performing experiments; drawing conclusions; later, realizing that your experiment was not sufficient and you got it wrong; and trying again. Back in March, we thought that we knew how to deal with vectorized Rust: tell the compiler not to auto-vectorize code; tell the compiler not to use vector instructions; and use existing conditional compilation feature flags to disable hand-vectorized code. Unfortunately, two of those three ideas don’t work – but we think we have a viable approach now.

Read More

Verifying hand-vectorized Rust code

Rust logo One of the major themes in our work on Rust verification is eliminating verification obstacles: things that mean that you can’t even run a verification tool on your code. So we have worked on how to verify cargo crates, how to verify code that uses the Rust FFI and how to verify programs with command line arguments. One recurring obstacle is that some code depends on processor-specific intrinsic functions. For example the Aho-Corasick crate supports fast string searches using SIMD acceleration and uses intrinsics to access the x86 architecture’s AVX2 or SSE2 vector extensions if they are available. Although verification tools could support these intrinsics, most of them do not – so if your program uses Aho-Corasick (or any crate that depends on it like the regex crate), then you won’t be able to verify your program.

Read More

Profiling Rust verification

KLEE logo Automatic formal verification is usually pushing up against what Leino calls “the decidability ceiling”: pushing the tools beyond what they can be guaranteed to solve in some reasonable time, taking the risk that the tools will blow up, but often getting away with it. But what can we do when the toast lands butter-side-down? This is a summary of a quick investigation to find out if anybody had any answers. Tl;dr: they do.

Read More

Rust Design-for-Testability: a survey

Rust logo What can we do when designing Rust code to make it easier to test? This is a survey of everything I could find[^survey-method] about testing Rust with a particular focus on design for testability for correctness. Some of the articles show multiple things to do on a worked example, some are more focused on a particular trick.

Read More

Using command-line arguments ('argv')

LLVM logo One important difference between C and Rust is that the C main function expects to be given a list of command line arguments via argc/argv function parameters while Rust programs access their command line arguments via the sys::env::args() library function.

Read More

Rust testing or verifying: Why not both?

Rust logo Dijkstra famously dissed testing by saying “Program testing can be used to show the presence of bugs, but never to show their absence!” As if you should choose one over the other. I don’t see them as opposites but as complementary techniques that should both be used to improve the quality of your code.

Read More

Using the `propverify` library to verify Rust programs

Rust logo The goal of the tools and library in this repository is to let you verify interesting things about non-trivial programs. Unfortunately, interesting/non-trivial programs are too large for introducing you to a tool so, for now, we will consider this trivial program.

Read More

Using KLEE

KLEE logo One of the most straightforward and most solid automatic verification tools is the KLEE symbolic execution tool that can be used to search for bugs in programs. KLEE was originally developed for C but, because KLEE and the Rust compiler are both based on the LLVM platform, it is possible to use KLEE with Rust programs.

Read More

Facebook Test and Verification Symposium 2019

Facebook London has built a great team to bring the latest bugfinding techniques into Facebook’s development process. A lot of the time, they are having to develop those techniques themselves. But, they also give grants to academic groups to encourage them and they hold an annual two day symposium to talk about the challenges, progress, techniques, etc. The symposium was open to anybody interested in the topic: I met Facebook staff, many academic researchers (professors and PhD students), people from some of the other major tech companies, people from automative industry, entrepreneurs creating bugfinding tools, and many others. I think it is great that Facebook is investing in developing the testing and verification community in this way.

Read More

ISA Specification Workshop (SpISA)

It feels as if the idea of having a formal specification of a processor has turned a corner. Instead of having incomplete specs for parts of some architectures, we now have specs for Arm and RISC-V that are complete enough that you can boot an OS on them and we have complete specs of the x86 instruction set. Instead of having specs that are tied to some particular project/purpose, we have flexible specs that can be used to reason for many different purposes. So this is a great time to hold a workshop for the people working on all the different specs and applications to get together, compare notes and identify future challenges.

Read More

Generating multiple solutions with SMT

ARM logo You can use an SMT solver to find a solution to a set of constraints. But what happens if you want to find multiple solutions? My previous posts in this series have looked at how you can turn execution traces into SMT problems and at how you can use an SMT solver to enumerate all paths through a specification. In this post, I’ll look at how you can generate multiple inputs that will trigger each path. This can be useful if you are trying to generate testcases although it is good for other things too.

Read More

Using SMT to check specifications for errors

ARM logo SMT solvers are incredibly flexible tools for analyzing complex systems. In my previous post, I showed how you can: generate a symbolic execution trace from running an instrumented interpreter on some input values; turn the trace into an SMT circuit; and use an SMT solver to check that the SMT circuit matches your original trace. This post will explore how we can check assertions, array index bounds, etc. to find bugs in the specification and how we can enumerate all the paths through a specification.

Read More

Making parallelism explicit with SoC-C

ARM logo Last week’s S-REPLS keynote by Sylvan Clebsch was a talk about the limitations of current microprocessor architecture and how it hides everything of interest from the programmer: instruction level parallelism is hidden behind out-of-order execution, message passing is hidden behind cache coherency, etc. This reminded me that I have been meaning to write about the SoC-C project that aimed to make parallelism explicit by adding some language extensions to C and that achieved very high performance, good scaling and high energy efficiency by using compiler tricks and some programmer annotations to let us exploit a fairly bare-bones parallel system.

Read More

Bidirectional ARM Assembly Syntax Specifications

ARM logo One of the tantalising pieces of information contained in ARM’s machine readable specifications is a specification of the assembly syntax. A few years ago (on an earlier) version of the specification, Wojciech Meyer and I decided to try to transform this specification into assemblers and disassemblers. At the time, this was not very useful to ARM because we already had assemblers and disassemblers so, although technically successful, the project died and the code has been slowly bitrotting ever since. In a few days time, I will be giving a talk at the 34th Chaos Communication Congress [video] [pdf] in Leipzig about practical things you can do with ARM’s specification and I thought it would be a good idea to suggest that someone creates a similar tool. But maybe it would be a good idea if I showed you what Wojciech and I did to get you started?

Read More

Are natural language specifications useful?

ARM logo In my efforts to create a formal specification of the Arm architecture, I have focussed on the parts written in “pseudocode”. I have reverse engineered the ASL language hiding inside the pseudocode to create a formal, executable specification that I can execute and test. In the process, I have tended to ignore all the natural language prose that makes up the bulk of the 6,000 page Arm Architecture Reference Manual. In defiance of Betteridge’s Law, this article is going to explain how I finally found a use for all that prose.

Read More

Arm v8.3 Machine Readable Specification Released

ARM logo Three months ago, Arm released version v8.2 of its processor architecture specification. Arm’s release includes PDF and HTML for the specification but what makes this specification unusual is that it includes a machine readable spec as well. The machine readable spec contains the instruction encodings, the system register encodings and an executable specification (written in ASL) of each instruction and all the supporting code such as exception handling, interrupt handling, page table walks, hypervisor support, debug support, etc.

Read More

ASL Lexical Syntax

ARM logo In my post about dissecting the ARM Machine Readable Architecture files I described how to extract the ASL code from the XML files that ARM provides. In this post, I will describe how to start processing that code by examining the lexical syntax. In doing so, I will be going back to one of the first things I had to figure out when I started trying to use ARM’s documentation as an executable specification so I will be looking at code I have barely thought about in 6 years and trying to remember my thought processes at the time as I reverse engineered the language lurking inside ARM’s pseudocode.

Read More

ARM Releases Machine Readable Architecture Specification

ARM logo The device you are reading this post on consists of a very tall stack of layers - all the way from transistors and NAND gates all the way up to processors, C, Linux/ Android/ iOS/ Windows to the browser. Each of these layers may be written by a different team possibly in a different company and the interface between these layers is documented and specified so that each team knows what it can assume and what it must provide. One of the most important interfaces in this stack is the one between hardware and software that says what a processor will do if it is configured a certain way, provided with page tables, interrupt handlers, put in a certain privilege level and finallyprovided with a program to execute. If you want to write a compiler, operating system, hypervisor or security layer, then you need to know how the processor will behave. If someone gives you a virus and asks you to figure out how it works and how to defend against it, then you need to know how the processor will behave.

Read More

ARM's Architecture Specification Language

ARM logo What language should you write a specification in? Should you use the language supported by your favourite verification tool (HOL, Verilog, …)? Or should you write it in a DSL and translate that to whatever your verification tool needs?

Read More

Specification Terminology

ARM logo I have spent the last 5 years working on ARM’s processor specifications: making them executable, testable, mechanizing them, formalizing them, making them trustworthy, etc.

Read More

Talking more to academia

The difficult thing about Industrial Research Labs is that you don’t always get to talk about what you are doing so, for the last five years or so, I haven’t published anything except the occasional patent and I have given only abstract descriptions to most of my friends.

Read More

Using Markdown in projects

I write lots of minor documentation files: Readme’s; release notes; instructions for checking out and building the project; weekly reports; todo lists; internal documentation; etc. If I want to keep these in sync with the source code they refer to, then I need to put it in git/Hg/SVN/CVS version control which rules out using MS Word (not that I was seriously planning to use Word). So it’s plain text then…

Read More

PocketMod

I don’t have a problem. My wife thinks it strange that I always have a pen in my pocket - but I feel that she also admires the practicality of it. Almost the first thing I do on getting a new computer is to install software for taking and cross-referencing notes - but everyone does that, don’t they? And when I’m travelling and want to be able to take notes without whipping out a laptop and typing a password, then I use a pocketmod.

Read More

Minimalist Docking Station

My first laptops came with docking stations - hefty beasts you clip your laptop into with USB, DVI, power, printer and possibly even RS-232 connectors. Convenient at the desk but too bulky to travel with so it sits on my desk at work and doesn’t come home with me (or sits at home and …)

Read More

A Home Git Server

I’ve been using a Synology Diskstation for a while as a Network Attached Storage box for sharing music and photos and as a TimeMachine server but I’ve always known that it can do a lot more and I wanted a private git server so I googled and found these useful pages:

Read More

A New Machine

I recently got a new home computer - my first Mac. As with any new machine, it takes a while to set it up just as you like it. Here are some of the basics that I started off with.

Read More

Verification

ARM logo Historically, I’ve changed jobs every 5-6 years; I’ve been at ARM over 7 years now. When I mentioned this to my manager, he started wondering where this was heading. In fact, I was explaining why I felt it was time for me to shift my focus away from parallelism, vector processing, performance optimization, etc. and into something I didn’t know much about. It’s a fast moving industry and if you don’t keep moving, your skills become irrelevant and if you don’t make a big enough shift, you won’t take the risks you need to take to achieve something new. (A very bad paraphrase of Richard Hamming.)

Read More

Return to the web

When I worked for universities, I had a very visible public presence. Anyone I talked to could easily google for me and find my recent projects, papers, etc. Without even trying, I was the first hit on Google. But 7 years ago, I moved to industry and suddenly I was invisible. I had no web page, there was no single collection of all my publications, nothing about recent projects, nothing about the people I work with. And I often can’t say much about my current project until all the patents are filed or because we’re working with another company and there are confidentiality requirements.

Read More