Facebook London has built a great team to bring the latest bugfinding techniques into Facebook’s development process. A lot of the time, they are having to develop those techniques themselves. But, they also give grants to academic groups to encourage them and they hold an annual two day symposium to talk about the challenges, progress, techniques, etc. The symposium was open to anybody interested in the topic: I met Facebook staff, many academic researchers (professors and PhD students), people from some of the other major tech companies, people from automative industry, entrepreneurs creating bugfinding tools, and many others. I think it is great that Facebook is investing in developing the testing and verification community in this way.
It feels as if the idea of having a formal specification of a processor has turned a corner. Instead of having incomplete specs for parts of some architectures, we now have specs for Arm and RISC-V that are complete enough that you can boot an OS on them and we have complete specs of the x86 instruction set. Instead of having specs that are tied to some particular project/purpose, we have flexible specs that can be used to reason for many different purposes. So this is a great time to hold a workshop for the people working on all the different specs and applications to get together, compare notes and identify future challenges.
After 15 great years at Arm, I’m making a change.
You can use an SMT solver to find a solution to a set of constraints. But what happens if you want to find multiple solutions? My previous posts in this series have looked at how you can turn execution traces into SMT problems and at how you can use an SMT solver to enumerate all paths through a specification. In this post, I’ll look at how you can generate multiple inputs that will trigger each path. This can be useful if you are trying to generate testcases although it is good for other things too.
SMT solvers are incredibly flexible tools for analyzing complex systems. In my previous post, I showed how you can: generate a symbolic execution trace from running an instrumented interpreter on some input values; turn the trace into an SMT circuit; and use an SMT solver to check that the SMT circuit matches your original trace. This post will explore how we can check assertions, array index bounds, etc. to find bugs in the specification and how we can enumerate all the paths through a specification.
In a previous post about formal validation of the Arm architecture I sketched how you can reason about Arm’s processor specifications using SMT solvers such as Z3, Boolector and CVC4. The main idea is that you translate ASL (the language that Arm’s specifications are written in) to SMT-Lib2 (the language that SMT solvers accept).
Last week’s S-REPLS keynote by Sylvan Clebsch was a talk about the limitations of current microprocessor architecture and how it hides everything of interest from the programmer: instruction level parallelism is hidden behind out-of-order execution, message passing is hidden behind cache coherency, etc. This reminded me that I have been meaning to write about the SoC-C project that aimed to make parallelism explicit by adding some language extensions to C and that achieved very high performance, good scaling and high energy efficiency by using compiler tricks and some programmer annotations to let us exploit a fairly bare-bones parallel system.
One of the tantalising pieces of information contained in ARM’s machine readable specifications is a specification of the assembly syntax. A few years ago (on an earlier) version of the specification, Wojciech Meyer and I decided to try to transform this specification into assemblers and disassemblers. At the time, this was not very useful to ARM because we already had assemblers and disassemblers so, although technically successful, the project died and the code has been slowly bitrotting ever since. In a few days time, I will be giving a talk at the 34th Chaos Communication Congress [video] [pdf] in Leipzig about practical things you can do with ARM’s specification and I thought it would be a good idea to suggest that someone creates a similar tool. But maybe it would be a good idea if I showed you what Wojciech and I did to get you started?
In my last post about natural language specifications, I talked about the problem with executable specifications: they are great for specifying existing behaviour but they are bad about describing overall properties and, in particular, what is not allowed to happen. This makes it possible that extensions that add new behaviour could break fundamental properties of the specification without anybody noticing.
In my efforts to create a formal specification of the Arm architecture, I have focussed on the parts written in “pseudocode”. I have reverse engineered the ASL language hiding inside the pseudocode to create a formal, executable specification that I can execute and test. In the process, I have tended to ignore all the natural language prose that makes up the bulk of the 6,000 page Arm Architecture Reference Manual. In defiance of Betteridge’s Law, this article is going to explain how I finally found a use for all that prose.
Three months ago, Arm released version v8.2 of its processor architecture specification. Arm’s release includes PDF and HTML for the specification but what makes this specification unusual is that it includes a machine readable spec as well. The machine readable spec contains the instruction encodings, the system register encodings and an executable specification (written in ASL) of each instruction and all the supporting code such as exception handling, interrupt handling, page table walks, hypervisor support, debug support, etc.
In my post about dissecting the ARM Machine Readable Architecture files I described how to extract the ASL code from the XML files that ARM provides. In this post, I will describe how to start processing that code by examining the lexical syntax. In doing so, I will be going back to one of the first things I had to figure out when I started trying to use ARM’s documentation as an executable specification so I will be looking at code I have barely thought about in 6 years and trying to remember my thought processes at the time as I reverse engineered the language lurking inside ARM’s pseudocode.
Last week, ARM released their Machine Readable Architecture Specification and I wrote about what you can do with it. But before you can do anything with the specification, you need to extract the useful bits from the release so I thought I would try for myself and describe what I found out (and release some scripts that demonstrate/test what I am saying).
The device you are reading this post on consists of a very tall stack of layers - all the way from transistors and NAND gates all the way up to processors, C, Linux/ Android/ iOS/ Windows to the browser. Each of these layers may be written by a different team possibly in a different company and the interface between these layers is documented and specified so that each team knows what it can assume and what it must provide. One of the most important interfaces in this stack is the one between hardware and software that says what a processor will do if it is configured a certain way, provided with page tables, interrupt handlers, put in a certain privilege level and finallyprovided with a program to execute. If you want to write a compiler, operating system, hypervisor or security layer, then you need to know how the processor will behave. If someone gives you a virus and asks you to figure out how it works and how to defend against it, then you need to know how the processor will behave.
What language should you write a specification in? Should you use the language supported by your favourite verification tool (HOL, Verilog, …)? Or should you write it in a DSL and translate that to whatever your verification tool needs?
As I was explaining the ISA-Formal technique for verifying ARM processors to people, I realized that it was important to be clear about what ISA-Formal does not verify and why.
Part of the reason why the ISA-Formal technique for verifying ARM processors is so effective and so portable across different ARM processors is the fact that we directly use the ARM Instruction Set Architecture (ISA) Specification in our flow. That is, I translate ARM’s official printed documentation into something that I can load into a model checker alongside ARM’s processor Verilog and I verify that the two match each other.
Probably the most important thing I didn’t say enough in my paper about verifying ARM processors is why we focus on finding bugs.
I have spent the last 5 years working on ARM’s processor specifications: making them executable, testable, mechanizing them, formalizing them, making them trustworthy, etc.
The difficult thing about Industrial Research Labs is that you don’t always get to talk about what you are doing so, for the last five years or so, I haven’t published anything except the occasional patent and I have given only abstract descriptions to most of my friends.
I write lots of minor documentation files: Readme’s; release notes; instructions for checking out and building the project; weekly reports; todo lists; internal documentation; etc. If I want to keep these in sync with the source code they refer to, then I need to put it in git/Hg/SVN/CVS version control which rules out using MS Word (not that I was seriously planning to use Word). So it’s plain text then…
I don’t have a problem. My wife thinks it strange that I always have a pen in my pocket - but I feel that she also admires the practicality of it. Almost the first thing I do on getting a new computer is to install software for taking and cross-referencing notes - but everyone does that, don’t they? And when I’m travelling and want to be able to take notes without whipping out a laptop and typing a password, then I use a pocketmod.
My first laptops came with docking stations - hefty beasts you clip your laptop into with USB, DVI, power, printer and possibly even RS-232 connectors. Convenient at the desk but too bulky to travel with so it sits on my desk at work and doesn’t come home with me (or sits at home and …)
I’ve been using a Synology Diskstation for a while as a Network Attached Storage box for sharing music and photos and as a TimeMachine server but I’ve always known that it can do a lot more and I wanted a private git server so I googled and found these useful pages:
I have been wanting to experiment with ARM’s new 64-bit architecture for a while and now ARM has released a simulator.
What does it take to turn a Mac into a decent programming environment?
I recently got a new home computer - my first Mac. As with any new machine, it takes a while to set it up just as you like it. Here are some of the basics that I started off with.
Historically, I’ve changed jobs every 5-6 years; I’ve been at ARM over 7 years now. When I mentioned this to my manager, he started wondering where this was heading. In fact, I was explaining why I felt it was time for me to shift my focus away from parallelism, vector processing, performance optimization, etc. and into something I didn’t know much about. It’s a fast moving industry and if you don’t keep moving, your skills become irrelevant and if you don’t make a big enough shift, you won’t take the risks you need to take to achieve something new. (A very bad paraphrase of Richard Hamming.)
When I worked for universities, I had a very visible public presence. Anyone I talked to could easily google for me and find my recent projects, papers, etc. Without even trying, I was the first hit on Google. But 7 years ago, I moved to industry and suddenly I was invisible. I had no web page, there was no single collection of all my publications, nothing about recent projects, nothing about the people I work with. And I often can’t say much about my current project until all the patents are filed or because we’re working with another company and there are confidentiality requirements.