mra_tools

Tools to process ARM's Machine Readable Architecture Specification

View the Project on GitHub alastairreid/mra_tools

Tools to extract ARM’s Machine Readable Architecture Specification.

These tools unpack the ASL spec from inside the XML so that the spec is easy to process.

See blog post for an explanation of the structure of ARM’s releases and a description of the innards of these tools and see blog post for some ideas on what can be done with the specification once it has been unpacked.

Usage

The following commands will download ARM’s specification and unpack it.

mkdir -p v8.6
cd v8.6

wget https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/2019-12/SysReg_xml_v86A-2019-12.tar.gz
wget https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/2019-12/A64_ISA_xml_v86A-2019-12.tar.gz
wget https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/2019-12/AArch32_ISA_xml_v86A-2019-12.tar.gz

tar zxf A64_ISA_xml_v86A-2019-12.tar.gz
tar zxf AArch32_ISA_xml_v86A-2019-12.tar.gz
tar zxf SysReg_xml_v86A-2019-12.tar.gz

cd ..

make all

You may need to manually add function prototypes for these functions to arch.asl

bits(4) _MemTag[AddressDescriptor desc]
_MemTag[AddressDescriptor desc] = bits(4) value;

Generates:

You can also extract various subsets of the full architecture specification. For example, if you want a subset of the usermode AArch64 instructions, you can use the following command.

make FILTER=--filter=usermode.json all

The subset selected may not contain all the instructions you would want — see Subsetting for more details.

Help

$ bin/instrs2asl.py  -h
usage: instrs2asl.py [-h] [--verbose] [--altslicesyntax] [--demangle]
                     [--output FILE] [--filter [FILE [FILE ...]]]
                     [--arch {AArch32,AArch64}]
                     <dir> [<dir> ...]

Unpack ARM instruction XML files extracting the encoding information and ASL
code within it.

positional arguments:
  <dir>                 input directories

optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         Use verbose output
  --altslicesyntax      Convert to alternative slice syntax
  --demangle            Demangle instruction ASL
  --output FILE, -o FILE
                        Basename for output files
  --filter [FILE [FILE ...]]
                        Optional input json file to filter definitions
  --arch {AArch32,AArch64}
                        Optional list of architecture states to extract

Subsetting

Various subsets of the architecture can be generated using these additional flags

--arch=AArch32
--arch=AArch64
--arch=AArch32 --arch=AArch64

For finer control, you can specify a specific filter that selects exactly which instructions and subset of the call graph to include

make FILTER=--filter=usermode.json all

The filter is controlled by a json file that has this format:

{
    "instructions": [
        // regexp list goes here
    ],
    "roots": [
        // root definitions go here
    ],
    "cuts": [
        // cut functions go here
    ],
    "canaries": [
        // canary definitions go here
    ]
}

The four parts of this are:

Currently implemented

All generated files include ARM’s license notice.

Shared pseudocode

The shared pseudocode is sorted so that definitions come before uses.

Tagfile format for functions

A tagfile consists of sections that start with a line of the form “TAG:$label:$kind”. There are five different kinds:

Register spec

At the moment, we unpack all the information about fields and declare a variable with the right name and with named fields. This uses an unofficial ASL extension to declare a number the location of each field.

__register 32 {
    31:31 N, 30:30 Z, 29:29 C, 28:28 V, 27:27 Q, 24:24 J, 22:22 PAN, 19:16 GE,
    9:9 E, 8:8 A, 7:7 I, 6:6 F, 5:5 T, 7:2, 1:0 IT, 3:0 M
} CPSR;

The system register specification also contains a lot of information about how to refer to a system register, permission checking, constant value fields, etc. but none of that is being extracted at the moment.

Experimental parser, etc.

There is an experimental parser for the language written in ocaml. This requires some tools to be installed. The following instructions are for a Mac.

brew install ocaml opam
opam install menhir core

Test it using the following

make test

At the moment, all it does is parse the ASL code extracted from the XML files. It does not have a parser or typechecker.