Verifying hand-vectorized Rust code

One of the major themes in our work on Rust verification is eliminating verification obstacles: things that mean that you can’t even run a verification tool on your code. So we have worked on how to verify cargo crates, how to verify code that uses the Rust FFI and how to verify programs with command line arguments. One recurring obstacle is that some code depends on processor-specific intrinsic functions. For example the Aho-Corasick crate supports fast string searches using SIMD acceleration and uses intrinsics to access the x86 architecture’s AVX2 or SSE2 vector extensions if they are available. Although verification tools could support these intrinsics, most of them do not – so if your program uses Aho-Corasick (or any crate that depends on it like the regex crate), then you won’t be able to verify your program.

[Since writing this post, we have realized that the approach described here does not work very well. We now favour the much more effective “emulate, don’t eliminate” approach described here: using a SIMD emulation library to replace SIMD intrinsics.]

The solution to this problem lies in the cause of the problem. Verification tools don’t implement these intrinsics because they are not available on all processors. For the same reason, any reasonably portable library must have a way of disabling use of these intrinsics. Typically, it has a portable “slow path” that can be used if the vectorized “fast path” is not supported. So the solution is to force the library to always use the more portable slow path.¹ ²

The way that Rust programs dynamically decide whether a given instruction extension is available or not is to use the std-detect crate and to use macros like this which detect whether the AVX2 extension is available.

let has_avx = is_x86_feature_detected!("avx2");

This macro expands to code that calls std::std_detect::detect::check_for: a function that returns ‘true’ only if a processor extension is present.

fn check_for(x: Feature) -> bool

What if we could arrange for this function to always return ‘false’ no matter what feature you ask for?

As part of our verification toolchain, we had already had to write a tool that would preprocess LLVM bitcode files. We use this to collect initializers and make sure that they are run before main using this command

$ rvt-patch-llvm --initializers foo.bc -o bar.bc

Most of the effort of writing a tool like that is in figuring out how to read, write and traverse LLVM IR. The actual transformation itself is fairly easy. So, it’s pretty easy to extend the tool to look for a function with particular name and replace its body with return false.

So, now, the command

$ rvt-patch-llvm --initializers --features foo.bc -o bar.bc

turns an LLVM bitcode file ‘foo.bc’ that uses processor-specific intrinsics on its fast path into a bitcode file ‘bar.bc’ that does not.

If you are using a symbolic execution tool like KLEE that tests the feasibility of a path before going down it, that will be enough to fix the problem because execution will never hit the fast path with its unsupported intrinsics. If you are using a model checker like SeaHorn, that examines paths before testing their feasibility, I suspect that you will need to work a bit harder and use the LLVM optimization tool to perform constant propagation and dead-code elimination to completely remove all mention of these intrinsics from the bitcode file.

Summary

Most verification tools don’t implement every single processor specific intrinsic supported by compilers: especially not intrinsics that are only used for performance optimizations.

Anybody trying to write a portable library has added fallback code that can be used on processors that don’t support those intrinsics.

So all you have to do is ensure that the verifier only has to consider the portable version of the code and you can verify code that uses hand-vectorized libraries.

[This post was originally posted as part of the Rust verification project]

There’s an interesting question (that I won’t explore) about whether the simpler control structures of vectorized code might actually make them simpler to verify? In other words, maybe we should try supporting vector intrinsics in case it makes verification easier/faster? ↩
You might object that now I’m not verifying the code that actually runs/ships. This would be a serious problem if my goal was to verify the vectorized library: I would ideally want to check that the fast path and the slow path are equivalent.

On the other hand, if I am verifying a client library that calls the vectorized library, and I am more interested in bugs in the client library, then I might be happy to use the slow path as a proxy specification for what the fast path does while, reluctantly, accepting the fact that a bug in the vectorized code could break my verified program. ↩

Written on March 20, 2021.
The opinions expressed are my own views and not my employer's.