Large-scale automated refactoring using ClangMR

Hyrum K. Wright, Daniel Jasper, Manuel Klimek, Chandler Carruth, Zhanyong Wan
[doi] [Google Scholar] [DBLP] [Citeseer]
Read: 21 September 2020

2013 IEEE International Conference on Software Maintenance
Pages 548-551
Sep. 2013
Note(s): google
Papers: potvin:cacm:2016

ClangMR is a refactoring tool that scales to systems of the size of Google’s monorepo (potvin:cacm:2016). It consists of three phases:

  1. Collect a list of all the commands needed to compile code in the entire repo.

  2. Compile using a modified compiler that stores the AST in memory (of many machines running in parallel).

    it is roughly as fast to compile C++ code into a memory-held AST as it is to read a completely annotated AST out of distant storage

  3. Apply transformation rules based on AST pattern matching and substitutions.

Transformations are limited to changes within the same translation units. (I think that is because this is all based on MapReduce and so they cannot propagate information from one translation unit to another. I think “MR” is short for “MapReduce”?)

The paper gives an example of a transformation that changed calls to an old, error-prone string splitting API into calls to a better API. The transformation was able to take advantage of constant arguments, etc. in the old code to produce a better/cleaner output.