Fast sparse ConvNets

Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan
[arXiv] [Google Scholar] [DBLP] [Citeseer]
Read: 04 October 2021

arXiv 1911.09723 cs.CV
2019
Note(s): neural network, sparse model, TensorFlow, XNNPACK, google

Describes the motivation and implementation of sparse convolution operations in XNNPACK (implemented in Arm NEON). Can be used with the TensorFlow lite model pruning library that learns sparse representations.

At sparsity levels of 85-90%, runtime of models increases by about 30-50% (for MobileNet and EfficientNet).

Three observations

  1. Although the weights are sparse, the activations are dense. So they can perform vector loads from the activation matrix.

  2. It is possible to keep the working set smaller than the L1$. (Is this just a corollary of the ability to use vector loads?)

  3. If you don’t have too many channels, you can prefetch activations. (todo: I thought “channel” was just red-green-blue in images - now I think I have that wrong).


XNNPACK