Sparse GPU kernels for deep learning

Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen
[arXiv] [Google Scholar] [DBLP] [Citeseer]
Read: 04 October 2021

arXiv 2006.10901 cs.LG
2020
Note(s): neural network, sparse model, google

Scientific sparse matrices tend to be 99% sparse and have many short rows (eg 4-16 elements) while ML matrices tend to be 60-99% sparse and have many moderate length rows (eg 64-256 elements) with much less variability in row lengths.

This paper describes in detail how to implement SpMM and SDDMM efficiently on CUDA resulting in significantly faster inference than if dense models were used instead. Figure 12 shows that you can get 1% more accuracy at the same throughput or 20% higher framerates at the same accuracy.