About techniques for learning sparse models either by learning a model and then making it sparse or by modifying how the initial model is learned or by using a sparse architecture for the model.
They reimplement three techniques and do a massive analysis of the different techniques at different sparsity levels.
Main takeaway seems to be that magnitude pruning is the winner. Also, there is a lot of “tuning hyperparameters” in some of the other techniques.