Machine learning

[Google Scholar]

Notes:
Papers:

Mixture of experts

Mosaic: An interoperable compiler for tensor algebra [bansal:pldi:2023]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [fedus:arxiv:2021]
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer [shazeer:arxiv:2017]

The opinions expressed are my own views and not my employer’s.