## Notes related to Neural network

Activation function, Attention function, Auto-encoder model, Convolutional neural network (CNN), Deep neural networks, Generative adversarial neural network (GAN), Long short-term memory (LSTM), Mixture of experts, PyTorch, Rectified linear unit (RELU), Recurrent neural network (RNN), Soft max, TensorFlow, Transformer machine learning model, XNNPACK

## Papers related to Neural network

- TensorFlow: Large-scale machine learning on heterogeneous distributed systems [abadi:arxiv:2016]
- Fast sparse ConvNets [elsen:arxiv:2019]
- Rigging the lottery: Making all tickets winners [evci:arxiv:2021]
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [fedus:arxiv:2021]
- The state of sparsity in deep neural networks [gale:arxiv:2019]
- Sparse GPU kernels for deep learning [gale:arxiv:2020]
- ExTensor: An accelerator for sparse tensor algebra [hedge:micro:2019]
- In-datacenter performance analysis of a tensor processing unit [jouppi:isca:2017]
- Motivation for and evaluation of the first tensor processing unit [jouppi:micro:2018]
- The tensor algebra compiler [kjolstad:oopsla:2017]
- GShard: Scaling giant models with conditional computation and automatic sharding [lepikhin:arxiv:2020]
- SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training [qin:hpca:2020]
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer [shazeer:arxiv:2017]
- Attention is all you need [vaswani:arxiv:2017]