Notes related to Natural language processing (NLP)
Attention function, Bilingual evaluation understudy (BLEU), Transformer machine learning model
Papers related to Natural language processing (NLP)
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity [fedus:arxiv:2021]
- GShard: Scaling giant models with conditional computation and automatic sharding [lepikhin:arxiv:2020]
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer [shazeer:arxiv:2017]
- Attention is all you need [vaswani:arxiv:2017]