https://arxiv.org/abs/1409.3215

https://arxiv.org/abs/1409.0473

https://arxiv.org/abs/1706.03762

https://arxiv.org/abs/1801.06146

https://arxiv.org/abs/1810.04805

language_understanding_paper.pdf

Language Models are Unsupervised Multitask Learners

https://arxiv.org/abs/2005.14165

https://arxiv.org/abs/2305.10435

https://arxiv.org/pdf/2503.05788