- Stanford CS124: From Languages to Information
- Stanford CS224N: Natural Language Processing with Deep Learning
- Stanford CS224U: Natural Language Understanding
- Stanford CS324: Large Language Models
- John Hopkins CS 601.471/671: NLP: Self-supervised Models
- CMU CS 11-711- Advanced NLP
- CMU CS 11-737- Multilingual NLP
- CMU CS 11-747- Neural Networks for NLP
- CMU CS 11-731- Machine Translation and Sequence-to-sequence Models
- CMU CS 11-777- MultiModal Machine Learning
- CMU CS 11-877- Advanced Topics in MultiModal Machine Learning
- MIT 6.806-864- Natural Language Processing
- MIT 6.861- Quantitative Methods for NLP
- Princeton COS 484: Natural Language Processing.
- Princeton COS 584: Advanced Natural Language Processing.
- Princeton COS 598C: Deep Learning for Natural Language Processing
- Princeton COS 597F: Embodied Language Understanding
- Princeton COS 597G: Understanding Large Language Models
- UC Berkeley Natural Language Processing
- UMass Amherts CS 685 Advanced NLP
- Columbia COMS W4705: Natural Language Processing
- Columbia CS 4705: Introduction to Natural Language Processing
- UT CS388: Natural Language Processing
- UT CS378: Natural Language Processing(Undergrad)
- UT CS395T: Structured Models for NLP
- UvA CS4501: Machine Learning for NLP
- Unimelb COMP90042: Web Search and Text Analysis
- UMD CMSC470: Introduction to Natural Language Processing
- OSU CSE 5525: Speech and Language Processing
- UCSD CSE 256: Statistical Natural Language Processing
- ETH Advanced Formal Language Theory
- ETH Formal Language Theory and Neural Networks
- ETH Dependency Structures and Lexicalized Grammars
- ETH Natural Language Processing
- ETH Large Language Models
- ETH Generating Text from Language Models (Applied)
- Speech and Language Processing, Dan Jurafsky
- A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
- Neural Network Methods for Natural Language Processing, Yoav Goldberg
- Natural Language Processing, Jacob Eisentein
- Foundations of Statistical Natural Language Processing, Christhoper D. Manning (Old Fashioned)
- Introduction to Linear Algebra, Gilbert Strang
- Mathematical Statistics, Peter J. Bickel
- Elements of Information Theory, Thomas M. Cover
- Information Theory, Inference, and Learning Algorithms, David MacKay
- Probability, Random Variables and Stochastic Processes with Errata Sheet, A. Papoulis
- Algorithms for Optimization (Applied), Mykel J. Kochenderfer, Tim Wheeler
- Introduction for Optimization, Edwin K. P. Chong, Stainslaw H. Zak
- Pattern Recognition and Machine Learning, Christopher M. Bishop
- Deep Learning, Ian Goodfellow
- Dive into Deep Learning (Applied), Course Materials
- Understanding Deep Learning (Applied), Simon J. D. Prince
- The Principles of Deep Learning Theory, D. A. Roberts, S. Yaida
- Mathematical Intrıduction to Deep Learning: Methods, Implementations, and Theory, A. Jentzen, B. Kuckuck, P. von Wurstemberger
- Geometric Deep Learning Grids, Groups, Graphs, Geodesics, and Gauges, M. M. Bronstein, J. Bruna, T. Cohen, P. Velickovic