GRS MA752- Mathematical Foundations of Deep Learning

Instructor: Konstantinos Spiliopoulos

Office: 665 Commonwealth Ave, CCDS 438
Office Hours: Tuesday-Thursday 10am-11am
Email: kspiliop_at_math.bu.edu
to send me an email replace _at_ by @.
Meets: Spring 2024, Tuesday-Thursday 11:00-12:15 at MCS B37

Main source:

Course will be self-contained and, for now, largely based on notes and papers.

Recommended sources:

Lecture notes by Matus Telgarsky
I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, The MIT Press, Cambridge, MA, 2016

Syllabus (pdf)

Course Description: Deep learning is a relatively recent field and mathematical work on deep learning, which is the focus of this class, is even more recent. Key mathematical results will be presented in class and often will be assigned as readings. The material of the class will be based on notes that will be provided to the participants of the class.

This course is a rigorous introduction to the mathematical foundations of deep learning. In particular, the course will introduce the students to the theory of universal approximation, stochastic gradient type of optimizers and their convergence properties, statistical learning bounds, approximation theory, depth separation results, neural tangent kernel, mean field overparametrized regime, implicit regularization and noisy dynamics, different deep neural network architectures and their properties, reinforcement learning, Q-learning, several computational aspects such as back propagation, batch normalization and dropout, deep learning for dynamical systems and variational methods.

The course material will be based on theory, methods (both theoretical and computational) and examples from various scientific disciplines. The class will focus on supervised learning.

Course Prerequisites: Introduction to probability and/or stochastic processes (MA 581 or MA583 or equivalents), Differential Equations (MA226 or MA231 or equivalent), Linear algebra (MA242 or equivalent), basic statistical theory and basic programming skills. Some exposure to real analysis at the undergraduate level will also be useful. PDEs, graduate level probability and statistics will be helpful but not necessary. Students are expected to have the knowledge equivalent to undergraduate level: (a) probability and/or stochastic processes, (b) basic statistical theory, (c) linear algebra, (d) differential equations, and (e) basic coding skills (ideally in Python).