Deep Learning | Utkarsh Sahu

This course provides a deep dive into the historical evolution and modern application of neural networks. We begin with the mathematical foundations of the McCulloch Pitts Neuron and Perceptrons, progressing into the representation power of Multi-Layer Perceptrons and the mechanics of Backpropagation. The curriculum covers a wide array of optimization strategies, including modern variants like Adam and NAdam, alongside regularization techniques such as Dropout and Batch Normalization. Students will gain expertise in state-of-the-art architectures, from Convolutional Neural Networks (ResNet, Inception) to sequence-based models like LSTMs and the transformative Attention mechanism.

Instructor

Prof. Mitesh M. Khapra, Department of Computer Science and Engineering, IIT Madras

Course Schedule & Topics

The course is structured over 12 weeks, transitioning from foundational theory to complex deep learning architectures.

Week	Primary Focus	Key Topics Covered
1	Foundations	History of DL, McCulloch Pitts Neuron, and Perceptron Learning Algorithm.
2	MLPs & Gradient Descent	Multilayer Perceptrons, Sigmoid Neurons, and the basics of Gradient Descent.
3	Feedforward Networks	Representation Power and the Backpropagation algorithm.
4	Optimization Algorithms	Momentum, Nesterov, Adagrad, RMSProp, Adam, and Learning Rate Schedulers.
5	Unsupervised Learning	Autoencoders, PCA relations, Denoising, and Contractive Autoencoders.
6	Regularization & Bias	Bias-Variance Tradeoff, L2 Regularization, Data Augmentation, and Dropout.
7	Training Improvements	Activation functions (ReLU, etc.), Weight Initialization, and Batch Normalization.
8	Convolutional Networks (CNN)	Word Vectorial Representations, LeNet, AlexNet, VGGNet, and ResNet architectures.
9	CNN Visualization	Guided Backpropagation, Deep Dream, Deep Art, and Adversarial Attacks (Fooling CNNs).
10	Recurrent Neural Networks (RNN)	BPTT, Vanishing and Exploding Gradients, and Truncated BPTT.
11	Gated RNNs (LSTM/GRU)	Gated Recurrent Units, LSTM Cells, and overcoming the vanishing gradient problem.
12	Advanced Sequence Models	Encoder-Decoder Models, Attention Mechanisms, and an introduction to Transformers.

Material used

Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (MIT Press).