Deep Learning

A comprehensive exploration of neural networks, covering foundational architectures like MLPs and advanced models including CNNs, RNNs, and Transformers.

This course provides a deep dive into the historical evolution and modern application of neural networks. We begin with the mathematical foundations of the McCulloch Pitts Neuron and Perceptrons, progressing into the representation power of Multi-Layer Perceptrons and the mechanics of Backpropagation. The curriculum covers a wide array of optimization strategies, including modern variants like Adam and NAdam, alongside regularization techniques such as Dropout and Batch Normalization. Students will gain expertise in state-of-the-art architectures, from Convolutional Neural Networks (ResNet, Inception) to sequence-based models like LSTMs and the transformative Attention mechanism.


Instructor

Prof. Mitesh M. Khapra, Department of Computer Science and Engineering, IIT Madras


Course Schedule & Topics

The course is structured over 12 weeks, transitioning from foundational theory to complex deep learning architectures.

Week Primary Focus Key Topics Covered
1 Foundations History of DL, McCulloch Pitts Neuron, and Perceptron Learning Algorithm.
2 MLPs & Gradient Descent Multilayer Perceptrons, Sigmoid Neurons, and the basics of Gradient Descent.
3 Feedforward Networks Representation Power and the Backpropagation algorithm.
4 Optimization Algorithms Momentum, Nesterov, Adagrad, RMSProp, Adam, and Learning Rate Schedulers.
5 Unsupervised Learning Autoencoders, PCA relations, Denoising, and Contractive Autoencoders.
6 Regularization & Bias Bias-Variance Tradeoff, L2 Regularization, Data Augmentation, and Dropout.
7 Training Improvements Activation functions (ReLU, etc.), Weight Initialization, and Batch Normalization.
8 Convolutional Networks (CNN) Word Vectorial Representations, LeNet, AlexNet, VGGNet, and ResNet architectures.
9 CNN Visualization Guided Backpropagation, Deep Dream, Deep Art, and Adversarial Attacks (Fooling CNNs).
10 Recurrent Neural Networks (RNN) BPTT, Vanishing and Exploding Gradients, and Truncated BPTT.
11 Gated RNNs (LSTM/GRU) Gated Recurrent Units, LSTM Cells, and overcoming the vanishing gradient problem.
12 Advanced Sequence Models Encoder-Decoder Models, Attention Mechanisms, and an introduction to Transformers.

Material used

  • Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (MIT Press).