Algorithms for Data Science
A mathematical deep dive into randomized algorithms, dimensionality reduction, statistical learning theory, and differential privacy.
This course explores the theoretical foundations of modern data science. We begin with concentration inequalities and the Johnson-Lindenstrauss Lemma for dimensionality reduction. The curriculum then covers Locality Sensitive Hashing (LSH) for high-dimensional search, randomized SVD, and Spectral Clustering. The second half of the course transitions into Statistical Learning Theory (VC Dimension, ERM, SRM) and concludes with a rigorous introduction to Data Privacy and Differential Privacy mechanisms.
Instructor
Prof. Arun Rajkumar, Department of Computer Science and Engineering, IIT Madras
Course Schedule & Topics
The course is structured to provide a balance between algorithmic techniques for scale and the statistical theory of learning.
| Week | Primary Focus | Key Topics Covered |
|---|---|---|
| 1 | Concentration & JL Lemma | Markov, Chebyshev, and Hoeffding inequalities; Introduction to Johnson-Lindenstrauss (JL) Lemma and Isometry. |
| 2 | Dimensionality Reduction | JL Lemma Final Bound and its Linear Algebra interpretation. |
| 3 | Nearest Neighbors & LSH | Approximate Nearest Neighbors (ANN) and Locality Sensitive Hashing (LSH) using AND/OR constructions. |
| 4 | Randomized Linear Algebra | Recap of SVD, transition from Naive to Fast SVD, and Randomized SVD algorithms. |
| 5 | Spectral Clustering I | Graph Cuts, MinCut theory, and the foundations of Spectral Clustering. |
| 6 | Spectral Clustering II | The Spectral Clustering Algorithm and Scaling via Spectral Sparsification. |
| 7 | Learning Theory Foundations | Underlying Distributions, Test Error, Bayes Classifiers, and Error Decomposition. |
| 8 | Generalization | Generalization Error and Sample Complexity analysis. |
| 9 | VC Dimension | Infinite Hypothesis Classes, VC Dimension for Rectangles and Linear Hypotheses. |
| 10 | Structural Risk | Estimation Error (ERM), Approximation Error (SRM), and Sample Compression. |
| 11 | Data Privacy | Introduction to Data Privacy and the formal definitions of Differential Privacy. |
| 12 | Privacy Mechanisms | Approximate Differential Privacy and the Exponential Mechanism. |
Material Used
-
Understanding Machine Learning: From Theory to Algorithmsby Shai Ben-David and Shai Shalev-Shwartz