Algorithms for Data Science

A mathematical deep dive into randomized algorithms, dimensionality reduction, statistical learning theory, and differential privacy.

This course explores the theoretical foundations of modern data science. We begin with concentration inequalities and the Johnson-Lindenstrauss Lemma for dimensionality reduction. The curriculum then covers Locality Sensitive Hashing (LSH) for high-dimensional search, randomized SVD, and Spectral Clustering. The second half of the course transitions into Statistical Learning Theory (VC Dimension, ERM, SRM) and concludes with a rigorous introduction to Data Privacy and Differential Privacy mechanisms.


Instructor

Prof. Arun Rajkumar, Department of Computer Science and Engineering, IIT Madras


Course Schedule & Topics

The course is structured to provide a balance between algorithmic techniques for scale and the statistical theory of learning.

Week Primary Focus Key Topics Covered
1 Concentration & JL Lemma Markov, Chebyshev, and Hoeffding inequalities; Introduction to Johnson-Lindenstrauss (JL) Lemma and Isometry.
2 Dimensionality Reduction JL Lemma Final Bound and its Linear Algebra interpretation.
3 Nearest Neighbors & LSH Approximate Nearest Neighbors (ANN) and Locality Sensitive Hashing (LSH) using AND/OR constructions.
4 Randomized Linear Algebra Recap of SVD, transition from Naive to Fast SVD, and Randomized SVD algorithms.
5 Spectral Clustering I Graph Cuts, MinCut theory, and the foundations of Spectral Clustering.
6 Spectral Clustering II The Spectral Clustering Algorithm and Scaling via Spectral Sparsification.
7 Learning Theory Foundations Underlying Distributions, Test Error, Bayes Classifiers, and Error Decomposition.
8 Generalization Generalization Error and Sample Complexity analysis.
9 VC Dimension Infinite Hypothesis Classes, VC Dimension for Rectangles and Linear Hypotheses.
10 Structural Risk Estimation Error (ERM), Approximation Error (SRM), and Sample Compression.
11 Data Privacy Introduction to Data Privacy and the formal definitions of Differential Privacy.
12 Privacy Mechanisms Approximate Differential Privacy and the Exponential Mechanism.

Material Used