Statistics for Data Science II
An exploration of advanced probability and statistical inference to model and draw conclusions from complex data.
Statistics for Data Science II, was a deep dive into the practical engine of statistical inference. We went beyond basic probability and learned how to model the real world by working with multiple random variables, understanding their dependencies through covariance and correlation. My biggest takeaway was learning how to formally draw conclusions from data; I’m now comfortable with two major approaches for this — estimating unknown values using both frequentist (like MLE) and Bayesian methods, and then validating claims through rigorous hypothesis testing.
Instructor
Andrew Thangaraj, Professor , Electrical Engineering Department , IIT Madras
Course Schedule & Topics
The course is structured over 12 weeks, with two weeks dedicated to review and consolidation.
Week | Primary Focus | Key Topics Covered |
---|---|---|
1 | Multiple Random Variables | Two random variables, joint distributions, marginal and conditional distributions. |
2 | Functions of Random Variables | Independence of random variables, functions of one and multiple random variables, visualization techniques. |
3 | Expectation and Key Properties | Expected Value ($E[X]$), Variance ($\text{Var}(X)$), Standard Deviation ($\sigma$), Covariance ($\text{Cov}(X,Y)$), Correlation ($\rho$), and key statistical inequalities. |
4 | Continuous Random Variables | Differences between discrete and continuous variables, Cumulative Distribution Functions (CDF), Probability Density Functions (PDF), analysis of real-world data, Colab illustrations. |
5 | Advanced Topics & Modeling | Jointly continuous variables (e.g., height & weight), averages of random variables, Limit Theorems (incl. Central Limit Theorem), Jointly Gaussian variables, and building probability models from data (e.g., IPL Powerplay analysis). |
6 | Refresher Week | Mid-course review of all topics covered from Week 1 to Week 5. |
7 | Estimation and Inference I | Introduction to statistical inference, point estimation, Maximum Likelihood Estimation (MLE). |
8 | Estimation and Inference II | Interval estimation, confidence intervals, properties of estimators (bias, consistency). |
9 | Bayesian Estimation | Introduction to the Bayesian paradigm, prior and posterior distributions, Bayesian inference for common models. |
10 | Hypothesis Testing I | The framework of hypothesis testing, null ($H_0$) and alternative ($H_1$) hypotheses, Type I & Type II errors, p-values, and tests for population means. |
11 | Hypothesis Testing II | Two-sample tests, chi-squared tests for independence and goodness-of-fit, practical applications. |
12 | Revision Week | Comprehensive course review and preparation for the final examination. |
Material used
-
Probability and Statistics with Examples using R
by Siva Athreya, Deepayan Sarkar, and Steve Tanner
Bonus Marks Activities
Activity | Link |
---|---|
Extra Activity 1 | Click Here |
Extra Activity 2 | Click Here |
Extra Activity 3 | Yet to do |
Extra Activity 4 | Yet to do |
Extra Activity 5 | Yet to do |