Statistics for Data Science



Welcome to this course on STAT101 (a.k.a. statistical and data sciences via R)! In this course, we will learn the fundamentals of statistical inference and data analysis using the R programming language. Our textbook, “ModernDive”, will be our primary resource for learning the material.

The course is designed to be accessible to students with little to no prior experience in statistics or R. We will start by introducing you to the basics of R and data visualization using the ggplot2 package. We will then cover the fundamentals of data summarization, probability, distributions, and statistical inference.

Throughout the course, you will have the opportunity to work with real-world datasets and apply the techniques you learn to answer interesting questions. You will also have the chance to develop your data analysis skills using R and the tidyverse packages (dplyr, ggplot2, tidyr).

By the end of the course, you should have a solid understanding of the fundamental concepts and tools of statistical inference and data analysis. You should also be comfortable using R to manipulate data, create visualizations, and perform statistical tests.

I hope you will find this course engaging and rewarding, and that you will leave with a deeper understanding of the power and importance of statistical and data analysis in our modern world.


  • Week 1: Introduction to data and R (Chapter 1)

  • Week 2: Visualization with ggplot2 (Chapter 2)

  • Week 3: Summarizing data with dplyr (Chapter 3)

  • Week 4: Probability (Chapter 4)

  • Week 5: Distributions (Chapter 5)

  • Week 6: Foundations for statistical inference: sampling distributions and the central limit theorem (Chapter 6)

  • Week 7: Mid-term exam

  • Week 8: Introduction to inference (Chapter 7)

  • Week 9: Confidence intervals (Chapter 8)

  • Week 10: Hypothesis testing (Chapter 9)

  • Week 11: Inference for numerical data (Chapter 10)

  • Week 12: Inference for categorical data (Chapter 11)

  • Week 13: Simple linear regression (Chapter 12)

  • Week 14: Multiple linear regression (Chapter 13)

  • Week 15: Final exam

Textbooks for the course

“ModernDive: An Introduction to Statistical and Data Sciences via R” by Chester Ismay and Albert Y. Kim is a comprehensive textbook that provides an accessible and hands-on approach to learning the fundamental concepts of statistical inference and data analysis using the R programming language. The book covers a wide range of topics, including data visualization, data summarization, probability and distributions, statistical inference, hypothesis testing, and linear regression, and emphasizes the importance of data visualization, reproducibility, and statistical thinking. With its use of the tidyverse suite of packages, numerous exercises and examples, and guidance on how to use R and RStudio, “ModernDive” is a valuable resource for students and practitioners who want to develop their data analysis skills.