Data Science 101

데이터사이언스의 이해

a <- "Data"
b <- "Science"
paste0("Welcome to ", a," ",b," ",100+1)
[1] "Welcome to Data Science 101"

Introduction

Welcome to Data Science 101 course, designed to equip you with the essential skills to analyze, visualize, and communicate data effectively. Over the course of 15 weeks, you will delve into the fundamentals of data science, master the power of R programming, and learn how to create interactive visualizations and websites to showcase your findings.

Throughout the course, you will learn how to import, manipulate, and explore data using R and the tidyverse. You will gain hands-on experience with data cleaning, transformation, and aggregation techniques. Additionally, you’ll dive deep into data visualization with ggplot2 and learn how to create advanced, interactive plots using Shiny and plotly.

By the end of the course, you will have completed a data science project that demonstrates your ability to analyze, visualize, and communicate complex data insights. You will also learn the importance of collaboration, version control, and reproducible research in data science projects. With a solid understanding of the concepts and tools covered, you will be well-prepared to apply your skills in various real-world applications.


Syllabus

Week 1: Introduction to Data Science and R

  • What is Data Science?

  • Introduction to R and RStudio

  • R syntax and basic operations

  • Data types and structures in R

Week 2: Data Import and Export

  • Reading and writing data in R (CSV, Excel, JSON, etc.)

  • Data from APIs and web scraping

  • Handling missing data and errors

Week 3: Data Manipulation I

  • Introduction to tidyverse

  • Data cleaning with dplyr and tidyr

  • Data filtering and aggregation

Week 4: Data Manipulation II

  • Data transformation with dplyr

  • Grouping and summarizing data

  • Joining datasets

Week 5: Data Exploration

  • Descriptive statistics

  • Exploratory data analysis (EDA)

  • Introduction to ggplot2 for data visualization

Week 6: Data Visualization I

  • Grammar of graphics with ggplot2

  • Customizing plots with themes and scales

  • Adding labels, titles, and legends

Week 7: Data Visualization II

  • Advanced ggplot2 techniques

  • Creating different types of plots (scatter plots, bar plots, etc.)

  • Visualizing distributions and relationships

Week 8: Data Visualization III

  • Faceting and multi-panel plots

  • Plotting time series data

  • Interactive plots with plotly or ggplotly

Week 9: Mid-term QZ

Week 10: Introduction to Shiny

  • What is Shiny?

  • Creating Shiny apps with R

  • Adding interactivity to data visualizations

Week 11: Version Control and Collaboration

  • Introduction to Git and GitHub

  • Collaborating with others using version control

  • Best practices for organizing and documenting data science projects

  • Working with AI (feat. ChatGPT)

Week 12: Reproducible Research

  • Introduction to R Markdown

  • Creating reports and presentations with R Markdown

  • Embedding code, visualizations, and results in R Markdown documents

Week 13: Creating Websites with Quarto

  • Introduction to Quarto

  • Creating a Quarto website with R Markdown

  • Customizing the website layout and design

  • Publishing and sharing your Quarto website

Week 14: Team project consultation

Week 15: Project presentation



Textbooks for the course

  • Statistical Inference via Data Science (Modern Dive) (written by Chester Ismay and Albert Y. Kim)
    • is a comprehensive textbook that provides an accessible and hands-on approach to learning the fundamental concepts of statistical inference and data analysis using the R programming language.
  • R for Data Science (written by Hadley Wickham and Garrett Grolemund)
    • is an excellent resource for learning data science using R, covering data manipulation, visualization, and modeling with R. The book is available as a free online resource.
  • Introductory Statistics with R (written by Peter Dalgaard)
    • is a great resource for learning basic statistics with a focus on R programming. This book covers a wide range of statistical concepts, from descriptive statistics to hypothesis testing and linear regression, along with R code examples.
  • R Cookbook (written by JD Long and Paul Teetor)
    • is a comprehensive resource for data scientists, statisticians, and programmers who want to explore the capabilities of R programming for data analysis and visualization.
  • R Graphic Cookbook (written by Winston Chang)
    • is a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems