About the Course


Course description

Welcome to Cultural Industry & Data Analytics course (a.k.a. Data Science 101), designed to equip you with the essential skills to analyze, visualize, and communicate data effectively. Over the course of 15 weeks, you will delve into the fundamentals of data science, master the power of R programming, and learn how to create interactive visualizations and websites to showcase your findings.

Throughout the course, you will learn how to import, manipulate, and explore data using R and the tidyverse. You will gain hands-on experience with data cleaning, transformation, and aggregation techniques. Additionally, you’ll dive deep into data visualization with ggplot2 and learn how to create advanced, interactive plots using Shiny and plotly.

By the end of the course, you will have completed a data science project that demonstrates your ability to analyze, visualize, and communicate complex data insights. You will also learn the importance of collaboration, version control, and reproducible research in data science projects. With a solid understanding of the concepts and tools covered, you will be well-prepared to apply your skills in various real-world applications.


Weekly Design

Week Date Pre-class Class PBL Note
1 03/04/2025 Course intro
2 03/11/2025 Variable & Vector Basic Syntax (1)
3 03/18/2025 Array Basic Syntax (2)
4 03/25/2025 Data.frame & List Basic Syntax (3) Problem description
5 04/01/2025 Data import, export, filter Data Manipulation Data introduction
6 04/08/2025 Repetition, Function Data Exploration (1)
(Recorded Lecture)
Team arrangement
7 04/15/2025 Missing values, Outliers Data Exploration (2) Team meeting #1
8 04/22/2025 Data viz intro Data Visualization (1) Team meeting #2
9 04/29/2025 Data viz Practice Data Visualization (2) Team meeting #3
10 05/06/2025 <No class> P.Hday
11 05/13/2025 QZ
12 05/20/2025 Shiny Intro Interactive Web: Shiny Team meeting #4
13 05/27/2025 Git, GitHub Basic Version Control and Collaboration Team meeting #5
14 06/03/2025 Quarto Intro Reproducible Research Team meeting #6
15 06/10/2025 <Team meetings> <Team meetings>
16 06/17/2025 Wrap-up Project Presentation Youtube link submission


Syllabus

Week 1: Introduction to Data Science and R

  • Course Orientation

  • Introduction to R and RStudio

  • What is Data Science?

Week 2: Basic Syntax (1)

  • R syntax and basic operations

  • Data types and structures in R

Week 3: Basic Syntax (2)

  • Data types and structures in R: Array

Week 4: Basic Syntax (3)

  • Data types and structures in R: Data.frame & List

Week 5: Data Manipulation

  • Data import & export

  • Data filtering

Week 6: Data Exploration (1): Recorded lecture

  • Repetition

  • Function

  • Introduction to tidyverse

  • Data cleaning with dplyr and tidyr

  • Data filtering and aggregation

  • Data transformation with dplyr

Week 7: Data Exploration (2)

  • Missing values & Outliers
  • Descriptive statistics
  • Grouping and summarizing data
  • Joining datasets
  • Exploratory data analysis (EDA)

Week 8: Data Visualization (1)

  • Introduction to ggplot2 for data visualization

  • Grammar of graphics with ggplot2

  • Customizing plots with themes and scales

  • Adding labels, titles, and legends

  • Creating different types of plots (scatter plots, bar plots, etc.)

Week 9: Data Visualization (2)

  • Advanced ggplot2 techniques

  • Visualizing distributions and relationships

  • Faceting and multi-panel plots

  • Plotting time series data

  • Interactive plots with plotly or ggplotly


Week 10: Officially No Class (Public Holiday)


Week 11: Mid-term QZ


Week 12: Interactive Web: Shiny

  • What is Shiny?

  • Creating Shiny apps with R

  • Adding interactivity to data visualizations

Week 13: Version Control and Collaboration

  • Introduction to Git and GitHub

  • Collaborating with others using version control

  • Best practices for organizing and documenting data science projects

  • Working with AI (feat. ChatGPT)

Week 14: Reproducible Research

  • Introduction to Quarto

  • Creating a Quarto website with R Markdown

  • Customizing the website layout and design

  • Publishing and sharing your Quarto website


Week 15: Project Consultation

Week 16: Project Presentation


Course management


  • Lecturer: Changjun Lee (Associate Professor in SKKU School of Convergence)

    • changjunlee@skku.edu
  • TA: Ye Seo Lim (Master Student, SKKU Immersive Media Engineering)

    • ivisy6952@g.skku.edu
  • Time:

    • (1h): Flipped learning content

    • (2h): Tue 09:00 ~ 10:50

  • Location: International Hall High-Tech e+ Lecture Room (9B312)


Class consists of Pre-class, Class, and PBL project

  • Pre-class

    • Students are required to watch the recorded lecture (or other assigned videos) about the concept of data science and the programming language before the offline class as part of their self-study.

    • To assess their understanding, students may occasionally be required to submit discussion responses.

  • Class

    • The lecturer will summarize the pre-class lecture and provide additional explanations.
      Students will be asked questions about the pre-class content to assess their self-study efforts.

    • It is acceptable to answer incorrectly, but failure to respond at all will affect their pre-class discussion score.

    • Students will also practice advanced coding techniques during the class.

  • PBL project

    • Students organize teams that meet several conditions.

      • 4~5 members in a team

      • Background diversity: no homogeneous majors in a team

      • Exception: Allowed if persuasion is possible for sufficient reasons

    • Data will be given. Teams are going to choose the data they want to explore considering their interest

    • Teams can offer a zoom meeting with lecturer if they need


Final outputs (An example not limited)

  • Data Preparing (or Collecting)

  • Explore data (Descriptive stats)

  • Set your hypothesis (or research questions)

  • Visualize data to confirm your hypo or RQs

  • Explain your findings

  • Expanding your findings to implications


Textbooks for the course

  • R4DS: R for Data Science (written by Hadley Wickham and Garrett Grolemund)
    • is an excellent resource for learning data science using R, covering data manipulation, visualization, and modeling with R. The book is available as a free online resource.
  • RC2E: R Cookbook (written by JD Long and Paul Teetor)
    • is a comprehensive resource for data scientists, statisticians, and programmers who want to explore the capabilities of R programming for data analysis and visualization.
  • RGC: R Graphic Cookbook (written by Winston Chang)
    • is a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems
  • MDR: Statistical Inference via Data Science (Modern Dive) (written by Chester Ismay and Albert Y. Kim)
    • is a comprehensive textbook that provides an accessible and hands-on approach to learning the fundamental concepts of statistical inference and data analysis using the R programming language.
  • ISR: Introductory Statistics with R (written by Peter Dalgaard)
    • is a great resource for learning basic statistics with a focus on R programming. This book covers a wide range of statistical concepts, from descriptive statistic


Grading Criteria

See Course intro in Week 1

  • Attendance & Participation (20%)
    Attendance is mandatory. Tardiness will result in a deduction of 0.5 points, and absences will result in a deduction of 1 point. Students who miss more than one-third of the classes will receive an F grade.
  • Quiz (40%)
    A Quiz will be conducted to assess students’ understanding of the material covered.

  • Project (40%)
    The project will evaluate students’ ability to apply concepts and skills learned in class. A portion of the project grade will be weighted based on peer evaluation, reflecting team collaboration and individual contributions.


Communication

  • Notices & Questions

    • Please join Kakao open-chat room

      • https://open.kakao.com/o/gQpZ3hbh

      • When you enter, please make sure to enter your name as it is on the attendance sheet. (입장하셔서 이름을 꼭 출석부에 있는 이름으로 설정해주세요.)

  • Personal counsel (Scholarship, recommendation letter, etc.)