About the Course


Course description

Welcome to Cultural Industry & Data Analytics course (a.k.a. Data Science 101), designed to equip you with the essential skills to analyze, visualize, and communicate data effectively. Over the course of 15 weeks, you will delve into the fundamentals of data science, master the power of R programming, and learn how to create interactive visualizations and websites to showcase your findings.

Throughout the course, you will learn how to import, manipulate, and explore data using R and the tidyverse. You will gain hands-on experience with data cleaning, transformation, and aggregation techniques. Additionally, you’ll dive deep into data visualization with ggplot2 and learn how to create advanced, interactive plots using Shiny and plotly.

By the end of the course, you will have completed a data science project that demonstrates your ability to analyze, visualize, and communicate complex data insights. You will also learn the importance of collaboration, version control, and reproducible research in data science projects. With a solid understanding of the concepts and tools covered, you will be well-prepared to apply your skills in various real-world applications.


Weekly Design

Week Date Pre-class Class PBL Note
1 03/06/2024 Course intro
2 03/13/2024 Variable & Vector Basic Syntax (1)
3 03/20/2024 Array Basic Syntax (2) Problem description
4 03/27/2024 Data.frame & List Basic Syntax (3) Data introduction
5 04/03/2024 Data import, export, filter Data Manipulation Team arrangement
6 04/10/2024 Repetition, Function Data Exploration (1)
(Recorded Lecture)
Public holiday (N.A. elections)
7 04/17/2024 Missing values, Outliers Data Exploration (2) Team meeting #1
8 04/24/2024 Data viz intro Data Visualization (1) Team meeting #2
9 05/01/2024 Data viz Practice Data Visualization (2) Team meeting #3
10 05/08/2024 QZ
11 05/15/2024 <No class>
Public Holiday
Public holiday (Buddha’s)
12 05/22/2024 Shiny Intro Interactive Web: Shiny Team meeting #4
13 05/29/2024 Git, GitHub Basic Version Control and Collaboration Team meeting #5
14 06/05/2024 Quarto Intro Reproducible Research Team meeting #6
15 06/12/2024 <Team meetings> <Team meetings>
16 06/19/2024 Project Presentation Project Presentation

Syllabus

Week 1: Introduction to Data Science and R

  • Course Orientation

  • Introduction to R and RStudio

  • What is Data Science?

Week 2: Basic Syntax (1)

  • R syntax and basic operations

  • Data types and structures in R

Week 3: Basic Syntax (2)

  • Data types and structures in R: Array

Week 4: Basic Syntax (3)

  • Data types and structures in R: Data.frame & List

Week 5: Data Manipulation

  • Data import & export

  • Data filtering

Week 6: Data Exploration (1): Recorded lecture

  • Repetition

  • Function

  • Introduction to tidyverse

  • Data cleaning with dplyr and tidyr

  • Data filtering and aggregation

  • Data transformation with dplyr

Week 7: Data Exploration (2)

  • Missing values & Outliers
  • Descriptive statistics

  • Grouping and summarizing data

  • Joining datasets

  • Exploratory data analysis (EDA)

Week 8: Data Visualization (1)

  • Introduction to ggplot2 for data visualization

  • Grammar of graphics with ggplot2

  • Customizing plots with themes and scales

  • Adding labels, titles, and legends

  • Creating different types of plots (scatter plots, bar plots, etc.)

Week 9: Data Visualization (2)

  • Advanced ggplot2 techniques

  • Visualizing distributions and relationships

  • Faceting and multi-panel plots

  • Plotting time series data

  • Interactive plots with plotly or ggplotly


Week 10: Mid-term QZ


Week 11: Officially No Class (Public Holiday)


Week 12: Interactive Web: Shiny

  • What is Shiny?

  • Creating Shiny apps with R

  • Adding interactivity to data visualizations

Week 13: Version Control and Collaboration

  • Introduction to Git and GitHub

  • Collaborating with others using version control

  • Best practices for organizing and documenting data science projects

  • Working with AI (feat. ChatGPT)

Week 14: Reproducible Research

  • Introduction to Quarto

  • Creating a Quarto website with R Markdown

  • Customizing the website layout and design

  • Publishing and sharing your Quarto website


Week 15: Project Consultation

Week 16: Project Presentation


Textbooks for the course

  • Statistical Inference via Data Science (Modern Dive) (written by Chester Ismay and Albert Y. Kim)
    • is a comprehensive textbook that provides an accessible and hands-on approach to learning the fundamental concepts of statistical inference and data analysis using the R programming language.
  • R for Data Science (written by Hadley Wickham and Garrett Grolemund)
    • is an excellent resource for learning data science using R, covering data manipulation, visualization, and modeling with R. The book is available as a free online resource.
  • Introductory Statistics with R (written by Peter Dalgaard)
    • is a great resource for learning basic statistics with a focus on R programming. This book covers a wide range of statistical concepts, from descriptive statistics to hypothesis testing and linear regression, along with R code examples.
  • R Cookbook (written by JD Long and Paul Teetor)
    • is a comprehensive resource for data scientists, statisticians, and programmers who want to explore the capabilities of R programming for data analysis and visualization.
  • R Graphic Cookbook (written by Winston Chang)
    • is a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems


Rules

  • No score for Attendance: F will be given if you fail to attend 2/3 of the course.