Chapter 1: Introduction to Data

This set of labs teaches both conceptual ideas behind exploratory data analysis as well as the practical skills for working with data using computing software. The Golub Case Study and the Arenosa Case Study labs demonstrate how computing is essential for data analysis, in addition to illustrating the use of simple statistical ideas for exploring complex data.
Lab Notes


1. Introduction to Data

Introduces the basic commands for working with data, including those for manipulating dataframes, producing numerical and graphical summaries, and drawing pseudorandom samples. The questions in the last section focus on interpreting summaries and exploring relationships between variables.

Handout Template Solutions


2. DDS Case Study

Provides a walk-through for conducting an exploratory analysis of a dataset in which confounding may be present. Most of the commands required were introduced previously in Introduction to Data (Lab 1); this lab more heavily emphasizes interpretation.

Handout Template Solutions


3. Golub Case Study

Demonstrates the use of statistical first principles in conducting a basic analysis of a small microarray dataset. New commands for manipulating matrices are introduced, along with a first look at control structures.

Handout Template Solutions


4. Arenosa Case Study

Reinforces the functions introduced in the Golub Case Study (Lab 3) and data interpretation skills from Introduction to Data and the DDS Case Study (Labs 1 and 2), in the context of an RNA sequencing dataset.

Handout Template Solutions