Data Sets


Acute Exercise, Prostate Cancer Cell Growth, and Immune System Cell Growth

Blood serum is a component of blood important to the immune system. In this study, blood serum was collected from 10 men, both before and immediately after exercise. Each of these serum samples, two from each man, were then exposed to two types of cells:
LNCaP, which is a type of prostate cancer cell.
NIH3T3, which is an immune system cell.
The growth of the two types of cells was recorded after 48 hours.


Dengue vaccine, early trial

Results from an early trial for a vaccine for Dengue fever. Patients were randomized into two groups. Patients in the treatment group received the vaccine, and those in the control group received a placebo. After 6 months, each patient was injected with a weaker virus of Dengue. Researchers subsequently took blood measurements, inspecting if any virus was present, and also reported if any rash developed.


How does the president's party perform in midterm elections?

Data on the performance of a president's party in the House of Representatives during the midterm elections. (A midterm election is the election year between presidential election years, e.g. 2014.) The data covers midterm elections since 1898.
A commonly held political belief is that when unemployment is high, the president's party generally performs poorly during midterm elections in Congress. This was the primary motivation for examining this data set.


Gun violence around the world

What is the relationship between gun ownership and gun violence around the world? Do countries whose populations own more guns also have more gun-related deaths?
This dataset contains gun ownership and mortality statistics by for 75 countries. Since it may be useful to compare countries only to other "similar" countries, it also includes a Human Development Index that combines data on life expectancy, education, and income into a single metric that is used to rank countries into four tiers of human development (low, medium, high, and very high).


Pokémon Go Evolutions

A key part of Pokémon Go is using evolutions to get stronger Pokémon, and a deeper understanding of evolutions is key to being the greatest Pokémon Go player of all time. This data set covers 75 Pokémon evolutions spread across four species. A wide set of variables are provided, allowing a deeper dive into what characteristics are important in predicting a Pokémon's final combat power (CP).
Example research questions: (1) What characteristics correspond to an evolved Pokémon with a high combat power? (2) How predictable is CP from an evolution?


Soap or saline: which is better for cleaning wounds?

An experiment where patients' who had open wounds were cleaned with either soap or saline.
For an informal overview of the experiment and findings, see this United Press International article.


Peanut allergy experiment in young children

In the late 1990s, it was believed that young children should exclude peanuts from their diets to reduce the chance of developing an allergy. In 2008, researchers were no longer so sure. This experiment evaluates whether peanut exposure is helpful or harmful. A peanut diet regimen (consume or avoid) was assigned to over 500 young children during years 2-5, ages during which children had previously been told not to eat peanuts. The key outcome was testing for a peanut allergy when each child turned 5.
A discussion of this study was featured on Healthcare Triage.


Gun violence in the United States

What is the relationship between gun ownership and gun violence in the US? This dataset contains gun ownership and mortality statistics by state.
A broader discussion of gun violence in the US can be found in this video and in this accompanying article.


Ames Residential Home Sales

All residential home sales in Ames, Iowa between 2006 and 2010. The data set contains many explanatory variables on the quality and quantity of physical attributes of residential homes in Iowa sold between 2006 and 2010. Most of the variables describe information a typical home buyer would like to know about a property (square footage, number of bedrooms and bathrooms, size of lot, etc.). A detailed discussion of variables can be found in the original paper referenced below.


Arbuthnot's Data on Births

Arbuthnot's data describes male and female births for London from 1629-1710.
John Arbuthnot (1710) used these time series data to carry out the first known significance test. During every one of the 81 years, there were more male christenings (births) than female christenings. As Arbuthnot wondered, we might also wonder if this could be due to chance, or whether it meant the birth ratio was not actually 1:1.


Atheism in the World

Survey results on atheism across several countries and years. Each row represents a single respondent.


Body Measurements

Body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender, are given for 507 physically active individuals - 247 men and 260 women. These data can be used to provide statistics students practice in the art of data analysis. Such analyses range from simple descriptive displays to more complicated multivariate analyses such as multiple regression and discriminant analysis.


US Counties

Data for 3083 counties in the United States, including variables for demographic, financial, education, and other characteristics. For a more complete set of counties and variables, see the "countyComplete" data set from the openintro package, or for a tab-delimed text file version, download OpenIntro's data set repository.


Behavioral Survey

The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey of 350,000 people in the United States collected by the Centers for Disease Control and Prevention (CDC). As its name implies, the BRFSS is designed to identify risk factors in the adult population and report emerging health trends. For example, respondents are asked about their diet and weekly physical activity, their HIV/AIDS status, possible tobacco use, and even their level of healthcare coverage. The BRFSS Web site contains a complete description of the survey, the questions that were asked and even research results that have been derived from the data.
This data set is a random sample of 20,000 people from the BRFSS survey conducted in 2000. While there are over 200 questions or variables in this dataset, this data set only includes 9 variables.


Teacher Evaluations

The data are gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin. In addition, six students rate the professors' physical appearance. The result is a data frame where each row contains a different course and each column has information on either the course or the professor.


Google Transparency Report

The data consist of the number of requests Google received for user account information as part of criminal investigations in the first half of 2011, the rate of compliance, and some other indicators on the countries.


Hot Hands

Data from the five games the Los Angeles Lakers played against the Orlando Magic in the 2009 NBA finals. Each row represents a shot Kobe Bryant took during these games. Kobe Bryant's performance against the Orlando Magic in the 2009 NBA finals earned him the title of Most Valuable Player and many spectators commented on how he appeared to show a hot hand.


Baseball Player Statistics

Data from all 30 Major League Baseball teams from the 2011 season. This data set is useful for examining the relationships between wins, runs scored in a season, and a number of other player statistics.


North Carolina Births

Data set on 1,000 randomly sampled births from the birth records released by the state of North Carolina in 2004. This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children.


Anti-Piracy Legislation

This data set contains observations on US Senators and Representatives related to their support of anti-piracy legislation that was introduced at the end of 2011.


US Birth Records

Number of male and female births, sex ratio at birth, and number of excess males: United States, 1940–2002.


Flights from New York City airports

On-time data for a random sample of flights departing New York City airports in 2013.