Stats 250 exam 1 study guide

University of Michigan, Fall 2012
by Evan Hahn & Jacob Nestor

Feel free to send this around or to contribute. If you contribute, add your name above!

# What to study

1. Book chapters 1-2, 5.1-5.2, 5.5-5.6, 6, 7.1-7.4, 8.1-8.7, 9.1-9.4, 10.1-10.2, 10.4, 12.1-12.2, 12.4
2. HW 1-3
3. Lecture notes
4. Lab slides

# Types of data

1. Continuous: no gaps, basically, and they’re numbers. Things like temperature are continuous, but things like “integer years old” are not.
2. Discrete: I like to think of them as buckets; like “how many apples in this bucket?”

1. Categorical: discrete “buckets” that are categories.
2. Quantitative: numbers. Can be discrete or continuous.

# Means and medians and more

1. (Arithmetic) mean: it’s the average. Add up all the numbers and divide by the number of things. There’s an ugly formula that just says the same thing. It’s not very resistant to outliers.
1. Population mean: μ
2. Sample mean: x̅
1. Median: middle value. Resistant to outliers.

1. Variance: arithmetic mean of (each element’s distance from the mean)²
1. Population variance: σ²
2. Sample variance: s²
1. Standard deviation: square root of variance
1. Standard deviation of population: σ
2. Standard deviation of sample: s
1. z-score: number of standard deviations away from the mean. (observed – mean) / (standard deviation)

# Plots and summaries

1. Box plot: allows you to easily see quartiles
2. Histogram: Used for quantitative data, able to tell distribution from data
3. Bar chart: Used for categorical data, order on x-axis doesn’t matter, cannot tell shape (ie distribution). Looks like a histogram but isn’t!
4. Five-number summary: has minimum, lower quartile (Q1), median, upper quartile (Q3), and maximum
5. Q-Q plot: good for telling whether a distribution is normal; do the points line up with a straight line?

1. Negative/left skew: hump is on right of the graph
2. Positive/right skew: hump is on the left of the graph

1. Interquartile range (IQR) = Q3 – Q1

# Properties of normal distributions

1. Empirical rule for bell-shaped histograms
1. 68% of values fall within one standard deviation in either direction
2. 95% fall within two
3. 99.7% fall within three
1. Q-Q plots are good for determining whether something is a normal distribution

# Probabilities and random variables

1. P(X) = probability that X will happen, which can be 0, 1, or anything in between
2. P(A and B) = TODO
3. P(A|B) = TODO

# Confidence intervals

1. 95% confidence interval: 95% confident that the true parameter is inside the interval
2. 95% confidence level: if we did it many times, we expect 95% of the resulting confidence intervals to contain the true parameter

# Hypothesis testing

1. Alternative hypothesis: TODO
2. Null hypothesis: TODO
3. Type I error: false positive. For example, an innocent person goes to jail
4. Type II error: miss. For example, a guilty person is freed