University of Michigan, Fall 2012

by Evan Hahn & Jacob Nestor

Feel free to send this around or to contribute. If you contribute, add your name above!

- Book chapters 1-2, 5.1-5.2, 5.5-5.6, 6, 7.1-7.4, 8.1-8.7, 9.1-9.4, 10.1-10.2, 10.4, 12.1-12.2, 12.4
- HW 1-3
- Lecture notes
- Lab slides

- Continuous: no gaps, basically, and they’re numbers. Things like temperature are continuous, but things like “integer years old” are not.
- Discrete: I like to think of them as buckets; like “how many apples in this bucket?”

- Categorical: discrete “buckets” that are categories.
- Quantitative: numbers. Can be discrete or continuous.

- (Arithmetic) mean: it’s the average. Add up all the numbers and divide by the number of things. There’s an ugly formula that just says the same thing. It’s not very resistant to outliers.

- Population mean: μ
- Sample mean: x̅

- Median: middle value. Resistant to outliers.

- Variance: arithmetic mean of (each element’s distance from the mean)²

- Population variance: σ²
- Sample variance: s²

- Standard deviation: square root of variance

- Standard deviation of population: σ
- Standard deviation of sample: s

- z-score: number of standard deviations away from the mean. (observed – mean) / (standard deviation)

- Box plot: allows you to easily see quartiles
- Histogram: Used for quantitative data, able to tell distribution from data
- Bar chart: Used for categorical data, order on x-axis doesn’t matter, cannot tell shape (ie distribution). Looks like a histogram but isn’t!
- Five-number summary: has minimum, lower quartile (Q1), median, upper quartile (Q3), and maximum
- Q-Q plot: good for telling whether a distribution is normal; do the points line up with a straight line?

- Negative/left skew: hump is on right of the graph
- Positive/right skew: hump is on the left of the graph

- Interquartile range (IQR) = Q3 – Q1

- Empirical rule for bell-shaped histograms

- 68% of values fall within one standard deviation in either direction
- 95% fall within two
- 99.7% fall within three

- Q-Q plots are good for determining whether something is a normal distribution

- P(X) = probability that X will happen, which can be 0, 1, or anything in between
- P(A and B) = TODO
- P(A|B) = TODO

- 95% confidence interval: 95% confident that the true parameter is inside the interval
- 95% confidence level: if we did it many times, we expect 95% of the resulting confidence intervals to contain the true parameter

- Alternative hypothesis: TODO
- Null hypothesis: TODO
- Type I error: false positive. For example, an innocent person goes to jail
- Type II error: miss. For example, a guilty person is freed

This study guide © 2012 Evan Hahn and contributors. This document is licensed under a Creative Commons Attribution 3.0 Unported License.

(I added this weird legal section because I want to put this on my website for other people to use. If you’re contributing to this document, know that I’ll put it on my website under this license.)