Holt.Blue
Back To Class Notes Menu

Scatterplots and Correlation

Worksheet


Question: Do Florida boat registrations kill manatees?









Example: The table below lists the number of Florida boat registrations and manatee deaths from 1977 to 2009.









Example: Below is a scatterplot of manatee deaths versus the number of boat registrations in Florida. Each point represents a particular year.











In general, when graphing a relationship between two variables, the variable on the horizontal (or $x$) axis is called the explanatory variable, and the variable on the vertical (or $y$) axis is called the response variable.









Looking at the scatterplot, do you think manatee deaths in a given year are explained by the number of boat registrations?

Which is the explanatory variable? Which is the response variable?









When we compare relationships between two variables, (i.e., make a scatterplot), we should look for an overall pattern stating the
  1. direction
  2. form
  3. strength of the relationship
  4. outliers or other deviations from the overall pattern.
Scatterplots can take on many forms including linear, curved, and clustered forms.

What do we see when we examine the manatee graph? What is the strength of the relationship?







Correlation is a way of measuring the strength and direction of a linear relationship.

Correlation is very often denoted by $r$.



If the correlation $r$ is positive, the association is positive.

If the correlation $r$ is negative, the association is negative.



The closer the correlation $r$ is to zero, the weaker the relationship.









How do we calculate correlation? $$r=\frac{1}{n-1}\left[ \left(\frac{x_1-\overline{x}}{s_x}\right) \left(\frac{y_1-\overline{y}}{s_y}\right)+\left(\frac{x_2-\overline{x}}{s_x}\right) \left(\frac{y_2-\overline{y}}{s_y}\right) +\cdots+\left(\frac{x_n-\overline{x}}{s_x}\right) \left(\frac{y_n-\overline{y}}{s_y}\right) \right]$$ The more compact notation uses sigma notation: $$r=\frac{1}{n-1}\sum_{j=1}^{n}\left(\frac{x_j-\overline{x}}{s_x}\right) \left(\frac{y_j-\overline{y}}{s_y}\right)$$

Example: Suppose we collect the data set: $(1,1)$,$(2,4)$, and $(3,4)$. What is the correlation?









Savvy Citizen Fact #3: Correlation DOES NOT Imply Causation.













Example: Consider the relationship between lemon imports from Mexico and traffic deaths in the United States.