Question: What do we do if our data doesn't satisfy the guidelines of the statistical test we want to use? (i.e., You can't trust the conclusions.)
Answer: We use non-parametric procedures!
Fact: Non-parametric procedures do not assume that your data come from a normal distribution.
Pros: Non-parametric procedures work well even if we have lower sample sizes, outliers, or very non-normal data. So even if you have a small data set which is skewed and has outliers, you can still trust the results!
Cons: Non-parametric procedures aren't as powerful as parametric methods. That is, when effects are present, they are harder to detect using non-parametric methods.
Example: Does the presence of small numbers of weeds reduce the yield of corn? A researcher planted corn at the same rate in 8 small plots of ground, then weeded the corn rows by hand to allow no weeds in 4 randomly selected plots and exactly 3 weeds per meter of row in the other 4 plots. Here are the yields of corn (bushels per acre) in each of the plots:
0 weeds per meter: | 166.7 | 172.2 | 165.0 | 176.9 |
3 weeds per meter: | 158.6 | 176.4 | 153.1 | 156.0 |
Big Question: Is it safe to use the $t$ procedures? Let's take a look.
Procedure: Arrange ALL of the data from smallest to largest.
153.1
156.0
158.6
165.0
166.7
172.2
176.4
176.9
Do we notice anything?
Observe: the ranks of the 0 Weeds/Meter group are generally higher than the ranks of the 3 Weeds/Meter group.
153.1 | 156.0 | 158.6 | 165.0 | 166.7 | 172.2 | 176.4 | 176.9 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
The Big Deal: we will be using the ranks of the values rather than the values themselves to detect if there is any difference between treatments.
153.1 | 156.0 | 158.6 | 165.0 | 166.7 | 172.2 | 176.4 | 176.9 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Sum of Ranks for 0 Weeds/Meter group: | 4+5+6+8=23 |
Sum of Ranks for 3 Weeds/Meter group: | 1+2+3+7=13 |
Fact: the sum of the ranks (choose one) is our test statistic which we will call $W$.
In this case we choose $W=23$.
The Wilcoxon Rank Sum Test: Draw an SRS of size $n_1$ from one population and draw an independent SRS of size $n_2$ from a second population. There are $N=n_1+n_2$ observations in all. The sum $W$ of the ranks for the first sample is the Wilcoxon rank sum statistic. If the two populations have the same continuous distribution, then $W$ has mean and standard deviation $$\mu_W=\frac{n_1(N+1)}{2},\,\,\,\,\,\,\,\,\,\sigma_W=\sqrt{\frac{n_1 n_2(N+1)}{12}}.$$ If $W$ is further from the mean than we would expect by chance alone, we reject $H_0.$
Fact: The Wilcoxon Rank Sum Test is a non-parametric alternative to the 2-sample $t$ test.
Annoying Question: How do we calculate the $p$-value from the test statistic $W$?
Nice Answer: Use software! :D
Let's run a Wilcoxon Rank Sum Test on our data.
Recall the Homework Question: Researchers claim that women speak significantly more words per day than men. One estimate is that a woman uses about 20000 words per day while a man uses about 7000.
Some data for men are:
28408 | 10084 | 15931 | 21688 | 37786 |
10575 | 12880 | 11071 | 17799 | 13182 |
8918 | 6495 | 8153 | 7015 | 4429 |
10054 | 3998 | 12639 | 10974 | 5255 |
Is it reasonable to use the $t$ procedures in this case? Let's re-examine the data.
Step 1: Compare each value to the median under the null hypothesis, in this case 7000 words:
21408 | 3084 | 8931 | 14688 | 30786 |
3575 | 5880 | 4071 | 10799 | 6182 |
1918 | -505 | 1153 | 15 | -2571 |
3054 | -3002 | 5639 | 3974 | -1745 |
Step 2: Arrange the absolute values in ascending order and assign ranks:
1 | 15 |
2 | 505 |
3 | 1153 |
4 | 1745 |
5 | 1918 |
6 | 2571 |
7 | 3002 |
8 | 3054 |
9 | 3084 |
10 | 3575 |
11 | 3974 |
12 | 4071 |
13 | 5639 |
14 | 5880 |
15 | 6182 |
16 | 8931 |
17 | 10799 |
18 | 14688 |
19 | 21408 |
20 | 30786 |
The sum of the positive ranks is:
1+3+5+8+9+10+11+12+13+14+15+16+17+18+19+20=191.
The sum of the negative ranks is:
2+4+6+7=19.
These are the Wilcoxon Signed Rank Sums. The positive sum is denoted $W^{+}=191$, and the negative sum is denoted $W^{-}=19$.
Traditionally we take $W^{+}$ as our test statistic.
The Wilcoxon Signed Rank Test Draw an SRS of size $n$ from a population and take the differences from the median $M_0$ under $H_0$. Rank the absolute values of these differences. The sum $W^{+}$ of the ranks for the positive differences is the Wilcoxon Signed Rank Statistic. Under the null hypothesis, $W^{+}$ has mean and standard deviation: $$\mu_{W^{+}}=\frac{n(n+1)}{4},\,\,\,\,\,\,\sigma_{W^{+}}=\sqrt{\frac{n(n+1)(2n+1)}{24}}.$$ If $W^{+}$ is further from the mean than we would expect by chance alone, we reject $H_0.$
Running the Test: again, we will use software to compute the $p$-value of the test statistic $W^{+}$.
Back to Our Question: Researchers claim that men speak significantly less words per day than women. One estimate is that a man uses about 7000. The data for the men:
28408 | 10084 | 15931 | 21688 | 37786 |
10575 | 12880 | 11071 | 17799 | 13182 |
8918 | 6495 | 8153 | 7015 | 4429 |
10054 | 3998 | 12639 | 10974 | 5255 |
Is there good evidence to suggest the true median is higher than 7000 words?
Let's run a Wilcoxon Signed Rank Test!
Another Fact: The Wilcoxon Signed Rank Test is a parametric alternative to the one-sample $t$ test.
One More Example: Recall the homework question:
Durable press treatment reduces wrinkling, but how much? "Wrinkle recovery angle" measures how well a fabric recovers from wrinkles. Higher is better. Here are data on wrinkle recover angle:
Untreated: | 79 80 78 80 78 |
Permafresh 55: | 136 135 132 137 134 |
Permafresh 48: | 125 131 125 145 145 |
Hylite HF: | 143 141 146 141 145 |
Who remembers why we can't trust the ANOVA $F$ test in this situation?
One More Question: Is there a non-parametric test which tests multiple samples?
One More Answer: Yes! It's called the Kruskal-Wallis Test.
Run a Kruskal-Wallis test on the wrinkle recovery data:
Untreated: | 79 80 78 80 78 |
Permafresh 55: | 136 135 132 137 134 |
Permafresh 48: | 125 131 125 145 145 |
Hylite HF: | 143 141 146 141 145 |
One More Fact: The Kruskal-Wallis Test is a non-parametric alternative to One-Way ANOVA.
The Fine Print: Conditions for Inference
The non-parametric procedures we've discussed so far may be used in the presence of outliers, skew, or anything else that might prevent us from using parametric techniques. However, there are still conditions for using them :
1) Each of our samples is an independent SRS (as always).
2) Your data come from a continuous distribution*.
Bonus: If we somehow know that the shapes of the distributions for multi-sample data are all the same, we can be more specific and say that the null and alternative hypotheses apply to the medians.
Small caveat: As long as you can continuously measure something, the units of measurement don't necessarily have to be decimals. For example, someone's opinion of a film can be any decimal number between 0 and 5, but is usually rounded to the nearest integer. We can still treat such data as continuous.
On the other hand, the number of cars a person owns is a discrete measurement and such data would not be advisable for the procedures discussed above.