Holt.Blue
Back To Class Notes Menu

When The Tried and True Fails: Non-Parametric Statistics

Worksheet


Question: What do we do if our data doesn't satisfy the guidelines of the statistical test we want to use? (i.e., You can't trust the conclusions.)











Answer: We use non-parametric procedures!











Fact: Non-parametric procedures do not assume that your data come from a normal distribution.

Pros: Non-parametric procedures work well even if we have lower sample sizes, outliers, or very non-normal data. So even if you have a small data set which is skewed and has outliers, you can still trust the results!

Cons: Non-parametric procedures aren't as powerful as parametric methods. That is, when effects are present, they are harder to detect using non-parametric methods.











Example: Does the presence of small numbers of weeds reduce the yield of corn? A researcher planted corn at the same rate in 8 small plots of ground, then weeded the corn rows by hand to allow no weeds in 4 randomly selected plots and exactly 3 weeds per meter of row in the other 4 plots. Here are the yields of corn (bushels per acre) in each of the plots:

0 weeds per meter: 166.7 172.2 165.0 176.9
3 weeds per meter: 158.6 176.4 153.1 156.0

Big Question: Is it safe to use the $t$ procedures? Let's take a look.











Procedure: Arrange ALL of the data from smallest to largest.

153.1 156.0 158.6 165.0 166.7 172.2 176.4 176.9


Do we notice anything?











Observe: the ranks of the 0 Weeds/Meter group are generally higher than the ranks of the 3 Weeds/Meter group.

153.1 156.0 158.6 165.0 166.7 172.2 176.4 176.9
1 2 3 4 5 6 7 8

The Big Deal: we will be using the ranks of the values rather than the values themselves to detect if there is any difference between treatments.











153.1 156.0 158.6 165.0 166.7 172.2 176.4 176.9
1 2 3 4 5 6 7 8

Sum of Ranks for 0 Weeds/Meter group: 4+5+6+8=23
Sum of Ranks for 3 Weeds/Meter group: 1+2+3+7=13

Fact: the sum of the ranks (choose one) is our test statistic which we will call $W$.

In this case we choose $W=23$.











The Wilcoxon Rank Sum Test: Draw an SRS of size $n_1$ from one population and draw an independent SRS of size $n_2$ from a second population. There are $N=n_1+n_2$ observations in all. The sum $W$ of the ranks for the first sample is the Wilcoxon rank sum statistic. If the two populations have the same continuous distribution, then $W$ has mean and standard deviation $$\mu_W=\frac{n_1(N+1)}{2},\,\,\,\,\,\,\,\,\,\sigma_W=\sqrt{\frac{n_1 n_2(N+1)}{12}}.$$ If $W$ is further from the mean than we would expect by chance alone, we reject $H_0.$











Fact: The Wilcoxon Rank Sum Test is a non-parametric alternative to the 2-sample $t$ test.











Annoying Question: How do we calculate the $p$-value from the test statistic $W$?











Nice Answer: Use software! :D

Let's run a Wilcoxon Rank Sum Test on our data.











Recall the Homework Question: Researchers claim that women speak significantly more words per day than men. One estimate is that a woman uses about 20000 words per day while a man uses about 7000.

Some data for men are:

28408 10084 15931 21688 37786
10575 12880 11071 17799 13182
8918 6495 8153 7015 4429
10054 3998 12639 10974 5255

Is it reasonable to use the $t$ procedures in this case? Let's re-examine the data.











Step 1: Compare each value to the median under the null hypothesis, in this case 7000 words:

21408 3084 8931 14688 30786
3575 5880 4071 10799 6182
1918 -505 1153 15 -2571
3054 -3002 5639 3974 -1745












Step 2: Arrange the absolute values in ascending order and assign ranks:
1 15
2 505
3 1153
4 1745
5 1918
6 2571
7 3002
8 3054
9 3084
10 3575
11 3974
12 4071
13 5639
14 5880
15 6182
16 8931
17 10799
18 14688
19 21408
20 30786












The sum of the positive ranks is:

1+3+5+8+9+10+11+12+13+14+15+16+17+18+19+20=191.

The sum of the negative ranks is:

2+4+6+7=19.

These are the Wilcoxon Signed Rank Sums. The positive sum is denoted $W^{+}=191$, and the negative sum is denoted $W^{-}=19$.

Traditionally we take $W^{+}$ as our test statistic.











The Wilcoxon Signed Rank Test Draw an SRS of size $n$ from a population and take the differences from the median $M_0$ under $H_0$. Rank the absolute values of these differences. The sum $W^{+}$ of the ranks for the positive differences is the Wilcoxon Signed Rank Statistic. Under the null hypothesis, $W^{+}$ has mean and standard deviation: $$\mu_{W^{+}}=\frac{n(n+1)}{4},\,\,\,\,\,\,\sigma_{W^{+}}=\sqrt{\frac{n(n+1)(2n+1)}{24}}.$$ If $W^{+}$ is further from the mean than we would expect by chance alone, we reject $H_0.$











Running the Test: again, we will use software to compute the $p$-value of the test statistic $W^{+}$.













Back to Our Question: Researchers claim that men speak significantly less words per day than women. One estimate is that a man uses about 7000. The data for the men:
28408 10084 15931 21688 37786
10575 12880 11071 17799 13182
8918 6495 8153 7015 4429
10054 3998 12639 10974 5255

Is there good evidence to suggest the true median is higher than 7000 words?

Let's run a Wilcoxon Signed Rank Test!











Another Fact: The Wilcoxon Signed Rank Test is a parametric alternative to the one-sample $t$ test.











One More Example: Recall the homework question:

Durable press treatment reduces wrinkling, but how much? "Wrinkle recovery angle" measures how well a fabric recovers from wrinkles. Higher is better. Here are data on wrinkle recover angle:

Untreated: 79 80 78 80 78
Permafresh 55: 136 135 132 137 134
Permafresh 48: 125 131 125 145 145
Hylite HF: 143 141 146 141 145

Who remembers why we can't trust the ANOVA $F$ test in this situation?











One More Question: Is there a non-parametric test which tests multiple samples?











One More Answer: Yes! It's called the Kruskal-Wallis Test.











Run a Kruskal-Wallis test on the wrinkle recovery data:

Untreated: 79 80 78 80 78
Permafresh 55: 136 135 132 137 134
Permafresh 48: 125 131 125 145 145
Hylite HF: 143 141 146 141 145












One More Fact: The Kruskal-Wallis Test is a non-parametric alternative to One-Way ANOVA.











The Fine Print: Conditions for Inference

The non-parametric procedures we've discussed so far may be used in the presence of outliers, skew, or anything else that might prevent us from using parametric techniques. However, there are still conditions for using them :

1) Each of our samples is an independent SRS (as always).

2) Your data come from a continuous distribution*.

Bonus: If we somehow know that the shapes of the distributions for multi-sample data are all the same, we can be more specific and say that the null and alternative hypotheses apply to the medians.













Small caveat: As long as you can continuously measure something, the units of measurement don't necessarily have to be decimals. For example, someone's opinion of a film can be any decimal number between 0 and 5, but is usually rounded to the nearest integer. We can still treat such data as continuous.

On the other hand, the number of cars a person owns is a discrete measurement and such data would not be advisable for the procedures discussed above.