Non-Parametric Statistics

Question: What do we do if our data doesn't satisfy the guidelines of the statistical test we want to use? (i.e., You can't trust the conclusions.)

Answer: We use non-parametric procedures!

Fact: Non-parametric procedures do not assume that your data come from a normal distribution.

Pros: Non-parametric procedures work well even if we have lower sample sizes, outliers, or very non-normal data. So even if you have a small data set which is skewed and has outliers, you can still trust the results!

Cons: Non-parametric procedures aren't as powerful as parametric methods. That is, when effects are present, they are harder to detect using non-parametric methods.

Example: Does the presence of small numbers of weeds reduce the yield of corn? A researcher planted corn at the same rate in 8 small plots of ground, then weeded the corn rows by hand to allow no weeds in 4 randomly selected plots and exactly 3 weeds per meter of row in the other 4 plots. Here are the yields of corn (bushels per acre) in each of the plots:

0 weeds per meter:	166.7	172.2	165.0	176.9
3 weeds per meter:	158.6	176.4	153.1	156.0

Big Question: Is it safe to use the $t$ procedures? Let's take a look.

Procedure: Arrange ALL of the data from smallest to largest.

153.1 156.0 158.6 165.0 166.7 172.2 176.4 176.9

Do we notice anything?

Observe: the ranks of the 0 Weeds/Meter group are generally higher than the ranks of the 3 Weeds/Meter group.

153.1	156.0	158.6	165.0	166.7	172.2	176.4	176.9
1	2	3	4	5	6	7	8

The Big Deal: we will be using the ranks of the values rather than the values themselves to detect if there is any difference between treatments.

153.1	156.0	158.6	165.0	166.7	172.2	176.4	176.9
1	2	3	4	5	6	7	8

Sum of Ranks for 0 Weeds/Meter group:	4+5+6+8=23
Sum of Ranks for 3 Weeds/Meter group:	1+2+3+7=13

Fact: the sum of the ranks (choose one) is our test statistic which we will call $W$.

In this case we choose $W=23$.

The Wilcoxon Rank Sum Test: Draw an SRS of size $n_1$ from one population and draw an independent SRS of size $n_2$ from a second population. There are $N=n_1+n_2$ observations in all. The sum $W$ of the ranks for the first sample is the Wilcoxon rank sum statistic. If the two populations have the same continuous distribution, then $W$ has mean and standard deviation $$\mu_W=\frac{n_1(N+1)}{2},\,\,\,\,\,\,\,\,\,\sigma_W=\sqrt{\frac{n_1 n_2(N+1)}{12}}.$$ If $W$ is further from the mean than we would expect by chance alone, we reject $H_0.$

Fact: The Wilcoxon Rank Sum Test is a non-parametric alternative to the 2-sample $t$ test.

Annoying Question: How do we calculate the $p$-value from the test statistic $W$?

Nice Answer: Use software! :D

Let's run a Wilcoxon Rank Sum Test on our data.

Recall the Homework Question: Researchers claim that women speak significantly more words per day than men. One estimate is that a woman uses about 20000 words per day while a man uses about 7000.

Some data for men are:

28408	10084	15931	21688	37786
10575	12880	11071	17799	13182
8918	6495	8153	7015	4429
10054	3998	12639	10974	5255

Is it reasonable to use the $t$ procedures in this case? Let's re-examine the data.

Step 1: Compare each value to the median under the null hypothesis, in this case 7000 words:

21408	3084	8931	14688	30786
3575	5880	4071	10799	6182
1918	-505	1153	15	-2571
3054	-3002	5639	3974	-1745

Step 2: Arrange the absolute values in ascending order and assign ranks:

1	15
2	505
3	1153
4	1745
5	1918
6	2571
7	3002
8	3054
9	3084
10	3575
11	3974
12	4071
13	5639
14	5880
15	6182
16	8931
17	10799
18	14688
19	21408
20	30786

The sum of the positive ranks is:

1+3+5+8+9+10+11+12+13+14+15+16+17+18+19+20=191.

The sum of the negative ranks is:

2+4+6+7=19.

These are the Wilcoxon Signed Rank Sums. The positive sum is denoted $W^{+}=191$, and the negative sum is denoted $W^{-}=19$.

Traditionally we take $W^{+}$ as our test statistic.

The Wilcoxon Signed Rank Test Draw an SRS of size $n$ from a population and take the differences from the median $M_0$ under $H_0$. Rank the absolute values of these differences. The sum $W^{+}$ of the ranks for the positive differences is the Wilcoxon Signed Rank Statistic. Under the null hypothesis, $W^{+}$ has mean and standard deviation: $$\mu_{W^{+}}=\frac{n(n+1)}{4},\,\,\,\,\,\,\sigma_{W^{+}}=\sqrt{\frac{n(n+1)(2n+1)}{24}}.$$ If $W^{+}$ is further from the mean than we would expect by chance alone, we reject $H_0.$

Running the Test: again, we will use software to compute the $p$-value of the test statistic $W^{+}$.

Back to Our Question: Researchers claim that men speak significantly less words per day than women. One estimate is that a man uses about 7000. The data for the men:

28408	10084	15931	21688	37786
10575	12880	11071	17799	13182
8918	6495	8153	7015	4429
10054	3998	12639	10974	5255

Is there good evidence to suggest the true median is higher than 7000 words?

Let's run a Wilcoxon Signed Rank Test!

Another Fact: The Wilcoxon Signed Rank Test is a parametric alternative to the one-sample $t$ test.

One More Example: Recall the homework question:

Durable press treatment reduces wrinkling, but how much? "Wrinkle recovery angle" measures how well a fabric recovers from wrinkles. Higher is better. Here are data on wrinkle recover angle:

Untreated:	79 80 78 80 78
Permafresh 55:	136 135 132 137 134
Permafresh 48:	125 131 125 145 145
Hylite HF:	143 141 146 141 145

Who remembers why we can't trust the ANOVA $F$ test in this situation?

One More Question: Is there a non-parametric test which tests multiple samples?

One More Answer: Yes! It's called the Kruskal-Wallis Test.

Run a Kruskal-Wallis test on the wrinkle recovery data:

Untreated:	79 80 78 80 78
Permafresh 55:	136 135 132 137 134
Permafresh 48:	125 131 125 145 145
Hylite HF:	143 141 146 141 145

One More Fact: The Kruskal-Wallis Test is a non-parametric alternative to One-Way ANOVA.

The Fine Print: Conditions for Inference

The non-parametric procedures we've discussed so far may be used in the presence of outliers, skew, or anything else that might prevent us from using parametric techniques. However, there are still conditions for using them :

1) Each of our samples is an independent SRS (as always).

2) Your data come from a continuous distribution^*.

Bonus: If we somehow know that the shapes of the distributions for multi-sample data are all the same, we can be more specific and say that the null and alternative hypotheses apply to the medians.

Small caveat: As long as you can continuously measure something, the units of measurement don't necessarily have to be decimals. For example, someone's opinion of a film can be any decimal number between 0 and 5, but is usually rounded to the nearest integer. We can still treat such data as continuous.

On the other hand, the number of cars a person owns is a discrete measurement and such data would not be advisable for the procedures discussed above.

When The Tried and True Fails: Non-Parametric Statistics