Holt.Blue
Back To Class Notes Menu

Producing Data: Sampling

Worksheet
Question 1: what is statistics?

Question 2: why do we do statistics?











Very Important Vocab:

The population is entire group of individuals (people, cars, animals, ball bearings, etc.) which we want information about.

A small part of the population we choose to gain information about the whole is called a sample.

Gaining information about the entire population from a sample is called inference.











Question: Why do we take samples instead of looking at the whole population?











Sampling Design: the methods we use to collect our sample should be easy to describe in writing so that they can be repeated. These methods are called our sampling design.

In order to create a sampling design we must
  1. define exactly what our population is
  2. say exactly what we want to measure












Example: A department store mails a customer satisfaction survey to people who make credit card purchases at the store. This month, 45,000 people made credit card purchases. Surveys are mailed to 1000 of these people and 137 people return the survey form.

(a) What is the population of interest for this survey?

(b) What is the sample? From what group is information actually obtained?













The BIG Question: How do we choose a sample which is representative of our population?









Answer: We don't choose. We let chance decide.











Simple Random Sampling: Give every individual in our population a numerical label and then choose randomly from these numbers.

Since simple random sampling is such a common method of sampling, we often call a "simple random sample" an "SRS."













Another Big Question: why random sampling?











Answer: to eliminate bias.











Very Important Vocab: A study is biased if it favors, or tends toward a certain outcome which may or may not truly reflect the population.











How To Sample Badly.

Bad Sample Design 1: Start interviewing people at a shopping mall.

Who is over or under represented in such a sample?

This brand of sampling is called convenience sampling.









Bad Sample Design 2: Create an online poll and let people respond.

Example: Former CNN evening commentator Lou Dobbs doesn't like illegal immigration. One of his broadcasts in 2007 was largely devoted to attacking a proposal by the governor of New York State to offer drivers’ licenses to illegal immigrants as a public safety measure. During the show, Mr. Dobbs invited his viewers to go to loudobbs.com to vote on the question “Would you be more or less likely to vote for a presidential candidate who supports giving drivers’ licenses to illegal aliens?” We aren't surprised that 97% of the 7350 people who voted by the end of the broadcast said, “Less likely.”

This kind of sample is called a voluntary response sample.











Choosing An SRS: Recall, to take a random sample of a population, we assign a label to each individual, and choose randomly.

Method 1: use a table of random digits. Your text has one: Table B.

Example: Starting from line 125 of Table B, draw a random sample of students in our class.

Assign a label to each student:


Line 125 in Table B reads:

96746 12149 37823 71868 18442 35119 62103 39244

Therefore, our random sample consists of , , , , and









Method 2: use a random number generator or a random number service such as random.org.













Another Sampling Design: Stratified Random Sampling.

When a population is spread out over large areas, or there are many groups and sub categories, we often assign labels to each region or group. We then
  1. randomly choose a sample of regions or groups
  2. take an SRS from each randomly chosen region or group
  3. combine each SRS into a single sample.
The regions, groups, or subcategories are generically referred to as strata.







Example: Lets take the tables in our room as a our strata and use random.org to take a random sample.



Other examples:
  1. Sampling small plots in a large forest: break up forest into larger strata.
  2. Break a large region up into smaller region where states and/or counties serve as strata.














Problems In Practice: Human subjects are the hardest to get good information about.

Example: Suppose we want to know the percentage of people will vote for Hillary or Trump.

How are we going to ask people the question: "Who do you plan to vote for this November?"

What problems could we run into when trying to get good information?











Undercoverage occurs when some groups in the population are left out of the process of choosing the sample.

Nonresponse occurs when an individual chosen for the sample can’t be contacted or refuses to participate. Any honest poller will tell you their rate of nonresponse. It is a red flag if they don't.

The behavior of the respondent or of the interviewer can cause response bias in sample results.

The wording of questions is the most important influence on the answers given to a sample survey. Confusing or leading questions can introduce strong bias, and changes in wording can greatly change a survey’s outcome.











Example: Wording of Questions

Two differently worded questions, which essentially ask the same thing, can give you vastly different results. For example, in a survey about illegal immigration in the U.S., we have the following results.

“Should illegal immigrants be prosecuted and deported for being in the U.S. illegally, or shouldn’t they?” In this opinion poll, 69% favored deportation.

The very same sample was asked whether illegal immigrants who have worked in the United States for two years “should be given a chance to keep their jobs and eventually apply for legal status,” 62% said that they should.











A common method for interviewing people in a large population is random digit dialing.

Question: What effect do certain technologies such as cell phones and caller ID have on undercoverage and non-response?

Can web-based technologies help remedy some of these issues?











Very, Very, Very Important Vocab: Inference

Question: Why do we do sampling?