About our Live Votes and surveys
How 1,000 people can be more representative than 200,000
One week in the middle of the Clinton-Lewinsky scandal, more than 200,000 people took part in an MSNBC Live Vote that asked whether President Clinton should leave office. Seventy-three percent said yes. That same week, an NBC News-Wall Street Journal poll found that only 34 percent of about 2,000 people who were surveyed thought so. To explain the vast gap in the numbers in this and other similar cases, it is necessary to look at the difference in the two kinds of surveys.
POLLS
Journalists use polls to gauge what the public is thinking. The most statistically accurate picture is captured by using a randomly selected sample of individuals within the group that is being targeted, typically adult Americans.
Actually, more correctly, these polls use a randomly selected sample of individuals who are willing to be polled. While this seems to be a subtle point, the whole "science" of polling is based on the fact that there is no statistical difference in opinions between people who answer polls and people who don't. Now we have a causality problem here, you cannot make this determination because by definition you cannot collect the data, so this means that polling is fundamentally based on an assumption: those who answer polls are not statistically different from those who do. Since the ratio of people who answer the polls vs. those (like me) who just hang up the phone without so much as a "no thanks" is small, this assumption seems to have no "scientific" basis. The other assumption behind polling is that people answer honestly.
While a poll of 100 people will be more accurate than a poll of 10, studies have shown that accuracy begins to improve less at about 500 people and increases only a minor amount beyond 1,000 people.
OK, while this seems somewhat profound, all they are saying is that the "margin of error" (itself a statistical value) shrinks to the point of diminishing returns. It is still *possible* (just unlikely) that all 1,000 people will "get it wrong". Since most people found taking high school or college elementary statistics challenging, most people simply don't understand the basic mathematics behind what they are talking about.
So, in the case of that NBC-WSJ poll, only 2,005 adults were surveyed by the polling organizations of Peter D. Hart and Robert M. Teeter. The poll was conducted by telephone and had a margin of error of plus or minus 2.2 percentage points at the 95 percent confidence level. The confidence level means that if the same poll were conducted 100 times, each one randomly selecting the people polled, only five of the polls would be expected to yield results outside the margin of error.
Random selection of those polled is necessary to ensure a broad representation of the population at large. For example, a nationwide poll asking which NBA team is the best would likely yield a far different answer in Philadelphia than in Los Angeles. (And neither one would be a good sample of the population at large.)
Sure, if you preload the question, you can clearly influence the result. "Which NBA team is best?" will give different answers in the city of brotherly love vs. the city of angels because the question is ambiguous: "best" is not defined - "best" at what? If instead the question was "Which NBA team do you feel is most likely to win the championship?" you might find more alignment between the two cities. "Be careful what you ask for" is all the more true for polls.
In the NBC-WSJ survey, pollsters first randomly selected a number of geographic areas and then telephone numbers were generated in a way that allowed all numbers in those areas (both listed and unlisted) an equal chance to be called. Only one adult in each household was then selected to answer the poll.
OK, except everything they said was wrong. Telephone numbers were generated in a way that equalized the distribution of telephone numbers, but the distribution of telephone numbers across households is not even. Also, the distribution of numbers of people in each household is not even. Also they are not *really* sure that they were talking to an adult. Also, it is not clear how they (evenly and unbiased) selected only one adult.
While variation can occur depending on what questions are asked and how they are asked, similar questions tend to yield similar answers. One way to account for variation, however, is to ask the same question over a period of time.
Another way is to engage in a dialog with somebody about the true meaning of their opinion. This takes time and also requires subtleties not easily conveyed in a "yes/no" format. Asking if dubya should be impeached and getting the answer "it depends" is probably most accurate, but not necessarily informative in that binary yes/no black/white good/evil polarization.
ONLINE SURVEYS
In contrast, MSNBC's online surveys (Live Votes) may reflect the views of far more individuals, but they are not necessarily representative of the general population.
To begin with, the people who respond choose to do so — they are not randomly selected and asked to participate, but instead make the choice to read a story about a certain topic and then vote on a related question. There is thus no guarantee that the votes would reflect anything close to a statistical sample, even of MSNBC.com users: The participants in a Sports Live Vote and a Politics Live Vote may overlap, but each group is likely to be dominated by people with an interest in each particular area. In addition, while MSNBC.com’s Live Votes are designed to allow only one vote per user, someone who wants to vote more than once could simply use another computer or another Internet account.
According to Nielsen//NetRatings, nearly 75 percent or 204.3 million Americans had access to the Internet from home in early 2004. In contrast, more than 90 percent of Americans live in homes with a telephone.
This does not mean that Internet polling cannot be scientific. Harris Interactive, for example, has set up a system with checks and balances that allow it to use the Internet to obtain survey results comparable with more traditional methods.
But MSNBC’s Live Votes are not intended to be a scientific sample of national opinion. Instead, they are part of the same interactive dialogue that takes place in our online chat sessions: a way to share your views on the news with MSNBC writers and editors and with your fellow users. Let us know what you think.