Electoral predictions have traditionally been the territory of pundits and politicians, but as polling results become more widely accessible on the web, a new group of analysts have emerged. These statisticians gather polling data from across the country and run it through mathematical models that predict the victor on November 4. And while the accuracy of these models is still debatable, one analyst seems to be doing something right: Nate Silver of fivethirtyeight.com constructed an election model that out-predicted the pros in the democratic primary last May, and now he’s looking to repeat this feat on Tuesday.

The success of Silver’s model is due in part to his ability to use the information collected from national polls and heavily polled states in order to inform and strengthen the data on less frequently polled states. He also accounts for regional and national trends that individual polls might overlook.

Silver writes of one particular example, “The late February SurveyUSA polls had Barack Obama four points ahead of John McCain in North Dakota, but behind by four points in South Dakota. Since North Dakota and South Dakota are very similar, it is unlikely that there is a true eight-point differential in the polling in these states. The regression estimate is able to sniff out such discrepancies.”

Illustration by Joe Kloc.

As Silver points out, “Polls are an imperfect measure of voter sentiment.” Polling organizations employ different techniques to try to minimize this imperfection, and some do it better than others. He looks at the historical performance of each poll in order to determine how reliable its polling methodology is. Then he weights each poll accordingly. His weights also factor in the sample size and age of each poll.

Of course, there is no foolproof mathematical method for determining how these weights should be assigned. And doing so successfully is as much of an art as it is a science. Modeling real-world systems, particularly those influenced by human behavior, also presents statisticians with another problem. There are an overwhelming number of factors that influence the behavior of the system. And attempting to keep track of and account for all these influencing factors would be something akin to trying to herd a large group of cats across Manhattan at rush hour.

So inevitably statisticians must determine which factors to consider and which to disregard. This ultimately amounts to making guesses. And while these guesses are educated, they are still essentially guesses.

Once these initial assumptions are made, we can trust the theory to provide us with an accurate model of the system based on the assumptions, but this is only an approximation of the actual system as many factors — sometimes quite influential — are not considered. The objectivity of statistics comes after the assumptions are made. Thus Silver’s success is a credit not only to his statistical prowess but also to his keen intuition about social habits and relationships of the population he is analyzing.

We tend to overlook this more creative element involved with statistical modeling perhaps because we associate with math a particular degree of truth. And usually this is a sound approach, because by the time an idea wiggles free from the mathematical community and permeates the public sphere, we can be relatively certain that concept has been reinforced by a substantial amount of rigorous proof.

But in this way statistical models can be deceptive. We often conflate the objectivity of statistical theory with the uncertainty of statistical results. The British politician Benjamin Disraeli once commented that as a consequence of this confusion, “there are three kinds of lies: lies, damned lies, and statistics.”

So in less than a week voters will head to the polls, testing Silver’s assumptions and determining if the predictions of his election model are valid — or just more of those damnedest lies.

Originally published October 30, 2008