Polling is Hard: How Methodological Foundations Drive or Derail Results
In the first entry of our ongoing series, we examine the first steps of how a poll is made, evaluating best practices and pitfalls - why and how the sample impacts the end-value of polls.
I don’t think anybody would accuse them of such, but polling firms don’t exactly have an easy job; in fact it’s quite a tricky business. Even before the scrutiny of the public eye (and the expectations coming with it) setting up and conducting a survey is full of possible pitfalls that can lead even the most honest pollster astray. I’ve often seen, heard and - if I’m honest - experienced the complete shock of voters as one side won while pointing at surveys showing an opposite result. So instead of taking these polls on face value, but without diving into the broader societal/political forces at play, let’s explore the challenges a survey has to deal with to even produce surface level numbers - starting with the fundamentals.
Why Polling is Hard
While the last few years saw an increase in online polling of public sentiment – whether it’s about elections, policies or cultural shifts and trends - the polling industry itself is still slow to change its practical approaches and is persistent in using more traditional methods we might picture. This presents a growing challenge with generational culture shifting around anonymous calls, decades of experience with various robocalls, spam and so forth. With the younger generations preferring other methods of communication, it is increasingly difficult reaching a statistically valid sample size, as depending on who answers, it’s easy to slip into various domains of sampling bias. Now we will focus on how polling firms reach (from their point of view) a scientifically satisfactory sample and how the environment is created to get to the point which is interesting for the public: the numbers and line charts themselves.
Dialing In
The first thing a responsible and adept pollster must settle is who they are going to ask. As asking the entire population (relevant to the topic of the poll) is reasonably impossible, the main task is getting a workable slice that contains all the elements of the full picture - that is to say, a representative sample - which reflects the characteristics and can be used to predict the behavior of the larger population. For the purpose of thematic consistency, let's take an election poll as an example. In this case, the full population we can't reasonably reach is the entire electorate, every person who legally has a vote. We still need to get data we can use to measure support for one party or the other, so for the sample to be representative we need to consider all the nuances (or as many as possible) that make up the actual electorate. This means making sure we have enough responses from - preferably roughly following the societal distribution on the ground - groups such as:
genders
different age groups and generations
regions and counties; religions and ethnicities
education level and occupation (in broad categories).
This distribution is often based on census or other statistical data but it's ultimately up to the polling firm to use the sources they prefer.
Some, like Quinnipiac and Gallup, use random digit dialing which is exactly what it sounds like, calling random phone numbers. Others use mixed methods, such as Ipsos sets up participation groups (called panels) based on cold-calling and mail invitations or Rasmussen uses - essentially - interactive robocalls. Traditional methods are very much in play, in fact, 6 of our 10 core pollsters still extensively use it. However, we cannot pass by the fact that these surveys often require serious weighting to counteract the low(er) response rates. In practice this can mean that if younger generations are less inclined to pick up the phone, the responses of those who do, have to be given more weight to represent their proportion in the electorate. This doesn't mean they have to be inaccurate as (based on our own analysis) Gallup boasts a mere 2.2% error rate, but demographic corrections have to be applied precisely otherwise the results can be heavily skewed.
Another important element is whether they (technically) start fresh every survey, or do they poll a group of participants repeatedly. The latter option, which is called a panel, and it can create huge datasets like YouGov's MRP methodology, but it can also require considerable weighting as the responders are not filtered based on representative criteria.
A common way of normalizing data on the sample level is introducing quotas, meaning that pollsters will try to reach a predetermined amount or ratio of participants, such as based on the age distribution of the electorate. A number of factors can influence the accuracy of a sample built with quotas, such as whether the participants were selected randomly, by the interviewer, or if they are part of a panel as any of these can introduce their own set of human- or chance-based biases. Using quotas seems to yield positive results though, as YouGov's error rate is only 1.5% in the US (though a high 3.5% in Europe) and Morning Consult is also at the top of our chart with only a 1.8% aggregated error rate!
More complex (and more expensive and difficult) methods can create samples that are 'out-of-the-box' more representative, but these are also reasons why they are more rare. A combination of effective sample creation and weighting is nevertheless practically always needed, so the method a firm uses doesn't inherently determine its value.
Almost needless to say, there are many different ways and combinations of sampling and it's entirely up to the pollster to decide how to approach it.
We also need the responses tagged ('cross-tabulated' or 'crosstabbed', when responses are broken down by the different demographic groups, something polling enthusiasts often want to see as soon as possible) so we can set combinations, like college-educated middle-aged women from the capital; under-25 blue-collar men from rural towns; or retirees in this or that side of the country and so forth as there might be stark differences between how these different 'labels' create a vast diversity of perspectives. Polling firms usually target at least 1000 responses (which is the academically accepted minimum and a balanced between statistical significance and practical feasability); some might have more but that doesn't necessarily translate to better results (YouGov's MRP, based on limited data has a 2.5% error rate which is rouhgly around average). As we established, because of the amount of responses and holes in the sample, we'll have to introduce corrections as some groups may only have a few individuals. The exact methodology of the weighting (sometimes even whether they do it at all) depends on the polling firm. The general idea however, is to make sure responses represent the actual presence of each demographic group - at which point pollsters also have to be careful not to over-weigh a group and disproportionately distort the results (more on this in the next post of the series). Now classic examples of this step going awry are the recent oversampling and over-weighting of college-educated voters, leading to - based on the polls - some surprising results from and around 2016 (a real low point for the industry as most people would note aptly). Sampling and weighting errors are of course not the only reasons for inaccuracies in, for example the polling around Brexit, and we'll dive into concepts such as biases, contradictions, 'shy voters' and 'disappointed voters' (both mythical and actual) elsewhere.
Reaching Out
Once the sample creation method is settled, a pollster also has to decide how they reach out to people. This intersects with the sample creation to a degree, as random digit dialing can often already lead to recording the responses via telephone. This method is very common, and in Hungary for example, most firms rely heavily on phone calls. The other most common method is web-based surveys, used exclusively (such as YouGov or Morning Consult) or in tandem with phone calls (like Kantar or Ipsos). There are still some though who rely on face-to-face interviews (Eurobarometer for example), but for national-level polls it's by far the rarest.
So there we have it - one step closer to a thorough understanding of how polling works, and consequently, where it can fail. However, even with all possible tinkering with sample setups; experimenting with contact methods and tweaking the weighting formulas, the data ultimately still depends on the people polled. Whether they respond at all, whether they are honest with themselves and the interviewer, how much they trust the pollsters, or if they have contradictory preferences shape the data fundamentally. Join us next time, when we'll examine how and what kind of questions are asked and what can be done there to further refine the final numbers.


