Introducing Wick’s Weighted Poll Averages

The nerds have taken over the political space over the last decade-plus as the tools that started to revolutionize sports over the previous decade have been brought to bear on politics with wild success. Every major election sees a mind-numbing amount of numerical and mathematical analysis focused on it, and poll averages, forecasts, and other numerical analysis tools abound. For many, poring over polls has become as much if not more of a pastime than following what the candidates themselves are doing.

Perhaps surprisingly, there is no one single “poll average”, and indeed there seem to be as many different poll averages as there are outlets collating the polls. The two most prominent, widely cited poll averages are the ones from RealClearPolitics and FiveThirtyEight, and as the race for the Democratic presidential nomination has progressed I find that neither of them quite fit what I want from them. RealClearPolitics publishes a straight average of whatever polls they record and deem worthy, usually from the most prominent outlets, over whatever period they choose to average them over. The only quality control, if any, is in what polls are included; among the polls included, there is no attempt to control for sample size, methodology, or overall quality, and polls simply age out of the average once they get too old (however “too old” is defined) or the next poll from that pollster comes along.

FiveThirtyEight, on the other hand, weights its poll average based on those factors, but the details of their methodology aren’t public, and it also includes their own model’s assumptions about how the race should develop, meaning in the days immediately after a contest the “average” tries to predict how much of a “bump” candidates will get based on their performance, and states with little recent polling will have their “average” extrapolated from larger national trends. Such extrapolations don’t always incorporate mitigating factors or common sense; for example, the current FiveThirtyEight “average” of South Carolina has Mike Bloomberg in fourth place at 9.5%, despite him not actually being on the ballot there. The copious polling conducted in South Carolina that doesn’t include Bloomberg is merely interpreted as failing to catch whatever bump Bloomberg might have received. The result is so complex with so many mitigating factors that it’s hard to accurately call it a “poll average” at all; it’s more an attempt to capture the state of the race based on local and national trends and past history, and FiveThirtyEight themselves readily admit that it’s not really intended to be much more than the backbone of their election forecasts. It’s useful in its own way, but not really the best way of capturing what the polls are actually saying right now like what RealClearPolitics and most other media outlets try to do. But is there a middle ground between a straight average of the topline numbers and FiveThirtyEight’s complex model?

It’s with that in mind that I present the Wick Weighted Poll Averages, available on Google Sheets here. The Wick Weighted Average uses a fairly simple, transparent methodology, incorporating readily available tools to weight each poll by quality and sample size. I start by taking FiveThirtyEight’s Predictive Plus-Minus for each pollster and subtracting it from 1. (Thus, FiveThirtyEight’s best pollster, Monmouth University, with its Predictive Plus-Minus of -1.5, gets a weight of 2.5, while pollsters with positive Predictive Plus-Minuses receive weights below 1.) A pollster with a Predictive Plus-Minus over +1 (which would therefore end up with negative weight), or one not assessed by FiveThirtyEight at all, is not included in the averages; at some point I may come up with some means of incorporating new pollsters. For pollsters where FiveThirtyEight only has a provisional grade, I multiply this raw weight by the number of polls they have assessed divided by 25, or if they have 25 or more, by one more than the number of polls they have assessed, to produce the final weight, serving to de-emphasize polls without enough data to properly assess. (At some point I may attempt to scale this effect by how recent their assessed polls are to get closer to an approximation of how close they are to getting an actual grade from FiveThirtyEight.) For each individual poll, this base weight is multiplied by 1-(1/√S), where S is the sample size, to arrive at a final weight. Only the most recent poll from each pollster is listed. Each candidate’s result in the poll is multiplied by the weight, and the sum of all the weighted percentages is divided by the sum of all weights. (Note that the way this works in the spreadsheet, if a candidate is not mentioned by a poll it’s treated like a 0. I’d like for such polls to be ignored for such candidates, but that doesn’t feel like it would be worth the work and adjustment to the spreadsheet it would entail to weed a poll out of the denominator only for specific candidates; in the meantime, though, I do try to exclude polls from pollsters with provisional grades that don’t include all candidates included by every poll with a full grade.)

I only publish averages where 1) there are at least three polls, 2) the weights of all the polls add up to at least 1 (so a bunch of crummy or untested polls can’t produce an average unless there’s enough of them for the wisdom of crowds to kick in), and 3) the weights of all the polls except for the one with the largest weight add up to more than the largest weight (so one poll can’t have an outsized effect on the average). In addition I only list polls if it’s no more than a month to five weeks older than the next-oldest poll I already have listed. I generally define polls over the shortest period where I can apply the above criteria, with emphasis on polls conducted fully or primarily after major events, such as a prominent contest on the road to the presidential nomination, a debate, or a major news story, or if no such event has occurred on simple, discrete time periods such as a week or a month. For certain situations, I maintain the Date-Weighted Average, where each weight is divided by the number of days since the poll’s end date (I may eventually add some way of incorporating the entire time-span of a poll into the date adjustment) and no cut-off applies (though particularly old polls may not be assessed). (Note that because the current date is used to determine the age of a poll, the DWA may change even when no new polls have been published, as the impact of recent polls fades.) This is used when polling is too sparse for the straight average to be useful, or conversely, where a recent event has had a clear effect on the polls but there aren’t enough polls of sufficient quality to move the cut line to that point; in the latter case, the DWA is only published if the total date-adjusted weight of polls published since that event is greater than the total weight of polls not included in the straight average. The above criteria for publication also apply to the date-adjusted weights. I have another idea I’m working on and for which the infrastructure is already present in the spreadsheet, but I still have some details to work out and it would be applicable only in very specific situations involving only the most prominent, heavily-polled races.

For those interested in assessing the accuracy or usefulness of this new average, here’s how it would have assessed Iowa and New Hampshire (Nevada, a notoriously difficult state to poll, did not produce enough polls of enough quality to post an average before the caucuses), with Iowa including all polls conducted after the last debate before Iowa and New Hampshire including polls conducted after Iowa, which I hope to eventually add to the spreadsheet at some point:

Iowa NH
Sanders 23.3% 27.5%
Buttigieg 16.4% 21.2%
Warren 14.5% 12.7%
Biden 21.2% 11.9%
Klobuchar 9.4% 11.1%
Yang 3.4% 3.2%
Steyer 3.2% 1.8%
Gabbard 1.0% 2.3%
Bloomberg (WI) 0.4% 0.1%
Bennet 0.5% 0.6%
Patrick 0.0% 0.7%

Click here to see the latest averages on Google Sheets, where as this post goes up you can see averages of the national Democratic presidential primary polls as well as averages for South Carolina and four of the five biggest delegate states on Super Tuesday: California, Texas, North Carolina, and Massachusetts, plus a tab for poll results from Florida and (in the future) any other state where I have three polls I can list based on the above criteria even if I can’t post an average. Over the next few days I hope to add poll lists and averages for upcoming congressional, Senate, and gubernatorial primaries, and possibly also for the generic Congressional ballot; as time goes on general election races will be added to the spreadsheet. If people like the concept enough I may add a link to the spreadsheet to the top bar, or possibly to the sidebar.

Leave a Comment