There was an uproar among political observers in Pennsylvania over poll analysis posted on PoliticsPA. I’d like to take readers through, point by point, so they may have a better understanding of what the data shows and does not show.
(1) The irrelevance of this single poll as a bellwether about the Pennsylvania governor’s race cannot be emphasized enough.
From the CBS News/New York Times Battleground Tracker’s own documentation of methodology:
YouGov has created a panel of over 100,000 registered voters who will be interviewed four times between July and November for the nation’s newspaper of record and premier television network. The panelists come from each of the 50 states and 435 Congressional districts. They are interviewed on the internet and have been weighted to be representative of registered voters in each Congressional district using data from the U.S. Bureau of the Census and other sources. The weighting variables are age, race, gender, education, 2012 vote for President and Congress, and party identification.
As with any survey, the estimates are subject to sampling error with larger samples usually giving more accurate estimates than smaller ones. Panelists were selected disproportionately from the most competitive states and Congressional districts to provide more accurate estimates for these races. If we had divided the sample of 100,000 evenly among districts, we would have had only 250 persons in each district. Instead, we oversampled competitive districts (ending up with about 800 persons in each of the 60 most competitive districts), leaving smaller samples (around 150 people) in the less competitive ones. To say anything about the overall outcome of the election—which party will control the House and the Senate in the 114th Congress—requires estimates for every House and Senate race, even in places where our sample is too small to make the most reliable predictions.
How can we make accurate predictions when the sample size is so small in many Congressional districts? We have quite a bit of information about the people that are not in our sample and the races in those districts. From the Census, we know their demographics. From the 2012 election returns, we know the proportion who voted for each candidate in 2012 (or didn’t vote). From the 2012 Exit poll, we know the relationship between voter demographics and 2012 vote. And from our 2014 panel, we have data on how these variables relate to 2014 voting intentions.
We have combined these data into a statistical model that predicts 2014 vote on the basis of demographics and past vote. The model uses common patterns in the data to make estimates for people not interviewed. For example, if most of the 18-24 year old white female respondents in the sample who voted for Romney in 2012 tell us that they intend to vote for the Republican Congressional candidate in 2014, the model then predicts similar behavior for 18-24 year old white female voters in a district where our sample doesn’t include any voters of this type. Where we have a few voters in a particular group, we average the model predictions with the sample, with the model estimates discounted as the sample size in that group increases. These techniques have been developed by statisticians and are commonly used for small area estimates by the Census.
The output of the statistical model is a range of estimates for each state and Congressional district, reflecting the uncertainty around the predictions from the model. We have made 100,000 random draws from these estimates, producing thousands of different combinations of election outcomes—some with Republicans winning both houses, others with Democrats maintaining control of the Senate, and countless variations of outcomes for particular races. In these simulations, we can calculate the proportion of times that, say, the Republicans gain a 51 or more seats in the Senate. This is the probability of Republican control of the Senate implied by the model.
So it was an online survey, not a poll (which means we can all contend that there is at least a slightly increased level of inaccuracy over a traditional phone call-based poll methodology.)
Most would agree that overall, it is a pretty small sampling, for this online survey. Other, more respected polls have used a much larger sample of respondents.
This survey also oversampled in the competitive congressional districts within each state. In the case of Pennsylvania, that causes some issues, as the most competitive congressional districts are in more urban areas. One could incorrectly say that this means that the results of the poll are actually even closer than they indicate, but we really can’t know that. This brings me to my next point….
(2) 12% of Republican voters are definitely voting for Tom Wolf.
Among “Independent” voters, Wolf and Corbett are in a dead heat basically, if you aggregate the leaning votes to their corresponding preference, but a fairly significant amount of Republicans want to trade Toms.
When those who label themselves as “conservative” were asked who they would vote for, a term that has a more bipolar (right-wing) connotation, 13% of conservatives said they’d be voting for Wolf.
(3) Tom Wolf is beating Tom Corbett among women, Hispanic, and African American voters, but he’s also beating Tom Corbett among seniors.
Though I don’t value this poll very much, if the trends it suggests are at all accurate, I’d guess that rising property taxes due to the $1 billion in education cuts have had a negative impact on the senior voting population, a voting bloc Corbett’s campaign is probably counting on to win. It also probably didn’t help that Governor Corbett was such a loud, proud advocate of voter ID, which could have had an impact on seniors’ constitutional right to vote.
I strongly urge everyone not to take this particular poll seriously, but if you would like to, please, for my sanity, read the whole darn thing, before publishing your inferences about the election.
There are 98 days until Election Day, so I guess I have a few more points to make before I wrap up:
(1) There are three reasons why polls are inaccurately analyzed: (a) click-bait, plain and simple (b) to influence the behavior of voters in a particular way…by driving down or driving up turnout of a particular demographic group, or (c) looming deadlines make it harder to catch all the details
(2) If we want to talk about how the Wolf campaign might be in trouble, we should discuss the massive rift going on between the PA Democratic Party and the Wolf campaign. Almost every staff member at the PA Dems have resigned to go work for Wolf’s “Fresh Start PAC”, and by all appearances, this new PAC has done very little in terms of coordination, fundraising, or ID-ing prospective Democratic voters.
(3) Any Corbett staffer who truly believes, deep down, that their candidate isn’t in trouble right now, is pretty silly. There are actual polls that consistently show his vulnerability, across every demographic and based on almost every policy decision he has made while in office.
Original post found HERE.