Philip Stark with his boxes of ballotsCyrus Farivar
reader comments 114
Share this story
Today is Election Day in the United States, so we are resurfacing this story on auditing election results that originally ran in 2012.
NAPA, CALIFORNIA—Armed with a set of 10-sided dice (we’ll get to those in a moment), an online Web tool, and a stack of hundreds of ballots, University of California-Berkeley statistics professor Philip Stark spent last Friday unleashing both science and technology upon a recent California election. He wanted to answer a very simple question—had the vote counting produced the proper result?—and he had developed a stats-based system to find out.
On June 2, 6,573 citizens went to the polls in Napa County and cast primary ballots for supervisor of the 2nd District in one of California’s most famous wine-producing regions, on the northern edge of the San Francisco Bay Area. The three candidates—Juliana Inman, Mark van Gorder, and Mark Luce—would all have liked to come in first, but they really didn’t want to be third. That’s because only the two top vote-getters in the primary would proceed to the runoff election in November; number three was out.
Napa County officials announced the official results a few days later: Luce, the incumbent, took in 2,806 votes, van Gorder got 1,911 votes, and Inman received 1,856 votes—a difference between second and third place of just 55 votes. Given the close result, even a small number of counting errors could have swung the election.
Vote counting can go wrong in any number of ways, and even the auditing processes designed to ensure the integrity of close races can be a mess (did someone say “hanging, dimpled, or pregnant chads”?). Measuring human intent at the ballot box can be tricky. To take just one example, in California, many ballots are cast by completing an arrow, which is then optically read. While voters are instructed to fully complete the thickness of the arrow, in practice some only draw a line. The vote tabulation system used by counties sometimes do not always count those as votes.
So Napa County invited Philip Stark to look more closely at their results. Stark has been on a four-year mission to encourage more elections officials to use statistical tools to ensure that the announced victor is indeed correct. He first described his method back in 2008, in a paper called “Conservative statistical post-election audits,” but he generally uses a catchier name for the process: “risk-limiting auditing.”
Napa County had no reason to believe that the results in this particular election were wrong, explained John Tuteur, the County Assessor, when I showed up to watch. But, anticipating that the election would be close, Tuteur had asked that Napa County be the latest participant in a state-sponsored pilot project to audit various elections across the Golden State.
While American public policy, particularly since the 2000 Bush v. Gore debacle, has focused on voting technology, not as much attention has been paid to vote audits. If things continue to move forward, Stark could have an outsized effect on how election audits are conducted in California, and perhaps the country, for years to come.
“What this new auditing method does is count enough to have high confidence that [a full recount] wouldn’t change the answer,” Stark explained to me. “You can think of this as an intelligent recount. It stops as soon as it becomes clear that it’s pointless to continue. It gives stronger evidence that the outcome is right.”
The process has been endorsed by numerous academics and voting officials, and by the American Statistical Association (PDF), the League of Women Voters (PDF), the Brennan Center for Justice (PDF) and many others in recent years.
And it begins with those 10-sided dice.
A ballot from the audit; note the use of a thin connecting line.
To kick off the process, all 6,573 votes tallied in the 2nd District supervisor contest were re-scanned by county elections officials in the City of Napa. They sent the scans to a separate computer science team at Berkeley, led by Professor David Wagner. Along with a group of graduate students, Wagner has developed software meant to read voter intent from ballots. His system, for instance, will flag even ballots where the arrow was not filled in according to the instructions, and it takes a different approach to filtering out stray marks. The Wagner team created a spreadsheet containing each ballot (they also created a numbering system to identify and locate individual ballots) and how that person cast his or her vote.
One problem that cropped up early on was the discrepancy between the number of ballots cast and the number of ballots scanned. While 6,573 total votes were recorded in this particular contest, the Wagner team scanned a total of 6,809 ballots, while Napa County recorded 7,116 votes cast in the election as whole. (Not every voter in the election chose to vote in this particular contest.) In short, there were over 300 ballots missing. While that seems problematic, the margins stayed more or less the same.
“If both systems say ‘Abraham Lincoln won’ then if the unofficial system is right, so is the official system, even if their total votes differ and even if they interpreted every vote differently,” wrote Stark in an e-mail on Tuesday. “That’s the transitive idea. A transitive audit is really only checking who won, not checking whether the official voting system counted any particular ballot correctly. That said, we do compare the precinct totals for the two systems to make sure they (approximately) agree, which they did here.”
He added that to deal with the missing ballots, to confirm the winner, he treated them as if they were votes for the runner-up—so even with 300 additional votes, Luce still was the victor.
“To confirm the runner-up, we could not do that; instead, I treated them two different ways, neither completely rigorous,” he added. “In other audits, I’ve been able to deal with any mismatches between the ballot counts completely rigorously, so that the chance of a full hand count if the reported result was wrong remained over 90 percent.”
With that out of the way, the first step in the actual audit was to randomly select a seed number that would be used to feed a pseudo-random number generator found on a website that Stark created. For this, Stark had some high-level help in the form of Ron Rivest, one of America’s foremost experts on cryptography and voting systems, a professor of computer science at MIT who had also helped create the RSA crypto algorithm. Using 20 store-bought ten-sided dice, Rivest and Stark rolled out a 20-digit number. (73567556725160627585, for those keeping score at home.)
Risk-limiting auditing relies on a published statistical formula, based on an accepted risk limit, and on the margin of victory to determine how many randomly selected ballots should be manually checked.
“The risk limit is not the chance that the outcome (after auditing) is wrong,” Stark wrote in a paper (PDF) published in March 2012. “A risk-limiting audit amends the outcome if and only if it leads to a full hand tally that disagrees with the original outcome. Hence, a risk-limiting audit cannot harm correct outcomes. But if the original outcome is wrong, there is a chance the audit will not correct it. The risk limit is the largest such chance. If the risk limit is 10 percent and the outcome is wrong, there is at most a 10 percent chance (and typically much less) that the audit will not correct the outcome—at least a 90 percent chance (and typically much more) that the audit will correct the outcome.”
Enlarge / Ron Rivest, an MIT cryptographer, helped Stark use 10-sided dice to produce a random seed.
To decide how many ballots should be sampled in the Napa County audit, Stark used his own online tools and calculated that it should be 559. With that number in hand, Napa County’s John Tuteur supervised a team of temporary ballot counters in another room. They sorted through stacks of ballots in numbered boxes, affixing a sticky note to the individual ballots in question, preserving the order in which all ballots were kept.
After locating the individual ballots, the team delivered the boxes containing them back to Stark, Rivest, and a few observers (including me). Each marked ballot was then pulled from its box and displayed to the room. Once everyone agreed that the ballot showed a vote for a particular candidate, an undervote (e.g., no vote at all), or an overvote (an uncounted and unauthorized vote for multiple candidates), the result was tallied on Wagner’s spreadsheet. After a given set of ballots, those results were then compared to what the Wagner image-scanning team had recorded.
“You want cast as intended, and counted as cast, and verified,” Stark said.
Enlarge / Temporary elections workers sifted through stacks of voted ballots to locate which ones needed to be audited.
Statistically significant audits
Over a dozen counties have now participated in a California-wide pilot project to provide a real-world test of what had previously been an academic theory. The pilot was authorized under California Assembly Bill 2023, which passed in 2010. Including audits conducted before the bill’s passage, 23 contests have been audited across several county-level elections in the state in recent users, and other counties, including Orange, Marin, and Yolo, will have additional audits in the coming weeks.
California already has a mandatory audit law, which stipulates that a public manual tally of 1 percent of the precincts, chosen at random, must take place. But in Stark’s view, this is the wrong way to proceed.
“There is no statistical justification for the 1 percent tally,” Stark explained. “It is a check on the accuracy of the system, but it is not well tied to ensuring that outcomes are right. It doesn’t require more counting for small margins than for large ones, and it does not require a full hand count, even if something is obviously wrong.”
“In a contest I audited in Orange County,” Stark added, “the chance the 1 percent count might not find any errors at all even if the outcome had been wrong could have been as large as 88 percent.” Risk-limiting auditing, by contrast, takes into account the margin of victory. A wider margin of victory means there’s less risk that something went wrong, so the system requires fewer votes to audit—sometimes dramatically fewer.
Some vote registrars appreciate the new system. “Academics like Professor Stark bring an unbiased, fact-based approach to solving problems, unlike some election reform activists that promote changes based on superstition and emotion,” said Marin County’s registrar of voters, Elaine Ginnold, in a 2010 UC Berkeley news release. “It is the more objective approach that will result in meaningful election reform such as the proposal in this election audit bill.”
Rivest, who has published academic papers with Stark on this issue, also lauded the process, which until last week he had not witnessed in person.
“Post-election auditing is a great way of making sure that the voting system is working as it should,” he said. “Given the difficulty of checking the election outcome by looking at the paper ballots, I’d like to see a lot more post-election auditing. The work here is based [on] having a foundation in paper ballots. Assuming you have a solid paper trail, you can confirm the election outcome with the process that we’re seeing today.”
And the impact of Stark’s work is spreading. Around the country, counties in Colorado and Ohio have used Stark’s methods to conduct similar audits, though he has not participated in them. Starting in 2014, all elections in Colorado will use risk-limited auditing. As for California’s pilot project, its audits will continue through the November 2012 election.
Enlarge / Stark’s spreadsheet compared the scanned vote (right-hand name column) with the votes as human-read on each audited ballot (left).
The results are in
But risk-limiting auditing does have one real downside: time. A full recount can sometimes take days, of course, but even doing a risk-limiting audit on a relatively small Napa County contest of around 5,000 votes took four hours (including a lunch break) and collectively involved around 15 people, to say nothing of the prep work required to set up the process.
“At the moment, I think that until and unless we get [officials] to report [votes] at the ballot level, it is going to be a lot of trouble to do it this way,” Stark said. “For large jurisdictions, it’s just hard—it’s hard to do quickly enough.” He has ideas for speeding up the process, but they don’t align well with the current crop of voting machines, which don’t record their per-ballot vote interpretations.
The Napa recount encountered a few minor discrepancies, such as when a numbered ballot (for example, Ballot 32 from a stack of 50) was not properly marked because the human worker mis-counted. Those glitches, however, were all corrected by the Stark and Rivest team. In the end, all 559 audited votes the team examined matched the votes as they were recorded by the Wagner scanning software.
As the day wound down, the original results stood—and Napa County could have confidence in its election.
“I am committed to having the right count,” Napa County’s Tuteur said on Friday. “My goal is to make sure that the people of Napa County, those who voted and those who didn’t, have full confidence in our system.”
Philip Stark with his boxes of ballotsCyrus Farivar