Union of Concerned Scientists Inc.

09/09/2025 | News release | Distributed by Public on 09/09/2025 05:30

Missing Pieces: Why Filling Gaps in Voting Data is Key to Building Trust in Elections

Rose Nafa
Research Consultant

When scientists study anything at all, the first step is nearly always the same: sourcing reliable, standardized data. Without it, analysis is impossible-like knitting a sweater with no yarn. Election science is no exception. In theory, if I request this data from election commissioners, I should be able to request and receive it, easily analyze it, and understand how it was gathered. It is, after all, public and available upon request. The reality, however, is far different. For an upcoming report for the UCS Center for Science and Democracy, we requested election data from swing counties across the country for the last presidential election. And what did we find? Formats are inconsistent, incomplete, and rarely structured in ways that allow for direct analysis.

Take the 2024 election data from Wayne County, Michigan, which includes the city of Detroit. It has one of the largest populations in a key swing state and some of the lowest turnout rates. Unfortunately, data labeling inconsistencies made it impossible to include rejection data from most precincts of Detroit when modelling ballot rejections across the country. The voter turnout data labels each precinct in Detroit with a simple numbering system, ranging from Precinct 1 to Precinct 502. The ballot rejection data, however, used a five-digit code ranging from Precinct 01261 to Precinct 07415. The two datasets came from the same county election office and described the same election, but because the numbering systems don't match, there was no way to connect them.

The result? We cannot answer questions about whose votes were counted and whose were not in this critical swing state. The data exists, but not in a usable form. This isn't just inconvenient for us as researchers; it has real consequences for how well we can understand elections. Because we couldn't merge the datasets, we couldn't answer basic questions, such as which communities in Wayne County were most likely to see their ballots rejected. These insights matter, but instead of generating answers, we were left with mismatched spreadsheets.

Election results aren't just data for data's sake: they are official records of political participation. They record the democratic process by telling us who voted, which types of ballots were utilized, and which ballots were rejected. To evaluate the health of our elections, we need this information to be clear, complete, and consistently recorded. When data is unusable, the consequences ripple outward: media outlets can't verify claims in real time, policymakers lack timely information to guide reforms, and voters are left with a system that looks opaque and confusing. Poor data usability introduces errors, both human and technical. Every manual copy-and-paste step is a chance for mistakes. In a time when public trust in elections is already fragile, clarity and consistency in reporting are more than conveniences for researchers, they're safeguards against misinformation for all of us.

Ensuring access to usable election data is not just a matter of saving researchers time. It's about ensuring that democratic information-the official record of who voted and how those votes were counted-can be accessed, analyzed, and trusted.

Status-Quo: the problems we face

The way that election results are shared varies dramatically across counties and across states, creating a patchwork system that makes analysis difficult.

Availability

The first barrier to utilizing precinct-level election data is getting the data in the first place. Precinct-level data means election results broken down into the smallest voting areas, usually neighborhood-sized. Relying on county totals makes us miss local variation in the voting behavior of large, diverse groups of voters. Despite its importance, there is no coordination of this data at a precinct level. Instead of accessing the data in one place like one might for other types of governmental data, in order to get precinct-level election data researchers must identify which level of government is in charge of managing the data, confirm that that data is in a format computers can work with, locate the data, often through a request or even a paid fee.

It is a complicated process that plays out differently in each county and varies from election cycle to election cycle Some jurisdictions respond quickly, others slowly, and some not at all. At each step, there could be an issue that delays access to the data or prevents it full stop. Once access is granted, another barrier emerges: usability. Some of the historical data we requested isn't usable at all because it only comes as an image-based PDF (essentially a scanned page that most software can't read). Using this data would require manual transcription, which costs time and lends itself to errors.

Format

While most counties provide data in a format ready for analysis, formatting inconsistencies can cause major delays to clean the data before an analysis can be conducted. Shifting columns and spacing inconsistencies were common across the datasets used in our analysis, and each issue took hours to correct. Without standardization, the simple act of opening a file and confirming that it contains the necessary information in a usable format can be a major roadblock.

Quality

Even when the data can be processed into statistical analysis software easily, combining various datasets (for example, the turnout, ballot rejection, census demographics, and mapping data used in our analysis) can be a herculean task. Each county has its own method of identifying precincts, and most change that method depending on which file you are looking at. One file may refer to the "City of Allen Park" while others list "Allen Park City". Directional words like "North" and "South" may be abbreviated, while "West" and "East" are not-and vice versa in other datasets from the same county. Add in words like "Heights", "Village" and "Township", whether or not there is a space between each word in a name, the inclusion of the word "precinct" or just a number for identification, and we have the recipe for data disaster. In order to compare two datasets, precinct names must match perfectly.

There are also changes in precinct boundaries across different elections, and updating and releasing that data is also subject to the discretion of the state and local election administrators. Some counties have the resources to publicly release new boundaries as shapefiles (a data type used to store information about the geography of a place), but many are unable to do so due to capacity restraints, and instead release the data as PDF files (or even just lists of included addresses). If jurisdictions do have shapefiles online, they often update them every election without archiving previous iterations, essentially erasing the shapefiles from previous elections and making historical analysis difficult. Using this data requires an even more specialized skill set than that needed to address the naming inconsistencies above, meaning that individual researchers are less likely to be able to fix both issues. This contributes to a lack of precinct-level analysis of election data.

Local Resources

It is important to note that these are not issues put in place by election administrators to prevent research. Instead, election administrators are working with limited staffing, outdated systems, inconsistent funding, and frequently updated election laws which change what they are required to do each election and how. Urban counties may manage thousands of precincts, while rural ones may lack full-time staff trained in data management. This makes it difficult for election offices to collect, store, and share data.

This is the reality of the status quo: election data exists, but it is locked behind unusable formatting, inconsistent labeling, and fragmented accessibility. Until these barriers are addressed, understanding, evaluating, and improving elections will remain slow, error-prone, and incomplete.

What better election data could look like

A better system is possible.

At the most foundational level, election data needs standardization. Precinct identifiers should match across data files and across elections. Naming conventions within each jurisdiction should be consistent. Any changes in how the data was collected and what it represents should be documented clearly. With such standardization, preparing election data for analysis would no longer take weeks of manual cleaning and guesswork.

Second, as we suggest in our report on improving election data transparency, election data should be provided in usable formats by default. Text files should be structured into consistent rows and columns, spreadsheets should be easy to import into statistical software, and image-based PDFs should never be used to share data. Most importantly, data systems should minimize errors from manual transcription.

Third, access to election data should be centralized so that researchers, media, and the public can find it quickly and easily. Right now, access to election data is dependent on which county you want to study, what their historic funding priorities have been, and whether local administrators have time to process the request. Ideally, election data should be accessible through a state or national portal that is updated as soon as possible after each election. The strongest version of this would be a federal API (Application Programming Interface) for election data. APIs are interfaces between the software that stores large amounts of data and the software used to perform statistical analysis. These are common tools in other types of government data: we can easily look to APIs run by the Census, the EPA, or NOAA for inspiration and adapt those systems to fit the needs of election administrators and researchers.

How do we get there?

Taking steps to improve the election data landscape requires investment and coordination. Local election offices are already underfunded and understaffed. Without adequate resources, they can't be expected to improve their systems or build new data infrastructure. Any plan to improve election data must include direct financial and technical support for administrators. If election officials and the public are sincerely concerned about election integrity, they need to invest in election data transparency and provide administrators with resources to accurately and efficiently collect and share data so that election integrity can be tested and confirmed instead of guessed at.

These steps require oversight. For a coordinated, standardized system to work, it needs a national set of rules for formatting and data upload timelines. This does not mean stripping control from local election administrators, but it does mean building a common data system that is accessible and usable for everyone. Researchers, policy advocates, and election officials need to collaborate about what usable data looks like, what is feasible for administrators, and what is necessary for accountability.

By investing in accessible, standardized election data we can turn a patchwork system of confusing data into a transparent, reliable record of democracy. Those interested in election research could answer critical questions quickly, reducing errors and building trust in our democracy. This isn't just about efficiency. It's about accountability, equity, and confidence in our electoral process.

With standardization, resources, and oversight, we can ensure that the data needed to understand and improve elections is available to everyone, making our democracy stronger and more resilient.

Join us for a webinar on October 8 to learn more about our upcoming report, which leveraged geospatial data to better understand how, where, and why voting rights of racially diverse communities have been diminished in recent elections.

Union of Concerned Scientists Inc. published this content on September 09, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on September 09, 2025 at 11:30 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]