For anyone wishing to replicate this work, the Pennsylvania Voter Registration data may be obtained by filling out this form and submitting it as instructed. Snapshots dating back to 2012 may be ordered. If you write the date(s) of interest on the form you will be sent the snapshot that is closest in date. Snapshots are archived weekly.
The data snapshots used for this analysis have dates of:
For background on voter registration database vulnerabilities please read my Medium article “Voter Data and Election Integrity“.
Lets start with the 1993 National Voter Registration Act, also known as NVRA, or the “motor voter” law. NVRA tells the states what they must do to maintain accurate and current voter registration lists.
Under NVRA, all who register to vote must submit a signed voter registration application. Submitting a false application is prohibited by law. No one can register another person to vote.
Under NVRA, states are required to mail a a response to anyone who applies to register to vote, telling them whether the application was successful or not. This could include a valid voter ID card, or a rejection notice if the voter is found to be ineligible.
NVRA also restricts how states remove votes, telling the states that they are not allowed to remove voters from the rolls due to a failure to vote (“use it or lose it” is prohibited). Even voters who move, may not be removed from voter rolls before the state has mailed them a notification enabling them to easily update their registration information to a new address.
NVRA mandates recordkeeping, telling states that “all records and papers relating to any application, registration, or other act requisite to voting in any election for federal office, [must] be preserved for a period of twenty-two months from that federal election”. States are also required to document any changes to voter registration records.
The state of Pennsylvania has its own law about voter registration. Pennsylvania’s Title 25 statutes deal with all election-related legislation. The majority of its current framework became law Jan. 31, 2002.
Section 1222 provides for the creation of the Statewide Uniform Registry of Electors, also known as the SURE database, to store data on each unique voter in the state, including identifying information such as name, date of birth, and state ID (PennDOT) number. SURE was set up to be the definitive repository of voter registrations in the state – ensuring that people who die, change their names, or move between counties are reflected accurately in the registration.
Once a voter is registered, ANY CHANGE to his or her registration requires some action by the voter, except for changes to address or district caused by street renaming or redistricting. In these cases the voter must be notified.
A voter can request a change of political party, or report an address or name change fairly easily, either in person, by mail, or via an online form. This makes sense, for they are common enough changes.
Changing other personal data is more complicated. Date of birth and gender are considered part of an individual’s identity, thus they can not easily be changed without full documentation. To correct a birthdate mistake you need to provide a copy of your birth certificate, and submit this form.
Title 25 does not directly address gender changes, but presumably a similar process is followed. This form can be signed by a licensed medical or social worker and submitted in person to a PennDOT licensing center.
Once records have been altered to reflect the voter’s desire to change identity, the voter then re-submits their registration with the updated information. Under section 1328 this will be evaluated, and the voter informed of the acceptance or rejection of the registration by mail.
To re-cap, street name changes or redistricting can cause changes without voter involvement. All other changes need to be initiated by the voter, involve mailed confirmation, and generally require additional documentation from the voter. What we are saying here is that voter registration information isn’t supposed to just change by itself. It isn’t supposed to change because of some kind of database error. Or sunspots. Or Mercury retrograde. There is a specific procedure, and the voter is always involved.
So, how’s Pennsylvania doing with all this?
By both their own strong state laws and by federal law, we shouldn’t find too many oddities in the SURE voter registration database. Especially not ones the voter doesn’t know about.
But that’s not what we found.
Each data set has over 8.5 million voter registration records. Each voter’s record contains 153 data fields. That’s over 50 million records and nearly 8 billion pieces of data. We uploaded this data to a high-speed cloud computer and wrote scripts to look for duplicate and anomalous data within each data set, and also to look for changes in voter data between data sets.
At first we didn’t know what to look for, so we just probed the data. We counted the voters in each county, for each data set, by party. We started noticing odd things. Duplicate records. Voter records changing in ways that didn’t make any sense at all.
We wrote more scripts. Scripts to look for duplicate and anomalous data within each data set. Then we looked for changes in voter data between data sets. We rubbed our eyes and went back to double-check the raw data. And yep, it was THAT weird. We pulled out over a million lines of data for closer examination. The more we looked the less it made sense that we were seeing normal organic changes in Pennsylvania’s electorate
Even that simple exercise yields interesting results. We found that in the seven months between April 4, 2016 and the November election the total number of registration records increased by nearly 460,000 voter registrations. By 7 months later however, 316,000 of those registered voters were gone from the rolls. That’s a 5.5% increase right before the election and a 3.6% drop right afterwards. Are such large changes normal?
Looking at party affiliation, 20 of 67 counties in Pennsylvania saw a drop in registered Democrats in the run up to the election. No counties saw a drop in registered Republicans. In fact there was a 0.7% drop in Democratic share of registered voters in the state after the primaries, meaning either Democratic voters changed party or Republicans registered far more new voters in 2016.
60% of the loss in voters after the election were registered Democrats, 25% were Republicans. By July 2017, there were nearly 30k fewer registered Democrats in Pennsylvania than there were in April 2016, but 96k more Republicans and an increase of 142k registered voters total. Did Pennsylvania truly shift that dramatically towards the GOP or Independents right before and after the 2016 election?. This data is perplexing but not necessarily proof of anything untoward.
DATA: A spreadsheet containing the voter counts by party and county for three different snapshot dates is here.
We became curious about exactly what was causing this statewide change in apparent political affiliation. Were people actually making changes to their political party after the deadline for the primary, when such changes would make no difference at all?
To answer this question we began looking at the records of individual voters. How many were added? How many were removed? How many showed a change in party? As long as we were checking, we looked for other changes, again tracking specific voters.
Changes to the Data
The increase in records between the April 4 data set and the November 7 data set was caused by the addition nearly 550,000 new voter IDs, the removal of over 93,000 voter IDs, and approximately 2000 additional duplicate voter IDs.
Duplicate voter IDs you ask? Yes. We will get to that later.
Oddly enough, 5000 of the voters who were added during this time period were marked as “Inactive” in the November 7 data set. “Inactive” voters, as you recall, are registered voters who haven’t voted in years, not newly registered voters prior to their first election.
Over 200,000 voters changed party in the seven months between April 4, 2016 and the election. This is 2.3% of the voting population. Nearly 120,000 of these changes occurred after August 15. Why would any voter—much less 2.3% of all voters—change party in the months after the primary?
Looking at the “record changed” date for voters that appear in both the April 4th and the November 7th data set, we see there were nearly 1.5 million changes made to the roughly 8.5 million PA voter registration records in this seven month period. In other words, more than one in six records was altered in some way during this time.
Last Minute Changes
As reported in the Philadelphia Inquirer, in the weeks before the election county officials reported a huge surge in the number of voter registrations that were being filed. Although the deadline for registering or changing registrations was October 11, election officials worked hard up to the day of the election processing these registrations. The dates of registration shown in the November 7 data set show this surge. But weirdly, the records show an an enormous number of changes to registrations — 3 times as many modified voter registrations as new voter registrations, especially in the first 11 days of October.
Out of 8.5 million registered voters, the database tells us that nearly 1.5 million of those voters changed their voter registration information, over 17% of all PA voters. Remember, PA law requires voter participation and often documentation for any changes to their information.
These numbers defy common sense, but it gets worse. Looking at the February 2017 voter registration data set we see that 20% of the voters who theoretically rushed to change their voter registrations at the last minute are recorded as not even voting the November 2016 election.
Here are the counts of recorded new registrations and changed registrations for each two week period between the May 1st and the election:
All of these changes, especially the last minute ones, perplexed us.
So we went even deeper.
A few features of the data immediately became apparent.
1. There are a great many January 1 birthdays. We counted them. The data from our November 7, 2016 data set shows that 34,384 of our 8.73 million voters was born on January 1. Since January 1 occurs once every 365.25 days, we roughly estimated that one in 365.25 voters, or approximately 24,000 should have that birthday. The odds against 10,000 extra people having that particular birthdate seem high, to us. Especially since date of birth is considered an identifying feature and a correct date of birth must be provided in order to register to vote or to obtain a Pennsylvania identification card.
2. Looking at these January 1 birth dates more closely, we noticed a great many birth dates of January 1, 1900, and January 1, 1800. In fact 202 voters have a birth date of January 1, 1900, and 1689 voters were born on January 1, 1800.
It has been suggested that the Jan 1, 1800 birth dates were a placeholder for unknown birth dates – perhaps from older registration data that was missing this information. We thought we would have a closer look at these voters to see if that was true.
Of the 1689 voters with a Jan 1, 1800 birthday, 426 apparently registered to vote in 2000 or later, and 928 registered in 1990 or later, leaving only 711, or 42% who registered before 1990.
Looking even more closely, we started searching for these people by name, one by one. The results were very interesting. We found good matches for 17 of the first 19 we looked for. The matches we found ranged in age from 29 to 83, with only five over the age of 60. Most of these voters must have registered in the 1980s, or later. It seems unlikely that the registration system used at that time would have omitted date of birth.
3. The discovery of these extremely old voters led us to count the voters in this data set who were over the age of 100 on election day. The answer? 7,572. This is particularly extraordinary considering 0.0713% of Americans reach the age of 100. Given Pennsylvania’s population of 12.78 million, there should be approximately 2,200 centenarians in the entire state.
4. Since it’s possible that some people who had passed away were not removed from the voter rolls, an even more interesting question is how many people over the age of 100 cast a ballot on November 8. To estimate this we looked at the February 27, 2017 data set. 6.2 million of the records in that data set show a “last voted” date of 11//08/2016. Of these voters, how many were over 100?
Of these voters, 2,109 were over 100.
That’s nearly all of the Pennsylvania’s 2,200 centenarians.
Of these very old voters, 798 were apparently born on Jan 1, 1800. Three other voters had a birthday of zero, making them over 2000 years old. Additionally, 69 were born between Feb 9, 1800 and Jan 1, 1900, making them older than the oldest known living person at the time of the election. 13 people were born on Jan 1, 1900.
That leaves 1,223 people between the ages of 100 and 117 with legitimate looking birth dates who voted in the election.
31 of these voters, ranging in age from 100.2 to 109.9 registered in 2016. 440 of them first registered after 2000, at a minimum age of 84.
5. Finally, we noticed a voter whose “last voted” date was 11/06/2012, and whose birth date was listed as 12/04/1997. This indicates that that voter was under the age of 15 when she last voted. Looking at the November 7 data, we counted all such voters and found 257 who were under 18 when they last voted. Looking at the February 27, 2017 data set we found 144 such voters, including two who voted in the November 8, 2016 election.
Looking closer at the supposedly century-old and child voters in Pennsylvania’s voter lists, we found a few of these voters were listed twice, with identical information, including identical voter IDs. This simply should never happen, and if it does, basic database maintenance should be able to identify these problems easily.
In Pennsylvania each voter record has a “base id” of nine digits, then a hyphen, then a two digit code indicating the county. For instance the voter id 001234567-01 indicates that the voter lives in county 01, or Adams County. If that voter moved to Allegheny county, his or her new “full id” would be 001234567-02. That voter’s “root id” is 001234567, and that id should identify the voter as long as he or she is a voter in Pennsylvania. Voter IDs are never reused on new voters, and voters who reactivate their registrations don’t change ID.
As stated in Pennsylvania’s Title 25 election law, each voter should have exactly one ID. This means that when a voter moves to a new PA county, the previous ID MUST be replaced by the new ID.
Curious, we made lists of all the “full ids” and “root ids” in each data set, and checked for duplicates.
We found nearly 13,000 exact duplicate records (same root id and same county) and over 2000 “two county” duplicates (same root id, different counties) in our November 7th and November 28th data sets. Very simple and standard database maintenance should have detected these duplicates and flagged them for deletion, perhaps after mailing requests for address verification in the case of “two county” duplicates.
These duplicates could not have been created by any action, inaction, or attempted fraud on the voter’s part.
Even more surprisingly, we found that ALL of the exact duplicate “voters” were missing from our post-election data sets, and nearly all of the “two county” duplicates were gone too.
We then took a closer look at the complete records for the voters with two identical IDs, and we found something truly bizarre and extraordinary.
The two records were completely identical, except in one respect. Each Pennsylvania voter record includes fields indicating whether that voter did or did not vote in the last 40 elections. For each pair of duplicate voter IDs, the two IDs apparently never voted in the same election. Because of these staggered election dates between voters and their clones, if the vote count was tallied for a given election, only one of any pair of duplicate IDs would show up for any election examined.
The “Last Voted Date” is how we were able to distinguish the original voters from their “clones”. In addition to voting history, the PA database also records a “last voted date” as a separate field. The last voted date for the original voter should correspond to the most recent of the elections recorded in each voter’s election history. However, for all clone voters the recorded “last voted date” is not the same as any of the elections the data says they “voted” in. This is what the pairs of voter records look like:
How many of the clones cast a ballot in November 2016?
It’s hard to know for certain because only 53% of voter “last voted dates” had been recorded by November 28, and in our next snapshot (Feb 2017) all these clone voter IDs had disappeared.
So looking only at the Nov 28 data set, we counted the clones who were marked as having voted November 8, 2016, just to get a sense. The answer: 7,299 of our 12,939 clones “voted”. That’s 56%. And only 53% of the vote histories had been recorded. This makes it a safe guess that almost all of our nearly 13,000 clone voters were marked as having voted.
Does this mean that the clone votes were counted in official election tallies? We have no idea.
The discovery of 13,000 clone voters who all apparently “voted” in 2016 before vanishing from the data was unsettling. The next question we had was, Are there any other duplicates in the data? Turns out the answer is yes, we found many more strange Pennsylvania voting doubles.
We looked for registered voters who shared an address in the November 7 data set. Straight away, we noticed that sometimes two women voters apparently shared both date of birth and first name, but had different last names and different voter IDs. The straightforward explanation is that these double records were caused by an error. If a woman changed her name upon marriage and re-registered to vote, perhaps a new ID was mistakenly created instead of updating the existing record. If so, however, those pairs of women should exist in data sets going back to their creation date. But when we checked the other data sets—before and after the election—in most cases only one of the two women existed. So if an error was introduced it only happened in October and was cleaned up after the election.
That made us wonder: Were there any more voters with the same date of birth sharing a household? In a large population we expect a few legitimate examples of this. Twins in the same household, for instance, or a couple who are exactly the same age.
In our November 7 data we found over 20,000 such households, which is a staggering number (twice as many as you’d expect even if all voters live with voters of the same age). In over 2500 of these cases though, the two occupants with the same birth date had identical names, which is even more unlikely. In over 10,000 more of these households the “birthday twins” had different last names, which casts doubt on the “young adult twins living at home” theory.
That made us curious how many duplicates in the pre-election November data lived at different addresses.
Of course it’s perfectly normal for people in a very large population to share a name, age and and birthday, even a first name, last name and birth date—especially for those with more common names. But we wondered whether we would find MORE of these “name and birthday” twins in the November 7 data than in the other data sets.
To do this, we pulled lists of first name + last name + date-of-birth for each voter, from all six of our voter registration data sets.We then counted all the unique values in each list and subtracted that number from the total.
For an illustration of what we did, let’s say you have a group of 100 people in which there are two people named Jennifer who were born September 1, 1960, and no other matches. That group has 99 unique first name/birthdate combos. Subtracting 99 from 100 shows us that one person in the set has the same first name and birth date as another person.
In a very large set of people (8.75 million in the case of the November 7 PA registered voter list) you would expect to find a great many people who share name, age, and birthday. But we wanted to know whether there were MORE registered voters who shared name and birth date just before the election than in the months before or after..
And indeed there were. Of particular interest to us were those who share a first name, last name AND date of birth. In the most recent data snapshot we studied, from July 31,, 2017, there were just under 7,500 voters with identical name/age/birthday combinations. In the pre-election data there were over three times as many 22,500. While some of the names on our “twins” list are quite common, a great many are not common at all.
Also striking was how many more first name/birth date combinations appear in the pre-election database than in other data sets. There were nearly 175,000 more voters in November 7 who shared a first name and date of birth than there were in the July 31 2017 data set.
Our earlier investigations caused us to become interested in people living at the same address. So once again we made a list of all addresses in each data set, and searched for duplicate records.
What we found was very interesting.
While nearly 500,000 voters were added to the voter rolls in the seven months before the election, fewer than 100,000 new addresses were added during this same time period.
To take a closer look at this phenomenon, we examined the addresses in which there were more voters in November than either August 15 ( 3 months before ) or February 27 (four months after). We found 220,000 registration records at addresses where there were more voters in November than EITHER three months prior or four months after.
In April 2016 there were on average 2.08 voters per household. The day before the election there were 2.14 voters per household. By February 2017, households were almost back to their pre-election state, at 2.09 voters per household. A very strange phenomenon indeed.
We made a list of these “over-full” households, and it’s very interesting to look at. For instance, one address in Philadelphia apparently housed six people on November 7. Three of them were named Carmen, and two of those Carmens shared a birthday and a last name, but the birthdays were exactly three years apart. One of these two Carmens apparently moved to that address between August 15, 2016 and November 7. Between November 7 and February 27, 2017, she disappeared from the voter rolls, along with one of the other housemates. A third housemate apparently moved in between August and November, and stayed there through at least February.
We found over 100,000 households that displayed similar patterns. Voter who were not clearly related to each other by name appearing in time for the November election, then disappearing within a few months.
Very peculiar indeed.
So we kept digging.
The huge differences in the numbers of duplicate voters between data sets caused us to look more closely at individual voters. Comparing a voter’s data from one data set to the same voter’s data in a second data set we suddenly noticed something odd.
The voter’s birth date had changed.
Of course we became curious about how often that happened. Perhaps it was a corrected error, or an odd fluke.
We wrote scripts that compared the records of voters that had the same ID, looking for changes in their “identifying features” – name, date of birth, and gender.
What we found was very surprising.
In Pennsylvania, any change to a voter record, including change of address or political party, must be voter initiated, either via an online form, or by fax or mail.
A change in marital status is the most common reason for a name change. Pennsylvania had 73,876 marriages and 33,749 divorces in 2016. According to a survey conducted by the New York Times, approximately 20% of women keep their maiden names after marriage, meaning 80% go through the name change process. That gives an estimated 59,100 name changes due to marriage for the entire year for all Pennsylvania adults, or an expected 35,000 for a 7 month period. Statistics for name changes following a divorce are harder to come by, but the information available on the process suggests this is a more difficult and time consuming process.
Date of birth and gender are considered to be identifying features used to distinguish voters with identical or similar names, and errors in these data fields should be relatively rare. That said, there is NO voter facing mechanism by which a voter can easily change his or her date of birth via the internet. Instead the voter must fill out the Department of Transportation form found here, then bring it in person to a PennDot Driver’s License Center in person along with his or her original state-issued birth certificate with embossed seal.
To change legal gender identification, this form is required, along with the signature of a licensed medical or social services worker. The form too must be presented in person at a State Driver’s License Center. The most common reason for a change of gender, and seemingly the only one recognized by the state of Pennsylvania, is gender reassignment. Out of Pennsylvania’s population of 12,702,379, transgender adults make up 0.44% of the population. This gives us a population of 55,890 transgender residents. Research has shown that amongst the other challenges this population faces, getting official documentation to match their gender identity is a challenge. As of 2015, nationally 71% of those surveyed did not have their updated gender on any form of identification. This gives us a rough estimate of 16,208 Pennsylvania residents in total whose gender has EVER been changed on legal forms of identification. Of course, only a fraction of those would go through the process of changing identification in any given year.
The data shows that 2.6% of all voter records showed a change to some part of the voter’s identity (name, DOB, or gender) in the seven months leading up to the election, and another 1.7% of records showed such change in the nine months following the election, for a total of 378,675 such changes. This seems like an extremely high number, especially since all of those changes are laborious and must be voter initiated IF valid.
Over 10,000 voter records, or over one in a thousand voters, showed changes to their dates of birth during this time period.
Over 175,000, or 2% of all voter records showed a change to the gender field before the election and over 95,000 showed a change in the nine months following the election. Pennsylvania has four gender options – F, M, U and an empty field. Over 98% of these changes to the field denoting gender showed a change to or from an “undesignated” state, with nearly 85% changing from undesignated to F or M before the election, but with over 12% actually changing from F or M to undesignated. About 1.5% of these registrations showed a change from male to female, or vice versa, but in only a small number of cases was there a name change that seemed to correspond to the new gender. Again, a legitimate gender reassignment almost always includes a name change.
As with the other seemingly bizarre changes to the Pennsylvania voter records we found, these changes accelerated in the last months before the election. In the period between August 15th and the election 146,764 registrations showed changes to the voter’s name, DOB, or gender, with 110,669 of these voters showing changes to DOB, or gender.
Since there is no clear mechanism for voters to change their dates of birth or genders, and since names were changed at nearly twice the expected rate before the election it’s hard to believe most of these changes were voter-initiated.
Even more extraordinarily, nearly 5000 of the records that showed a name, DOB, or gender change in the seven months before the election made another change in the nine months following the election. Many of these voting records also show a changed address, some of which later changed back. Here is an excerpt from our spreadsheet showing voters whose dates of birth changed twice:
As we attempted to determine which of these “flexible”voters had actually cast a ballot in November, we noticed one more very disturbing set of changes. Comparing “last voted” dates from the February 27, 2017 data set to “last voted” dates six months later in the July 31, 2017 data set, we found 3,413 voters whose records had been changed in ways that made no sense at all. 2602 of these records were changed so that the July data showed that the voter had voted in the November 2016 election. 460 of these records were changed so that a voter who had previously been shown as voting in the November election no longer was. One voter was actually marked as having voted in the upcoming November 7, 2017 election as of July 31, 2017. Only 262 of the 3,413 records that were changed in this way showed a “last changed” date after February 27, which indicates that these most of these changes were not made in a standard way. Or to put it another way, their voter data changed AFTER the last time the database said their data had changed. This simply should not be possible for any legitimate database modification.
How much of an effect did these registration changes have on the outcome of the election? It’s hard to know for sure. Read my interview with a Pennsylvania election judge and draw your own conclusions.