If you have voted before, then you can understand SQL tables using this wonderfully non-political analogy.
After this particularly controversial election, you probably know more about the U.S. voting system than you would ever want to.
But I bet that you never thought it would help you learn how to organize a relational database.
Fortunately, the basic rules of presidential elections are actually great guidelines to organize SQL tables and understand the concept of keys. Keys are the most important tool for setting up scalable and efficient SQL databases. They are also the most confusing part when you are just starting!
All you need to know before you read this article are the very basics of SQL- mainly, how tables are organized by rows and columns.
Rewind to Election Day
You have just left your voting booth with paper ballot in hand. You head over to the processing machine, and put it in the slot.
But what happens next? What do you think would be the best way to store your vote for president? Remember that about 136,000,000 other people voted as well in just this election!
Check out the ballot above. You can see that there are three key identifiers that we need to track on the ballot:
- The election that this ballot is associated with (2016)
- A ballot number (012)
- A voter ID, in this case, Social Security Number (012 34 5678)
Your initial thought might be to store this whole thing as one entry, like this:
But let’s think this through, considering that there are over 100 million other voters! Do we really need to store every piece of information on the ballot in every entry in the database? For example, do we really need to list “candidateA” and “candidateB” as the two choices 100 million times? Surely not!
And this does not even consider multiple elections. Imagine if we wanted to view Jennifer Hardy’s voting activity from the past 3 elections. Even then, it would not make sense to have 3 entries in our database that included her full name, SSN, gender etc. with every vote. This encourages bugs and inconsistent information when you update your database to change her party affiliation, for example.
Let’s take a step back and consider some basic principles of the voting system.
- One election has many votes (100 million+)
- One voter has many votes over the course of their lifetime
- One voter can participate in many elections (as many as 20)
Our three unique IDs- ballot #, social security number, election date- show that there are really three components in this system. If we put all three components in one entry, like we did above, we cannot capture the relationships between them. And we will repeat a lot of information.
Connecting Multiple Tables
With our three key identifiers, we have now outlined the three tables that are needed to properly store this information.
- Election table, with a date, Party A nominee and Party B nominee. Date is the unique identifier because some candidates may run for President twice, but two elections never happen on the same date.
- Vote table, that includes the ballot number, the SSN of the voter that it is associated with, the election that the ballot is associated with, the issue that is being voted on (president), and a 0 or 1 for which candidate was chosen. 0 stands for Party A, 1 for Party B due to alphabetical order.
- Voter table, which includes the voter’s first name, last name, Social Security #, year of birth, party affiliation, gender and state registered to vote.
This does a much better job of describing the different pieces of the voting system. There are actually three distinct events.
- You had to register to vote at some point before voting day! You were added to the voter database. Your SSN is the primary key, in this case. That means that it is the unique identifier for that row.
- Each party held a National Convention to finalize their candidates in July 2016 for the November 8 election. This created a new entry in the election table. The date is the primary key.
Now we are back to the original scenario. It’s November 8, 2016. You just put your ballot in the processing machine. So how should the machine really handle your ballot? First, your ballot should have the minimum amount of information necessary.
The machine then needs to answer a few questions to determine if the ballot is valid.
- Have you already voted in this election?
- Is the ballot connected to the 2016 election, or did you save one from a past election?
- Are you a registered voter?
Each one of these is a separate SQL query. I want to focus on the first three pieces of info on the ballot above. The ballot ID is the primary key for the votes table. The SSN and date are actually foreign keys. That means that they reference primary keys from the other two tables.
If we want to check if the ballot is coming from a registered voter, we are going to need to use a table join. A join references info from multiple tables using the primary/foreign key system. This means that our vote table must store both a unique identifier for the voter, and for the election as a whole. But it also means that we do not need to store all the voter’s info or the election info in that row! We just need a reference to the corresponding table.
The primary/foreign key system forms connections between tables.
We want to check if Jennifer Hardy is a registered voter after a ballot with Hardy’s name is submitted. We just need to confirm that she has an entry in the voters table. We use the SSN foreign key from the votes table, and the primary key from the voters table to link the two records. We also probably need to check if the state registered on her record matches the state where the vote was processed.
And that is it! If you want to practice, try and set up a database with the last 4 US presidential elections. Add 20 sample votes from 10 voters over the course of the 4 elections. And see if you can write the query that will check if a voter is attempting to submit a 2nd ballot in one election!