Don’t Be Afraid To Break Database Normalization
Posted on 18th August 2008 by SameerFor those of us who took an intro database class, the topic of database normalization was covered ad homonym as an absolute necessity in any database design. So, it takes a little bit of effort to pull yourself away from such a time tested concept. But, sometimes its absolutely necessary. (For those who are not familiar with the term, database normalization basically means “don’t repeat any information ever”).
Let’s say you have a social network site with a huge table of private messages between users with each row representing an individual message between two users. Each time a user logs in you want to query the number of unread messages to the user with a query like:
select count(*) from privatemessages where unread = 1 and userid = $userid
However, you find that because your private messages table is 6gb’s large and that some users have thousands of messages that the query is taking an unacceptably long time to run. So what do you do?
You could use a cache such as Memcached to store the number of unread messages for each user in memory, which would yield a significant improvement. However, Memcached still needs to occasionally run the query. To speed up the query, you break database normalization. Why not create another table called privatemessagescount which could store the number of unread messages per user (with columns “userid” and “numunread”). Each time the user receives a new message or reads a new message there will be an additional write to the DB, but it will be a simple write. And more importantly, your reads will be much faster because you will be looking up on a primary key (“userid”). Now the query simply becomes
select numunread from privatemessagescount where userid = $userid
So for performance sakes you’ve broken database normalization and added additional overhead to your application. But, I’d rather have a fast site than a normalized site that is too slow to be used.
[...] the database is laid out in a highly normalized form which is not ideal for performance, as it requires joins to be performed over multiple tables. A quick look at the db profiler shows [...]

