Wednesday, March 28, 2012

Normalize your database: NOT OPTIONAL!

I've been away for a while but now that I'm back one if the first things I noticed is how many of the problems on this forum could have been easily avoided by simply normalizing the database in question.

For those of you without formal SQL training, know this: Youhave to normalize your database. This is part of the process of database development. You can not just whack together a few tables in whatever way is the easiest for the problem at hand, because eventually your requirements will expand and then you need your data to be flexible.

Seriously, in the last 3 weeks I've seen many many questions where it is obvious that the problem is a flawed DB design, but all the so-called SQL gurus here will just answer the question without addressing the fact that the problem lies much deeper.

Here's a link I found on Google:
http://www.cse.unsw.edu.au/~cs3710/PMmaterials/Tutorials/normalise.htmGood point, but a qualifier :

sometimes de-normalizing is required or at least advisable from a performance point of view, though your point holds in thevast majority of cases. Your statement holds for virtually anoyone asking questions in this forum though.

j|||Yes, although your starting point is always a normalized DB. If you then realize you need to de-normalize for performance, then so be it. Never the other way around though.|||::sometimes de-normalizing is required or at least advisable from a performance point of
::view

Yes, and professionals call this thn a datawarehouse, where completly different rules apply.

Interesting nough I have never seen questions about data warehousing here, evryone is just dealing with transactional (OLTP) databases.|||It's a given to normalize your database to a certain degree. It's faster to select on an integer or bit based where clause than it would a float or varchar.

However, Atrax is correct, you can't really say it's the best choice since it's not ALWAYS the case. Especially when you have a normalized table of more than 15 tables, and some of those tables only hold 3-5 records of two fields, one being an identity.

Let's take for instance states. States don't change, and a char(2) is all you really need to store that data. Why would you have another field that takes up 4bytes to store an identity field FK of the normalized table holding "states"?

So Pierre, and valiant as your intentions were, it's still considered at the disgretion of the developer/dba to maintain how these tables will be designed for optimal storage as well as retrieval.|||I'm not saying take your DB to 5NF or DKNF, just 3NF is fine. What you're illustrating is still a 3NF normalized DB, and thus is still OK for just about any purpose out there. I'm not saying your DB should always be 100% normalized. I'm just pointing out that many developers simply skip that altogether.

All I'm saying is no normalization is BAD. E.g.

UserID int, Languages varchar(100)
(1, '1, 2, 5, 7')

I agree with your example, I personally use non-integer fields as PKs often. And if updates are possible but unlikely, you can even use cascade updates. But the DB is still normalized and should still have FKs etc.|||The problem is, many of the people on this forum don't even know what normalization is...

Heck, in the professional world there are a lot of people that don't understand this.

I am on a project where a programmer needed to know what what type of item something was.

Instead of adding a table of the types, and a foreign key to his original table, he added a column for each of the types to the table, and had a "Y" in the correct one and "N" in all the incorrect ones...|||Probably someone with COBOL experience...|||I think that Pierre's original point, and his intention, are solid. Even though there are certainly exceptions, many people who post here would benefit significantly from learning the basics of normalization, and the link looks like a solid tutorial. Learn the rules, then learn how (and when) to break the rules.

No comments:

Post a Comment