Showing posts with label normalizing. Show all posts
Showing posts with label normalizing. Show all posts

Wednesday, March 28, 2012

normalizing using sql commands

Hi

Please can someone point me in the direction, i built a very badly designed database consisting of only one huge table when i first started databases, since learning about normalization i have designed and set up a new database which consists of many more tables instead of just the one. My question is where do i start in transfering the data from the old single tabled database to my new multi-tabled database?

I have MS SQL server 2005 managment studio if that helps, but want to transfer around 200,000 rows of data into the new database. Both new and old databases are on the same server.

thanks in advancewith out some idea of the target and source schemas this is a broad question.

I would add a column to each of your tables for the purpose of the transfer that holds the primary key of your source table. This will allow you to establish your relationships while populating your tables. Remember to populate your parent tables before you populate your child tables so as not to violate your foriegn keys.

have fun.|||You will also find that the number of records is pretty well irrelevent - it is the number of tables in the newly normalised db that will more accurately determine your required effort.

If this is a one off then you need to work with copies of your databases and write one script at a time to normalise the data and populate the tables. As Sean says, start with the Parents. You just have to write a lot of scripts starting with "SELECT DISTINCT..." - no real shortcuts. Once you have written (and tested) all the required scripts you need to take your db off line, run them and then put your newly normalised db (you have changed the sproc interface\ application too yes?) on line.

To echo Sean - have fun :)|||hi guys

thanks for the advice! i'm really still at the starting blocks, how do i actually transfer the data?

I'm sure it will be a mixture of SELECT and INSERT statements, but can someone give me a simple example of selecting a tables data and inserting it into another using only sql?

cheers|||INSERT INTO OneTable (Col1, Col2)
SELECT ColA, ColB
FROM AnotherTable

Check Books Online and familiarise yourself with DML SQL.|||thanks pootle - i'm a noobie to booksonline, don't think i've used it before - i've just done a google search and downloaded the following:

link (http://www.microsoft.com/downloads/details.aspx?FamilyId=BE6A2C5D-00DF-4220-B133-29C1E0B6585F&displaylang=en)

Is this what you mean?|||AKA BoL BTW.

That's the chappy although that is the 2005 version. If you are using 2K then you can access it via Enterprise Manager:
Action-> Help
or Query Analyser:
Help-> Transact SQL Help

Read, absorb, learn - it is excellent.

A more gentle introduction to SQL:
http://www.w3schools.com/sql/default.asp|||Sorry, back again with another question!

I can easily move data around now using your simple example and BOL, how ever its a little more complicated as need to take newly created Foreign keys into account when transfering from old database (single table) to new database (multiple-tables). The following is the SQL i use to insert into my new database, i basically need to somehow get this to do the same when transfering data from my old database.

eg:
BEGIN TRANSACTION

DECLARE @.new_customer_id int

INSERT INTO CUSTOMER(name, email)
VALUES('Joe Bloggs', 'joe@.bloggs.com')

SET @.new_customer_id = @.@.IDENTITY

INSERT INTO VENUE(customer_id, business_name, address1)
VALUES(@.new_customer_id, 'bloggs corp', 'london')

COMMIT TRANSACTION

I'm guessing that i may need to use a WHILE loop (or is there an easier option?), but either i don't know how to use BOL properly or i can't seem to find any good examples on it!

thanks (again! ;) )|||Well - you could join your data using natural keys and propogate the surrogate (Identity) values as sets rather than looping. You defo want to avoid doing this in while loops - sets of data are the way to go.

For example - imagining that your surname\ forename combinations are unique (YOU NEED TO TEST THIS FIRST! - if they are not unique you need to either find a combination of fields that can serve as a natural key or eliminate the duplicates and handle those later on a case by case basis) and so will do as a natural key for the normalisation exercise...

Insert ALL your customers into the customers table in one query.

Link your newly populate customers table to the customers child data via surname and forename and include the identity value in your queries. You can now insert these to the child entities in one fell swoop rather than looping and inserting absolutes (like you are now).|||hi

Not sure if i confused things in the last post, i meant Primary not Foreign key.

Right, read your post about 10 times, think i've got it, or a similiar alternative!

Basically, i have a customers table and a venue table. Each venue has one customer, but one customer can have many venues, hence normalisation required me to split these into two seperate fileds.

I gave each customer a primary key, and also each venue a primary key.

My objective is to have the the customer id (auto incremented primary key from customers table) inserted/matched into the venues table so i will be able to tell which customer relates to which venue.

Link your newly populate customers table to the customers child data via surname and forename and include the identity value in your queries.

ok, so can copy over all the customer details to the customer table, and the venue details into the venue table, but I can insert the the venue_id field in both the customers and the venue table so they will have that in common.

This is my 'SQL think out loud i know its wrong' way of thinking...

INSERT INTO VENUE(customer_id)
SELECT customer_id FROM customer WHERE venue.venue_id = customer.venue_id

i'm thinking i need some type of join in there maybe?|||You want to check those SQL links again :) Your current route requires a UPDATE not an INSERT.

In case I didn't make much sense earlier I meant something along these lines:

INSERT INTO NewCustomers(ForeName, Surname, OtherCols)
SELECT DISTINCT ForeName, Surname, OtherCols FROM BigTable

INSERT INTO NewVenue(Venue_Name, Cusomer_ID, OtherCols)
SELECT DISTINCT BigTable.Venue_Name, NewCustomers.Customer_ID, BigTable.OtherCols
FROM BigTable INNER JOIN NewCustomers ON
NewCustomers.Surname = BigTable.Surname
AND NewCustomers .Forename = BigTable .Forename
To repeat - I am only flagging up surname and forename as possible join columns. You know your data - you may well have something much more appropriate.

HTH|||thanks flump, big hi-5 from the other side of the pennines! :D

Normalizing the data

Hi,

I have a table like this:

Col1 First_Year Last_Year
a 1990 1993

I want this data to be converted to
a 1990
a 1991
a 1992
a 1993

Is there any simple way to do it in SSIS without using Script Component?

Thx

No - you'll need to use script.

There are components for pivot and unpivot, but not for imputing missing members of a series.

Donald

|||

Thiru_, you can use DataDefractor SSIS source data flow component to accomplish this with no programming. This component pivots and normalizes fact data and dimensional metadata located in unstructured and semi-structured matrix data sources. You can download a free beta of the component at http://www.datadefractor.com.

sql

Normalizing the data

Hi,

I have a table like this:

Col1 First_Year Last_Year
a 1990 1993

I want this data to be converted to
a 1990
a 1991
a 1992
a 1993

Is there any simple way to do it in SSIS without using Script Component?

Thx

No - you'll need to use script.

There are components for pivot and unpivot, but not for imputing missing members of a series.

Donald

|||

Thiru_, you can use DataDefractor SSIS source data flow component to accomplish this with no programming. This component pivots and normalizes fact data and dimensional metadata located in unstructured and semi-structured matrix data sources. You can download a free beta of the component at http://www.datadefractor.com.

Normalizing sub-categories?

Hi,
I'm designing a database, but am having a problem with categories and sub
categories.
Basically we have documents. Each document has a category, and a sub
category. The selection of sub category is dependant on which category is
selected (so that you cannot select a sub category that does not relate to
its category).
If anyone has any ideas, I would be all ears.
Many thanks in advance.
AndyIf you have a separate table with documents on you can defin a Check
Constraint to check wheter the inserted combination exists in this table. Or
you use Foreign Key Constraint as follows:
CREATE TABLE [dbo].[Tab_Categories] (
[CategoryID] [char] (10),
[SubCategoryID] [char] (10)
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Tab_Docs] (
[DocID] [char] (10) ,
[CategoryID] [char] (10) ,
[SubcategoryID] [char] (10)
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Tab_Categories] ADD
CONSTRAINT [PK_Tab_Categories] PRIMARY KEY CLUSTERED
(
[CategoryID],
[SubCategoryID]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Tab_Docs] ADD
CONSTRAINT [PK_Docs] PRIMARY KEY CLUSTERED
(
[DocID]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Tab_Docs] ADD
CONSTRAINT [FK_Docs_Tab_Categories] FOREIGN KEY
(
[CategoryID],
[SubcategoryID]
) REFERENCES [dbo].[Tab_Categories] (
[CategoryID],
[SubCategoryID]
)
GO
HTH, Jens Smeyer.
http://www.sqlserver2005.de
--
"Andy" <Andy@.discussions.microsoft.com> schrieb im Newsbeitrag
news:25C8585D-3986-4DC6-9997-160B27BB49CC@.microsoft.com...
> Hi,
> I'm designing a database, but am having a problem with categories and sub
> categories.
> Basically we have documents. Each document has a category, and a sub
> category. The selection of sub category is dependant on which category is
> selected (so that you cannot select a sub category that does not relate to
> its category).
> If anyone has any ideas, I would be all ears.
> Many thanks in advance.
> Andy|||To model this properly you need 3 tables (additional columns omitted for
clarity):
CREATE TABLE Categories (
Category_name VARCHAR(50) NOT NULL,
CONSTRAINT PK_Categories
PRIMARY KEY (Category_name)
)
CREATE TABLE Sub_categories (
Sub_category_name VARCHAR(50) NOT NULL,
Category_name VARCHAR(50) NOT NULL,
CONSTRAINT PK_Sub_categories
PRIMARY KEY (Sub_category_name, Category_name),
CONSTRAINT FK_Sub_categories__Categories
FOREIGN KEY (Category_name)
REFERENCES Categories (Category_name)
)
CREATE TABLE Documents (
Document_Name VARCHAR(100) NOT NULL,
Sub_category_name VARCHAR(50) NOT NULL,
Category_name VARCHAR(50) NOT NULL,
CONSTRAINT PK_Documents
PRIMARY KEY (Documents),
CONSTRAINT FK_Documents__Sub_categories
FOREIGN KEY (Sub_category_name, Category_name)
REFERENCES Sub_categories (Sub_category_name, Category_name)
)
Jacco Schalkwijk
SQL Server MVP
"Andy" <Andy@.discussions.microsoft.com> wrote in message
news:25C8585D-3986-4DC6-9997-160B27BB49CC@.microsoft.com...
> Hi,
> I'm designing a database, but am having a problem with categories and sub
> categories.
> Basically we have documents. Each document has a category, and a sub
> category. The selection of sub category is dependant on which category is
> selected (so that you cannot select a sub category that does not relate to
> its category).
> If anyone has any ideas, I would be all ears.
> Many thanks in advance.
> Andy|||Jens,
Thanks for the reply.
Surely having sub-category in the documents table doesn't meet 3rd normal
form, since it relies on a non-key table i.e.category? This is my problem.
Its similar to what I have currently...
Documents
--
Doc ID (PK)
CatID (FK)
SubCatID (FK)
Categories
--
CATID (PK)
Cat Description
SubCategories
--
CatID (PK)
SubCatID (PK)
SubCat Description
...With a constraint between Categories and SubCategories
But I don't like it. Maybe Categories and sub categories is just one of
those things that isn't pretty or meets the rules?
"Jens Sü?meyer" wrote:

> If you have a separate table with documents on you can defin a Check
> Constraint to check wheter the inserted combination exists in this table.
Or
> you use Foreign Key Constraint as follows:
> CREATE TABLE [dbo].[Tab_Categories] (
> [CategoryID] [char] (10),
> [SubCategoryID] [char] (10)
> ) ON [PRIMARY]
> GO
> CREATE TABLE [dbo].[Tab_Docs] (
> [DocID] [char] (10) ,
> [CategoryID] [char] (10) ,
> [SubcategoryID] [char] (10)
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[Tab_Categories] ADD
> CONSTRAINT [PK_Tab_Categories] PRIMARY KEY CLUSTERED
> (
> [CategoryID],
> [SubCategoryID]
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[Tab_Docs] ADD
> CONSTRAINT [PK_Docs] PRIMARY KEY CLUSTERED
> (
> [DocID]
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[Tab_Docs] ADD
> CONSTRAINT [FK_Docs_Tab_Categories] FOREIGN KEY
> (
> [CategoryID],
> [SubcategoryID]
> ) REFERENCES [dbo].[Tab_Categories] (
> [CategoryID],
> [SubCategoryID]
> )
> GO
>
> HTH, Jens Sü?meyer.
> --
> http://www.sqlserver2005.de
> --
>
> "Andy" <Andy@.discussions.microsoft.com> schrieb im Newsbeitrag
> news:25C8585D-3986-4DC6-9997-160B27BB49CC@.microsoft.com...
>
>|||Jacco,
Thanks for that, its basically what I currently have. Should I not worry
about it breaking 3NF? - i.e. In the Documents table, sub_category_name
relies on category_name, and not the primary key?
Many thanks
Andy
"Jacco Schalkwijk" wrote:

> To model this properly you need 3 tables (additional columns omitted for
> clarity):
> CREATE TABLE Categories (
> Category_name VARCHAR(50) NOT NULL,
> CONSTRAINT PK_Categories
> PRIMARY KEY (Category_name)
> )
> CREATE TABLE Sub_categories (
> Sub_category_name VARCHAR(50) NOT NULL,
> Category_name VARCHAR(50) NOT NULL,
> CONSTRAINT PK_Sub_categories
> PRIMARY KEY (Sub_category_name, Category_name),
> CONSTRAINT FK_Sub_categories__Categories
> FOREIGN KEY (Category_name)
> REFERENCES Categories (Category_name)
> )
> CREATE TABLE Documents (
> Document_Name VARCHAR(100) NOT NULL,
> Sub_category_name VARCHAR(50) NOT NULL,
> Category_name VARCHAR(50) NOT NULL,
> CONSTRAINT PK_Documents
> PRIMARY KEY (Documents),
> CONSTRAINT FK_Documents__Sub_categories
> FOREIGN KEY (Sub_category_name, Category_name)
> REFERENCES Sub_categories (Sub_category_name, Category_name)
> )
> --
> Jacco Schalkwijk
> SQL Server MVP
>
> "Andy" <Andy@.discussions.microsoft.com> wrote in message
> news:25C8585D-3986-4DC6-9997-160B27BB49CC@.microsoft.com...
>
>|||Andy,
Do not store the CategoryID in the Documents table (store only the
SubCategoryID). The Category combobox should be unbound, used only in
the form (not stored in the database).
Razvan|||create table CATEGORY
(
CODE varchar(3), --PK
DESCRIPTION varchar(30)
)
create table SUB_CATEGORY
(
CODE varchar(3), --PK
DESCRIPTION varchar(30)
)
create table CAT_DEF
(
RID uniqueidentifier, --PK
CATEGORY varchar(3), -- FK to CATEGORY
SUB_CATEGORY varchar(3) -- FK to SUB_CATEGORY
)
go
unique constraint on CATEGORY & SUB_CATEGORY
create table DOCUMENT
(
DOC_ID integer, -- PK
DOC_NAME varchar(30),
CAT_DEF uniqueidentifier -- FK to CAT_DEF
)
you don't have to use unqiueidentifiers for the link between DOCUMENT and
CAT_DEF, they could be integers etc. Some may say you don't need the
RID/PRID columns at all, but its just a bit neater than having the category
and sub category fields repeated in DOCUMENT.
CAT_DEF allows you to associate the sub categories to the categories, hence
you cannot incorrectly associate a sub category with a category
"Andy" <Andy@.discussions.microsoft.com> wrote in message
news:25C8585D-3986-4DC6-9997-160B27BB49CC@.microsoft.com...
> Hi,
> I'm designing a database, but am having a problem with categories and sub
> categories.
> Basically we have documents. Each document has a category, and a sub
> category. The selection of sub category is dependant on which category is
> selected (so that you cannot select a sub category that does not relate to
> its category).
> If anyone has any ideas, I would be all ears.
> Many thanks in advance.
> Andy|||Category_name depends on sub_category_name, not the other way around. And
even that is only the case if sub_category_name is UNIQUE. If the same
sub_category_name can appear in multiple categories, the model I presented
is in 3NF. If sub_category_names are unique, the model below is in 3NF. It
wasn't clear to me from your original post whether sub_category_names are
unique or not.
CREATE TABLE Categories (
Category_name VARCHAR(50) NOT NULL,
CONSTRAINT PK_Categories
PRIMARY KEY (Category_name)
)
CREATE TABLE Sub_categories (
Sub_category_name VARCHAR(50) NOT NULL,
Category_name VARCHAR(50) NOT NULL,
CONSTRAINT PK_Sub_categories
PRIMARY KEY (Sub_category_name),
CONSTRAINT FK_Sub_categories__Categories
FOREIGN KEY (Category_name)
REFERENCES Categories (Category_name)
)
CREATE TABLE Documents (
Document_Name VARCHAR(100) NOT NULL,
Sub_category_name VARCHAR(50) NOT NULL,
CONSTRAINT PK_Documents
PRIMARY KEY (Documents),
CONSTRAINT FK_Documents__Sub_categories
FOREIGN KEY (Sub_category_name)
REFERENCES Sub_categories (Sub_category_name)
)
Jacco Schalkwijk
SQL Server MVP
"Andy" <Andy@.discussions.microsoft.com> wrote in message
news:B1A8D1FE-743E-43BC-B38F-0A784A542C26@.microsoft.com...
> Jacco,
> Thanks for that, its basically what I currently have. Should I not worry
> about it breaking 3NF? - i.e. In the Documents table, sub_category_name
> relies on category_name, and not the primary key?
> Many thanks
> Andy
> "Jacco Schalkwijk" wrote:
>

Normalizing my Database

Please i have created some tables Delivary with this columns (DelivaryId,DelivaryNo,QtyRecieved,DelivaryDate,ProductId) and Product with this columns (ProductId,ProductCode,ProductName,ProductPrice) as you can see the product table keeps record of products whlie the delivary table keeps record of stock supplied. I will like to create another table that will keep record of stock sold out (Invoice Table) based on the qty recieved from the delivaries table

Please help

I am thinking you need two more tables one for the inventory and another for the dates, delivery date, recieved date, purchase date and more. You want to separate out functional dependencies. Try the links below for sample data models and normalization tutorial. Hope this helps.

http://www.databaseanswers.org/data_models/

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

|||

Caddre is right, you probably need at the least, an inventory table (Tracks current stock), a receiving table (Tracks products received, date/time, etc), and a sales table (Tracks products sold, date/time) in addition to your products table.

Create triggers on the receiving table to update the quantities when records are added/updated/deleted.

Same for the sales table.

|||

Motley:

Caddre is right, you probably need at the least, an inventory table (Tracks current stock), a receiving table (Tracks products received, date/time, etc), and a sales table (Tracks products sold, date/time) in addition to your products table.

Create triggers on the receiving table to update the quantities when records are added/updated/deleted.

Same for the sales table.

hi thanks for ur response. The recieving table is my Delivaries Table (DelivaryId,DelivaryNo,Quantity,Date) if i understand u very well u mean i sholud create another table for keeping current stock record but if i do what then happens to the quantity column in the delivaries table

|||It would be the quantity delivered. You would then use that in the insert/update trigger to update the inventory table's quantity (Or what is commonly called "on hand" quantity).|||i think im begining to understand u but please try to expanciate more on what trigger does to a table or table column|||

CREATE TRIGGER (Transact-SQL)

Designing DDL Triggers

You can find more if you search on internet.

Normalizing help and naming conventions

I have an Access Database that I am working on normalizing to SQL Server.
This Access database has been revised multiple times, each time the
programmer (not me!) didn't work to get the existing data to the same level
as the revised program. For instance, the HairColor column has the
following DISTINCT values,
Auburn
Black
Black/Gray
Blonde
Brown
Brown/Gray
Dark Brown
Gray
Gray/Black
Gray/Brown
Lt. Brown
Red
Red-blonde
Red/Blonde
Red/Brown
Now the current revision of the Access database limits the values for the
HairColor column to,
Black
Blonde
Brown
Gray
Red
So I need to make the current values reflect the current dropdown. I know
how to go about this, but have a question about normalizing this database.
Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of just
leaving the text in there?
Also, if I am to create a new table, how does the naming convention work
with that? Should I prefix my table name with fk_HairColor, or something
like that?
Any advice is appreciated!
Thanks!
drewI would create a table named HairColor with the columns:
HairColorID int
HairColor varchar (20)
In any table that references this table, you can create a foreign key. You
will take up less space in your DB overall. Also, consider what would
happen if you decided to change "Lt. Brown" to "Light Brown". With the
normalized version, you change it in only one place.
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Toronto, ON Canada
.
"Drew" <drewDOTlaingATswvtc.dmhmrsas.virginia.gov> wrote in message
news:eh9TXyLXGHA.5024@.TK2MSFTNGP04.phx.gbl...
I have an Access Database that I am working on normalizing to SQL Server.
This Access database has been revised multiple times, each time the
programmer (not me!) didn't work to get the existing data to the same level
as the revised program. For instance, the HairColor column has the
following DISTINCT values,
Auburn
Black
Black/Gray
Blonde
Brown
Brown/Gray
Dark Brown
Gray
Gray/Black
Gray/Brown
Lt. Brown
Red
Red-blonde
Red/Blonde
Red/Brown
Now the current revision of the Access database limits the values for the
HairColor column to,
Black
Blonde
Brown
Gray
Red
So I need to make the current values reflect the current dropdown. I know
how to go about this, but have a question about normalizing this database.
Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of just
leaving the text in there?
Also, if I am to create a new table, how does the naming convention work
with that? Should I prefix my table name with fk_HairColor, or something
like that?
Any advice is appreciated!
Thanks!
drew|||Thanks for the response... I thought it was better to replace it with a FK,
but am trying to convince my boss the same...
So you don't use a different naming convention with FKs? I just have about
30 tables on this database and it is getting harder to read for each FK
table I include...
Thanks,
Drew
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:O5jEK4LXGHA.3808@.TK2MSFTNGP04.phx.gbl...
>I would create a table named HairColor with the columns:
> HairColorID int
> HairColor varchar (20)
> In any table that references this table, you can create a foreign key.
> You
> will take up less space in your DB overall. Also, consider what would
> happen if you decided to change "Lt. Brown" to "Light Brown". With the
> normalized version, you change it in only one place.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Toronto, ON Canada
> .
> "Drew" <drewDOTlaingATswvtc.dmhmrsas.virginia.gov> wrote in message
> news:eh9TXyLXGHA.5024@.TK2MSFTNGP04.phx.gbl...
> I have an Access Database that I am working on normalizing to SQL Server.
> This Access database has been revised multiple times, each time the
> programmer (not me!) didn't work to get the existing data to the same
> level
> as the revised program. For instance, the HairColor column has the
> following DISTINCT values,
> Auburn
> Black
> Black/Gray
> Blonde
> Brown
> Brown/Gray
> Dark Brown
> Gray
> Gray/Black
> Gray/Brown
> Lt. Brown
> Red
> Red-blonde
> Red/Blonde
> Red/Brown
> Now the current revision of the Access database limits the values for the
> HairColor column to,
> Black
> Blonde
> Brown
> Gray
> Red
> So I need to make the current values reflect the current dropdown. I know
> how to go about this, but have a question about normalizing this database.
> Should I create a seperate table, HairColor and then assign ID's to the
> HairColor column? What is the benefit of doing it this way, instead of
> just
> leaving the text in there?
> Also, if I am to create a new table, how does the naming convention work
> with that? Should I prefix my table name with fk_HairColor, or something
> like that?
> Any advice is appreciated!
> Thanks!
> drew
>|||I did say to use a foreign key. Here's an example. Let's say that you have
a Persons table, and each person has a hair colour. Here's how the Persons
table would go:
create table Persons
(
PersonID int primary key
, HairColorID int not null
FOREIGN KEY FK1_Persons
REFERENCES HairColor
, ...
)
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Toronto, ON Canada
.
"Drew" <drewDOTlaingATswvtc.dmhmrsas.virginia.gov> wrote in message
news:uW2Bs7LXGHA.196@.TK2MSFTNGP04.phx.gbl...
Thanks for the response... I thought it was better to replace it with a FK,
but am trying to convince my boss the same...
So you don't use a different naming convention with FKs? I just have about
30 tables on this database and it is getting harder to read for each FK
table I include...
Thanks,
Drew
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:O5jEK4LXGHA.3808@.TK2MSFTNGP04.phx.gbl...
>I would create a table named HairColor with the columns:
> HairColorID int
> HairColor varchar (20)
> In any table that references this table, you can create a foreign key.
> You
> will take up less space in your DB overall. Also, consider what would
> happen if you decided to change "Lt. Brown" to "Light Brown". With the
> normalized version, you change it in only one place.
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Toronto, ON Canada
> .
> "Drew" <drewDOTlaingATswvtc.dmhmrsas.virginia.gov> wrote in message
> news:eh9TXyLXGHA.5024@.TK2MSFTNGP04.phx.gbl...
> I have an Access Database that I am working on normalizing to SQL Server.
> This Access database has been revised multiple times, each time the
> programmer (not me!) didn't work to get the existing data to the same
> level
> as the revised program. For instance, the HairColor column has the
> following DISTINCT values,
> Auburn
> Black
> Black/Gray
> Blonde
> Brown
> Brown/Gray
> Dark Brown
> Gray
> Gray/Black
> Gray/Brown
> Lt. Brown
> Red
> Red-blonde
> Red/Blonde
> Red/Brown
> Now the current revision of the Access database limits the values for the
> HairColor column to,
> Black
> Blonde
> Brown
> Gray
> Red
> So I need to make the current values reflect the current dropdown. I know
> how to go about this, but have a question about normalizing this database.
> Should I create a seperate table, HairColor and then assign ID's to the
> HairColor column? What is the benefit of doing it this way, instead of
> just
> leaving the text in there?
> Also, if I am to create a new table, how does the naming convention work
> with that? Should I prefix my table name with fk_HairColor, or something
> like that?
> Any advice is appreciated!
> Thanks!
> drew
>|||>> Should I create a seperate table, HairColor and then assign ID's to the
HairColor column? What is the benefit of doing it this way, instead of
just
leaving the text in there? <<
Let's start with basics:
1) Do your research! Is there an industry standard code for hair
colors? Use it
2) If not, then design an encoding that has a missing and/or misc code
value -- Hey, I am bald!
3) If you expect the values to change, the list to be long, or that you
willneed to add other information later, then put the encoding into a
separate table.
4) If you expect the values to be static, the list to be short or that
you will not need to add other information later, then put the encoding
into a CHECK() constraint. For example, the ISO sex codes are going to
stay the same for awhile, as are the two-letter state codes.
Get a copy of SQL PROGRAMMING STYLE. You never, never prefix a data
element name with thigns to tell you (1) Where it is used -- so no
table names (2) how it is used -- so no pk- or fk- crap (3) how it is
store, so no data types. Follow ISO-11179 rules and name a thing for
what it is, independent of its location or usage.
Do not blindly design an encoding as a numbered list. Think about
hierarchical ("Dewey Decimal") codes, abbreviation codes, etc. That
is also in SQL PROGRAMMING STYLE.|||I second getting SQL Programming Style. I read it in 2 ws. It is
any easy read.|||For the record, I was agreeing with you!
Drew
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:eYsy3EMXGHA.752@.TK2MSFTNGP02.phx.gbl...
>I did say to use a foreign key. Here's an example. Let's say that you
>have
> a Persons table, and each person has a hair colour. Here's how the
> Persons
> table would go:
> create table Persons
> (
> PersonID int primary key
> , HairColorID int not null
> FOREIGN KEY FK1_Persons
> REFERENCES HairColor
> , ...
> )
>
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Toronto, ON Canada
> .
> "Drew" <drewDOTlaingATswvtc.dmhmrsas.virginia.gov> wrote in message
> news:uW2Bs7LXGHA.196@.TK2MSFTNGP04.phx.gbl...
> Thanks for the response... I thought it was better to replace it with a
> FK,
> but am trying to convince my boss the same...
> So you don't use a different naming convention with FKs? I just have
> about
> 30 tables on this database and it is getting harder to read for each FK
> table I include...
> Thanks,
> Drew
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:O5jEK4LXGHA.3808@.TK2MSFTNGP04.phx.gbl...
>

Normalizing full-text search rank?

I read many good postings here about the weird logic of full-text
ranking, and I hope I finally got it (if it is possible at all ;).
Yet... We biult an application to search in our database in varoius
ways, incl. full-text search. I want to show users only "relevant"
matches, i.e. matches that score above some threshold. For other
(simple) search options one can easily return a score between 0 and
100% - for example, macthing 4 of 5 parameters searched will count
80%. How about full-text search, though? I can't simply normalize the
range 0-1000 saying rank 0 = 0%, rank 1000 = 100%, since even exact
matches (e.g. searching for "salt" in a row containing just that one
single word) return ranks as low as 160.
Any ideas, thoughts, considerations are welcome!
This can't really be done. The problem is that rank changes per search
arguement (SARG)/token. So searching on a word like "salt" might return a
set of relatively low rankings, but searching on a word which occurs
relatively rarely in your docuement set like "anthropormorphological" would
return a higher rank.
Words/SARGs/Tokens which occur relatively rarely have higher resolving power
than Words/SARGs/Tokens which occur frequently. For instance consider you
searching for a friend named John Smith. John Smith is a very common name,
so you need something that distinguishes your John Smith from all other John
Smith's - ie something relatively rare - like a middle name. Ranking weights
common Words/SARGs/Tokens lower, than Words/SARGs/Tokens which occur more
rarely as these Words/SARGs/Tokens has higher resolving power and hence
relevance.
Ranking is an attempt to measure relevance. It is a statistical measure and
is useful for ordering search results. Your actual "hit" could be the first,
second, of 15th result.
The best thing to do, is to order your results, pick the highest ranked
item, divide all other ranks by this value and multiply by 100.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"istoynev" <istoynev@.hotmail.com> wrote in message
news:3b49ac82.0405142252.1a80edab@.posting.google.c om...
> I read many good postings here about the weird logic of full-text
> ranking, and I hope I finally got it (if it is possible at all ;).
> Yet... We biult an application to search in our database in varoius
> ways, incl. full-text search. I want to show users only "relevant"
> matches, i.e. matches that score above some threshold. For other
> (simple) search options one can easily return a score between 0 and
> 100% - for example, macthing 4 of 5 parameters searched will count
> 80%. How about full-text search, though? I can't simply normalize the
> range 0-1000 saying rank 0 = 0%, rank 1000 = 100%, since even exact
> matches (e.g. searching for "salt" in a row containing just that one
> single word) return ranks as low as 160.
> Any ideas, thoughts, considerations are welcome!
|||Thanks, Hilary!
I also thought about this trick, although it has bad bitter
performance impact, I am afraid... But it might be the only feasible
solution.
"Hilary Cotter" <hilaryk@.att.net> wrote in message news:<OiOcxB0OEHA.3300@.TK2MSFTNGP09.phx.gbl>...
> The best thing to do, is to order your results, pick the highest ranked
> item, divide all other ranks by this value and multiply by 100.
|||Again, performance with a TSQL solution would be expensive, if it was done on the client it would not be that expensive.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
sql

normalizing data warehouse

Here’s a story. I’m in a warehouse and I’m to normalize 8M records
The employee dimesion is composed of the following fields
Employee_key, employee_no, center_id, date_hired and other fields
What I want is to
1. Select disctint employee_no, centerid, datehired
Plus
2. the “top 1 employee_key” per group if grouped by
(employee_no,centerid,datehired)
3. no cursor pls.
the resultset
employee_no, centerid, datehired, employee_key --<--top 1
thank you, thank you…
thanks,
Jose de Jesus Jr. Mcp,Mcdba
Data Architect
Sykes Asia (Manila philippines)
MCP #2324787does this help? You could consider using min() function instead...
select
max(employee_key) as EmpKey,
employee_no,
centerid,
datehired
from
table
group by
employee_no, centerid, datehired
MC
"Jose G. de Jesus Jr MCP, MCDBA" <Email me> wrote in message
news:AF491592-2324-42A0-B397-FBA301F3DD6F@.microsoft.com...
> Here's a story. I'm in a warehouse and I'm to normalize 8M records
> The employee dimesion is composed of the following fields
> Employee_key, employee_no, center_id, date_hired and other fields
> What I want is to
> 1. Select disctint employee_no, centerid, datehired
> Plus
> 2. the "top 1 employee_key" per group if grouped by
> (employee_no,centerid,datehired)
> 3. no cursor pls.
> the resultset
> employee_no, centerid, datehired, employee_key --<--top 1
> thank you, thank you.
>
> --
> thanks,
> --
> Jose de Jesus Jr. Mcp,Mcdba
> Data Architect
> Sykes Asia (Manila philippines)
> MCP #2324787

Normalizing Address Information...

THE LAYOUT:
I have two tables: "Applicant_T" and "StreetSuffix_T"

The "Applicant_T" table contains fields for the applicant's current address, previous address and employer address. Each address is broken up into parts (i.e., street number, street name, street suffix, etc.). For this discussion, I will focus on the street suffix. For each of the addresses, I have a street suffix field as follows:

[Applicant_T]
CurrSuffix
PrevSuffix
EmpSuffix

The "StreetSuffix_T" table contains the postal service approved street suffix names. There are two fields as follows:

[StreetSuffix_T]
SuffixID <--this is the primary key
Name

For each of the addresses in the Applicant_T table, I input the SuffixID of the StreetSuffix_T table.

THE PROBLEM:
I have never created a view that would require the primary key of one table to be associated with multiple fields of another table (i.e., SuffixID-->CurrSuffix, SuffixID-->PrevSuffix, SuffixID-->EmpSuffix). I want to create a view of the Applicant_T table that will show the suffix name from the StreetSuffix_T table for each of the suffix fields in the Applicant_T table. How is this done?I got the solution from another forum. It is as follows:


create view ApplicantAddresses
( currstreetnumber
, currstreetname
, ...
, currsuffixname
, prevsuffixname
, empsuffixname
)
as
select currstreetnumber
, currstreetname
, ...
, c.name
, p.name
, e.name
from Applicant_T
inner
join StreetSuffix_T c
on currsuffix = c.SuffixID
inner
join StreetSuffix_T p
on prevsuffix = p.SuffixID
inner
join StreetSuffix_T e
on empsuffix = e.SuffixID
|||Having the primary key of one table be associated with mulitple fields in another table is not unusual. However, it does bring a "Spock's raised eyebrow" indicating a table design that could be improved upon. To use the same lookup table more than once you just need to create table aliases so that the query can distinguish between the tables. You also need aliases in the Select clause so you can tell them apart.

SELECT dbo.StreetSuffix_T.Suffix AS CurrSuffix, StreetSuffix_T_1.Suffix AS PrevSuffix, StreetSuffix_T_2.Suffix AS EmpSuffix
FROM dbo.Applicant_T INNER JOIN
dbo.StreetSuffix_T ON dbo.Applicant_T.CurrSuffix = dbo.StreetSuffix_T.SuffixID INNER JOIN
dbo.StreetSuffix_T StreetSuffix_T_1 ON dbo.Applicant_T.PrevSuffix = StreetSuffix_T_1.SuffixID INNER JOIN
dbo.StreetSuffix_T StreetSuffix_T_2 ON dbo.Applicant_T.EmpSuffix = StreetSuffix_T_2.SuffixID

However, the design of Applicant_T seems a bit suspect since you have multiple addresses. Maybe today you want current, former, and employer addresses but in the future you might need another one (spouses employer address, delivery address, second employer address, whatever). With your current design you'd need to add a bunch of new columns whenever you add a new address type.

It may be better to have a separate table that just handles addresses. It might have two keys, one to link back to the applicant_T table and another for the type of address held in that record. (0=current address, 1=previous address, 100=employer address, 1000=dog groomers address).|||McMurdoStation,

You are right. In my pursuit of a solution to a short-term issue, I neglected to consider the longer-term impact of my decision. I am going to change the table design.

Thanks for your input :)|||Yeah, you save less than a byte to normalize that, and lose tons of rotations in the joins.

Sacrifice space for efficiency in runtimes every once in awhile.|||The idea of normalization isn't to save space. If normalizing causes performance bottlenecks you can do things to address that. But that problem is easier to address when and if necessary than having a data model that can't handle change.|||In my particular case, scalability is of high importance. Normalizing the addresses in the manner McMurdoStation suggested, provides such scalability.

Normalizing a Crosstab

I re-designed a predecessor's database so that it is more properly
normalized. Now, I must migrate the data from the legacy system into
the new one. The problem is that one of the tables is a CROSSTAB
TABLE. Yes, the actual table is laid out in a cross-tabular fashion.
What is a good approach for moving that data into normalized tables?

This is the original table:

CREATE TABLE [dbo].[Sensitivities](
[Lab ID#] [int] NULL,
[Organism name] [nvarchar](60) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[Source] [nvarchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[BACITRACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CEPHALOTHIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CHLORAMPHENICOL] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[CLINDAMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ERYTHROMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[SULFISOXAZOLE] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[NEOMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[OXACILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[PENICILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[TETRACYCLINE] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[TOBRAMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[VANCOMYCIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[TRIMETHOPRIM] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[CIPROFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[AMIKACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[AMPICILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CARBENICILLIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[CEFTAZIDIME] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[GENTAMICIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[OFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[POLYMYXIN B] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[MOXIFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[GATIFLOXACIN] [nvarchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL,
[SENSI NOTE] [nvarchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]imani_technology_spam@.yahoo.com wrote:

Quote:

Originally Posted by

I re-designed a predecessor's database so that it is more properly
normalized. Now, I must migrate the data from the legacy system into
the new one. The problem is that one of the tables is a CROSSTAB
TABLE. Yes, the actual table is laid out in a cross-tabular fashion.
What is a good approach for moving that data into normalized tables?
>
This is the original table:
>


[snip]

What does the new structure look like? If these are drugs then I guess
you ought to use a formal coding scheme for medical terms (SNOMED for
example).

In general you can un-pivot as follows. SQL Server 2005 also has an
UNPIVOT keyword that makes things a bit easier. Look that up in Books
Online if you are using 2005.

SELECT s.LabIDNo, s.OrganismName, s.Source, c.Code, s.Sensitivity
FROM
(SELECT [Lab ID#], [Organism name], [Source],
BACITRACIN, 'BACITRACIN'
FROM dbo.Sensitivities
WHERE BACITRACIN ''
UNION ALL
SELECT [Lab ID#], [Organism name], [Source],
CEPHALOTHIN, 'CEPHALOTHIN'
FROM dbo.Sensitivities
WHERE CEPHALOTHIN ''
UNION ALL
SELECT [Lab ID#], [Organism name], [Source],
CHLORAMPHENICOL, 'CHLORAMPHENICOL'
FROM dbo.Sensitivities
WHERE CHLORAMPHENICOL ''
/* UNION ...etc */
) AS s (LabIDNo, OrganismName, Source, Sensitivity, Term)
LEFT JOIN MedicalCodes AS c
/* Note: I assume the target table won't allow nulls
LEFT JOIN guarantees you'll get an error if the code is missing
*/
ON s.Term = c.Term ;

(untested)

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/...US,SQL.90).aspx
--|||Sorry, I guess the new table got cut off. Hopefully, this can help:

CREATE TABLE [dbo].[Sensitivities](
[SensiID] [smallint] IDENTITY(1,1) NOT NULL,
[AntibioticID] [smallint] NULL,
[SourceID] [smallint] NULL,
[SusceptID] [smallint] NULL,
[OrganismID] [smallint] NULL,
[SensiNote] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,|||Sorry, I guess the new table got cut off. Hopefully, this can help:

CREATE TABLE [dbo].[Sensitivities](
[SensiID] [smallint] IDENTITY(1,1) NOT NULL,
[AntibioticID] [smallint] NULL,
[SourceID] [smallint] NULL,
[SusceptID] [smallint] NULL,
[OrganismID] [smallint] NULL,
[SensiNote] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

Each ID field represents an FK except for the IDENTITY (of course). In
the old table (the crosstab), each Antibiotic is in a separate column.
In the new structure, those columns will become fields in the
Antibiotics table.|||imani_technology_spam@.yahoo.com wrote:

Quote:

Originally Posted by

Sorry, I guess the new table got cut off. Hopefully, this can help:
>
CREATE TABLE [dbo].[Sensitivities](
[SensiID] [smallint] IDENTITY(1,1) NOT NULL,
[AntibioticID] [smallint] NULL,
[SourceID] [smallint] NULL,
[SusceptID] [smallint] NULL,
[OrganismID] [smallint] NULL,
[SensiNote] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,


You call that normalized!? Why make all those columns nullable? What is
the natural key supposed to be?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/...US,SQL.90).aspx
--|||Wow, back to the drawing board.

The original table looks like this:

Lab ID | Organism Name | Source | BACITRACIN

Monday, March 26, 2012

Normalization question

Hello,
I have a question about the benefits/gain or correctness
according to RDBMS convention of normalizing data tables.
My table in question is a single table that contains about
5,000,000 records that are queried to generate reports.
It does not feed any other tables. The un-normalized
table (tbl1) looks like this:
idNum Area Date Metric Value
1 A 1/1/2005 Q 3
1 A 1/1/2005 R 0
1 A 1/1/2005 S 17
1 A 1/1/2005 T 9
2 B 1/1/2005 Q 11
2 B 1/1/2005 R 8
2 B 1/1/2005 S 2
2 B 1/1/2005 T 30
...
10 A 1/2/2005 Q 3
10 A 1/2/2005 R 0
10 A 1/2/2005 S 17
10 A 1/2/2005 T 9
11 B 1/2/2005 Q 20
11 B 1/2/2005 R 22
11 B 1/2/2005 S 0
11 B 1/2/2005 T 17
...
So in this table I am capturing 4 metrics, Q, R, S, T for
each Area (A, B, ...) for each day. So for each Metric, I
am repeating the idNum, Area, and Date in each row.
In the normalized version, tbl1 would contain one
record/row for each idNum, Area, Date, and tbl2 would
contain the 4 rows of idNums, Metrics for each idNum,
Area, Date in tbl1
tbl1 tbl2
idNum Area Date idNum Metric Value
1 A 1/1/2005 1 Q 3
1 R 0
1 S 17
1 T 9
tbl1 would now shrink from 5,000,000 rows to about 50,000
(there's actually like 20 metrics for each area) but tbl2
would still have the 5,000,000 rows. My server has
300gigs drivespace, two 3gig cpu's, 8gigs ram. With the
normalized tables, I have 2 tables to manage now and have
to add a join for all the queries. I don't see the
benefit of normalizing my table for this scenario.
However, there may be concepts of RDBMS that I don't
understand where normalization is the correct way to go.
Any advice on this would be greatly appreciated.
Thanks,
Ron"Ron" <anonymous@.discussions.microsoft.com> wrote in message
news:1eec01c515d5$d62407b0$a601280a@.phx.gbl...
> idNum Area Date Metric Value
> 1 A 1/1/2005 Q 3
> 1 A 1/1/2005 R 0
> 1 A 1/1/2005 S 17
> 1 A 1/1/2005 T 9
> 2 B 1/1/2005 Q 11
> 2 B 1/1/2005 R 8
> 2 B 1/1/2005 S 2
> 2 B 1/1/2005 T 30
The main practical benefit of normalization (IMO) is elimination of
duplicate data. This has two sub-benefits: 1) reduction of storage space,
and 2) ease of updating. Depending on your circumstances, either or both of
these sub-benefits may not apply.|||Thanks for your reply. The main thing I was checking for
was if there were some concept I wasn't aware of. I can
see the reduced storage. And I suppose it is easier to
search through 50,000 rows vs 5,000,000 rows for stuff.
The conclusion I come to is that you can get more
performance with Normalization - or - it is easier to
write code for a single table but less performance. At my
place, the coding seems to change more than the data. I
guess the bottom line will be who is going to do more
work - me or the computer.

>--Original Message--
>"Ron" <anonymous@.discussions.microsoft.com> wrote in
message
>news:1eec01c515d5$d62407b0$a601280a@.phx.gbl...
>The main practical benefit of normalization (IMO) is
elimination of
>duplicate data. This has two sub-benefits: 1) reduction
of storage space,
>and 2) ease of updating. Depending on your circumstances,
either or both of
>these sub-benefits may not apply.
>
>.
>|||Normalizing the table will prevent update anomalies and may also increase
performance, despite the joins. Querying a table with 50,000 records with a
n
index takes roughly 16 compares, whereas querying 5,000,000 records with an
index takes roughly 23 compares. In addition, smaller records mean more
records per data page, which should reduce disk reads. There is no doubt
that reading the entire table will perform slower with a join, but it has
been my experience that most queries retrieve only a small fraction of the
data, so an inner loop or merge join (with a clustered index on common) fiel
d
will actually perform better with two tables because of the reduced number o
f
disk reads required to retrieve the data.
I make it a habit to fully normalize a database and then denormalize only to
simplify joins of many tables, and then only when there is a perceived
performance problem that cannot be fixed with a judiciously placed index.
"Ron" wrote:

> Hello,
> I have a question about the benefits/gain or correctness
> according to RDBMS convention of normalizing data tables.
> My table in question is a single table that contains about
> 5,000,000 records that are queried to generate reports.
> It does not feed any other tables. The un-normalized
> table (tbl1) looks like this:
> idNum Area Date Metric Value
> 1 A 1/1/2005 Q 3
> 1 A 1/1/2005 R 0
> 1 A 1/1/2005 S 17
> 1 A 1/1/2005 T 9
> 2 B 1/1/2005 Q 11
> 2 B 1/1/2005 R 8
> 2 B 1/1/2005 S 2
> 2 B 1/1/2005 T 30
> ...
> 10 A 1/2/2005 Q 3
> 10 A 1/2/2005 R 0
> 10 A 1/2/2005 S 17
> 10 A 1/2/2005 T 9
> 11 B 1/2/2005 Q 20
> 11 B 1/2/2005 R 22
> 11 B 1/2/2005 S 0
> 11 B 1/2/2005 T 17
> ...
> So in this table I am capturing 4 metrics, Q, R, S, T for
> each Area (A, B, ...) for each day. So for each Metric, I
> am repeating the idNum, Area, and Date in each row.
> In the normalized version, tbl1 would contain one
> record/row for each idNum, Area, Date, and tbl2 would
> contain the 4 rows of idNums, Metrics for each idNum,
> Area, Date in tbl1
> tbl1 tbl2
> idNum Area Date idNum Metric Value
> 1 A 1/1/2005 1 Q 3
> 1 R 0
> 1 S 17
> 1 T 9
> tbl1 would now shrink from 5,000,000 rows to about 50,000
> (there's actually like 20 metrics for each area) but tbl2
> would still have the 5,000,000 rows. My server has
> 300gigs drivespace, two 3gig cpu's, 8gigs ram. With the
> normalized tables, I have 2 tables to manage now and have
> to add a join for all the queries. I don't see the
> benefit of normalizing my table for this scenario.
> However, there may be concepts of RDBMS that I don't
> understand where normalization is the correct way to go.
> Any advice on this would be greatly appreciated.
> Thanks,
> Ron
>|||>> So in this table I am capturing 4 metrics, Q, R, S, T for each Area
(A, B, ...) for each day. So for each Metric, I am repeating the
idNum, Area, and Date in each row. <<
1) id_num is redundant.
2) DATE is a reserved word.
3) Since the metrics are attribute of a reading, they need to be in
columns, not expressed as values. A normalized table would look like
this:
CREATE TABLE Readings
(area_code CHAR(1) NOT NULL,
collection_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
Q_value INTEGER DEFAULT 0 NOT NULL,
R_value INTEGER DEFAULT 0 NOT NULL,
S_value INTEGER DEFAULT 0 NOT NULL,
T_value INTEGER DEFAULT 0 NOT NULL,
PRIMARY KEY (area_code, collection_date ));
Obviously you will need to add CHECK() constraints and the right
DEFAULT values.

Normalisation question?Help

I have an unnormalised data, and i would like to know if im normalizing the data correctly. It my homework, and its my first time doing normalisation as its so hard to understand. Thanks in advance.

UNF

Course code

Course title

Student Code

Student Name

Address

DOB

Telephone number

Module code

Module title

Tutor code

Tutor name

1NF

Student Code (Primary key)

Student Name

Address

DOB

Telephone number

-

Student Code( Foreign key)

Course code (Primary key)

Course title

Module code

Module title

Tutor code

Tutor name

2NF

Student Code (Primary key)

Student Name

Address

DOB

Telephone number

-

Course code (Primary key)

Student Code (Foreign key)

Course title

--

Module code (Primary key)

Module title

Tutor code

Tutor name

3NF

Student Code (Primary key)

Student Name

Address

DOB

Telephone number

-

Course code (Primary key)

Student Code (Foreign key)

Course title

Module Code(Primary key)

Student code (Foreign key)

Module title

--

Tutor code

Student code

Tutor name

In addition,

1. each course is taken by one or more studetns

2.each coruse is made up of one or more modules

3.a tutor may be teaching on one or more modules

4.each module must be delivered by one and only one tutor.

help pls..... :( im stuck