Monday, March 26, 2012

Normalization Question....

Hi,
I have a table of entities. Each of which has a unique ID. For a certain
subset of those entities, a particular set of values can apply to a pair of
entities.
Let's call the entity ID's E1 and E2 and the associated values x, y and z.
E1 and E2 are GUID's and x, y and z are floats.
My temptation is to create a table that has E1 and E2 as the primary key and
have columns for x, y and z. However, since E1 and E2 both represent an ID
column that relate to the same table, I feel I am breaking the first normal
form.
Essentially what needs to happen is that a pair of entity ID's needs to be
related to x, y and z values and I want the table normalized. Any advice
would be appreciated. Thanks in advance.
-PetePete Wittig wrote:
> Hi,
> I have a table of entities. Each of which has a unique ID. For a certain
> subset of those entities, a particular set of values can apply to a pair o
f
> entities.
> Let's call the entity ID's E1 and E2 and the associated values x, y and z.
> E1 and E2 are GUID's and x, y and z are floats.
> My temptation is to create a table that has E1 and E2 as the primary key a
nd
> have columns for x, y and z. However, since E1 and E2 both represent an I
D
> column that relate to the same table, I feel I am breaking the first norma
l
> form.
> Essentially what needs to happen is that a pair of entity ID's needs to be
> related to x, y and z values and I want the table normalized. Any advice
> would be appreciated. Thanks in advance.
> -Pete
It is extremely difficult to give goosd advise on such questions
without the opportunity to analyse your requirements in detail. Here
are some ideas. If you are saying that (x,y,z) uniquely defines a set
of entities then you can implement a cascading foreign key:
CREATE TABLE foo (x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT NOT NULL,
PRIMARY KEY (x,y,z));
CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, x FLOAT NOT
NULL, y FLOAT NOT NULL, z FLOAT NOT NULL, FOREIGN KEY (x,y,z)
REFERENCES foo (x,y,z) ON UPDATE CASCADE);
Perhaps more likely though is that your set is defined by some other
attribute(s) you haven't mentioned (foo_key in this example):
CREATE TABLE foo (foo_key INTEGER NOT NULL PRIMARY KEY, x FLOAT NOT
NULL, y FLOAT NOT NULL, z FLOAT NOT NULL);
CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, foo_key /* ?
*/ INTEGER NOT NULL REFERENCES foo (foo_key));
Your choice of datatypes would make me want to review this design. I've
almost never used FLOAT in tables. Although it certainly has legitimate
uses, most of the time the exact types (INTEGER or DECIMAL for example)
are much more useful. FLOAT probably isn't a good choice for a key
because of its imprecise nature. I have difficulty making sense of your
business requirement if you don't have the additional key I proposed
for the "foo" table but then I don't know your business...
Also, UNIQUEIDENTIFIER, while it may be used as a PK isn't generally
good as the only key of a table. Certainly it shouldn't be so if its
purpose is an artificial surrogate.
David Portas
SQL Server MVP
--|||Hi David,
Thank you for the response. Allow me to clarify: I have a table that
describes my entities (call it Table E). E contains 1...n entries.
CREATE TABLE E (E_key UNIQUEIDENTIFIER PRIMARY KEY, Name nvarchar(100) NOT
NULL);
The other table I was tempted to create would have looked like this:
CREATE TABLE Params (E1 UNIQUEIDENTIFIER, E2 UNIQUEIDENTIFIER, X FLOAT NOT
NULL, Y FLOAT NOT NULL, Z FLOAT NOT NULL,
PRIMARY KEY (E1, E2))
Where E1 and E2 would have been entries from Table E. So certain pairs of
entities (i.e. entry 5 and 2, entry 19 and 20) have parameters X, Y and Z.
X, Y and Z are dependent on the pair of entities.
If I understand you correctly, I think this method would work for me best:
"CREATE TABLE foo (foo_key INTEGER NOT NULL PRIMARY KEY, x FLOAT NOT
NULL, y FLOAT NOT NULL, z FLOAT NOT NULL);
CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, foo_key /* ?
*/ INTEGER NOT NULL REFERENCES foo (foo_key));"
However, allow me ask a few questions for clarification. In this case,
foo_key is an arbitrary key (i.e. an autonumber or a GUID)?
Thanks again.
BTW: My reasoning for using floats is because I am storing math/scientific
data so I need a high level of precision.
-Pete
"David Portas" wrote:

> Pete Wittig wrote:
> It is extremely difficult to give goosd advise on such questions
> without the opportunity to analyse your requirements in detail. Here
> are some ideas. If you are saying that (x,y,z) uniquely defines a set
> of entities then you can implement a cascading foreign key:
> CREATE TABLE foo (x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT NOT NULL,
> PRIMARY KEY (x,y,z));
> CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, x FLOAT NOT
> NULL, y FLOAT NOT NULL, z FLOAT NOT NULL, FOREIGN KEY (x,y,z)
> REFERENCES foo (x,y,z) ON UPDATE CASCADE);
> Perhaps more likely though is that your set is defined by some other
> attribute(s) you haven't mentioned (foo_key in this example):
> CREATE TABLE foo (foo_key INTEGER NOT NULL PRIMARY KEY, x FLOAT NOT
> NULL, y FLOAT NOT NULL, z FLOAT NOT NULL);
> CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, foo_key /* ?
> */ INTEGER NOT NULL REFERENCES foo (foo_key));
> Your choice of datatypes would make me want to review this design. I've
> almost never used FLOAT in tables. Although it certainly has legitimate
> uses, most of the time the exact types (INTEGER or DECIMAL for example)
> are much more useful. FLOAT probably isn't a good choice for a key
> because of its imprecise nature. I have difficulty making sense of your
> business requirement if you don't have the additional key I proposed
> for the "foo" table but then I don't know your business...
> Also, UNIQUEIDENTIFIER, while it may be used as a PK isn't generally
> good as the only key of a table. Certainly it shouldn't be so if its
> purpose is an artificial surrogate.
> --
> David Portas
> SQL Server MVP
> --
>|||Pete Wittig wrote:
> Hi David,
> Thank you for the response. Allow me to clarify: I have a table that
> describes my entities (call it Table E). E contains 1...n entries.
> CREATE TABLE E (E_key UNIQUEIDENTIFIER PRIMARY KEY, Name nvarchar(100) NOT
> NULL);
> The other table I was tempted to create would have looked like this:
> CREATE TABLE Params (E1 UNIQUEIDENTIFIER, E2 UNIQUEIDENTIFIER, X FLOAT NOT
> NULL, Y FLOAT NOT NULL, Z FLOAT NOT NULL,
> PRIMARY KEY (E1, E2))
> Where E1 and E2 would have been entries from Table E. So certain pairs of
> entities (i.e. entry 5 and 2, entry 19 and 20) have parameters X, Y and Z.
> X, Y and Z are dependent on the pair of entities.
> If I understand you correctly, I think this method would work for me best:
> "CREATE TABLE foo (foo_key INTEGER NOT NULL PRIMARY KEY, x FLOAT NOT
> NULL, y FLOAT NOT NULL, z FLOAT NOT NULL);
> CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, foo_key /* ?
> */ INTEGER NOT NULL REFERENCES foo (foo_key));"
> However, allow me ask a few questions for clarification. In this case,
> foo_key is an arbitrary key (i.e. an autonumber or a GUID)?
> Thanks again.
> BTW: My reasoning for using floats is because I am storing math/scientific
> data so I need a high level of precision.
> -Pete
>
Your choice of keys in the Params table looks very suspect to me. If E1
or E2 can appear multiple times then apparently these are groups, not
pairs. I had undestood that E1 determined (x,y,z) and that E2
determined the SAME (x,yz). If so, your table creates redundancy
because you allow E1 or E2 to be added multiple times with different
values of (x,y,z). Based on your original post I would have guessed:
CREATE TABLE Params (E1 UNIQUEIDENTIFIER NOT NULL PRIMARY KEY
REFERENCES e (e_key), E2 UNIQUEIDENTIFIER NOT NULL UNIQUE REFERENCES e
(e_key), x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT NOT NULL)
but now I have my doubts as to whether I understood you.
Design-by-newsgroup is a recipe for these kinds of misunderstandings
unfortunately.

> CREATE TABLE entities (e UNIQUEIDENTIFIER PRIMARY KEY, foo_key /* ?
> */ INTEGER NOT NULL REFERENCES foo (foo_key));"
> However, allow me ask a few questions for clarification. In this case,
> foo_key is an arbitrary key (i.e. an autonumber or a GUID)?
I doubt it, but you tell me. I was guessing there might be some
meaningful attribute that relates pairs or sets of entities together.
You wouldn't (shouldn't) need to use an artificial key unless (x,y,z)
are unique in foo.
David Portas
SQL Server MVP
--|||Thanks for the reply. Let me try and put this into context. In my
application, I want to look up an entity. That entity has certain attribute
s
which are returned such as its name. Another type of information is its
interaction parameters, how it interacts with other entities. This is the
data that I am trying to store.
So for a entity 1, I would return its name. Then I would want to return all
the ID's of the other entities it interacts with as well as the x, y and z
parameters associated with each particular pair/interaction (so potentially,
entity 1 & entity 2 and their associated x, y and z, entity 1 & entity 5 and
their associated x, y and z...).
Additionally, I will have a tool in which I can enter two entities and
return their x, y and z parameters.
In the application I will have the ID for entity 1, what I need to get is
all entity ID's that have an interaction with entity 1 and the associated x,
y and z parameters for that interaction.
I think you are correct in saying the key in my table Params looks suspect.
I agree. Since the x, y and z parameters would be the same for the pairs
entity1 & entity 2 and entity 2 & entity 1.
Does this help to clarify?
"David Portas" wrote:

> Pete Wittig wrote:
> Your choice of keys in the Params table looks very suspect to me. If E1
> or E2 can appear multiple times then apparently these are groups, not
> pairs. I had undestood that E1 determined (x,y,z) and that E2
> determined the SAME (x,yz). If so, your table creates redundancy
> because you allow E1 or E2 to be added multiple times with different
> values of (x,y,z). Based on your original post I would have guessed:
> CREATE TABLE Params (E1 UNIQUEIDENTIFIER NOT NULL PRIMARY KEY
> REFERENCES e (e_key), E2 UNIQUEIDENTIFIER NOT NULL UNIQUE REFERENCES e
> (e_key), x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT NOT NULL)
> but now I have my doubts as to whether I understood you.
> Design-by-newsgroup is a recipe for these kinds of misunderstandings
> unfortunately.
>
> I doubt it, but you tell me. I was guessing there might be some
> meaningful attribute that relates pairs or sets of entities together.
> You wouldn't (shouldn't) need to use an artificial key unless (x,y,z)
> are unique in foo.
> --
> David Portas
> SQL Server MVP
> --
>|||Pete Wittig wrote:

> Thanks for the reply. Let me try and put this into context. In my
> application, I want to look up an entity. That entity has certain attribu
tes
> which are returned such as its name. Another type of information is its
> interaction parameters, how it interacts with other entities. This is the
> data that I am trying to store.
> So for a entity 1, I would return its name. Then I would want to return a
ll
> the ID's of the other entities it interacts with as well as the x, y and z
> parameters associated with each particular pair/interaction (so potentiall
y,
> entity 1 & entity 2 and their associated x, y and z, entity 1 & entity 5 a
nd
> their associated x, y and z...).
> Additionally, I will have a tool in which I can enter two entities and
> return their x, y and z parameters.
> In the application I will have the ID for entity 1, what I need to get is
> all entity ID's that have an interaction with entity 1 and the associated
x,
> y and z parameters for that interaction.
> I think you are correct in saying the key in my table Params looks suspect
.
> I agree. Since the x, y and z parameters would be the same for the pairs
> entity1 & entity 2 and entity 2 & entity 1.
> Does this help to clarify?
>
>
If we are only talking about pairs of entities then I think we're
nearly there. Just add the constraint E1<E2 so that you can never get
redundant pairs (A-B and B-A):
CREATE TABLE Params (E1 UNIQUEIDENTIFIER NOT NULL PRIMARY KEY
REFERENCES e (e_key), E2 UNIQUEIDENTIFIER NOT NULL UNIQUE REFERENCES e
(e_key), CHECK (e1 < e2), x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT
NOT NULL);
BTW I have left out the constraint names for brevity. You should always
give constraints your own names rather than let the system generate
them. It makes them much easier to maintain.
David Portas
SQL Server MVP
--|||Thanks for the reply.
In regards to this table:
CREATE TABLE Params (E1 UNIQUEIDENTIFIER NOT NULL PRIMARY KEY
REFERENCES e (e_key), E2 UNIQUEIDENTIFIER NOT NULL UNIQUE REFERENCES e
(e_key), CHECK (e1 < e2), x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT
NOT NULL);
If I read this table correctly, it has only E1 as the primary key. I
believe that the primary key will have to be E1 and E2 since I could have
values like this (I've substituted ints for unique identifiers for
convenience in this example):
E1 E2 x y z
-- -- -- -- --
1 5 0.1 0.2 0.3
1 6 0.4 0.5 0.6
9 1 0.7 0.8 0.9
If I make the addition of including E2 in the primary key, will that effect
the "CHECK (e1 < e2)"?
I still have a question with regards to normalization. Since there is an E1
and E2 column in the Params table, does this mean it violates the 1NF? If
not, could you please explain why?
Thanks again.
"David Portas" wrote:

> Pete Wittig wrote:
>
> If we are only talking about pairs of entities then I think we're
> nearly there. Just add the constraint E1<E2 so that you can never get
> redundant pairs (A-B and B-A):
> CREATE TABLE Params (E1 UNIQUEIDENTIFIER NOT NULL PRIMARY KEY
> REFERENCES e (e_key), E2 UNIQUEIDENTIFIER NOT NULL UNIQUE REFERENCES e
> (e_key), CHECK (e1 < e2), x FLOAT NOT NULL, y FLOAT NOT NULL, z FLOAT
> NOT NULL);
> BTW I have left out the constraint names for brevity. You should always
> give constraints your own names rather than let the system generate
> them. It makes them much easier to maintain.
> --
> David Portas
> SQL Server MVP
> --
>|||Pete Wittig wrote:
> E1 E2 x y z
> -- -- -- -- --
> 1 5 0.1 0.2 0.3
> 1 6 0.4 0.5 0.6
> 9 1 0.7 0.8 0.9
> If I make the addition of including E2 in the primary key, will that effec
t
> the "CHECK (e1 < e2)"?
Still looks OK to me.

> I still have a question with regards to normalization. Since there is an
E1
> and E2 column in the Params table, does this mean it violates the 1NF? If
> not, could you please explain why?
>
First Normal Form is an elusive concept. Strictly speaking any table
conforms to 1NF if it has a candidate key and doesn't permit nulls. In
practice we look out for attributes that contain more than one domain
of values or have several columns representing the same attribute or
contain data structures encoded in strings. Loosely speaking we say
that these are a violation of 1NF. These are all subjective notions.
Sometimes it seems "obvious" when the spirit of 1NF is being violated.
Sometimes it is more tricky and the judgement will come down to one's
knowledge and experience of what works and what doesn't.
In your case we can say that using a pair of foreign keys in a single
table to represent a many-to-many relationship is a proven design that
appears frequently and that almost every database architect must have
used it. There are alternatives but the major benefit of your design is
that it's very easy to apply the necessary constraints for the business
rules. Your table is in 1NF and I don't see a better design.
David Portas
SQL Server MVP
--|||Thanks very much David. I appreciate the advice.
"David Portas" wrote:

> Pete Wittig wrote:
> Still looks OK to me.
>
> First Normal Form is an elusive concept. Strictly speaking any table
> conforms to 1NF if it has a candidate key and doesn't permit nulls. In
> practice we look out for attributes that contain more than one domain
> of values or have several columns representing the same attribute or
> contain data structures encoded in strings. Loosely speaking we say
> that these are a violation of 1NF. These are all subjective notions.
> Sometimes it seems "obvious" when the spirit of 1NF is being violated.
> Sometimes it is more tricky and the judgement will come down to one's
> knowledge and experience of what works and what doesn't.
> In your case we can say that using a pair of foreign keys in a single
> table to represent a many-to-many relationship is a proven design that
> appears frequently and that almost every database architect must have
> used it. There are alternatives but the major benefit of your design is
> that it's very easy to apply the necessary constraints for the business
> rules. Your table is in 1NF and I don't see a better design.
> --
> David Portas
> SQL Server MVP
> --
>|||Thanks for your help David. I appreciate it.
"David Portas" wrote:

> Pete Wittig wrote:
> Still looks OK to me.
>
> First Normal Form is an elusive concept. Strictly speaking any table
> conforms to 1NF if it has a candidate key and doesn't permit nulls. In
> practice we look out for attributes that contain more than one domain
> of values or have several columns representing the same attribute or
> contain data structures encoded in strings. Loosely speaking we say
> that these are a violation of 1NF. These are all subjective notions.
> Sometimes it seems "obvious" when the spirit of 1NF is being violated.
> Sometimes it is more tricky and the judgement will come down to one's
> knowledge and experience of what works and what doesn't.
> In your case we can say that using a pair of foreign keys in a single
> table to represent a many-to-many relationship is a proven design that
> appears frequently and that almost every database architect must have
> used it. There are alternatives but the major benefit of your design is
> that it's very easy to apply the necessary constraints for the business
> rules. Your table is in 1NF and I don't see a better design.
> --
> David Portas
> SQL Server MVP
> --
>

No comments:

Post a Comment