Monday, March 26, 2012

Normalization connundrum

I've come up with this issue in several apps now. There are things that, from
one perspective, are all handled the same, so it would be desirable that they
all be handled in the same table with some field as a type specification.
From other perspective of foreign key relationships, however, they are
different things and can't be stored in the same table.

For example, I have a scheme for indicating mappings between dimension records
at one time period to new dimension records at another time period. I could
use one set of tables for all mappings since they all work exactly the same
way, but then I can't set up DRI between the mapping tables and the dimension
tables. If I just make separate mapping tables for each dimension table, then
I'm creating 4 new tables per dimension table, all identical with respect to
what fields they contain, what kinds of unique constraints they have, and what
relationships they have to each other with the sole distinction that they each
map to the integer-type key of a different dimension table. I would not look
forward to doing maintenance on this schema!

Is there any strategy for having the cake and eating it, too?Steve Jorgensen wrote:

> I've come up with this issue in several apps now. There are things that, from
> one perspective, are all handled the same, so it would be desirable that they
> all be handled in the same table with some field as a type specification.
> From other perspective of foreign key relationships, however, they are
> different things and can't be stored in the same table.
> For example, I have a scheme for indicating mappings between dimension records
> at one time period to new dimension records at another time period. I could
> use one set of tables for all mappings since they all work exactly the same
> way, but then I can't set up DRI between the mapping tables and the dimension
> tables. If I just make separate mapping tables for each dimension table, then
> I'm creating 4 new tables per dimension table, all identical with respect to
> what fields they contain, what kinds of unique constraints they have, and what
> relationships they have to each other with the sole distinction that they each
> map to the integer-type key of a different dimension table. I would not look
> forward to doing maintenance on this schema!
> Is there any strategy for having the cake and eating it, too?

That things are handled the same way is not a criterion for putting them
into a single table. Let me refer you to Date and Cobb.

A table is a set. If the data constitutes a set as defined by set theory
it belongs in one table. If not ... it doesn't. The sole exceptions
being non-relational models such as star and object schemas.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||On Mon, 05 Jan 2004 17:42:22 -0800, Daniel Morgan <damorgan@.x.washington.edu>
wrote:

>Steve Jorgensen wrote:
>> I've come up with this issue in several apps now. There are things that, from
>> one perspective, are all handled the same, so it would be desirable that they
>> all be handled in the same table with some field as a type specification.
>> From other perspective of foreign key relationships, however, they are
>> different things and can't be stored in the same table.
>>
>> For example, I have a scheme for indicating mappings between dimension records
>> at one time period to new dimension records at another time period. I could
>> use one set of tables for all mappings since they all work exactly the same
>> way, but then I can't set up DRI between the mapping tables and the dimension
>> tables. If I just make separate mapping tables for each dimension table, then
>> I'm creating 4 new tables per dimension table, all identical with respect to
>> what fields they contain, what kinds of unique constraints they have, and what
>> relationships they have to each other with the sole distinction that they each
>> map to the integer-type key of a different dimension table. I would not look
>> forward to doing maintenance on this schema!
>>
>> Is there any strategy for having the cake and eating it, too?
>That things are handled the same way is not a criterion for putting them
>into a single table. Let me refer you to Date and Cobb.
>A table is a set. If the data constitutes a set as defined by set theory
>it belongs in one table. If not ... it doesn't. The sole exceptions
>being non-relational models such as star and object schemas.

Then, the correct wording of the question might refer to avoiding schema
duplication (leading to difficulty of maintenance) rather than normalization.
Perhaps, the best answer in this case would not be the best normalized answer.|||Steve Jorgensen wrote:

> Then, the correct wording of the question might refer to avoiding schema
> duplication (leading to difficulty of maintenance) rather than normalization.
> Perhaps, the best answer in this case would not be the best normalized answer.

Not unless there is some compelling case for denormalization. And I
don't see one in what was posted.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||On Mon, 05 Jan 2004 19:38:41 -0800, Daniel Morgan <damorgan@.x.washington.edu>
wrote:

>Steve Jorgensen wrote:
>> Then, the correct wording of the question might refer to avoiding schema
>> duplication (leading to difficulty of maintenance) rather than normalization.
>> Perhaps, the best answer in this case would not be the best normalized answer.
>Not unless there is some compelling case for denormalization. And I
>don't see one in what was posted.

The compelling aspects as I see them are ...

1. It is simply a good practice to eliminate duplication in code or in a data
model. Any intentional duplication is a source of potential trouble, since
it's possible to make subtle mistakes when attempting to change that shared,
repeated code.

2. Having the number of supporting tables multiplied by the number of
dimension tables makes for a vastly more complex ER diagram that someone will
have to decipher in the future.

3. I wouldn't have to resort to dynamic table names in code or writing a code
generator to avoid duplicating code as well. This is especially an issue if I
am I use stored procedures to update the data in these tables.

Perhaps, you don't think these are compelling, and that's fine. I find them
compelling, though, and I'm still hoping for some advice on how to accomplish
it.

I had one idea in which I can use an intervening table to translate between
the general and specific cases. Basically, I would have a generic dimension
table with a type code, then a specific dimension table (say category) that
also has a dimension ID and a type code and a 1-1 relationship to the generic
dimension on both fields. The specific table, then has a constraint that only
allows the applicable dimension type code as a valid value for the type code
field.

Any opinions on this? Other ideas?|||Steve Jorgensen wrote:

> Perhaps, you don't think these are compelling, and that's fine. I find them
> compelling, though, and I'm still hoping for some advice on how to accomplish
> it.

You are correct. I'd suggest you ask Mr. Celko for his opinion. ;-)

> I had one idea in which I can use an intervening table to translate between
> the general and specific cases. Basically, I would have a generic dimension
> table with a type code, then a specific dimension table (say category) that
> also has a dimension ID and a type code and a 1-1 relationship to the generic
> dimension on both fields. The specific table, then has a constraint that only
> allows the applicable dimension type code as a valid value for the type code
> field.
> Any opinions on this? Other ideas?

The fact that you don't know how to implement this and are talking about
a translation table, at least to me, would be warning enough you are
heading down the wrong road.

--
Daniel Morgan
http://www.outreach.washington.edu/...oad/oad_crs.asp
http://www.outreach.washington.edu/...aoa/aoa_crs.asp
damorgan@.x.washington.edu
(replace 'x' with a 'u' to reply)|||On Mon, 05 Jan 2004 22:06:55 -0800, Daniel Morgan <damorgan@.x.washington.edu>
wrote:

>Steve Jorgensen wrote:
>> Perhaps, you don't think these are compelling, and that's fine. I find them
>> compelling, though, and I'm still hoping for some advice on how to accomplish
>> it.
>You are correct. I'd suggest you ask Mr. Celko for his opinion. ;-)
>> I had one idea in which I can use an intervening table to translate between
>> the general and specific cases. Basically, I would have a generic dimension
>> table with a type code, then a specific dimension table (say category) that
>> also has a dimension ID and a type code and a 1-1 relationship to the generic
>> dimension on both fields. The specific table, then has a constraint that only
>> allows the applicable dimension type code as a valid value for the type code
>> field.
>>
>> Any opinions on this? Other ideas?
>The fact that you don't know how to implement this and are talking about
>a translation table, at least to me, would be warning enough you are
>heading down the wrong road.

I'm not sure what you're saying. I know how to implement what I just
described. As you say, perhaps it's not the "right" approach, and certainly
more complicated than what I'd prefer, but in the absence of any other
suggestions, it's the only one I've come up with besides either implementing
the massive duplication or forgoing the use of DRI.

Also, I'm not sure what seems so obviously wrong to you about the translation
table idea (though I would welcome an explanation). It's defintitely a
kludge, but on the surface, it doesn't look too terrible to me as kludges go.|||Steve Jorgensen (nospam@.nospam.nospam) writes:
> I've come up with this issue in several apps now. There are things
> that, from one perspective, are all handled the same, so it would be
> desirable that they all be handled in the same table with some field as
> a type specification. From other perspective of foreign key
> relationships, however, they are different things and can't be stored in
> the same table.
> For example, I have a scheme for indicating mappings between dimension
> records at one time period to new dimension records at another time
> period. I could use one set of tables for all mappings since they all
> work exactly the same way, but then I can't set up DRI between the
> mapping tables and the dimension tables. If I just make separate
> mapping tables for each dimension table, then I'm creating 4 new tables
> per dimension table, all identical with respect to what fields they
> contain, what kinds of unique constraints they have, and what
> relationships they have to each other with the sole distinction that
> they each map to the integer-type key of a different dimension table. I
> would not look forward to doing maintenance on this schema!

I'm not at all into data warehousing, so I don't know what dimension tables
are, but it sounds to me that you have a bunch of table which are like:

CREATE TABLE entity(entityid int NOT NULL PRIMARY KEY,
entityname varchar(30) NOT NULL)

and you want to collapse them into one. Yes, I have seen this. And I
killed it when I took over that data model. You get less tables, but
you also get less flexibility. In the system I work with, we may have
over fifty such tables. But not all are equal. Suddenly there is a need
for an extra column in such a table, one is specific for the entity in
question. Not very fun if you have all in one table.

Two other drawbacks with this arrangement is when you look at a graphical
data model, you can only see that a table refers to the general entity
table, but not for which entity. And each reference requires two columns
in the referring table, whereof one is a constant column. I rather have
extra tables than superfluous tables.

Now, judging the previous thread, you seem to have a special problem
with mappings that are in the same domain, but different at different
points in time. These mappings might be worthwhile to lump into one
table, four by four so to speak. Actually, we have such an arrangement
somewhere in our data model for one specific entity. But it would be
dangerous to lump everything into this basket.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment