Wednesday, March 28, 2012

Normalize data

Hi,

I have a table like this:

Pk_QuestionID

Question

Choice1

Choice2

Choice3

Choice4

correctChoice

1

XYZ?

a

b

C

d

3

This has to be transferred to a more normalized schema like this

Pk_QuestionID

Question

1

XYZ?

And

Pk_OptionID

Fk_QuestionID

Choice_Description

isCorrect

1

1

a

0

2

1

b

0

3

1

c

1

4

1

d

0

We can use either the unpivot-merge or a SQL Union ALL statement to achieve most of this requirement. I am unable to figure out how to transfer the ‘correctChoice’ column. Is there a way to achieve this?

Thanks

This looks like a classic unpivot operation to me. Have you looked at the UNPIVOT component?

-Jamie

|||

Jamie Thomson wrote:

This looks like a classic unpivot operation to me. Have you looked at the UNPIVOT component?

-Jamie

I believe he has... The issue is his "correctchoice" column in that it's a column number, not a value. (WHY?)

Never-the-less, this is easy. Load your destination table first, without regard to the iscorrect column.

NOTE: I assume that in the dest table, the Pk_OptionID and Fk_QuestionID fields are the table's primary key, otherwise, you're in for a world of hurt and this advice won't work.

Now, in a separate dataflow, pull the data from your original source table, only carrying two columns, the primary key (Pk_QuestionID) and correctchoice. Next up, is the LOOKUP transform. Simply join Pk_QuestionID to the foreign key field (Fk_QuestionID) of the dest table, and also join the correctchoice field to the pk field of the dest table (PK_OptionID). (If you have a match in the lookup, that row is the correct choice)

Send the flow to the OLE DB COMMAND transform and perform an update like:
update destination set iscorrect = 1 where Pk_OptionID = ? and Fk_QuestionID = ?. Map the fields accordingly with your input columns.|||

Hi Phil,

Unfortunately only pk_OptionID is the primary key for that table.

Thanks

|||

kasi124 wrote:

Hi Phil,

Unfortunately only pk_OptionID is the primary key for that table.

Thanks

Well, then, I think you need to reevaluate the structure of your tables/data. If your "correctchoice" column was a value, and not a column indicator, you'd be okay, but alas you're not. If you simply add the FK in the dest table to its primary key structure, and then reset the Pk_OptionID field everytime the Fk_QuestionID field changes, you'd be in a perfect world.

The way your data is structured now, it sounds really complex. You might be able to achieve it with a custom script, but I don't even know where to start. Perhaps someone else here might have an idea on that.

As it is, your destination table hasn't reached full normalization yet.|||

You are correct, both my source and destination data stores are bad designs (not just specific to this issue), but there is very small scope for a redesign.

The pk_optionID column is an identity. I dont know how this will help but may be a different set of values will help someone strike gold in this issue.

Source:

Pk_QuestionID

Question

Choice1

Choice2

Choice3

Choice4

correctChoice

101

XYZ?

a

b

C

d

3

This has to be transferred to a more normalized schema like this

Pk_QuestionID

Question

134

XYZ?

And

Pk_OptionID

Fk_QuestionID

Choice_Description

isCorrect

533

134

a

0

534

134

b

0

535

134

c

1

536

134

d

0

|||I hear ya on the issues, and I completely follow your structure.
I do have a solution though, and it doesn't require any scripting. Fun having free Fridays at work to learn other people's issues and trying to solve them. Stretching the brain!
Anyway, here goes:
Throw your source into a derived column transform. You will be adding a new column, altered_correct_choice, which has the following expression:
correctchoice == 1 ? choice1 : correctchoice == 2 ? choice2 : correctchoice == 3 ? choice3 : choice4
Then, perform an unpivot on the choicen fields. I called the output column for each field 'data'. Next, throw on another derived column after the unpivot. In here, we compare the altered_correct_choice value to the value in the data column. If they match, we set 'iscorrect' (a new column) to 1 else 0:
altered_correct_choice == Choice_Description ? 1 : 0
Feed all of that into your destination and map appropriately.
Here's what I did at a high level if that didn't make sense. Take the source in and use a derived column transform to lookup the "correct" value for the correctChoice field instead of using the column number. So using your new data above, the value coming out of the derived column transform would be 'c' instead of 3.
Next we unpivot the data. You already have this working.
Next we take all of that and make a comparison against each row (4 now instead of 1). If the value of the Choice_Description column matces that of the output column from the first derived column transform, set iscorrect to 1, otherwise 0.
Next up is the destination adaptor. Map your fields appropriately.
Let me know if that doesn't make sense as I was able to get it working for me.
Phil
|||

Thanks Phil, that worked like a charm.

However my problems are not yet. I am trying to figure out how to unpivot using multiple keys, if that is even possible.

No comments:

Post a Comment