Well i guess it is a bit more than just group by but here we go...
General description :( background info on the problem; may or may not be needed)
I am trying to get a list of active card holders for a particular month. A card holder is considered active if their status was active any time during that month. I have a card holder status in my CardHolder table but that will only give me their current status. I also have a CardHolderHistory table that saves any changes to a card holder including their status. Using the CardHolderHistory table i can find out who was active for any month in question by using the following algorithm:
A card holder is active if they meet one or both of the following conditions;
1) The last entry before the month in question has them as active in CardHolderHistory or
2) They were set to active any time during the month in question
The problem:
I create a temporary table and copy the card holders into the table with separate select into statements for each of the above conditions. For the first condition, I first gather all the CardHolderHistory entries before the month in question with the following select
select EmployerName, cardholderhistory.FirstName, cardholderhistory.Initial, cardholderhistory.LastName, ChangeDate, CardHolderHistory.CardState, FKEmployerID
into #MonthlyCards from cardholderhistory left join Employer on PKEmployerID = cardholderhistory.FKEmployerID
where changedate<@.datespecified
This gives me ALL of the CardHolderHistory before the month in question (@.dateSpecified).
It would look something like this
EmployerName First Initial Last Status ChangeDate EmployerID
-
.
.
.
Some Fake Company John P Doe A 2006-05-02 11:34:26.360 109
Some Fake Company John P Doe A 2006-03-28 11:14:10.520 109
Some Fake Company Jane T Doe S 2006-05-30 15:34:14.900 109
Some Fake Company Jane T Doe A 2006-03-28 12:20:03.670 109
Some Fake Company Jane T Doe A 2006-03-01 10:12:45.320 109
.
. (S = suspended; A = Active)
.
Now I need to make sure people are only listed once and remove everyone whose status = suspended for their last change date. That is my problem; I can’t figure out how to remove these people. The next step i took was to do a group by with EmployerName, FirstName, Initial, LastName, Status, Max(ChangeDate), FKEmployerID. This removes everyone listed multiple times except if their status changes. After the group by the list would be
EmployerName First Initial Last Status ChangeDate EmployerID
-
Some Fake Company John P Doe A 2006-05-02 11:34:26.360 109
Some Fake Company Jane T Doe S 2006-05-30 15:34:14.900 109
Some Fake Company Jane T Doe A 2006-03-28 12:20:03.670 109
Now I need to delete Jane completely, since her latest status is S. I could delete where Status = ‘S’ but that leaves her as active from 3/28. I can’t figure out how to completely remove Jane. The rest I can figure out I think. Please help me get rid of Jane for good, I don’t like her.
Thanks for any help,
Rico
Instead of grabbing too many records, how about grabbing only the records you want. In this case, construct a query that uses a full outer join of the two conditions you have stated. Now you will have only one record for the active members and you can sort out the values of the columns.
Hopefully, this code gives you something to work with. I have assumed that there is a CardHolderID that is unique to each CardHolder record.
SelectFrom(
select c1.CardHolderID,
c1.FirstName,
c1.Initial,
c1.LastName,
c1.ChangeDate,
C1.CardState,
FKEmployerID
From cardholderhistory c1
Where ChangeDate = (
Select Top 1
changeDate
From cardholderhistory c2
Where c1.CardHolderID = c2.CardHolderID
And c2.ChangeDate < @.DateSpecified
Order by changeDate desc )
And CardState = 'A'
) t1
Full Outer Join
(
select c3.CardHolderID,
c3.FirstName,
c3.Initial,
c3.LastName,
c3.ChangeDate,
C3.CardState,
FKEmployerID
From cardholderhistory c3
Where ChangeDate between @.monthStart and @.monthEnd
) t2
On t1.CardHolderID = t2.CardHolderID
Join Employer
On t1.FKEmployerID = Employer.PKEmployerID
|||
Anomolus,
Thank you very much for your reply. I have wasted an embarrassing number of hours trying to figure this out but with your help I was able to get what I needed. The code i used was
declare @.dateSpecified datetime
--@.employerID int
declare @.endDate datetime
set @.dateSpecified = '8/1/2006'
set @.endDate = dateadd(m, 1, @.dateSpecified)
select EmployerName, chh.FirstName, chh.Initial, chh.LastName, FKEmployerID
into #MonthlyCards
From CardHolderHistory chh left join
Employer on PKEmployerID = FKemployerID
Where ChangeDate = ( Select Top 1 ChangeDate
From CardHolderHistory chh2
Where chh.PKCardHolderID = chh2.PKCardHolderID
And chh2.ChangeDate < @.DateSpecified
Order by changeDate desc )
And (CardState = 'A' or CardState = '')
Order By EmployerName, LastName, FirstName, Initial
insert into #MonthlyCards([EmployerName],[FirstName],[Initial],[LastName],[FKEmployerID])
(select EmployerName, chh.FirstName, chh.Initial, chh.LastName, FKEmployerID
From CardHolderHistory chh left join
Employer on PKEmployerID = FKemployerID
where (changedate between dateadd(ms, (-5),@.dateSpecified) and @.enddate)
And (CardState = 'A' or CardState = ''))
Select * from #monthlycards
group by EmployerName, FirstName, Initial, LastName, FKEmployerID
--order by pkcardholderid
drop table #MonthlyCards
I was having a problem getting that first condition in the form of a select. I have few follow-up questions, now that i got it to work.
1) Why use the Full outer join? It seems to get a bunch of info that i don't need and has a bunch of nulls. Wouldn't the full outer join join the two conditions into the same record?
2) What if there are too ChangeDates with the same value? Will this cause errors or is it generally acceptable to rely on the millisecond precision of the datetime data type?
3) Is there anything i should do to optimize this code? My CardHolderHistory table has thousands of entries already and it will continue to get bigger.
Again, thanks Anomolus, and thank you to anyone that can help with these questions,
Rico
|||
1) The Full outer join allows you to get the list of customers in a single select. The customers will match at least one of the two conditions and if it matches both, then they are combined into a single record. You will get nulls when the customer only satisfies one condition. It occurs to me that all you really need out of this is the CustomerID. Isn't there a risk of getting old information if you grab the name from history as it could have been what was changed last?
2) This is really a business question since your rule says the last change. If it is possible for two changes to happen at the same time, then you ought to have another field that designates the version order in the history table.
|||"Isn't there a risk of getting old information if you grab the name from history as it could have been what was changed last?"
Yes there is, thanks for catching that! I thouught that i would save some resources by using one less join but you are right... i had thought of that in my earlier failed attemps but looked over it in this version.
"If it is possible for two changes to happen at the same time"
I dont think this is possible, maybe you could help me find out. The ChangeDate value is filled in automatically (the default value is getdate())when a new CardHolderHistory record is created. It is not changes beyond that. If 2 different administrators used the website and tried to update card holders at the same and my SaveCardHolder stored procedure is called at the same time im guessing this could generate the same ChangeDate. Besides the fact that it is unlikly, can more than one record be saved at the same time? I insert into the CardHolderHistory table from Save CardHolder using transactions (roll back on error otherwise commit) if this makes a difference.
Thank you for your help! You have really saved me!
sql
No comments:
Post a Comment