I have a table that has around 100,000 records in it, growing at a rate
of 200 records/day:
CREATE TABLE [Main] (
[ID] [int] IDENTITY (1, 1) NOT NULL ,
[Code] [char] (17) NOT NULL ,
[AgentID] [int] NULL ,
[DueDate] [smalldatetime] NOT NULL ,
[Status] [varchar] (18) NOT NULL
)
(This table also has about 30 more fields, but I have left them out for
brevity.)
I am looking to speed up a few specific search queries. I already have
a clustered index on my primary key field [ID], and a non-clustered
index on field [Code]. These indexes are fine performance wise.
I am looking to speed up 2 searches:
1) A combination of [AgentID] and [DueDate]: "... WHERE AgentID = n
AND DueDate BETWEEN (...)
2) [Status]: WHERE [Status]=xxx. Actually, this search is already
fast, and [Status] will really only have about 7 distinct values, so I
didn't index this field, as I figured a table scan would be faster in
this case. IS THIS A CORRECT ASSUMPTION. The result set for this
query is almost aways under 25 rows
My main question is: Should I put nonclustered indexes on [DueDate]
and [AgentID]. AgentID will have maybe 25 distinct values throughout
the table. The result set will usually be under 50 rows.
I just want to make sure I am not overindexing the table, and that if I
index these fields it will be beneficial.
Any help would be appreciated.for query 1, an index on ([AgentID], [DueDate]) could be an option, but
it looks like AgentID is not too sleective:
> AgentID will have maybe 25 distinct values throughout
so I'd also consider indexes on ([DueDate], [AgentID]) and just on
([DueDate])|||> 2) [Status]: WHERE [Status]=xxx. Actually, this search is already
> fast, and [Status] will really only have about 7 distinct values, so I
> didn't index this field, as I figured a table scan would be faster in
> this case. IS THIS A CORRECT ASSUMPTION. The result set for this
> query is almost aways under 25 rows
How exactly are you using this column? (i.e. how many rows per each value)
Searching for individual status values might be improved through the use of
indexed views - one per each status value. But may have a negative impact on
inserts/updates.
ML
http://milambda.blogspot.com/|||With 7 distinct values for Status, 100,000 rows in the table, and
WHERE [Status]=xxx returning under 25 rows, I have to assume that the
values for Status are severely skewed, and the status being tested is
one of the low frequency ones. Even with so few values, a query on a
specific low frequency value may in fact benefit from an index. I
would add an index on Status and see if it was used and if it helped.
With 100,000 rows, 25 distinct values for AgentID, and a result set of
around 50 rows, I have to assume that the DueDate BETWEEN test is
quite restrictive. I would try an index on (AgentID,DueDate).
Roy
On 23 Feb 2006 08:29:24 -0800, kaczmar2@.hotmail.com wrote:
>I have a table that has around 100,000 records in it, growing at a rate
>of 200 records/day:
>CREATE TABLE [Main] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [Code] [char] (17) NOT NULL ,
> [AgentID] [int] NULL ,
> [DueDate] [smalldatetime] NOT NULL ,
> [Status] [varchar] (18) NOT NULL
> )
>(This table also has about 30 more fields, but I have left them out for
>brevity.)
>I am looking to speed up a few specific search queries. I already have
>a clustered index on my primary key field [ID], and a non-clustered
>index on field [Code]. These indexes are fine performance wise.
>I am looking to speed up 2 searches:
>1) A combination of [AgentID] and [DueDate]: "... WHERE AgentID = n
>AND DueDate BETWEEN (...)
>2) [Status]: WHERE [Status]=xxx. Actually, this search is already
>fast, and [Status] will really only have about 7 distinct values, so I
>didn't index this field, as I figured a table scan would be faster in
>this case. IS THIS A CORRECT ASSUMPTION. The result set for this
>query is almost aways under 25 rows
>My main question is: Should I put nonclustered indexes on [DueDate]
>and [AgentID]. AgentID will have maybe 25 distinct values throughout
>the table. The result set will usually be under 50 rows.
>I just want to make sure I am not overindexing the table, and that if I
>index these fields it will be beneficial.
>Any help would be appreciated.|||kaczmar2@.hotmail.com wrote:
> I have a table that has around 100,000 records in it, growing at a
> rate of 200 records/day:
> CREATE TABLE [Main] (
> [ID] [int] IDENTITY (1, 1) NOT NULL ,
> [Code] [char] (17) NOT NULL ,
> [AgentID] [int] NULL ,
> [DueDate] [smalldatetime] NOT NULL ,
> [Status] [varchar] (18) NOT NULL
> )
> (This table also has about 30 more fields, but I have left them out
> for brevity.)
> I am looking to speed up a few specific search queries. I already
> have a clustered index on my primary key field [ID], and a
> non-clustered index on field [Code]. These indexes are fine
> performance wise.
> I am looking to speed up 2 searches:
> 1) A combination of [AgentID] and [DueDate]: "... WHERE AgentID = n
> AND DueDate BETWEEN (...)
Depending on the selectiveness of your query criteria and distribuition of
values it might in fact be better to change the PK to non clustered and
have a clustered index on (DueDate, AgentID). If inserted DueDates are
increasing then inserts may also benefit from this index layout. On the
other hand, ID will be increasing, too. But for a range query a clustered
index is usually favourable.
> 2) [Status]: WHERE [Status]=xxx. Actually, this search is already
> fast, and [Status] will really only have about 7 distinct values, so I
> didn't index this field, as I figured a table scan would be faster in
> this case. IS THIS A CORRECT ASSUMPTION. The result set for this
> query is almost aways under 25 rows
Assuming equal distribution of values you'll get only 14% of the rows with
this criterion. I would assume that an index pays off here as this seems
pretty selective. Since you seem to be mostly searching for low freq
values it's highly likely that and index would help, especially as the
table grows. The more often you search for low freq values the more the
index pays off.
> My main question is: Should I put nonclustered indexes on [DueDate]
> and [AgentID]. AgentID will have maybe 25 distinct values throughout
> the table. The result set will usually be under 50 rows.
> I just want to make sure I am not overindexing the table, and that if
> I index these fields it will be beneficial.
You'll have to do some testing to get definitive answers. And you
probably should verify your findings from time to time as the table grows.
You might as well create a test table with the same layout and much more
data (but with typical distribution) if you want to see the behavior for
large tables beforehand.
Kind regards
robert