Monday, March 19, 2012

Nonclustered Indexes

Is this true or false based on reading this:
http://msdn.microsoft.com/library/en...asp?frame=true
If a table has a nonclustered index AND a clustered index the
NONCLUSTERED index will use the clustered index key as a row locater?
Meaning SQL has to
1. Decide based on the execution plan whether or not it should use the
non clustered index
2. Scan the nonclustered index to get to the leaf node which contains
the clustered index key
3. Scan the clustered index by that key to get to its leaf node ( which
will be the data page )
4. Find the row within that page
Am I correct?
The clustered index IS the table order. Therefore it becomes the lookup key
for any non-clustered index operation. Steps 2. and 3. should be Search,
not Scan operations on the nonclustered and clustered indexes. Searching
for an item in a modified B-tree structure is a very fast operation. FYI,
the operation in step 3 is called a bookmark lookup.
One of the optimizer's challenges is deciding when to skip the index use and
just scan the table. If the nonclustered index is poorly selective or the
filter criteria returns a very large result set or returns a large
proportion of the underlying table, SQL may decide that the "extra hop" to
do the bookmark lookups is slower than a single pass table scan.
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"g3000" <carlton_gregory@.yahoo.com> wrote in message
news:1141933932.804305.235590@.u72g2000cwu.googlegr oups.com...
> Is this true or false based on reading this:
> http://msdn.microsoft.com/library/en...asp?frame=true
> If a table has a nonclustered index AND a clustered index the
> NONCLUSTERED index will use the clustered index key as a row locater?
> Meaning SQL has to
> 1. Decide based on the execution plan whether or not it should use the
> non clustered index
> 2. Scan the nonclustered index to get to the leaf node which contains
> the clustered index key
> 3. Scan the clustered index by that key to get to its leaf node ( which
> will be the data page )
> 4. Find the row within that page
> Am I correct?
>
|||Thanks Geoff for your reply.
Now a nonclustered index on a heap table uses the combination of
File ID, page number, and number of row on the page as the ROW ID in
its leaf node.
The docs state that the row locator when a clustered is available "is
the clustered index key for the row"
Does the above mean that the clustered index key takes you DIRECTLY to
the row or it takes you directly to the DATA PAGE that contains the
row?
Because from my understanding the leaf node of a clustered index IS the
data page.
|||For resolving data lookups, Row and Data Page containing the row are
identical concepts. SQL loads and saves data in pages. Going from a page
to a row within a page is a trivial exercise and the two are sometimes used
interchangably when talking about lookup operations. And you are correct,
the leaf level IS the data row for a clustered index.
Note that a clustered index doesn't have to be unique. Earlier versions of
SQL (6.5 and before) worked better when the clustered index was not unique.
That changed with SQL 7.0 and true row-level locking. Now, the default for
SQL is to create a clustered index out of the Primary Key. This is not a
requirement and can be overridden at design time. When the clustered index
is not unique, a uniquifier is added to each row so the index lookup
functions work correctly.
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"g3000" <carlton_gregory@.yahoo.com> wrote in message
news:1141938175.459830.100580@.i39g2000cwa.googlegr oups.com...
> Thanks Geoff for your reply.
> Now a nonclustered index on a heap table uses the combination of
> File ID, page number, and number of row on the page as the ROW ID in
> its leaf node.
> The docs state that the row locator when a clustered is available "is
> the clustered index key for the row"
> Does the above mean that the clustered index key takes you DIRECTLY to
> the row or it takes you directly to the DATA PAGE that contains the
> row?
> Because from my understanding the leaf node of a clustered index IS the
> data page.
>
|||One issue not touched upon. Clustered indexes in general, are not a
great idea for most tables.
Non-clustered indexes have relatively low overhead, super fast
addition, update, and delete capabilities, and are extremely fast to
traverse.
Clustered indexes take HUGE hits when you modifiy a column in the
index, potentially when you insert a lot into the middle of the table,
but are equivalent for deletes.
An interesting side note. The non-clustered index finds itself very
quickly into cache for even the largest of tables. Not true of a
clustered index.
And pay attention to the size of the columns in the clustered index.
Pick large columns, and your performance will be seriously downgraded.
In general, if you will ALWAYS be pulling multiple rows of sequential
data, with sequential ALWAYS being defined exactly teh same, then
clustered can make sense. As an example, time stamped data, whree you
need 1000 rows at a time.
Pulling a single row, or reporting data based upon several variables,
and the non-clustered index will typically outperform the clustered
index in all manners.
regards,
doug
|||Perhaps an example.
Public library, the fiction section. The clustered index is the
placement on the shelves due to the author's last name. Really nice if
ALL you EVER did was check out one author's books. You can walk right
to the correct shelf, and grab your books.
OTOH, the card catalogue is the non-clustered index. You know the
subject, or title, you don't start scanning book shelves.
it is MUCH quicker to go to teh card catalogue, get the author's name
and book name, and go to the shelf to get teh right book.
IMO, an EFFICIENT library would just throw the books on any old shelf,
noting where you stuffed it, and updating the card catalogue. Then, on
lookups, book losses, or additions, you just go to teh catalogue, find
the shelf/position, adn get your book.
regards,
doug
|||So basically the clustered index is best when the selectivity is HIGH?
and the queries against the tables are generally the same ( like a
reference table )
non-clustered indexes are good when you are doing something like a
search on a catalog of parts ( or a library )
So seems to me more often then not non clustered will be the best
choice.
Why is the overhead so high for clustered?
|||A poorly selected Clustered index can cause performance degradation. A
well-chosen clustered index will improve performance.
Your example illustrates a poorly chosen Clustered order. Here is a good
one:
Books are assigned a sequential number as they are purchased. They are
stocked on the shelves in that numbered order. Card catalog is used to find
books according to pre-determined criteria. Note that the physical ordering
of the books has nothing to do with most catalog requests. Only when
checking books by purchase date would the indexes align, and then only by
coincidence. This improves insert performance since all the insertion work
happens in the same part of the library. In most libraries, the more recent
a book or periodical, the more frequently it is accessed. A smart librarian
would put those high traffic books near each other and in an easily accesed
place (think cache).
Just because SQL creates a clustered index on the Primary Key by default,
doesn't mean those two entities are irrevocably tied together. A clustered
index is a physical construct. A Primary Key is a logical database design
component. They can be implemented differently.
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1142203270.733531.260280@.e56g2000cwe.googlegr oups.com...
> Perhaps an example.
> Public library, the fiction section. The clustered index is the
> placement on the shelves due to the author's last name. Really nice if
> ALL you EVER did was check out one author's books. You can walk right
> to the correct shelf, and grab your books.
> OTOH, the card catalogue is the non-clustered index. You know the
> subject, or title, you don't start scanning book shelves.
> it is MUCH quicker to go to teh card catalogue, get the author's name
> and book name, and go to the shelf to get teh right book.
> IMO, an EFFICIENT library would just throw the books on any old shelf,
> noting where you stuffed it, and updating the card catalogue. Then, on
> lookups, book losses, or additions, you just go to teh catalogue, find
> the shelf/position, adn get your book.
> regards,
> doug
>
|||so in your example, you'd have a clustered index on the identity key,
and this woudl be more efficient then have a non-clustered index?
I disagree. I would suggest having no clustered index would be faster.
the key would be "shorter." Inserts faster.
I am curious as to the logic that a clustered index would make this
scenario faster.
|||Yes Im curious also.
I looking for an aswer as to why the overhead is so high just because
its physical construct.
If I load physical in the same data page then most likely if its high
transaction on that table the same data pages should be found in the
cache.
Obviously the process for searching for an data page with free space
within an extent is slow for a clustered index?
Should clustered index be used when rows will be selected "together"
most of the time? Meaning a low cardinality column?
So in a employee database all sex columns with a "Male" value can have
a clustred index because those records will be selected together?

No comments:

Post a Comment