Tuesday, March 20, 2012

Nonclustered Indexes

Is this true or false based on reading this:
http://msdn.microsoft.com/library/en-us/architec/8_ar_da2_75mb.asp?frame=true
If a table has a nonclustered index AND a clustered index the
NONCLUSTERED index will use the clustered index key as a row locater?
Meaning SQL has to
1. Decide based on the execution plan whether or not it should use the
non clustered index
2. Scan the nonclustered index to get to the leaf node which contains
the clustered index key
3. Scan the clustered index by that key to get to its leaf node ( which
will be the data page )
4. Find the row within that page
Am I correct?The clustered index IS the table order. Therefore it becomes the lookup key
for any non-clustered index operation. Steps 2. and 3. should be Search,
not Scan operations on the nonclustered and clustered indexes. Searching
for an item in a modified B-tree structure is a very fast operation. FYI,
the operation in step 3 is called a bookmark lookup.
One of the optimizer's challenges is deciding when to skip the index use and
just scan the table. If the nonclustered index is poorly selective or the
filter criteria returns a very large result set or returns a large
proportion of the underlying table, SQL may decide that the "extra hop" to
do the bookmark lookups is slower than a single pass table scan.
--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"g3000" <carlton_gregory@.yahoo.com> wrote in message
news:1141933932.804305.235590@.u72g2000cwu.googlegroups.com...
> Is this true or false based on reading this:
> http://msdn.microsoft.com/library/en-us/architec/8_ar_da2_75mb.asp?frame=true
> If a table has a nonclustered index AND a clustered index the
> NONCLUSTERED index will use the clustered index key as a row locater?
> Meaning SQL has to
> 1. Decide based on the execution plan whether or not it should use the
> non clustered index
> 2. Scan the nonclustered index to get to the leaf node which contains
> the clustered index key
> 3. Scan the clustered index by that key to get to its leaf node ( which
> will be the data page )
> 4. Find the row within that page
> Am I correct?
>|||Thanks Geoff for your reply.
Now a nonclustered index on a heap table uses the combination of
File ID, page number, and number of row on the page as the ROW ID in
its leaf node.
The docs state that the row locator when a clustered is available "is
the clustered index key for the row"
Does the above mean that the clustered index key takes you DIRECTLY to
the row or it takes you directly to the DATA PAGE that contains the
row?
Because from my understanding the leaf node of a clustered index IS the
data page.|||For resolving data lookups, Row and Data Page containing the row are
identical concepts. SQL loads and saves data in pages. Going from a page
to a row within a page is a trivial exercise and the two are sometimes used
interchangably when talking about lookup operations. And you are correct,
the leaf level IS the data row for a clustered index.
Note that a clustered index doesn't have to be unique. Earlier versions of
SQL (6.5 and before) worked better when the clustered index was not unique.
That changed with SQL 7.0 and true row-level locking. Now, the default for
SQL is to create a clustered index out of the Primary Key. This is not a
requirement and can be overridden at design time. When the clustered index
is not unique, a uniquifier is added to each row so the index lookup
functions work correctly.
--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"g3000" <carlton_gregory@.yahoo.com> wrote in message
news:1141938175.459830.100580@.i39g2000cwa.googlegroups.com...
> Thanks Geoff for your reply.
> Now a nonclustered index on a heap table uses the combination of
> File ID, page number, and number of row on the page as the ROW ID in
> its leaf node.
> The docs state that the row locator when a clustered is available "is
> the clustered index key for the row"
> Does the above mean that the clustered index key takes you DIRECTLY to
> the row or it takes you directly to the DATA PAGE that contains the
> row?
> Because from my understanding the leaf node of a clustered index IS the
> data page.
>|||One issue not touched upon. Clustered indexes in general, are not a
great idea for most tables.
Non-clustered indexes have relatively low overhead, super fast
addition, update, and delete capabilities, and are extremely fast to
traverse.
Clustered indexes take HUGE hits when you modifiy a column in the
index, potentially when you insert a lot into the middle of the table,
but are equivalent for deletes.
An interesting side note. The non-clustered index finds itself very
quickly into cache for even the largest of tables. Not true of a
clustered index.
And pay attention to the size of the columns in the clustered index.
Pick large columns, and your performance will be seriously downgraded.
In general, if you will ALWAYS be pulling multiple rows of sequential
data, with sequential ALWAYS being defined exactly teh same, then
clustered can make sense. As an example, time stamped data, whree you
need 1000 rows at a time.
Pulling a single row, or reporting data based upon several variables,
and the non-clustered index will typically outperform the clustered
index in all manners.
regards,
doug|||Perhaps an example.
Public library, the fiction section. The clustered index is the
placement on the shelves due to the author's last name. Really nice if
ALL you EVER did was check out one author's books. You can walk right
to the correct shelf, and grab your books.
OTOH, the card catalogue is the non-clustered index. You know the
subject, or title, you don't start scanning book shelves.
it is MUCH quicker to go to teh card catalogue, get the author's name
and book name, and go to the shelf to get teh right book.
IMO, an EFFICIENT library would just throw the books on any old shelf,
noting where you stuffed it, and updating the card catalogue. Then, on
lookups, book losses, or additions, you just go to teh catalogue, find
the shelf/position, adn get your book.
regards,
doug|||So basically the clustered index is best when the selectivity is HIGH'
and the queries against the tables are generally the same ( like a
reference table )
non-clustered indexes are good when you are doing something like a
search on a catalog of parts ( or a library )
So seems to me more often then not non clustered will be the best
choice.
Why is the overhead so high for clustered?|||A poorly selected Clustered index can cause performance degradation. A
well-chosen clustered index will improve performance.
Your example illustrates a poorly chosen Clustered order. Here is a good
one:
Books are assigned a sequential number as they are purchased. They are
stocked on the shelves in that numbered order. Card catalog is used to find
books according to pre-determined criteria. Note that the physical ordering
of the books has nothing to do with most catalog requests. Only when
checking books by purchase date would the indexes align, and then only by
coincidence. This improves insert performance since all the insertion work
happens in the same part of the library. In most libraries, the more recent
a book or periodical, the more frequently it is accessed. A smart librarian
would put those high traffic books near each other and in an easily accesed
place (think cache).
Just because SQL creates a clustered index on the Primary Key by default,
doesn't mean those two entities are irrevocably tied together. A clustered
index is a physical construct. A Primary Key is a logical database design
component. They can be implemented differently.
--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1142203270.733531.260280@.e56g2000cwe.googlegroups.com...
> Perhaps an example.
> Public library, the fiction section. The clustered index is the
> placement on the shelves due to the author's last name. Really nice if
> ALL you EVER did was check out one author's books. You can walk right
> to the correct shelf, and grab your books.
> OTOH, the card catalogue is the non-clustered index. You know the
> subject, or title, you don't start scanning book shelves.
> it is MUCH quicker to go to teh card catalogue, get the author's name
> and book name, and go to the shelf to get teh right book.
> IMO, an EFFICIENT library would just throw the books on any old shelf,
> noting where you stuffed it, and updating the card catalogue. Then, on
> lookups, book losses, or additions, you just go to teh catalogue, find
> the shelf/position, adn get your book.
> regards,
> doug
>|||so in your example, you'd have a clustered index on the identity key,
and this woudl be more efficient then have a non-clustered index?
I disagree. I would suggest having no clustered index would be faster.
the key would be "shorter." Inserts faster.
I am curious as to the logic that a clustered index would make this
scenario faster.|||Yes Im curious also.
I looking for an aswer as to why the overhead is so high just because
its physical construct.
If I load physical in the same data page then most likely if its high
transaction on that table the same data pages should be found in the
cache.
Obviously the process for searching for an data page with free space
within an extent is slow for a clustered index?
Should clustered index be used when rows will be selected "together"
most of the time? Meaning a low cardinality column?
So in a employee database all sex columns with a "Male" value can have
a clustred index because those records will be selected together?|||Again, SQL "tunes" heaps for minimum space usage, not for maximum
performance.
Inserts are faster on clustered indexes, even with a non-monotonically
increasing key which will lead to page splits. It is faster because SQL
does not have to scan for a free slot to insert the page. To continue
beating up the library analogy, the librarian has to search the stacks to
find a free space for each new book. With a clustered index, the librarian
knows where the book has to go. With the correct index choice, such as an
Identity column, inserts happen near to each other which gives the cache
manager a big advantage.
--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Doug" <drmiller100@.hotmail.com> wrote in message
news:1142278820.789510.195480@.p10g2000cwp.googlegroups.com...
> so in your example, you'd have a clustered index on the identity key,
> and this woudl be more efficient then have a non-clustered index?
> I disagree. I would suggest having no clustered index would be faster.
> the key would be "shorter." Inserts faster.
> I am curious as to the logic that a clustered index would make this
> scenario faster.
>|||Hi,
Clustered tables are tables that have a clustered index:
The data rows are stored in order based on the clustered index key. The
index is implemented as a B-tree index structure that supports fast
retrieval of the rows based on their clustered index key values. The
pages in each level of the index, including the data pages in the leaf
level, are linked in a doubly-linked list, but navigation from one level
to another is done using key values.
Heaps are tables that have no clustered index:
The data rows are not stored in any particular order, and there is no
particular order to the sequence of the data pages. The data pages are
not linked in a linked list.
From these descriptions you can see that there is more overhead when
using clustered indexes because when you execute SQL DML there will be
extra resources used to insure your data page is still sorted correctly,
never mind the extra data behind the scenes being stored as a reference
table for your original data. Although you?re SELECT queries will be
fast if you?re querying on the clustered index. Non - clustered indexes
are more common because lots of the time you are giving your end - users
the option to query on multiple fields from multiple tables. A good
place to use clustered indexes would be on lookup tables. For instance a
table containing country codes where there will only be a certain amount
of rows and the table will be relatively small.
Happy Coding,
Stefan
C# GURU
www.DotNETovation.com
"You always have to look beyond the horizon and can never be complacent
-- God forbid we become complacent."
Jozef Straus
*** Sent via Developersdex http://www.developersdex.com ***|||As you can see Geoff from Stefans post that we have somebody that feels
that inserts on a clustered index has MORE OVERHEAD because of the
"extra" work of keeping the data page stored/sorted correctly.
Your library example made sense to me.
I did personally conclude that clustered indexes where better for
reference/lookup tables. Stefan made a mention of that.
Which makes me think selects are faster on clustered then non clustered
Im going to trust what you said Geoff and expect that inserts are
faster on clustered indexes and select can be fast if used on the right
"type" of data/table.|||I think that the librarian example is a perfect example of the "it depends" concept. Because it
really does. The examples provided here has focused on rapid inserts. Which can be of major
important for me.
But what if my major concern it to allow people to go an look at all the books written by a specific
author? Clustering on author allow the person to look up in the index where the books written by
author X are, then he can go an browse through those books.
Or perhaps, in order to give high degree of service to my customers, I want people to browse through
books from a certain genre? Cluster on Genre, and same principle as above applies.
Trying to narrow down a "perfect" indexing scheme is a moot point, as the requirements and
priorities varies.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Geoff N. Hiten" <SQLCraftsman@.gmail.com> wrote in message
news:eEAOJvrRGHA.5552@.TK2MSFTNGP14.phx.gbl...
>A poorly selected Clustered index can cause performance degradation. A well-chosen clustered index
>will improve performance.
> Your example illustrates a poorly chosen Clustered order. Here is a good one:
> Books are assigned a sequential number as they are purchased. They are stocked on the shelves in
> that numbered order. Card catalog is used to find books according to pre-determined criteria.
> Note that the physical ordering of the books has nothing to do with most catalog requests. Only
> when checking books by purchase date would the indexes align, and then only by coincidence. This
> improves insert performance since all the insertion work happens in the same part of the library.
> In most libraries, the more recent a book or periodical, the more frequently it is accessed. A
> smart librarian would put those high traffic books near each other and in an easily accesed place
> (think cache).
> Just because SQL creates a clustered index on the Primary Key by default, doesn't mean those two
> entities are irrevocably tied together. A clustered index is a physical construct. A Primary
> Key is a logical database design component. They can be implemented differently.
> --
> Geoff N. Hiten
> Senior Database Administrator
> Microsoft SQL Server MVP
>
>
> "Doug" <drmiller100@.hotmail.com> wrote in message
> news:1142203270.733531.260280@.e56g2000cwe.googlegroups.com...
>> Perhaps an example.
>> Public library, the fiction section. The clustered index is the
>> placement on the shelves due to the author's last name. Really nice if
>> ALL you EVER did was check out one author's books. You can walk right
>> to the correct shelf, and grab your books.
>> OTOH, the card catalogue is the non-clustered index. You know the
>> subject, or title, you don't start scanning book shelves.
>> it is MUCH quicker to go to teh card catalogue, get the author's name
>> and book name, and go to the shelf to get teh right book.
>> IMO, an EFFICIENT library would just throw the books on any old shelf,
>> noting where you stuffed it, and updating the card catalogue. Then, on
>> lookups, book losses, or additions, you just go to teh catalogue, find
>> the shelf/position, adn get your book.
>> regards,
>> doug
>|||Point taken Tibor. I agree nothing is a perfect scheme.
I was honestly just trying to get a good understanding of WHAT is
happening "underneath" when using a clustered versus a non clustered.
You get an "explanation" in books online but you can get from the docs
the "facts" of overhead in clustered indexes or non clustered indexs
are generally better then selects.
I was hoping to get good direction from others because I am new to SQL
Server and I personally dont have the insight.
Thanks for all the replies. It is appreciated.

No comments:

Post a Comment