Showing posts with label inconsistency. Show all posts
Showing posts with label inconsistency. Show all posts

Tuesday, March 20, 2012

nonempty performance inconsistency

I'm trying to understand the huge (almost 5 times) difference between these two queries:

select nonempty(dim1.dim1.dim1) on 1,

[fact count] on 0

from MyCube
where [dim3].[dim3].&[P1]
runs 5 secs

select nonempty(dim2.dim2.dim2) on 1,

[fact count] on 0

from MyCube
where [dim3].[dim3].&[P1]
runs 23 secs

some more details:

SQL 2005 Enterprise SP2

Both dimensions have regular relation type to the fact.

select count(*) from dim1 ==> 11M rows
select count(*) from dim2 ==> 15M rows
select count(distinct dim1) from fact where dim3 = 'P1' => 96 rows
select count(distinct dim2) from fact where dim3 = 'P1' => 103 rows
(meaning - both dimensions are big, but specific for dim3=P1 are very sparse.)

no aggregations exists in the DB at all !!

Both dimensions are pretty much similar in their properties, the only difference is that dim1 key is string (has no NameColumn) and dim2 key is int64 (has NameColumn pointing to a string attribute).


Looking at the profiler - both queries looks similar, the main difference is at the 5th line:

Query begin (start time = x)

Query cube begin (start time = x)

Calculate non empty begin (start time = x)

Calculate non empty begin (start time = x)

Get data from cache (query 1: start time = x + 1; query 2: start time = x + 10)

What am I missing?

The data type of the key is probably the cause. with SP2 the nonempty should be performed in block mode so the bulk of the time would be spent looking up the member keys in the respective indexes. An int64 key will take 8 bytes per member where as strings are stored as unicode, so a 16 character key will result in an index which is 4 times the size. Plus there is extra CPU work involved in string comparisons compared to integer comparisons.

With dimensions this large I would really encourage you to look into using integer keys.

|||

It sounds obvious it would be better using Int64 as keys but the interesting thing here is that the string key dimension was the faster one.

|||

Sorry, I got them transposed. Darn it, that would have been a neat answer Sad . Integer keys should put a lot less IO and CPU load on the system than string keys. You might be able to get the first query down to 1-2 seconds, but that does not explain why the second query is so slow.

Are there any scopes or assignments that could be affecting one dimension and not the other? Are they both related directly to the fact table or are there any many-to-many or non-materialized reference relationships involved? Are all the properties for the two attributes the same?

|||

That would have been a neat answer, but sorry, this is not in this case.

No scopes or assignment (what's that anyway?), both directly and regular related to the fact.
What is non-materialized reference?

All properties are the same beside the datatype and the NameColumn of the dim2 (the slow)

|||

When you create a referenced relationship there is a tick box that says whether to materialize the relationship (which is ticked by default). When referenced relationships are materialized they are resolved at processing time and should be as fast as regular relationships, when they are not materialized the relationships are resolved at run time.

Am I correct in assuming by the names that both of these attributes are the key attributes for thier dimensions?

Any security roles that might be applied to one and not the other?