Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

80 cubes joined in a virtual cube performance miracle

0 views
Skip to first unread message

Jeroen

unread,
Oct 10, 2008, 12:31:01 PM10/10/08
to
Dear reader,

Having 80 cubes with exactly the same shared dimensions (8), containing a
few megabytes of data, results in a terrible MDX query performance.

Background:
There are 80 cubes representing 80 companies who deliver their data
asynchroneously. Therefore they need individual processing and are stored in
separate physical cubes (these are not partitions since we are using the
standard edition). They are joined in one virtual cube that has exactly the
same dimensionility.

Problem:
When executing a MDX query on the virtual cube, the client where this query
is executed shows a high processor utilization of 100% (when having 4
processors 25% utilization can be seen, so the process is executed serial).
When starting 2 queries, I have noticed a utilization of exactly 50% when
having 4 processors. The average query time is about 90 seconds (with new
hardware).

Exactly the same query executed on one physical cube, where I have put all
data from the 80 cubes into one physical cube, is executed in a split second.

So somehow it is evident that when using a virtual cube, the query is not
executed at the MSAS server, but the client! It feels like it is retrieving
the data per cube, providing it to the client and then executing the formulas
(MDX statements etc) and aggregation of the companies at the client site.

So there is a significant difference on processing the query onto the
virtual cube or physical cube.

Also memory consumption is exploding at the client site instead of the
server site. The qeury I have used easly uses 300 Mb at the client while the
entire database is probably not that big! The server site shows a very flat
utilization, almost 0% processor utilization (it is hardly used!) and memory
usage is constant. So the server holding the virtual cube etc is almost
sleeping!

Question:
Could anybody explain why at the virtual version the query utilization is
moved to the client instead of execution at the server and what could be done
to improve this situation? Why do I notice these very strange characteristics?

Thanks for any suggestion.

fim...@exologic.com

unread,
Oct 10, 2008, 1:03:18 PM10/10/08
to
My first question is: why have 80 cubes, just 1 cube, but add a
dimension called "SourceCompany" that will provide you with a slicer
to the specific company, and am All member that is a blend of them
all?

Whatever the repair is for the query time issue, this will at least
make it 80x easier to apply...

You could also have 80 perspectives (is AS2005 or 2008) but still have
just one cube, each perspective containing a company.
you could have security drive which users (or roles) can see which
members in each dimension -- if that was the original purpose for
making 80 cubes.

-exologic

fim...@exologic.com

unread,
Oct 10, 2008, 6:05:34 PM10/10/08
to
One more thing, since you had mentioned that each of the 80 need
individual processing, you might just set up 80 partitions in one
cube, and have each partition processed one by one, or through a DTS
or SSIS package.

Jeroen

unread,
Oct 12, 2008, 2:10:01 AM10/12/08
to
The purpose of having 80 cubes is primarily the individual processing. Since
it is an highly interactive system where companies deliver their financial
data monthly and they need seperate processing. Why? Because the 'top' cube
may not be disabled for a while when somebody is processing his company cube.

The company dimension consist of an organization hierarchy with several
sublevels. Only one top member level would not be sufficient. Furthermore
they need to see other companies to compare intercompany data.

The most convenient solution is creating individual cubes, not attached to
the virtual cube, plus introducing the 'full' cube containing all data. The
results of this query are very satisfying. However this 'full' cube need to
be available instantly. So when a company has delivered its data it would
like to start analyzing.

This could be achieved by a "refresh data" option, where it is processed
each 5 minutes (via DTS Agent for instance). But this is not pure OLAP.

If you have any suggestion or other idea on this please let me know.

On the other hand it is still the questionable why the query is executed at
the client side instead of the MSAS server where I would expect it... I
believe this is pretty much a strange behaviour.

Thanks for ginving any reply on this!

fim...@exologic.com

unread,
Oct 12, 2008, 3:20:20 PM10/12/08
to
Not sure if your shop is on 2000 or 2005 (you had mentioned virtual
cubes, so I'm assuming 2000)...but I'm still not getting why you need
80 cubes. When you state "The purpose of having 80 cubes is primarily
the individual processing."...
As I mentioned in the 2nd message above, just have 80 partitions in
the cube, each of which contains SQL that pulls in just the data for
the proper SourceCompanyID (or name).
No "top" cube would therefore exist; the "All" member of the new
dimension SourceCompany would be the complete picture of all sources.
It would still fully aggregate.
If you had 2005, you could also have 80 perspectives.

Also, if you had 2005, you could still also use proactive MOLAP
partitions (not fully MOLAP) that automatically pulls new data in an
update fashion.

When you state: "The company dimension consist of an organization
hierarchy with several sublevels", so what? Make the Company
dimension a properly structured dimension, but avoid parent/child.
Just make a regualr dimension and fill in the "unbalanced" regions
with a blank member and hide the blank member with the dimension
options (2000 or 2005).
Also, whenever you process any single partition (not full or even
full), the cube still should be available while processing, so it
almost does not even matter how much data is being processed.

Maybe there's something I'm missing...but when I hear 80 cubes, that
just sounds like a bad idea...I don't even like to do anything fun 80
times.

-exologic


Jeroen

unread,
Oct 13, 2008, 4:00:01 AM10/13/08
to
Thanks for your answer,

A few things. We are working on SQL 2000 standard edition, so no
partitioning option available. Futhermore MOLAP is being used for good query
performance. Some dimensions contain MDX logic (like from Periodic to YTD
calculation, currency conversion against AVG rate (P&L) and EOM rate (BS), it
calculates the difference average to closing from P&L to Balance Sheet via
CalculationPassValue, etc) but also a reference to a specific company table
for drill-through logic is linked to each company cube. The company structure
has a asymetric shape and is based on parent-child relationship, since its
depth is variable (we have avoided as much as we could, but still 5
dimensions are based on parent-child, since it was required that the solution
was more generic and not limited to a certain level depth (we know the
alternative and consequences but that's fine).

Probably not all facts have to do with this issue (just to give more
background), but mainly when the due date for reporting is close we don't
want to have 80 companies processing one main cube (if there would be). If I
understand correctly, you are still recommending to put everything in one
cube and have them processing in a busy reporting period...
I believe you are assuming we do have the Enterprise Edition or am I wrong?
What should we do against a locked cube since somebody else is processing? By
using "refresh data" instead of "Rebuild" we can reduce processing time but
there is still a high chance that somebody else is locking the main cube.
With no partioning option available the only option is to process the
"main/overall partition/cube".

So I do hope I understand correctly, but please correct me if I'm wrong.

As an alternative we are now thinking of completely seperating the 80 cubes.
So no attachment to the virtual cube, because even with a very short
processing time of a virtual cube the chance of getting a collision is still
there... This means the companies cannot query from the virutal cube anymore.
With fully seperate company cubes they can query their individual company
data at their own cube and they keep their drill-through functionality.
Furthermore, with an interval of for instance 5 minutes a seperate main cube
will be setup and process (via "refresh data") the latest data that can be
found in the SQL source. This prevents that 80 companies are processing the
main cube. The downside is that the latest data is only available after a
maximum of 5 minutes. This is purely a work-around... (if you got better
suggestions please let me know).

But apart from the new approach it is still a miracle why the query time is
that slow when pulling the data from the virtual cube compared to the
one-main cube solution.

Thanks for any input you have given so far!

By the way: with all MDX logic within the dimensions build and parent-child
relationships the "one cube" solution does query very reasonable (1 or 2
seconds for an extensive query). Also, when we did have 30 cubes and the
holding was much smaller, the query time was much better. Tests have shown
that adding companies has a negative effect (linear) to the performance.

Jeroen

unread,
Oct 13, 2008, 4:01:01 AM10/13/08
to
Thanks for your answer,

fim...@exologic.com

unread,
Oct 13, 2008, 11:37:45 AM10/13/08
to
OK, 2000. 2000 has partitions, but maybe only with Enterprise edition
(I believe calculated cells are one of the differences, maybe
partitions as well).
You may want to check that. Perspectives were introduced in 2005, so
that is not going to work for you.


Jeroen

unread,
Oct 13, 2008, 11:55:01 AM10/13/08
to
partitioning and caculated cells are part of the Enterprise Edition. 99% sure!

Do you have any reason why the MDX takes that long on the virtual cube?

What do you think of interval processing by conducting "refresh data" every
5 minutes (triggered by DTS). Might that approach be a good one or do you
think it will give an issue when someone is retrieving at the same moment?

Would it be an option to re-install the MSAS server based on 2005? Is it
possible to keep the SQL server 2000 (which resides on a seperate server)
because of too many DTS packages. Next to this install 2005/MSAS on the other
server? So is it possible to combine MSAS/2005 with SQL/2000?
I have some fears that the (complex) MDX formula's in the dimensions might
fail...

thanks for your input.

0 new messages