A trick to avoid VERY LARGE DIMENSION

1199 views
Skip to first unread message

dxd

unread,
Jan 31, 2003, 10:21:24 AM1/31/03
to

As we know that OLAPServer will load all dimensions into memory
(unless SP3 changed it), VLDM has caused us lots of troubles for the
shortage of RAM.

We have a dimension (dimDataID) to keep Data ID and it has millions
members. We came up the following ideas: using formula
DataID = DataMID * 1,000,000 + DataKID * 1000 + DataRID.

So we spread the dimDataID as 3 dimensions:
dimDataMID x dimDataKID x dimDataRID.

Because each of these 3 dimensions has maximum 1000 members, therefore,
in the worst case, we need
3000 * 125 byte = 400K for the RAM.

Then we modify our MDXs as
NONEMPTYCROSSJOIN(
[DataMID].Children,
[DataKID].Children,
[DataRID].Children)

of cause, clients have to use the formula to get back the oringial
DataID.

Our experiment shows MDX queries with the 3 dims is also faster than
older one.We are happy with this approach.

If you see any drawback for this kind appoach, please let the
public know.

Also is there any tools (like SQL Profile) we could trace the responce
time of MDX queries????

Thanks for your attention.

Derek

--
Posted via http://dbforums.com

Acquiesce

unread,
Mar 18, 2003, 5:16:29 AM3/18/03
to

Hi dxd,

We are facing the similar problem here now (if i'm not mistaken..) we
have a dimension consists of 200K members (visible = false) which are
join to the fact table. Reason being of adding this dimension is to
facilitate the distinct count on one of the calculate measures.

Iif (IsEmpty (Count ( CrossJoin ({[SYS Measure]},Descendants (
[ID_US].CurrentMember,[US ID])),ExcludeEmpty)),0,Count ( CrossJoin
({[SYS Measure]},Descendants ( [ID_US].CurrentMember,[US
ID])),ExcludeEmpty))


The US_ID dimension has over 200K members in it. Currently we are
grouping it during the dimension design. Based on the calculated
measure created, it cost the front end performance to drag. Takes around
10-15minutes to display out the data (our front end is OWC10). I was
looking at your 'dimension splitting' trick to solve this problem. and
need some pointers in how you split those dimensions into smaller
dimensions.

I dont understand on how you can come up with 3 dimensions with a max of
1000 members per dimensions!

Hope you can get back to me ASAP

Thanks

dxd

unread,
Mar 18, 2003, 9:16:34 AM3/18/03
to

We use the formula

DataID = DataMID * 1,000,000 + DataKID * 1000 + DataRID.

to break the original DataID into a tuple
(DataMID, DataKID, DataRID).
For example
DataID =12,234,567 maps to (12, 234, 567) where
12 becomes a member of DataMID dimension,
234 becomes a member of DataKID dimension
567 becomes a member of DataRID dimension.
Therefore DataMID, DataKID, DataRID have maximum 1000 members.

This solution has definitely solved our memory shortage issue.

We also do the distinct count as the following and the responce
time is OK.

COUNT(nonemptycrossjoin(
Descendants([DataMID].CurrentMember, [DataIDM].[DSM]),
Descendants([DataKID].CurrentMember, [DataIDK].[DSK]),
Descendants([DataRID].CurrentMember, [DataIDR].[DSR]),
{[Measures].[auxDataCount]}, 3))


Please let me know your testing result if you are tring this approach.
thanks your attention.

dxd

unread,
Mar 19, 2003, 8:51:00 AM3/19/03
to

Yes, you need to add three columns to your FactTable (we are
using a view).
The three new columns could be:

DSKKey=cast((DataID % 1000000) / 1000 as smallint),
DSMKey=cast(DataID / 1000000 as smallint),
DSRKey=cast(DataID % 1000 as smallint),

Acquiesce

unread,
Mar 19, 2003, 5:18:36 AM3/19/03
to

Hi there...

Thanks for your prompt reply. I managed to figure out how you segregate
the single dimension into 3 dimensions and your crossjoin method
...very good idea indeed :)

But i have a question, how do i join those 3 dimensions to my fact table
in my cube editor? do you manipulate or add new columns in your fact
table (or retain the existing fields) to cater for your new dimensions?

I'm pretty much stuck in the cube editor part.

The formula you mentioned


DataID = DataMID * 1,000,000 + DataKID * 1000 + DataRID

is just a pointer? or do i need to include this formula somewhere?


Thanks & Regards

Harry

unread,
Mar 19, 2003, 10:11:24 AM3/19/03
to
Sounds like a very very interesting idea.

Tell me can also do the following:

after doing the cross join

NONEMPTYCROSSJOIN (


[DataMID].Children,
[DataKID].Children,
[DataRID].Children)

you get a number say: 12, 234, 567

But now I want to convert this ID to a NAME..... say MICROSOFT...

Any ideas??

Would love to try your approach if this works....
Thanks

Harry

unread,
Mar 19, 2003, 2:47:08 PM3/19/03
to
I gave this a try and it's seems quite useful.

But I am still not able to get the original ID back on the client. (Or
refer it to some list that contains the names.)

Would be cool if that worked...

dxd

unread,
Mar 19, 2003, 8:44:18 PM3/19/03
to

The clients can send the MDX

select [Measures].Allmembers on columns,
nonemptycrossjoin([DataIDM].Children, [DataIDK].Children,
[DataIDR].Children)) on rows
from [Cube]

you get the record set whose first three columns are
[DataIDM], [DataIDK], [DataIDR] ,

then using the formula to recover the original ID
OriID = [DataIDM]*1000000 + [DataIDK]*1000 + [DataIDR]


Originally posted by Harry

--
Posted via http://dbforums.com

Acquiesce

unread,
Mar 19, 2003, 10:23:19 PM3/19/03
to

Thanks Dxd...will try it out..

Let you knwo the results soon! :)

Harry

unread,
Mar 20, 2003, 9:10:45 AM3/20/03
to
I understand that using the formula you can get the original ID back
on the clients. I can do that visually in my mind...
But... where do you use the formula in the client ?

In other words... in a cross join, I would get the following:

___________Sales$
12|345|678| xxx
12|345|679| xxx
13|345|678| xxx
13|345|679| xxx
14|345|678| xxx

Now... I want to convert 12, 345, 678 and display "Customer 1"
or 12, 345, 679 to display "Customer 2"... and so forth...

Where in the client software can I write a plugin to make that
conversion?
OR is there an MDX that can read the crossjoin data and convert it for
me?
Thanx

Acquiesce

unread,
Mar 20, 2003, 2:56:33 AM3/20/03
to

Dxd,

yeah finally got it working. I've tried the distinct count and its the
correct figure...But there are a few "problems"...not sure if you face
this problem or not... as you know my front end is OWC10 webpage.
(Before that , the response time is reduce from 5 minutes to 1 minute
which i think its acceptable, for the time being)

Errors
1. In Excel, when you do a filtering on the rows to deselect certain
members of the dimension, the grand total figure does not reflect the
exact total of the table.
2. In OWC, in my internet explorer when i click the "export to excel"
button there will be a error message - "Format of the initialisation
string does not confrom to the OLE DB specification"

any ideas?

dxd

unread,
Mar 20, 2003, 8:48:01 AM3/20/03
to

Hi, Acquiesce:

Are you saying the query response time is reduced from 5 minutes to 1
minute? Glad to hear that.

Our clients are in VB, sending queries in MDX, work fine so far.
The original purpose of this trick was to reduce the RAM usage of
OLAPServer because the old DataID dimension had 10M+ members.

I will give a try on Excel today.
I have no experience on OWC. Can't help this issue.

Have a nice day.

Derek.

Originally posted by Acquiesce

Acquiesce

unread,
Mar 21, 2003, 4:27:04 AM3/21/03
to

Gauss,

Yupe it solve the problem...but i havent tested the performance yet...

Thanks!

Gauss

unread,
Mar 21, 2003, 4:36:19 AM3/21/03
to

Unfortunately the filter function seems to kill performance as
first tests
with a greater amount of data showed :-(

Searching for other options...

Regards
Gauss

Gauss

unread,
Mar 21, 2003, 3:24:44 AM3/21/03
to

Good approach, Derek!


Acquiesce,

I am also using OWC10, encountering the same problem with distinct count
and grand total.
Have you tried one of the following MDX:

COUNT(filter(nonemptycrossjoin(
[DataMID].children,
[DataKID].children,
[DataRID].children), [Measures].[auxDataCount] <>0))


or to ensure that only the IDs under the current member in the [Data_ID]
dimensions are counted:


COUNT(filter(nonemptycrossjoin(
Descendants([DataMID].CurrentMember, [DataMID].[DSM]),
Descendants([DataKID].CurrentMember, [DataKID].[DSK]),
Descendants([DataRID].CurrentMember, [DataRID].[DSR]))
, [Measures].[auxDataCount] <>0)
)

or something like (better performance?):

COUNT(filter(nonemptycrossjoin(
[DataMID].children,
[DataKID].children,
[DataRID].children,{ [Measures].[auxDataCount] },3)
, [Measures].[auxDataCount] <>0))

By using the Filter-Function Grand Total or Sub Totals should be
correct.
But now there might be a greater problem with poor performance.
Have to do some more testing.


Regards

Gauss

Originally posted by Acquiesce

>
>
> 1. In Excel, when you do a filtering on the rows to deselect certain
> members of the dimension, the grand total figure does not reflect
> the exact total of the table.
>

Acquiesce

unread,
Mar 21, 2003, 4:37:31 AM3/21/03
to

I noticed! :|

dxd

unread,
Mar 21, 2003, 9:22:43 AM3/21/03
to

This was we did,
defining a calculated

[Measures].[RealCount] as '


COUNT(nonemptycrossjoin(
Descendants([DataMID].CurrentMember, [DataIDM].[DSM]),
Descendants([DataKID].CurrentMember, [DataIDK].[DSK]),
Descendants([DataRID].CurrentMember, [DataIDR].[DSR]),

{[Measures].[auxDataCount]}, 3))'.

MDX:

select {[Measures].[RealCount]} on columns,
Filter([somedimension], [Measures].[auxDataCount] > 0) on rows
from [Cube].

The performance to us is quite OK. ( currently Our system has about
800,000 original DataID, which means we have 800 members in [DataKID]
dim and 1000 in [DataRID] dim, 1 is [DataMID] dim)

dxd

unread,
Mar 21, 2003, 8:55:42 AM3/21/03
to

Is that possible you define [Measures].[auxDataCount] such that
it is either empty or >0 , therefore the filter is not needed.

Originally posted by Gauss

--
Posted via http://dbforums.com

Gauss

unread,
Mar 21, 2003, 10:43:53 AM3/21/03
to

Hi Derek.

"My" [Measures].[auxDataCount] is already defined such that it is NULL
or > 0.

The OWC-Pivottable itself seems to have a problem with the MDX-Function
"nonemptycrossjoin" and the way it is calculating totals when members
are deselected.

So it seems to be a front-end problem.

Regards
Gauss

Originally posted by dxd

> Is that possible you define [Measures].[auxDataCount] such that
it is either empty or >0 , therefore the filter is not needed.

dxd

unread,
Mar 24, 2003, 9:08:59 AM3/24/03
to

Hello:

I have to admit that I am currently confused by the behavior of
nonemptycrossjoin function, especially when other dimensions are
selected or aggregated.

Can you please try

[Measures].[RealCount] as '
COUNT(
Descendants([DataMID].CurrentMember, [DataIDM].[DSM]) *
Descendants([DataKID].CurrentMember, [DataIDK].[DSK]) *
Descendants([DataRID].CurrentMember, [DataIDR].[DSR]) *
{[Measures].[auxDataCount]}, ExcludeEmpty)'?

Regards

Derek


Originally posted by Gauss

> Hi Derek.
>
> "My" [Measures].[auxDataCount] is already defined such that it is
> NULL
> or > 0.
>
> The OWC-Pivottable itself seems to have a problem with the
> MDX-Function
> "nonemptycrossjoin" and the way it is calculating totals when members
> are deselected.
>
> So it seems to be a front-end problem.
>
> Regards
Gauss

--
Posted via http://dbforums.com

Gauss

unread,
Mar 25, 2003, 3:31:32 AM3/25/03
to

Hi Derek.

In my example-cube calculations do not seem to come to an end.
Pivottable takes all of client's memory, CPU usage: 100%.

Regards
Gauss


Originally posted by dxd

> Hello:
>
> I have to admit that I am currently confused by the behavior of
> nonemptycrossjoin function, especially when other dimensions are
> selected or aggregated.
>
> Can you please try
>
> [Measures].[RealCount] as '
> COUNT(
> Descendants([DataMID].CurrentMember, [DataIDM].[DSM]) *
> Descendants([DataKID].CurrentMember, [DataIDK].[DSK]) *
> Descendants([DataRID].CurrentMember, [DataIDR].[DSR]) *
> {[Measures].[auxDataCount]}, ExcludeEmpty)'?
>
> Regards
>

Reply all
Reply to author
Forward
0 new messages