MDDB vs RDBMS warehouse?

Matt Reedy

unread,

Dec 14, 1994, 2:38:39 PM12/14/94

to

I've been thinking (uh, oh :-) about and talking with people about
the trade offs involved in doing multidimensional analysis using a
proprietary multidimensional database (MDDB) and a data warehouse
stored in an RDBMS. The MDDB offers significantly
better performance for drill down and slice-n-dice operations, at
the expense of (potentially) duplicating data that is stored in an
RDBMS data warehouse. The RDBMS data warehouse might already be setup
at some companies to offer data access for decision support, but if
part of that decision support means drill down, the performance can
really suffer.

I have a couple of questions. How are you doing multidimensional
analysis? Are you using an existing RDBMS data warehouse, and putting
up with the response time delays? Are you creating a data warehouse in
a proprietary MDDB (like Arbor's Essbase, Kenan's Acumate, IRI's Express,
etc.) and not using any RDBMS data warehouse? Or do you have
an RDBMS data warehouse and are feeding data into an MDDB and living
with the duplication?

If you are doing multidimensional analysis using an RDBMS, is the
performance even an issue? Do you believe (as I do) that the fact
that RDBMS vendors are pouring millions of dollars into improving the
performance of their engines and adding multidimensional extensions
to their databases, coupled with the emergence of parallel query
processing systems will solve the performance issue?

I'm interested in stimulating debate. By all means, contribute. Thanks.

matt
--
Matthew Reedy Chief Technical Officer
IQ Software Corporation Internet: ma...@iqsc.com
400 N Loop 1604 E, Suite 100
San Antonio, TX 78232 (210) 490 6684 Fax: (210) 490-3590

Seth Grimes

unread,

Dec 14, 1994, 4:44:35 PM12/14/94

to

ma...@iqsc.com (Matt Reedy) writes:
> Are you creating a data warehouse in
>a proprietary MDDB (like Arbor's Essbase, Kenan's Acumate, IRI's Express,
>etc.) and not using any RDBMS data warehouse? Or do you have
>an RDBMS data warehouse and are feeding data into an MDDB and living
>with the duplication?

One comment: data warehouse to me implies a robust database engine --
functions like backup/recovery, multi-writers, object-level (at least)
security, transaction management. Acumate and Express have few of these
capabilities, and if Essbase has them (beyond multiple writers and security)
they're probably elementary. SAS is in the same boat, by the way.

Also, data warehouse implies accessibility and Express is still notably
closed although this is improving in most respects with mid-'95 releases.

Seth

Ron McKay

unread,

Dec 14, 1994, 5:43:46 PM12/14/94

to

In article <1994Dec14.1...@iqsc.com> ma...@iqsc.com (Matt Reedy) writes:
>Date: Wed, 14 Dec 1994 19:38:39 GMT

>I've been thinking (uh, oh :-) about and talking with people about
>the trade offs involved in doing multidimensional analysis using a
>proprietary multidimensional database (MDDB) and a data warehouse
>stored in an RDBMS. The MDDB offers significantly
>better performance for drill down and slice-n-dice operations, at
>the expense of (potentially) duplicating data that is stored in an
>RDBMS data warehouse. The RDBMS data warehouse might already be setup
>at some companies to offer data access for decision support, but if
>part of that decision support means drill down, the performance can
>really suffer.

>I have a couple of questions. How are you doing multidimensional
>analysis? Are you using an existing RDBMS data warehouse, and putting
>up with the response time delays? Are you creating a data warehouse in
>a proprietary MDDB (like Arbor's Essbase, Kenan's Acumate, IRI's Express,
>etc.) and not using any RDBMS data warehouse? Or do you have
>an RDBMS data warehouse and are feeding data into an MDDB and living
>with the duplication?

I'll jump in! At present, we will be migrating technologies in '95, we have
what I will call 3 levels of users. Level 1, your typical EIS users, all
their drill down detail is in pre-formatted pre-run
pre-determined reports/graphs stored on a LAN server(read: performance isn't
an issue). Level 2 users, those that want the EIS access but aren't
satisfied, ie. they want information we couldn't(or didn't want to
anticipate), we give them a front end that they enter their question into with
clicks on boxes and choices from lists, and send this query off to a RDB and
they wait for the results. and level 3 users, the analysts with the full
power to get at the RDB and slice and dice the info anyway they want. This is
all done with IBM's DIS and DB/2 on a mainframe.
the answer to your question would be, we put up with the response time delays,
which can be good or bad, considering some of the 'aggregate' tables have
150,000,000 rows

>If you are doing multidimensional analysis using an RDBMS, is the
>performance even an issue? Do you believe (as I do) that the fact
>that RDBMS vendors are pouring millions of dollars into improving the
>performance of their engines and adding multidimensional extensions
>to their databases, coupled with the emergence of parallel query
>processing systems will solve the performance issue?

I will give you that doing multidimensional analysis is better done with a
multidimentional model/store/DB/whatever. Performance is an issue right now
for us, using an RDBMS for a MDDBS job! I would also agree that vendors
putting money behind their products will cure most of the performance issues.
I understand that Cognos' MD model will soon have the same performance with
something like 2,000,000 rows like it had with 200,000 rows.

>I'm interested in stimulating debate. By all means, contribute. Thanks.
>matt
>--
>Matthew Reedy Chief Technical Officer
>IQ Software Corporation Internet: ma...@iqsc.com
>400 N Loop 1604 E, Suite 100
>San Antonio, TX 78232 (210) 490 6684 Fax: (210) 490-3590

Ron McKay
at work -->>USTC...@IBMMAIL.COM at play -->> RMC...@INTERLOG.COM
Metaphor Marauder
"Preconceived notions can lead to utter madness" - Luba

Craig Schiro

unread,

Dec 15, 1994, 4:05:09 PM12/15/94

to

Seth Writes:
>...data warehouse implies accessibility and Express is still
>notably closed...

Seth:
I just want to ensure that your perceptions of our products are
based on all the current facts.

What do you mean by "closed"?

Most of the IRI Software customers and prospects I speak with
require OLAP products to provide access to other source databases
(DB2, Oracle, Sybase and others) as well as allow front end tools
(Visual Basic, Excel and others) to access the OLAP database.

IRI Software's Express meets these requirements.

If you share this definition of "open" or "NOT closed", then is
it possible that your information regarding our products is not
current? I will be glad to provide the latest product facts if
you (or others) are interested.
Regards,
Craig Schiro, IRIFORUM CompuServer Sysop, IRI Software
Regards,
Craig Schiro, IRIFORUM CompuServer Sysop, IRI Software

--
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Craig Schiro
IRI Software Internet: Craig....@InfoRes.Com
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

Martin Richmond-Coggan

unread,

Dec 16, 1994, 11:23:03 AM12/16/94

to

Tim;

If you are concerned about sizes in Essbase, have you had a look
at TM/1?

Database sizes in MDDMs are frequently exacrebated by
hyper-sparse matrices.

TM/1 architecture allows mutiple multi-dimensional tables to form
the database, each table having different dimensionality to suit
different structure of data, thus reducing sparcity problems.
Also, TM/1 tables can be split over multiple servers to increase
overall capacity and performance.

Regards

Martin

Tim Harris

unread,

Dec 15, 1994, 5:59:37 PM12/15/94

to

In article <1994Dec14.1...@iqsc.com> ma...@iqsc.com (Matt Reedy) writes:

>From: ma...@iqsc.com (Matt Reedy)
>Subject: MDDB vs RDBMS warehouse?

>Date: Wed, 14 Dec 1994 19:38:39 GMT

>I've been thinking (uh, oh :-) about and talking with people about

>the trade offs involved in doing multidimensional analysis using a
>proprietary multidimensional database (MDDB) and a data warehouse
>stored in an RDBMS. The MDDB offers significantly
>better performance for drill down and slice-n-dice operations, at
>the expense of (potentially) duplicating data that is stored in an
>RDBMS data warehouse. The RDBMS data warehouse might already be setup
>at some companies to offer data access for decision support, but if
>part of that decision support means drill down, the performance can
>really suffer.

>I have a couple of questions. How are you doing multidimensional
>analysis? Are you using an existing RDBMS data warehouse, and putting
>up with the response time delays? Are you creating a data warehouse in
>a proprietary MDDB (like Arbor's Essbase, Kenan's Acumate, IRI's Express,
>etc.) and not using any RDBMS data warehouse? Or do you have
>an RDBMS data warehouse and are feeding data into an MDDB and living
>with the duplication?

>If you are doing multidimensional analysis using an RDBMS, is the
>performance even an issue? Do you believe (as I do) that the fact
>that RDBMS vendors are pouring millions of dollars into improving the
>performance of their engines and adding multidimensional extensions
>to their databases, coupled with the emergence of parallel query
>processing systems will solve the performance issue?

>I'm interested in stimulating debate. By all means, contribute. Thanks.

Hi Matt. (and everyone else)

You've raised some interesting issues. We are currently developing an RDBMS
Data Warehouse (using Sybase), whilst at the same time using Essbase for
Multi-Dimensional analysis. The reason we're doing both is because:

a) We need to build a warehouse to clean up our legacy data, resolving
inconsistencies etc. for use by general DSS apps which don't need the
drilldown/slice&dice facilities of MDDBs. By using this data to populate
our MDDB tables, we can ensure that the data is in line with what others
are using in non MDDB apps.

The duplication issue is to our minds a positive thing - it guarantees
everyone is comparing apples with apples.

b) The warehouse is being used as the basis for a world-wide data glossary,
based on a rigorous logical data model - so we don't want to put it on an
MDDB, to keep it simple.

c) We're also a bit concerned at the capacity of Essbase to hold a large
warehouse: we're up to around 8GB on Essbase, and our warehouse is
currently around 15GB per timeslice.

d) Although I broadly agree with your thoughts on RDBMS performance, we're
suffering _today_ with performance problems. I always find it amusing that
if you give someone a computer system that takes a couple of minutes to do
something that used to take them a couple of days, they complain at
the response times!

Have a cool yule,

Tim.

Seth Grimes

unread,

Dec 16, 1994, 7:32:44 AM12/16/94

to

Craig Schiro <76703...@CompuServe.COM> writes:

>Seth Writes:
>>...data warehouse implies accessibility and Express is still
>>notably closed...

>What do you mean by "closed"?

To me, an "open" product should be scalable and platform-independent.
Express sells different products (albeit sharing the same data manipulation
language) for PCs and Unix systems and pcExpress doesn't even run on
Windows NT. (As I acknowleged, this will change in mid-'95.) As to scalable,
I believe that the Unix product will not run on more than one processor at a
time of an SMP machine.

Open products should also have open APIs into and out of the product. So
far as I know, you can't call the Express server (MDB) from a 3GL program --
Fortran, C, or Visual Basic for that matter -- and if Express will offer ODBC
or "objects" (table, graph, etc.) they won't be for the Unix Express server
except through a pcExpress pass-through. As to calls out, they don't allow
arrays or dimensions to be passed even though these objects are the raison
d'etre of a multi-dimensional DBMS.

>Most of the IRI Software customers and prospects I speak with
>require OLAP products to provide access to other source databases
>(DB2, Oracle, Sybase and others) as well as allow front end tools
>(Visual Basic, Excel and others) to access the OLAP database.

>IRI Software's Express meets these requirements.

Only on the PC, and sometimes quite lamely. For instance, the SQL interface
is cursor-based (and doesn't, all the same, allow nested cursors), which is
terrible performance-wise when large amounts of data are to be passed. I
know, I've tried it. As another example, current inter-process communication
on the PC is based on DDE, not a performance winner, and each front-end to
the pcExpress DB *shares the same instance of the DBMS*, a potential process-
integrity nightmare. Maybe your customers are limiting their expectations
of the product.

>If you share this definition of "open" or "NOT closed", then is
>it possible that your information regarding our products is not
>current? I will be glad to provide the latest product facts if
>you (or others) are interested.

Of course. Please do post development updates here.

Seth

Seth Grimes

unread,

Dec 17, 1994, 5:36:34 PM12/17/94

to

Tim.H...@mail.bt.net (Tim Harris) writes:

>d) Although I broadly agree with your thoughts on RDBMS performance, we're
> suffering _today_ with performance problems. I always find it amusing that
> if you give someone a computer system that takes a couple of minutes to do
> something that used to take them a couple of days, they complain at
> the response times!

This is another point in the MDDB vs RDBMS debate regarding their qualities
as DBMSes, something I got at in my note on Express, scalability. Oracle
and Informix now have query parallelization, the ability to decompse
queries and index building and other DBMS functions into threads that are
scheduled on as many processors as you have available in your machine.
Sybase has a more limited form of this in their Navigation Server. And of
course these functions have been available in specialized engines (e.g.,
Teradata) for quite a while. What do Arbor (Essbase), IRI (Express),
Kenan (Acumate), or others have? Not even a research program *so
far as I know*, and not even scheduling of different server functions in
different processes or threads on multiple processors. Of course, one
way to take advantage of advances in products like the most popular RDBMSes
is to open your engine to use data stored in them automatically and trans-
parently as Essbase appears to be doing and as Acumate and SAS do with
forms of descriptors (called focal points in the case of Acumate and
through SAS/Access descriptors). I can't vouch for performance but the
idea is on track.

Seth

Daniel Druker

unread,

Dec 18, 1994, 2:43:23 PM12/18/94

to

In article <grimes.787703082@access1>, gri...@access1.digex.net (Seth
Grimes) wrote:

Seth brings up some great points here. In fact, these reflect some of the
central objections to MDDB I typically encounter with analysts and
prospects. To summarize the argument : "Since RDBMS's are getting
better at doing complex queries, and hardware is getting cheaper all the
time, why do I need to invest in a new technology, an OLAP (or MDDB)
server ?"

There are lots of ways to answer this, I'll address it in two parts,
functionality and performance. This also happens to be very
relevant to the original question in this thread, where does Arbor
think that OLAP servers fit in the data warehousing environment.

Functionality
First, allow me to take it as a given that you need to be able to
structure/view/query data dimensionally. Otherwise we wouldn't even
be talking here. When I do presentations on this subject I
spend a long time talking about the database environment,
showing that OLTP and OLAP are on the far ends of
the spectrum of information processing needs. I also show that
OLAP is really a highly specialized part of a much larger universe
of complex queries.
Given this:

You get a lot of things for "free" with an OLAP server. Structuring
the data dimensionally is an automatic, as is creating hierarchies
and aggregating data from the bottom of the model to the top.
Though you can do this in relational, usually using the "Star Model"
it's not free. Building the code to import data into relational,
structure it dimensionally, build the Star tables, create the proper
indices, roll-up the database, etc is decidedly non-trivial. There
is no tool out there yet that can do this automatically, to my knowledge.
Companies that have built these databases report 6 month or longer
development cycles with lots of COBOL and Syncsort- data
massaging is almost always perfomed outside of the
RDBMS for performance reasons.

Adding an OLAP server also makes it easy to create complex
derived data. Since the OLAP servers are knowledgable about
common business needs, it is reasonably easy to do very sophisticated
things like cost allocations, activity based costing, etc. These involve moving
data values around the various levels of the hierarchy according to a complex
set of business rules. In relational, you're back to COBOL or C and custom
programming to do this kind of stuff. In the warehousing environment,
which is usually read-only, adding an OLAP server in front of the relational
warehouse lets you build fully read/write applications. (budget, forecast,
what if, etc.) Usually the warehouse contains "Actual" information
taken from production business systems, but doesn't include "What If",
budget, or forecasting data which starts out from the actual results. In
general I believe that OLAP servers add lots of value to
static warehouse data.

OLAP servers have lots of functionality built in that relationals
may have in 3-5 years, but since our payback is usually 3-12
months we have little trouble overcoming the FUD factor.

Performance
It is true that relational databases are gettting better at throwing
more hardware at a given problem (Parallel Query, etc) or at doing
things smarter (bit mapped indexes in Sybase - Expressway, etc)

However, until the marginal cost of hardware gets so cheap that
it's negligible, you will be better off doing dimensional queries
with a true OLAP server. Also, the OLAP market targets
"Gold Collar Workers" and strives for instantaneous response
time, while query accelleration targets the entire enterprise, and
strives for improved response time. This is an important distinction.

Doing OLAP in the Star relational model means you have an I/O
penalty of 1 to 4 orders of magnitude for a given operation. If you sit
down and figure out the SQL statements you need to use to
do various types of retrievals and pivots, you will quickly come to this
conclusion. The math on this is indisputable, and not too hard to do.
Eventually hardware will get so cheap that this won't matter, but not
in the near term future. The cost of buying the additional hardware
to get the same performance as a true OLAP server is still a lot
greater than the price of most OLAP servers.

There was a front page article a few weeks ago in open systems
today about Quantum corporation using an OLAP server in front of their
Oracle/HP warehouse. There was also a good article in this weeks
Computerworld by Rich Finkelstein that touches on this subject.

As to the questions about threading in Seth's post, Essbase has background
processes associated with each database, and, much like Oracle or Sybase,
starts a seperate thread on the server for each concurrent user. Thus we
scale very well on SMP platforms. We don't have a large number of discrete
background tasks for the server engine like an Oracle or Sybase, but
realize that RDBMS background tasks (Logwriter, database
writer, archiver, recovery daemon, etc) have to do with Transaction
Processing.
And OLAP servers are NOT OLTP databases.

Hope this helps,

- Dan

Home: Office:
Daniel Druker Daniel Druker
14190 Wild Plum Lane Arbor Software Corporation
Los Altos Hills, CA, 94022 1325 Chesapeake Terrace
ddr...@netcom.com Sunnyvale, CA, 94089
(800) 858-1666 x4027
ddr...@arborsoft.com

Richard Finkelstein

unread,

Jan 7, 1995, 7:35:51 PM1/7/95

to

Hi Tim.

Thanks for the helpful information. However, I am reasonably shocked
(excited?) that you are able to get 8 gigabytes of data into Essbase running
on WIndows NT. Since this is the largest I have heard of so far, I would like
to ask some questions if you do not mind:

1) How many users are on the system?

2) How much time does it take to load the database? To update the database?

3) How do you backup the database and how long does it take?

4) How is response time affected as you enlarge the database?

Thanks for any information that you can share.

Regards,

Rich

In article <Tim.Harris...@mail.bt.net> Tim.H...@mail.bt.net (Tim
Harris) writes:>From: Tim.H...@mail.bt.net (Tim Harris)
>Subject: Re: MDDB vs RDBMS warehouse?
>Date: 8 Jan 95 05:04:12 GMT

>Have a cool yule,

>Tim.

Richard Finkelstein
Performance Computing, Inc.
312-549-8325 (Voice)
312-549-4824 (Fax)

Richard Saville

unread,

Jan 9, 1995, 8:53:24 PM1/9/95

to

Hi Tim:

I am in the process of evaluating Essbase. I have a concern about data size
and number of dimensions. Can you provide some details on your 8GB database?

How many dimensions does it have?

Do you have any feel for what percent of that 8GB is data and what part is
indexes?

Have you ever had to perform a complete recalc, if so how long did it take?

I would also be interested in your responses to the questions asked by
Richard Finkelstein.

Thanks,
Richard Saville

>c) We're also a bit concerned at the capacity of Essbase to hold a large
> warehouse: we're up to around 8GB on Essbase, and our warehouse is
> currently around 15GB per timeslice.
>

>Tim.