Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Should you avoid updating the fact table fields?

1 view
Skip to first unread message

Stephen

unread,
Aug 19, 2003, 1:39:59 AM8/19/03
to
I'm reading this data warehouse book and it says that you should never
update a record in the fact table unless it is to correct an error but it
doesn't say what the consequences of doing this would be.

Could someone give me an example of why it is a bad idea to update fields in
the fact table?

I'm currently working on a data warehouse that has an invoice number in the
table that is the basis for the fact table. Sometimes this is null
depending on how far along the record is in the billing process before I try
to import it into the data warehouse. Should I import the record and then
at a later time update it when the invoice field is not null or wait until
the record is complete to add it to the fact table?


Kevin

unread,
Aug 19, 2003, 10:21:54 AM8/19/03
to
The invoice number should not be in the fact table, it should be in a
dimension.

--
Kevin Connell, MCDBA
--------------------------------------------------
The views expressed here are my own
and not of my employer.
----------------------------------------------------
"Stephen" <swrot...@hotmail.com> wrote in message
news:#5tQhRhZ...@TK2MSFTNGP10.phx.gbl...

Richard R

unread,
Aug 19, 2003, 1:12:59 PM8/19/03
to
Stephen,

Whether or not you update records in a fact table depends on how you are
processing OLAP cubes built over the underlying SQL server database. It also
has a philosophical aspect in that a data warehouse should have one version
of the data, which is recognised as "The Truth", so you only change it if it
is in error.

If you don't have any OLAP cubes over the database, then there is no problem
updating the data, users' queries will report the latest data. So if you
change that sale of 10,000,000 USD back to 10,000 USD as it should be, then
the warehouse will reflect this. No problem. Unless of course you have just
declared this year-end to the government :-<

If you have OLAP cubes built, then there are consequences, because the cube
is a copy of the data, in a new format. So if you change the fact table,
then you have to make sure that the cube is up to date. Assuming you use MS
Analysis services, then you might have your cube set to incrememtal update,
refresh, or full process. If you have incremental update, then the cube
won't go back to pick up the changes, and your warehouse and cubes will be
out of sync. If you refresh or re-process you will be fine.

I have several warehouses where I load in a rolling 60 day's worth of data,
becuase the users need data up to yesterday, but accept that the data may be
changed for up to 2 months after the initial load (these are for a shipping
organisation, and there are ALWAYS arguments over who pays for what on a
ship!). So every night I delete and update the last 60 days of the fact
table. The cubes either re-process or refresh, and everything is OK.
Eventually when everyone has agreed the fees, the relevant rows are marked
"certified", so the users know they are OK.

In response to Kevin's comment that the invoice number should be on a
dimension, I would disagree. I have the same setup, and the Invoice number
is on the fact table for *non-OLAP* reporting, although I have allowed it to
appear in drill-to-detail.
Even if you had created an Invoices dimension (with information about
invoice type and payment terms etc), then you still need a key field on the
fact table, and it makes sense to use the invoice number itself as the key.
If it is not a purely numeric field then it would be faster to use an
artificial integer key, but probably not worth it. The users won't see any
difference in performance, and I bet the additional time to do the join is
almost imeasurable..
Of course If you want to use the invoice number on the fact table as a
dimension, (which you can if you want to), or as a key to a dimension table
then you MUSTchange the NULL to a valid value. "Unknown" or 00000 would be
obvious choices.

Hope this helps,

Rich


"Stephen" <swrot...@hotmail.com> wrote in message
news:#5tQhRhZ...@TK2MSFTNGP10.phx.gbl...

Stephen

unread,
Aug 19, 2003, 5:06:52 PM8/19/03
to
Thanks for all the information Richard.


"Richard R" <general@adslleamcres> wrote in message
news:edgIxPnZ...@TK2MSFTNGP10.phx.gbl...

0 new messages