This is an excellent question. In my experience, the event of deleting
records can be a very valuable source of information about a business
process, and thus, it is very useful to capture the event in the data
warehouse. I have typically handled this by adding a DELETED column to
the fact or dimension table that stores a value of Y or N (or 0 or 1)
for deleted versus valid records.
Then, from the end-user tool, during query execution, you can modify
your query criteria to check for records that are marked deleted versus
valid. To ensure adequate performance, make sure that the DELETED
column is indexed using the appropriate technique for your database.
In Oracle DBs, low cardinality columns like this are usually retrieved
most efficiently by using bitmap indexes.
Note that for auditing purposes, this method works very well, because
the underlying integrity of the systems are not challenged by using
flags to mark deleted records. It is entirely possible to see the
deleted records with the appropriate query.
Good Luck,
Nenshad
Thanks in advance.
Doug
"Nenshad Bardoliwalla" <nen...@gmail.com> wrote in message
news:1129519520.0...@o13g2000cwo.googlegroups.com...
I would look into Time Variance as a concept to handle this situation.
have a look at this post on my blog it will help you get started.
check out:
http://bi-on-sql-server.blogspot.com/2005/07/news-group-post-slowly-changing.html
Myles Matheson
Data Warehouse Architect
http://bi-on-sql-server.blogspot.com/
I was thinking using an offseting fact record because the the fact tables
has a lot more rows (in the order of 100 million) than that of dimension (in
the order of 1000 to 10,000) typically.
It seems to be much more expensive in terms of storage and space to maintain
the Start Data, End Date and the flag in the fact.
Also, it will require two role playing time dimensions, one for start date
and another for end date, unless I degenerate the dates directly into the
fact record.
This is why I was leaning towards using an offseting fact record to achieve
the deletion.
e.g. Surrogate key, fkey1, fkey2, fkey3 that link to 1/3/2005, 100 will
become
Surrogate key +1, fkey1, fkey2, fkey3 that link to 1/4/2005, -100.
Any comments or thoughts? Thanks in advance.
<Myles.M...@gmail.com> wrote in message
news:1129632995.2...@g44g2000cwa.googlegroups.com...
Deletes can be a real pain. I tend not to have them in the fact table
and identify them in ODS in my data warehouses. The main reason is
performance. Having a flag tagging the status of a record will force a
table scan with some RDBMS. This is expensive with large tables.
In your case I would use the type 2 concept. This would allow you to
identify records that are no longer current. But be warned you will
have to educate your users to understand the types of transactions in
the fact table.
I have one question do your users want to report on delete
transactions?