Database/Table Design Question - Object/Event Model

orandov

unread,

Dec 11, 2007, 12:06:16 PM12/11/07

to

Hi,

Facts:
I created a database to support an application that tracks events on
different objects. The two main tables are tbl_Object and
tbl_EventLog. Each table has unique ID and on the tbl_EventLog there
is FK for a record in the tbl_Object. The events are inserted all the
time for the same or different objects from the tbl_Object. There are
about 600,000 objects in the tbl_Object and 1,500,000 (and growing)
events in tbl_EventLog.

Question:
The user often wants to know what the last event was for a specific
object.

What is the best way of retrieving the last event?

Should I simply do a max(eventdatetime) on a specific object? or
Should I add a LastEventID column to tbl_Object and update it every
time a new event is inserted? or any other way to implement it?

I chose the second method because I didn't think it made sense search
the event table everytime the user wants to know the last event, but I
wanted to know what the experts thought.

Please let me know what you think.

Thank you,
Oran Levin

--CELKO--

unread,

Dec 11, 2007, 5:29:58 PM12/11/07

to

>> I created a database to support an application that tracks events on different objects. The two main tables are tbl_Object [sic: violates ISO-11179 rules and is too vague] and tbl_EventLog [sic: violates ISO-11179 rules].

Pull off that silly "tbl-" prefix and start thinking in sets; in this
case, you should have plural names, unless there actually is only one
"object" and only one "event"

>> Each table has unique ID and on the tbl_EventLog there is FK for a record [sic: rows are not records] in the tbl_Object. <<

>> The events are inserted all the time for the same or different objects from the tbl_Object. There are about 600,000 objects in the tbl_Object [sic] and 1,500,000 (and growing) events in tbl_EventLog. <<

This is the wrong data model. The usual design error is to have only
one time in a row to capture when an event started, then do horrible
self-joins to get the duration
of the status change. Let me use a history table for price changes.
The fact to store is that a price had a duration:

CREATE TABLE PriceHistory
(upc CHAR(13) NOT NULL
REFERENCES Inventory(upc),
start_date DATE NOT NULL,
end_date DATE, -- null means current
CHECK(start_date < end_date),
PRIMARY KEY (upc, start_date),
item_price DECIMAL (12,4) NOT NULL
CHECK (item_price > 0.0000),
etc.);

You actually needs more checks to assure that the start date is at
00:00 and the end dates is at 23:59:59.999.. Hrs. You then use a
BETWEEN predicate to get the appropriate price.

SELECT ..
FROM PriceHistory AS H, Orders AS O
WHERE O.sales_date BETWEEN H.start_date
AND COALESCE (end_date, CURRENT_TIMESTAMP);

It is also a good idea to have a VIEW with the current data:

CREATE VIEW CurrentPrices (..)
AS
SELECT ..
FROM PriceHistory
WHERE end_date IS NULL;

Now, download the Rick Snodgrass book on Temporal Queries in SQL from
the University of Arizona website (it is free). And finally Google up
my article at www.DBAzine.com on transition constraints.

orandov

unread,

Dec 11, 2007, 5:54:19 PM12/11/07

to

Hi --CELKO--,

Thank you for your response. Just to make sure I understand you
correctly, you suggested having 2 datetimes for every event that
occurs. The 2nd one will represent when that event ends and the next
one (which will be a separate record) starts. Therefore, if you want
to know the last event you just look for the event record for that
object that has a null in the end date (or create a view like you
suggested).

That sounds like a good idea.

Oran

orandov

unread,

Dec 11, 2007, 6:37:56 PM12/11/07

to

Can't find the link on the University of Arizona's website to the book
that works.

This is where I looked...
http://www.cs.arizona.edu/people/rts/tsql2.html

Oran

--CELKO--

unread,

Dec 12, 2007, 5:27:49 AM12/12/07

to

>> Can't find the link on the University of Arizona's website to the book that works. <<

http://www.cs.arizona.edu/~rts/tdbbook.pdf

Developing Time-Oriented Database Applications in SQL, Richard T.
Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July,
1999, 504+xxiii pages, ISBN 1-55860-436-7.

The PDF of this book is here (which looks a little fuzzy but prints
fine, except for pages 30-31, which are here) and its associated CD-
ROM in zip (59MB) or gzipped tar (57MB) formats.

Message has been deleted

jhof...@googlemail.com

unread,

Dec 12, 2007, 12:29:54 PM12/12/07

to

Hi Oran,

While Celko's response is an excellent example of how to store history
data, it is not necessarily the best solution for your application...

Could you supply some additional information (DDL, sample data) about
your Object and EventLog tables? Based on the table names and your
description, I would guess that Time (eventdatetime in your original
post, but as events could feasibly have a duration, I would expect to
see an endtime/duration column as well) is actually an attribute of
the "Event" that occurred. This is because Events are things that
occur at a point in time (or over a period of time). To implement
Celko's suggested solution, you would need to add 2 datetime columns -
something like "validfromdatetime" and "validtodatetime".

Further - if the EventLog table is anything like the other "log"
tables I've worked with, I would guess that the use of validfrom/to
date ranges is probably overkill. The reason to use date ranges is if
you ever need to answer the question "At this point in history, what
was the last event recorded against each object". My experience is
that usually when looking at log tables, you're more interested in
"What events occurred during this period" questions which would look
at the eventdatetime (i.e. attribute) column, not at the validfrom/to
(audit) columns.

I would probably just use MAX(eventdatetime) or a bit flag (setting
myself up for slaughter here :-/) to indicate the latest event.
Either method will work, my main concern is in highlighting the
difference between attribute information and audit information ...
maybe the eventdatetime column that you allude to *is* actually an
audit column and not an attribute - without a DDL and sample data it
is hard to say :)

I would definitely use a view to facilitate queries of this nature
either way.

Good luck!
J

orandov

unread,

Dec 12, 2007, 3:01:33 PM12/12/07

to

Here are my two tables...

Object Table:

CREATE TABLE [dbo].[tbl_Folder](
[FolderID] [int] IDENTITY(1,1) NOT NULL,
[FolderName] [char](12) NOT NULL CONSTRAINT
[DF_tbl_Folder_FolderName] DEFAULT (0),
[LastEventID] [int] NULL,
[PurgedFlag] [char](1) NOT NULL CONSTRAINT
[DF_tbl_Folder_PurgedFlag] DEFAULT ('N'),
[InsertBy] [varchar](50) NULL,
[InsertDateTime] [datetime] NULL,
[UpdateBy] [varchar](50) NULL,
[UpdateDateTime] [datetime] NULL,
[DeleteBy] [varchar](50) NULL,
[DeleteDateTime] [datetime] NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL DEFAULT (newid()),
CONSTRAINT [PK_tbl_Folder] PRIMARY KEY CLUSTERED
(
[FolderID] ASC
) ON [PRIMARY]
) ON [PRIMARY]

Event Table:

CREATE TABLE [dbo].[tbl_EventLog](
[EventID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[FolderID] [int] NOT NULL,
[EventType] [varchar](50) NOT NULL,
[FromDepartmentType_EmployeeID] [int] NOT NULL,
[ToDepartmentType_EmployeeID] [int] NOT NULL,
[EventDateTime] [datetime] NOT NULL,
[InsertBy] [varchar](50) NULL,
[InsertDateTime] [datetime] NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL DEFAULT (newid()),
CONSTRAINT [PK_tbl_Event] PRIMARY KEY CLUSTERED
(
[EventID] ASC
) ON [PRIMARY]
) ON [PRIMARY]

EventDateTime is the time of the event. Then InsertDateTime is the
audit time. The rowguid's are used for replication.
The LastEventID on the folder table is a FK. As is FolderID on the
event table.

Thanks,

Oran

jhof...@googlemail.com

unread,

Dec 13, 2007, 5:39:22 AM12/13/07

to

Hi Oran,

If you have a LastEventID on tbl_Folder, why do you need the same
functionality on tbl_EventLog? Aah - going back to your original
question, it looks like you've implemented the "add a LastEventID
column to tbl_Object" approach :)

Maybe I'm just getting pedantic about the naming, but for me a log
table is there to record the where's and when's of events occurring.
Adding history tracking to a log table seems like a pretty unusual
thing to have to do, as the history is already intrinsic to the data.

You still haven't really gone into detail on how these tables are/will
be used... As always, there are a number of ways to produce the
results you are after, the "best" way comes down to balancing your
costs vs requirements equation.

Good luck!
J

orandov

unread,

Dec 13, 2007, 12:26:30 PM12/13/07

to

Hi J,

The requirements for the project were to be able determine the status
and location of the Folder (the object) at all times and to track a
history of what has happened to this folder in the past.
The users were really only interested in knowing the current status
and location of a folder. The history requirement was something that
they mentioned would be nice but they didn't really care what it
looked. At the time I had never worked with history or log tables and
I had some deadlines to meet so after consulting a co-worker I put
this tbl_EventLog together (he reccommended the "log" being added on).

There are about 15 event types that can happen to a folder and there
are rules of which events can happen when. After an event occurs the
Location and Status change of the folder. Location is based on where
the event ended up, the "ToDepartmentType_EmployeeID" column. The
status depends on the "EventType" of the event that just occurred.

Looking back at what I did I know I missed a number of things.

1) The history/log should be done differently. Some people
reccommended two tables. One to store the active data and one that
logs the history.
2) There are two levels of statuses. Some statuses stay with a folder
and only get changed by certain events while the level statuses will
be changed by different events.

For example, the event "check out" happens to a folder. The
status is "checked out". The event "check in" will change the status
to "checked in." While it is "checked in" the folder can have the
event "purge" happen to it which
gives the folder the status of "checked in" and "purged."
If the folder now has the "check out" event occur it will be "purged"
and "checked out."

I didn't realize this untill the middle of development and it was
already too late to change the design. Looking back I am trying to
think of the best way to design this.

I hope I made it clearer what the tables are used for.

What is the difference b/n a "log" and "history"?

Would you have three separate tables for current data, log, and
history?

Thanks,

Oran

jhof...@googlemail.com

unread,

Dec 14, 2007, 5:45:32 AM12/14/07

to

Hi Oran,

In my view (note - this is just my opinion):
Log - an ordered record of what happened when
History - a record of the state of your data at a given point in time.

The difference is that a log would be used to satisfy the questions
"What happened next?", or "What happened at this time?", a history
would satisfy the question "At this point in time, what did my data
look like?" The difference is pretty subtle, and in theory they are
interchangeable (you can derive the log from the history, and vice
versa) - so really it comes down to your requirements, and the cost
equation. Storing history data is never a bad idea in a database, it
is up to you to decide whether it will cost too much to implement in
your project. With unlimited resources I would choose to store all
history all the time - in reality this is rarely a possibility :)

WRT the statuses .. a quick-fix could be to simply add statuses for
all status combinations. So you could have a single status which
means "checked in and purged" or "checked out and purged". I'm fairly
sure Celko can give you a long list of reasons why this is not a good
idea though :) If the relationship between Folder and Status is many-
to-many, make it so in your database.

<quote>

Would you have three separate tables for current data, log, and
history?

</quote>

It sounds like you're developing some kind of document management
system (or maybe you've changed the names of your objects to
illustrate your point?) - if this is the case I would imagine that
history tracking/audit capability is fairly important in this
application. I see you already have a number of audit columns on your
Folder table (Inserted/Updated/Deleted ...). Something to remember
here, is that even though you're tracking insertion and deletion - for
updates you are only going to have the audit details of the *last*
update to run against the table. It might be better to implement the
table more like:

CREATE TABLE [dbo].[tbl_Folder](
[FolderID] [int] IDENTITY(1,1) NOT NULL,
[FolderName] [char](12) NOT NULL CONSTRAINT
[DF_tbl_Folder_FolderName] DEFAULT (0),
[LastEventID] [int] NULL,
[PurgedFlag] [char](1) NOT NULL CONSTRAINT
[DF_tbl_Folder_PurgedFlag] DEFAULT ('N'),

[CreatedBy] [varchar](50) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ValidFrom] [datetime] NOT NULL, -- Valid from is NOT NULL
[ValidTo] [datetime] NULL, -- This is NULLable with the NULL
row being the "current" one.

[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL DEFAULT
(newid()),
CONSTRAINT [PK_tbl_Folder] PRIMARY KEY CLUSTERED
(
[FolderID] ASC
) ON [PRIMARY]
) ON [PRIMARY]

You don't need Inserted and Updated data as you are only inserting and
this is reflected by the CreatedBy/On columns. Deleted is also
unnecessary - when a row is deleted, simply set the ValidTo datetime
and don't insert a new row. When you query the data, the row will not
meet the "WHERE ValidTo IS NULL" constraint. Regarding current data -
I would do this using a VIEW, not a whole new table.

I hope this helps!
J

orandov

unread,

Dec 14, 2007, 12:11:17 PM12/14/07

to

Hi J,

Thank you for all of your suggestions and explainantions. It is very
helpful.
I guess this is a document management system but it is for physical
folders not files on a computer.

> WRT the statuses .. a quick-fix could be to simply add statuses for
> all status combinations. So you could have a single status which
> means "checked in and purged" or "checked out and purged". I'm fairly
> sure Celko can give you a long list of reasons why this is not a good
> idea though :) If the relationship between Folder and Status is many-
> to-many, make it so in your database.

This was actually my work around. I made statuses like "Purged Check
Out" and "Purged Check In."

> [CreatedBy] [varchar](50) NOT NULL,
> [CreatedOn] [datetime] NOT NULL,
> [ValidFrom] [datetime] NOT NULL, -- Valid from is NOT NULL
> [ValidTo] [datetime] NULL, -- This is NULLable with the NULL
> row being the "current" one.

How would it look if I was actually updating a record?
Would the ValidTo get the current date and a new record is created
with ValidTo being NULL?
Or add more fields?

> The difference is that a log would be used to satisfy the questions
> "What happened next?", or "What happened at this time?", a history
> would satisfy the question "At this point in time, what did my data
> look like?"

So it would sound like the tbl_EventLog would be classified as a "Log"
table. I don't have any history tables with data of what happened at a
certain point in time.

Thanks,

Oran

--CELKO--

unread,

Dec 14, 2007, 12:19:21 PM12/14/07

to

>> I guess this is a document management system but it is for physical folders not files on a computer. <<

PAPER!? They still make that stuff? :)

You might want to look at actual document management systems rather
than RDBMS. Another system to look at is "Shephardization" or talk
to a law student. This is the system used for USA court cases which
links decisions together to give a complete picture of the state of
affairs.

http://www.19thcircuitcourt.state.il.us/bkshelf/l_libr/citation_searching.htm

jhof...@googlemail.com

unread,

Dec 14, 2007, 12:50:40 PM12/14/07

to

Hi Oran,

<quote>

How would it look if I was actually updating a record?
Would the ValidTo get the current date and a new record is created
with ValidTo being NULL?

</quote>

That is exactly right :) Your Create* columns show you who made the
change (insert/update/delete's are all just changes to the current
state of your data), and the Valid* columns reflect the time period
for which the row was "current". So for an update, you simply
timestamp the ValidTo column in the old "current" row and insert a new
row with a NULL ValidTo column.

<quote>

So it would sound like the tbl_EventLog would be classified as a
"Log"
table. I don't have any history tables with data of what happened at
a
certain point in time.

</quote>

It does sound like EventLog is a true "log", but I would probably keep
full history on the Folder table.

<quote>

>> I guess this is a document management system but it is for physical folders not files on a computer. <<

PAPER!? They still make that stuff? :)

</quote>

Hehe - Celko has a sense of humor?!? ;)

Good luck!
J

orandov

unread,

Dec 14, 2007, 1:15:16 PM12/14/07

to

You have both mentioned that is a good idea to use a view to hold the
current data.
I assume that in this case the view would Join the Folder with the
LastEvent that occured to the Folder.

I am using SQL Server 2000 sp3.

I am new to views. I have been reading up on them and trying to
determine whether the data in the view gets refreshed when the
underlying tables get new records or do you have to execute the view
everytime you refresh it?

If the latter is true then what advantage does the view have over
exceuting a stored procedure?
The view would have to be refreshed every time I need to know the
status of a folder b/c new events are inserted all the time.

Thanks,
Oran

Message has been deleted

orandov

unread,

Dec 14, 2007, 4:21:00 PM12/14/07

to

I don't know if the previous post worked, but I figured out from
testing the view that the data is refreshed.

Oran

--CELKO--

unread,

Dec 14, 2007, 11:15:45 PM12/14/07

to

>> I have been reading up on them and trying to determine whether the data in the view gets refreshed when the underlying tables get new records [sic: rows are not records!] or do you have to execute the view every time you refresh it? <<

A VIEW is a virtual table. Whenever you invoke its name, the
optimizer decides if it wants to make it into a materialized table in
physical storage, or if it wants to expand it in the query like an in-
line macro. Either way, its data is always current when you use it.
It is refreshed only if it has to be.