log table changes with a trigger...

John Hopfield

unread,

Jul 6, 2007, 11:26:07 AM7/6/07

to

Hi to all,

Can you make a (simple) example about logging table changes into a
"log" table?
It is possible using triggers?

thank you
JH

Robo

unread,

Jul 6, 2007, 12:39:36 PM7/6/07

to

Hi,

There is actually a number of ways to accomplish this.

Triggers are a very valid way. Using that appraoch here is a possible
design:

For the sake of the example, let's assume you had a table called
dbadmin.person and that table had the following columns with the assumption
that a process couldn't change the primary key of the table (in this
example, the PK is person_id):

dbadmin.person (person_id integer not null, person_ssn char(9) not null with
default, person_last_name char(50) not null w/ default, dob date nullable);

Build a 'log' table to keep track of before and after data.

new 'log table' dbadmin.person_audit (person_id integer not null,
b_person_ssn char(9) nullable, a_person_ssn char(9) nullable,
b_person_last_name
char(50) nullable, a_person_last_name char(50) nullable,
b_dob date
nullable, a_dob date nullable);

Create a trigger that performs following steps(see SQL guide or application
guide for code examples)

- CREATE TRIGGER DBADMIN.LOGDATA
AFTER update
REFERENCING OLD AS OLD_ROW, NEW AS NEW_ROW
FOR EACH ROW MODE DB2SQL
INSERT INTO DBADMIN.PERSON_LOG
(person_id, b_person_ssn,
a_person_ssn,b_person_last_name, a_person_last_name, b_dob, a_dob)
VALUES(OLD_ROW.person_id, OLD_ROW.b_person_ssn,
NEW_ROW.a_person_ssn,
OLD_ROW.b_person_last_name,
NEW_ROW.a_person_last_name,
OLD_ROW.b_dob, NEW_ROW.a_dob);

- Roland
"John Hopfield" <Hopf...@freemail.it> wrote in message
news:1183735567....@c77g2000hse.googlegroups.com...

Dave Hughes

unread,

Jul 6, 2007, 10:42:40 PM7/6/07

to

John Hopfield wrote:

Logging table changes can indeed be implemented using triggers.
However, depending on what you want to do with the logged history, you
need to be rather careful about how you design the history table and
the triggers that update it.

I've been meaning to write a tutorial on this (partly for my own
reference) for a while. Now's as good a time as any I guess :-)

STARTING SIMPLE
=================

I'll start with a simple example, and demonstrate how to extend it to
overcome various problems. Firstly, the base table for which we wish to
log changes - a simple table which records details about employees of a
company (name, pay grade (where 1 is the highest), and department,
keyed by a generated integer value):

CREATE TABLE EMPLOYEES (
ID INTEGER GENERATED BY DEFAULT AS IDENTITY,
PAYGRADE INTEGER NOT NULL,
DEPT CHAR(4) NOT NULL,
NAME VARCHAR(64) NOT NULL,
CONSTRAINT PAY_CK CHECK (PAYGRADE BETWEEN 1 AND 10)
);

Now for the history table. Initially, this will have the same
structure, with an additional column to record when the change occurred:

CREATE TABLE EMP_HISTORY (
CHANGED TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP,
ID INTEGER NOT NULL,
PAYGRADE INTEGER NOT NULL,
DEPT CHAR(4) NOT NULL,
NAME VARCHAR(64) NOT NULL,
CONSTRAINT PK PRIMARY KEY (ID, CHANGED),
CONSTRAINT PAY_CK CHECK (PAYGRADE BETWEEN 1 AND 10)
);

Note that a compound key is needed and that the KEY column is not
generated. Now for a couple of triggers to record changes to the base
table in the history table:

CREATE TRIGGER INSERTED_EMP
AFTER INSERT ON EMPLOYEES
REFERENCING NEW AS N
FOR EACH ROW
INSERT INTO EMP_HISTORY
(ID, PAYGRADE, DEPT, NAME) VALUES
(N.ID, N.PAYGRADE, N.DEPT, N.NAME);

CREATE TRIGGER UPDATED_EMP
AFTER UPDATE ON EMPLOYEES
REFERENCING NEW AS N
FOR EACH ROW
INSERT INTO EMP_HISTORY
(ID, PAYGRADE, DEPT, NAME) VALUES
(N.ID, N.PAYGRADE, N.DEPT, N.NAME);

Now some sample data:

INSERT INTO EMPLOYEES (PAYGRADE, DEPT, NAME) VALUES
(1, '0000', 'Tom'),
(4, '0010', 'Dick'),
(5, '0010', 'Harry');

SELECT * FROM EMP_HISTORY ORDER BY ID, CHANGED;

CHANGED ID PAYGRADE DEPT NAME
------------------- -- -------- ---- -----
2007-07-06-21.50.00 1 1 0000 Tom
2007-07-06-21.50.00 2 4 0010 Dick
2007-07-06-21.50.00 3 5 0010 Harry

Now, Harry gets a promotion to the same level as Dick, and Dick (no
doubt fed up with Harry's puerile comments about his name) changes his
name to Richard:

UPDATE EMPLOYEES SET PAYGRADE = 4 WHERE ID = 3;
UPDATE EMPLOYEES SET NAME = 'Richard' WHERE ID = 2;
SELECT * FROM EMP_HISTORY ORDER BY ID, CHANGED;

CHANGED ID PAYGRADE DEPT NAME
------------------- -- -------- ---- -------
2007-07-06-21.50.00 1 1 0000 Tom
2007-07-06-21.50.00 2 4 0010 Dick
2007-07-06-22.03.40 2 4 0010 Richard
2007-07-06-21.50.00 3 5 0010 Harry
2007-07-06-22.03.40 3 4 0010 Harry

So, we can see that the triggers are working and that we can track the
history of the employees in the company. We can also write relatively
simple queries to answer several questions about the data:

"When did Harry get promoted to his current pay grade?"

SELECT MIN(CHANGED)
FROM EMP_HISTORY H
WHERE ID = 3
AND PAYGRADE = (SELECT PAYGRADE FROM EMPLOYEES E WHERE E.ID = H.ID);

However, other questions are considerably more difficult to answer:

"What was Harry's pay grade before his promotion?"

SELECT PAYGRADE
FROM EMP_HISTORY H1
WHERE ID = 3
AND CHANGED = (
SELECT MAX(CHANGED)
FROM EMP_HISTORY H2
WHERE H1.ID = H2.ID
AND H2.PAYGRADE <> (SELECT PAYGRADE FROM EMPLOYEES E WHERE E.ID =
H2.ID)
);

And some questions are impossible to answer:

"What was Harry's pay grade at time X?"

Currently, we could answer this question with the following query.
However, this will NOT work in the general case...

WITH H AS (
SELECT CHANGED, PAYGRADE
FROM EMP_HISTORY
WHERE ID = 3
AND CHANGED < X
)
SELECT PAYGRADE
FROM H
WHERE CHANGED = (SELECT MAX(CHANGED) FROM H);

How does this fail in the general case? There's an operation we haven't
tried yet. What happens if Harry gets fired (possibly for being too
expensive after his promotion :-)...

DELETE FROM EMPLOYEES WHERE ID = 3;

Obviously, all the records in the history table remain (we don't want
to throw away an employee's history of employment when they leave -
incidentally, this was the reason for not defining a foreign key from
the EMP_HISTORY table to the EMPLOYEES table). However, the history
table itself does not indicate that Harry has gone, nor when he left.
Combining the history table with the base table we could tell that
Harry was once employed but has now left, but we still couldn't answer
the question "When did Harry leave?"

Worse still, what happens if Harry is re-hired, albeit at a lower pay
grade? (incidentally, this is why the ID column in EMPLOYEES was
defined BY DEFAULT AS IDENTITY instead of ALWAYS AS IDENTITY)

INSERT INTO EMPLOYEES (ID, PAYGRADE, DEPT, NAME) VALUES
(3, 5, '0010', 'Harry');

Now what does the history table tell us about Harry?

SELECT * FROM EMP_HISTORY WHERE ID = 3 ORDER BY CHANGED;

CHANGED ID PAYGRADE DEPT NAME
------------------- -- -------- ---- -----
2007-07-06-21.50.00 3 5 0010 Harry
2007-07-06-22.03.40 3 4 0010 Harry
2007-07-06-22.04.48 3 5 0010 Harry

For all we know, Harry got demoted rather fired and rehired. The
representation is exactly the same in each case. In other words, the
fact that Harry left has been "lost".

This is why we can't answer the question "What was Harry's pay grade at
time X?" in the general case. If we query the time during which Harry
was NOT employed by the company it will return Harry's pay grade prior
to his leaving, when it should return nothing (as he was not employed).

We can generalise this further. It is impossible with this structure to
accurately answer the question: "what was the content of the base table
at time X?". It seems to me that this is the whole point of having a
history table - to be able to see the state of the base table at any
given point in time. So, some enhancement is obviously required.

FIXING THE PROBLEMS
=====================

There are several ways of improving the structure. One could add a flag
field indicating whether the recorded change was an insertion, update,
or deletion. However, this doesn't really simplify the queries above.
The most elegant solution I've found in practice is to have *two*
fields to track the change timestamp: an effective and expiry timestamp
for each record. For the "current" record in the history table, the
expiry timestamp is set to some arbitrary future time (for convenience;
one could use NULL here instead). The new structure is defined as
follows (@ is used as the statement terminator as we need to make use
of a compound dynamic statement in one of the triggers):

CREATE TABLE EMPLOYEES (
ID INTEGER GENERATED BY DEFAULT AS IDENTITY,
PAYGRADE INTEGER NOT NULL,
DEPT CHAR(4) NOT NULL,
NAME VARCHAR(64) NOT NULL,
CONSTRAINT PAY_CK CHECK (PAYGRADE BETWEEN 1 AND 10)
)@

CREATE TABLE EMP_HISTORY (
EFFECTIVE TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP,
EXPIRY TIMESTAMP NOT NULL DEFAULT '9999-12-31 23:59:59.999999',
ID INTEGER NOT NULL,
PAYGRADE INTEGER NOT NULL,
DEPT CHAR(4) NOT NULL,
NAME VARCHAR(64) NOT NULL,
CONSTRAINT PK PRIMARY KEY (ID, EFFECTIVE),
CONSTRAINT PAY_CK CHECK (PAYGRADE BETWEEN 1 AND 10)
)@

CREATE TRIGGER INSERTED_EMP
AFTER INSERT ON EMPLOYEES
REFERENCING NEW AS N
FOR EACH ROW
INSERT INTO EMP_HISTORY
(ID, PAYGRADE, DEPT, NAME) VALUES
(N.ID, N.PAYGRADE, N.DEPT, N.NAME)@

CREATE TRIGGER UPDATED_EMP
AFTER UPDATE ON EMPLOYEES
REFERENCING OLD AS O NEW AS N
FOR EACH ROW
BEGIN ATOMIC
DECLARE T TIMESTAMP DEFAULT CURRENT TIMESTAMP;
UPDATE EMP_HISTORY SET
EXPIRY = T - 1 MICROSECOND
WHERE ID = O.ID
AND EXPIRY = '9999-12-31 23:59:59.999999';
INSERT INTO EMP_HISTORY
(EFFECTIVE, ID, PAYGRADE, DEPT, NAME) VALUES
(T, N.ID, N.PAYGRADE, N.DEPT, N.NAME);
END@

CREATE TRIGGER DELETED_EMP
AFTER DELETE ON EMPLOYEES
REFERENCING OLD AS O
FOR EACH ROW
UPDATE EMP_HISTORY SET
EXPIRY = CURRENT TIMESTAMP - 1 MICROSECOND
WHERE ID = O.ID
AND EXPIRY = '9999-12-31 23:59:59.999999'@

Now to try out the new structure (if trying this out for real, pause
for a minute or two between each statement to get a nicely dispersed
history, which will help with the next bit):

INSERT INTO EMPLOYEES (PAYGRADE, DEPT, NAME) VALUES
(1, '0000', 'Tom'),
(4, '0010', 'Dick'),
(5, '0010', 'Harry');
UPDATE EMPLOYEES SET PAYGRADE = 4 WHERE ID = 3;
UPDATE EMPLOYEES SET NAME = 'Richard' WHERE ID = 2;
DELETE FROM EMPLOYEES WHERE ID = 3;
INSERT INTO EMPLOYEES (ID, PAYGRADE, DEPT, NAME) VALUES
(3, 5, '0010', 'Harry');

SELECT * FROM EMP_HISTORY ORDER BY ID, EFFECTIVE;

EFFECTIVE EXPIRY ID PAYGRADE DEPT NAME
------------------- ------------------- -- -------- ---- -------
2007-07-06-23.50.58 9999-12-31-23.59.59 1 1 0000 Tom
2007-07-06-23.50.58 2007-07-06-23.51.02 2 4 0010 Dick
2007-07-06-23.51.02 9999-12-31-23.59.59 2 4 0010 Richard
2007-07-06-23.50.58 2007-07-06-23.51.02 3 5 0010 Harry
2007-07-06-23.51.02 2007-07-06-23.52.38 3 4 0010 Harry
2007-07-06-23.55.14 9999-12-31-23.59.59 3 5 0010 Harry

Now, using this structure we can easily find out exactly what the base
table looked like at any given point in time, X:

SELECT
ID, PAYGRADE, DEPT, NAME
FROM EMP_HISTORY
WHERE X BETWEEN EFFECTIVE AND EXPIRY;

We can also do some rather interesting things with recursive queries.
For example, if we wished to "explode" the history table to see how the
base table looked over several days (okay, minutes in this example) for
a specific user:

WITH
RANGE(I, D) AS (
(VALUES (1, TIMESTAMP('2007-07-06 23:50:00')))
UNION ALL
(
SELECT I + 1, D + 1 MINUTE
FROM RANGE
WHERE I <= 1000 AND D < TIMESTAMP('2007-07-07 00:00:00')
)
)
SELECT R.D, H.ID, H.PAYGRADE, H.DEPT, H.NAME
FROM RANGE R INNER JOIN EMP_HISTORY H
ON R.D BETWEEN H.EFFECTIVE AND H.EXPIRY
WHERE H.ID = 3
ORDER BY R.D;

D ID PAYGRADE DEPT NAME
------------------- -- -------- ---- -----
2007-07-06-23.51.00 3 5 0010 Harry
2007-07-06-23.52.00 3 4 0010 Harry
2007-07-06-23.56.00 3 5 0010 Harry
2007-07-06-23.57.00 3 5 0010 Harry
2007-07-06-23.58.00 3 5 0010 Harry
2007-07-06-23.59.00 3 5 0010 Harry
2007-07-07-00.00.00 3 5 0010 Harry

The I value in the recursive common table expression is simply there to
prevent a warning about possible infinite recursion. Note that there
are several missing timestamps in the result where we removed Harry
from the table.

Such a query can be adapted to numerous purposes. For example: to plot
on a graph the population of the company as it changes over time.
Combining the example above with the OLAP aggregate functions leads to
even more interesting possibilities - in particular, the ability to
combine two separate and unsynchronized history tables into a single
coherent history (although the query to do this is quite horribly
complex, so I won't go into it here).

DATES INSTEAD OF TIMESTAMPS
=============================

There are still a couple of refinements we can make to this structure.
Firstly: we probably don't want to track changes down to the nearest
microsecond. In the case of data such as our example (employee
attributes), tracking changes to the nearest day is likely sufficient.

In this case, one would need to replace the TIMESTAMP columns with DATE
columns. However, we would also need two different UPDATE and DELETE
triggers. For the UPDATE triggers, one is needed for the case where an
update is performed and the last update is at least 1 day old (UPDATE &
INSERT into the history as above), and another where an update is
performed on the same day that another update is made (just UPDATE the
history in this case, losing the older change). The second DELETE
trigger exists for the same purpose: in case a row is deleted which was
last updated on the same day (to handle this, just DELETE the last row
in the history table). The WHEN clause is used in the CREATE TRIGGER
statements to determine which UPDATE or DELETE trigger fires:

CREATE TABLE EMPLOYEES (
ID INTEGER GENERATED BY DEFAULT AS IDENTITY,
PAYGRADE INTEGER NOT NULL,
DEPT CHAR(4) NOT NULL,
NAME VARCHAR(64) NOT NULL,
CONSTRAINT PAY_CK CHECK (PAYGRADE BETWEEN 1 AND 10)
)@

CREATE TABLE EMP_HISTORY (
EFFECTIVE DATE NOT NULL DEFAULT CURRENT DATE,
EXPIRY DATE NOT NULL DEFAULT '9999-12-31',
ID INTEGER NOT NULL,
PAYGRADE INTEGER NOT NULL,
DEPT CHAR(4) NOT NULL,
NAME VARCHAR(64) NOT NULL,
CONSTRAINT PK PRIMARY KEY (ID, EFFECTIVE),
CONSTRAINT PAY_CK CHECK (PAYGRADE BETWEEN 1 AND 10)
)@

CREATE UNIQUE INDEX EMP_HISTORY_IX
ON EMP_HISTORY (ID, EXPIRY)
INCLUDE (EFFECTIVE)
ALLOW REVERSE SCANS@

CREATE TRIGGER INSERTED_EMP
AFTER INSERT ON EMPLOYEES
REFERENCING NEW AS N
FOR EACH ROW
INSERT INTO EMP_HISTORY
(ID, PAYGRADE, DEPT, NAME) VALUES
(N.ID, N.PAYGRADE, N.DEPT, N.NAME)@

CREATE TRIGGER UPDATED_EMP1
AFTER UPDATE ON EMPLOYEES
REFERENCING OLD AS O NEW AS N
FOR EACH ROW
WHEN (
CURRENT DATE > (
SELECT EFFECTIVE
FROM EMP_HISTORY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'
)
)
BEGIN ATOMIC
DECLARE D DATE DEFAULT CURRENT DATE;
UPDATE EMP_HISTORY SET
EXPIRY = D - 1 DAY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31';
INSERT INTO EMP_HISTORY
(EFFECTIVE, ID, PAYGRADE, DEPT, NAME) VALUES
(D, N.ID, N.PAYGRADE, N.DEPT, N.NAME);
END@

CREATE TRIGGER UPDATED_EMP2
AFTER UPDATE ON EMPLOYEES
REFERENCING OLD AS O NEW AS N
FOR EACH ROW
WHEN (
CURRENT DATE <= (
SELECT EFFECTIVE
FROM EMP_HISTORY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'
)
)
UPDATE EMP_HISTORY SET
ID = N.ID,
PAYGRADE = N.PAYGRADE,
DEPT = N.DEPT,
NAME = N.NAME
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'@

CREATE TRIGGER DELETED_EMP1
AFTER DELETE ON EMPLOYEES
REFERENCING OLD AS O
FOR EACH ROW
WHEN (
CURRENT DATE > (
SELECT EFFECTIVE
FROM EMP_HISTORY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'
)
)
UPDATE EMP_HISTORY SET EXPIRY = CURRENT DATE - 1 DAY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'@

CREATE TRIGGER DELETED_EMP2
AFTER DELETE ON EMPLOYEES
REFERENCING OLD AS O
FOR EACH ROW
WHEN (
CURRENT DATE <= (
SELECT EFFECTIVE
FROM EMP_HISTORY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'
)
)
DELETE FROM EMP_HISTORY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'@

Note the addition of an index on the ID and EXPIRY columns of the
EMP_HISTORY table. This is to improve the performance of the triggers'
WHEN clauses and bodies.

Another refinement we can make is to ensure the UPDATE triggers only
fire if something has *actually* changed (at the moment, an UPDATE that
changes nothing would still cause additional rows to be added to the
history table). For this, the WHEN clauses of the UPDATE trigger
definitions simply need expanding:

CREATE TRIGGER UPDATED_EMP1
AFTER UPDATE ON EMPLOYEES
REFERENCING OLD AS O NEW AS N
FOR EACH ROW
WHEN
(
(
O.ID <> N.ID OR
O.PAYGRADE <> N.PAYGRADE OR
O.DEPT <> N.DEPT OR
O.NAME <> N.NAME
)
AND
(
CURRENT DATE > (
SELECT EFFECTIVE
FROM EMP_HISTORY
WHERE ID = O.ID
AND EXPIRY = '9999-12-31'
)
)
)
...

Other possible refinements include: adding a column to the history
table to track the user that performed the change on the base table,
adding a view for the exploded view of the history demonstrated above
(in practice it's more useful to define a table function for this to
allow users to specify the start and end dates and increment step for
the exploded result).

DELTA VIEWS
=============

However, the final refinement I'll demonstrate is how to define a view
which can convert this structure into one that tracks "before" and
"after" states. This is using the TIMESTAMP structures above, but is
easily modified to fit the alternative DATE structure:

CREATE VIEW EMP_CHANGES AS
WITH
OLD_EMP_HISTORY AS (
SELECT *
FROM EMP_HISTORY
WHERE EXPIRY < '9999-12-31 23:59:59.999999'
)
SELECT
COALESCE(OLD.EXPIRY, NEW.EFFECTIVE) AS CHANGED,
OLD.ID AS OLD_ID,
NEW.ID AS NEW_ID,
OLD.PAYGRADE AS OLD_PAYGRADE,
NEW.PAYGRADE AS NEW_PAYGRADE,
OLD.DEPT AS OLD_DEPT,
NEW.DEPT AS NEW_DEPT,
VARCHAR(OLD.NAME,8) AS OLD_NAME,
VARCHAR(NEW.NAME,8) AS NEW_NAME
FROM
OLD_EMP_HISTORY OLD FULL OUTER JOIN EMP_HISTORY NEW
ON OLD.ID = NEW.ID
AND NEW.EFFECTIVE - 1 MICROSECOND BETWEEN OLD.EFFECTIVE AND
OLD.EXPIRY@

Now to test the new view (I've chopped the DEPT columns out below as
the result was too wide):

SELECT * FROM EMP_CHANGES WHERE COALESCE(OLD_ID, NEW_ID) = 3 ORDER BY
CHANGED;

CHANGED OLD_ID NEW_ID OLD_PAY NEW_PAY OLD_NAME NEW_NAME
------------------- ------ ------ ------- ------- -------- --------
2007-07-06-23.50.58 - 3 - 5 - Harry
2007-07-06-23.51.02 3 3 5 4 Harry Harry
2007-07-06-23.52.38 3 - 4 - Harry -
2007-07-06-23.55.14 - 3 - 5 - Harry

As you can see, the three history rows in the EMP_HISTORY table have
expanded into four rows in the EMP_CHANGED view, explicitly reflecting
the deletion and reinsertion of Harry in the base table (the deletion
and reinsertion is merely implied by a gap between an expiry time and
an effective time in the history table itself).

Also note the slightly unusual WHERE clause in the query against the
view. In order to ensure we get all records pertaining to Harry, we
need to COALESCE() the OLD_ID and NEW_ID columns.

Okay, I think that about wraps up how to track the history of a table
using triggers - if you have any questions about anything above, let me
know.

HTH,

Dave.

--