Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Checking data consistency in the context of from-to-dimensions

3 views
Skip to first unread message

Hans Mayr

unread,
Jun 28, 2009, 11:51:38 AM6/28/09
to
Hello,

I was analyzing the problem of logic in connection with ranges like a
price for a certain article which is valid from a certain date till a
certain date. Especially in cases where there is not only one from-to-
dimension but several I found some results quite interesting to me. I
would like to summarize them here to share my results and to see if I
get any feedback from experts, any easier solution than the ones I
found.

The following examples are based on MySQL 5.

A) The simple case
If there is only one from-to-dimension like for the price of an
article you do not have to give any Valid_To dates at all.

CREATE TABLE `pricelist`
(
`Article_ID` INT(11) NOT NULL DEFAULT
'0' ,
`Price` DOUBLE NOT NULL DEFAULT
'0' ,
`Valid_From` DATE NOT NULL DEFAULT
'0000-00-00',
PRIMARY KEY (`Article_ID`,`Valid_From`)
)
TYPE=MyISAM;


INSERT INTO `pricelist` VALUES (123, 15, '2007-10-01');
INSERT INTO `pricelist` VALUES (123, 16, '2008-01-01');
INSERT INTO `pricelist` VALUES (123, 17, '2008-07-01');


To find the valid price for an article at a specific date you will
have to use some logic:

SELECT Price
FROM pricelist
WHERE Article_ID = 123
AND Valid_From <= '2008-02-01'
ORDER BY Valid_From DESC LIMIT 0,1;

If you add a Valid_To column this will be much easier:

CREATE TABLE `pricelist2`
(
`Article_ID` INT(11) NOT NULL DEFAULT
'0' ,
`Price` DOUBLE NOT NULL DEFAULT
'0' ,
`Valid_From` DATE NOT NULL DEFAULT
'0000-00-00',
`Valid_To` DATE NOT NULL DEFAULT
'0000-00-00',
PRIMARY KEY (`Article_ID`,`Valid_From`)
)
TYPE=MyISAM;

INSERT INTO `pricelist2` VALUES (123, 15, '2007-10-01', '2007-12-31');
INSERT INTO `pricelist2` VALUES (123, 16, '2008-01-01', '2008-06-31');
INSERT INTO `pricelist2` VALUES (123, 17, '2008-07-01', '9999-12-31');

SELECT Price
FROM pricelist2
WHERE '2008-02-01' BETWEEN Valid_From AND Valid_To;

But the disadvantage of this concept is, that you will have to make
sure that the valid ranges do not overlap and, if you want to have a
price for all times, that there is no date which is not within at
least one Valid_From Valid_To period. As validation of this is not
different to the more complex case below, I will not treat it here.

B) Multidimensional Rule Sets

Imagine that a sales agent gets a bonus payment depending on two or
more rules. As more than two rules are not more complex than two rules
I will assume two rules e.g. volume and customer satisfaction.

CREATE TABLE `bonusrules`
(
`Agent_ID` INT(11) NOT NULL DEFAULT
'0' ,
`Bonus` DOUBLE NOT NULL DEFAULT
'0' ,
`Min_Volume` DOUBLE NOT NULL DEFAULT
'0' ,
`Min_Satisfaction` DOUBLE NOT NULL DEFAULT
'0',
PRIMARY KEY
(`Agent_ID`,`Min_Volume`,`Min_Satisfaction`)
)
TYPE=MyISAM;

INSERT INTO `bonusrules` VALUES (123, 1000, 10000, 0.50);
INSERT INTO `bonusrules` VALUES (123, 2000, 20000, 0.60);
INSERT INTO `bonusrules` VALUES (123, 3000, 15000, 0.75);

Which bonus has to be paid for a volume of 30000 and a satisfaction
rate of 0.80? This is not obvious, technically speaking, as the 2000
bonus payment is more demanding in terms of volume and the 3000 bonus
payment is more difficult to reach in terms of customer satisfaction.
Therefore we need a new logic to select the bonus like

SELECT MAX(Bonus)
FROM bonusrules
WHERE Agent_ID = 123
AND Min_Volume <= 30000
AND Min_Satisfaction <= 0.80;


In other words there is no need for upper bounds Max_Volume and
Max_Satisfaction if and only if we have additional logic like the
maximum principle aplied here. You should however check that your
rules are consistent, that more challenging conditions always lead to
a higher bonus:

SELECT a_rules.Agent_ID ,
a_rules.Min_Volume ,
a_rules.Min_Satisfaction,
a_rules.Bonus ,
b_rules.Min_Volume ,
b_rules.Min_Satisfaction,
b_rules.Bonus
FROM bonusrules a_rules,
bonusrules b_rules
WHERE a_rules.Agent_ID = b_rules.Agent_ID
/*Rules are not identical*/
AND NOT (
a_rules.Min_Volume = b_rules.Min_Volume
AND a_rules.Min_Satisfaction = b_rules.Min_Satisfaction
)
/*A is easier to reach as B*/
AND a_rules.Min_Volume <= b_rules.Min_Volume
AND a_rules.Min_Satisfaction <= b_rules.Min_Satisfaction
/*Bonus A is higher than Bonus B*/
AND a_rules.Bonus >= b_rules.Bonus;

So let's look at an example where such an maximum principle can not be
applied. Such examples are not easy to find but they exist. Imagine a
bank lending money. The two parameters determining the interest rate
are credit duration and credit amount. Neither increasing duration nor
increasing amount will necessarily lead to higher interest rates
(compare market interest rates during autumn 2008).

CREATE TABLE `interestrates`
(
`Rate` DOUBLE NOT NULL DEFAULT '0' ,
`Min_Amount` DOUBLE NOT NULL DEFAULT '0' ,
`Max_Amount` DOUBLE NOT NULL DEFAULT '0' ,
`Min_Duration` DOUBLE NOT NULL DEFAULT '0',
`Max_Duration` DOUBLE NOT NULL DEFAULT '0',
PRIMARY KEY (`Min_Amount`,`Min_Duration`)
)
TYPE=MyISAM;

Through a trigger or (if available) a check constraint you have to
make sure that min_amount < max_amount and min_duration <
max_maxduration for all entries.

INSERT INTO `interestrates` VALUES (0.05, 0, 10000, 1, 12);
INSERT INTO `interestrates` VALUES (0.06, 0, 10000, 12, 24);
INSERT INTO `interestrates` VALUES (0.055, 10000, 20000, 1, 24);
INSERT INTO `interestrates` VALUES (0.053, 0, 20000, 24, 36);

For convenience we assume that the lower boundaries are not part of
the interval and that the bank does not lend more than 20000 nor for
longer durations than 36 months.

Now we are facing two problems:

* Are there overlapping (two dimensional) intervals? Is there more
than one price for a given amount / duration?
* Did we forget to define a price for a certain duration and amount?

Finding overlaps is easy if one understands what is going on. Imagine
our amount-duration ranges as rectangles in the plane. Two rectangles
do not overlap if rectangle A is either left, right, up or down from
rectangle B. To check for overlaps run the following query:

SELECT a_rates. * ,
b_rates. *
FROM interestrates a_rates,
interestrates b_rates
WHERE NOT (
/*a right of b*/
a_rates.Min_Amount >= b_rates.Max_Amount
OR
/*a left of b*/
a_rates.Max_Amount <= b_rates.Min_Amount
OR
/*a over b*/
a_rates.Min_Duration >= b_rates.Max_Duration
OR
/*a under b*/
a_rates.Max_Duration <= b_rates.Min_Duration
)
AND
/*a <> b*/
(
a_rates.Min_Amount, a_rates.Min_Duration
)
<> (b_rates.Min_Amount, b_rates.Min_Duration );

Now we have to check if we forgot to define a rate for a certain
amount / duration, which turns out to be much more difficult.

If we already know that there are no overlaps, we can perform a very
easy check:

SELECT SUM((Max_Amount - Min_Amount)*(Max_Duration-Min_Duration))-
(20000-0)*(36-1)
FROM interestrates;

We compare the sum of the area of the rectangles to the total
rectangle of possible amounts and durations. If the result is 0 we are
covered.

However in a complex scenario with many rectangles or even many more
dimensions a result <> 0 will not help us finding the hole in our
pricing rules. So let's look for better solution:

It is obvious that a hole in our definition set must have at least one
corner of our definition rectangles on its boundaries or probably that
each corner of a hole is also the corner of at least one of the
definition rectangles. So it is enough to test, if there is a hole
next to one of the corners of the definition rectangles.

If the corner A1 of rectangle A lies on the boundaries of another
rectangle B it can either lie on a corner B1 of B or on the edge
between the boundaries B1 and B2 of B. If we look at a 3-dimensional
example it could also lie on a 2 dimensional area between B1, B2, B3
and B4 and so on for n-dimensional examples. To check if A1 is not
neighbouring a hole in the definition set we must make sure that all
four (or 2^n in the n-dimensional case) directions around A1 are
covered by definition rectangles. Rectangle A will cover 1 direction.
If A1 = B1, another corner, than rectangle B will cover one more
direction. If A1 lies on the edge between B1 and B2, rectangle B will
cover two directions. Generally speaking if A1 lies on a 2^m
dimensional boundary of B then B will cover m+1 directions.

As in our example we want to cover all durations from 1 to 36 months
and amounts from 0 to 20000 only we have to treat those limits
separately.

Let's do so:

SELECT
test_corners.pk_min_amount

,

test_corners.pk_min_duration

,

test_corners.corner_amount

,

test_corners.corner_duration

,
COUNT(DISTINCT identical_corners.corner_amount,
identical_corners.corner_duration)

AS
nbr_identical_corners ,
COUNT(DISTINCT identical_edges.edge_start_amount,
identical_edges.edge_end_amount,

identical_edges.edge_start_duration,
identical_edges.edge_end_duration) AS nbr_identical_edges ,
COUNT(DISTINCT definition_corners.boundary_amount,
definition_corners.boundary_duration)

AS
nbr_definition_corners,
COUNT(DISTINCT definition_edges.boundary_start_amount,
definition_edges.boundary_end_amount,

definition_edges.boundary_start_duration,
definition_edges.boundary_end_duration) AS nbr_definition_edges
FROM
( SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates
) test_corners
LEFT JOIN
/*Corners*/
( SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates
) identical_corners
ON (
test_corners.pk_min_amount,
test_corners.pk_min_duration
)
<> (identical_corners.pk_min_amount,
identical_corners.pk_min_duration)
AND (
test_corners.corner_amount,
test_corners.corner_duration
)
= (identical_corners.corner_amount,
identical_corners.corner_duration)
LEFT JOIN
/*Edges*/
(
/*"Left"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
min_amount AS edge_start_amount ,
min_duration AS edge_start_duration,
min_amount AS edge_end_amount ,
max_duration AS edge_end_duration
FROM interestrates

UNION

/*"Right"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
max_amount AS edge_start_amount ,
min_duration AS edge_start_duration,
max_amount AS edge_end_amount ,
max_duration AS edge_end_duration
FROM interestrates

UNION

/*"Top"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
min_amount AS edge_start_amount ,
min_duration AS edge_start_duration,
max_amount AS edge_end_amount ,
min_duration AS edge_end_duration
FROM interestrates

UNION

/*"Bottom"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
min_amount AS edge_start_amount ,
max_duration AS edge_start_duration,
max_amount AS edge_end_amount ,
max_duration AS edge_end_duration
FROM interestrates
) identical_edges
ON (
test_corners.pk_min_amount,
test_corners.pk_min_duration
)
<> (identical_edges.pk_min_amount,
identical_edges.pk_min_duration)
AND (
/*Test Corner on "vertical" edge*/
(
identical_edges.edge_start_amount
= identical_edges.edge_start_amount
AND test_corners.corner_amount
= identical_edges.edge_start_amount
AND test_corners.corner_duration
> identical_edges.edge_start_duration
AND test_corners.corner_duration
< identical_edges.edge_end_duration
)
OR
/*Test Corner on "horizontal" edge*/
(

identical_edges.edge_start_duration =
identical_edges.edge_start_duration
AND
test_corners.corner_duration =
identical_edges.edge_start_duration
AND
test_corners.corner_amount >
identical_edges.edge_start_amount
AND
test_corners.corner_amount < identical_edges.edge_end_amount
)
)
LEFT JOIN
/*Definition Corners*/
( SELECT 0 AS boundary_amount,
1 AS boundary_duration
FROM dual

UNION

SELECT 0 AS boundary_amount,
36 AS boundary_duration
FROM dual

UNION

SELECT 20000 AS boundary_amount,
1 AS boundary_duration
FROM dual

UNION

SELECT 20000 AS boundary_amount,
36 AS boundary_duration
FROM dual
) definition_corners
ON test_corners.corner_amount =
definition_corners.boundary_amount
AND test_corners.corner_duration=
definition_corners.boundary_duration
LEFT JOIN
/*Definition Edges*/
(
/*Left*/
SELECT 0 AS boundary_start_amount ,
1 AS boundary_start_duration,
0 AS boundary_end_amount ,
36 AS boundary_end_duration
FROM dual

UNION

/*Right*/
SELECT 20000 AS boundary_start_amount ,
1 AS boundary_start_duration,
20000 AS boundary_end_amount ,
36 AS boundary_end_duration
FROM dual

UNION

/*Top*/
SELECT 0 AS boundary_start_amount ,
36 AS boundary_start_duration,
20000 AS boundary_end_amount ,
36 AS boundary_start_duration
FROM dual

UNION

/*Bottom*/
SELECT 0 AS boundary_start_amount ,
1 AS boundary_start_duration,
20000 AS boundary_end_amount ,
1 AS boundary_end_duration
FROM dual
) definition_edges
ON
/*Test corner on "vertical" edge*/
(
definition_edges.boundary_start_amount =
definition_edges.boundary_end_amount
AND test_corners.corner_amount =
definition_edges.boundary_start_amount
AND test_corners.corner_duration >
definition_edges.boundary_start_duration
AND test_corners.corner_duration <
definition_edges.boundary_end_duration
)
OR
/*Test corner on "horizontal" edge*/
(
definition_edges.boundary_start_duration =
definition_edges.boundary_end_duration
AND test_corners.corner_duration =
definition_edges.boundary_start_duration
AND test_corners.corner_amount >
definition_edges.boundary_start_amount
AND test_corners.corner_amount <
definition_edges.boundary_end_amount
)
GROUP BY test_corners.pk_min_amount ,
test_corners.pk_min_duration,
test_corners.corner_amount ,
test_corners.corner_duration
ORDER BY test_corners.pk_min_amount ,
test_corners.pk_min_duration,
test_corners.corner_amount ,
test_corners.corner_duration;

Almost there! Just another wrapper and adding the final check if there
is a hole in the definitions or not:

SELECT pk_min_amount ,
pk_min_duration ,
nbr_identical_corners ,
nbr_identical_edges ,
nbr_definition_corners,
nbr_definition_edges ,
1 + nbr_identical_corners + 2*nbr_identical_edges
+3*nbr_definition_corners+2*nbr_definition_edges

check_sum
FROM
( SELECT
test_corners.pk_min_amount

,

test_corners.pk_min_duration

,

test_corners.corner_amount

,

test_corners.corner_duration

,
COUNT(DISTINCT identical_corners.corner_amount,
identical_corners.corner_duration)


AS nbr_identical_corners ,
COUNT(DISTINCT identical_edges.edge_start_amount,
identical_edges.edge_end_amount,

identical_edges.edge_start_duration,
identical_edges.edge_end_duration) AS

nbr_identical_edges ,
COUNT(DISTINCT definition_corners.boundary_amount,
definition_corners.boundary_duration)


AS nbr_definition_corners,
COUNT(DISTINCT definition_edges.boundary_start_amount,
definition_edges.boundary_end_amount,

definition_edges.boundary_start_duration,
definition_edges.boundary_end_duration) AS nbr_definition_edges
FROM
( SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates
) test_corners
LEFT JOIN
/*Corners*/
( SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
min_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
min_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates

UNION

SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration,
max_amount AS corner_amount ,
max_duration AS corner_duration
FROM interestrates
) identical_corners
ON (
test_corners.pk_min_amount,
test_corners.pk_min_duration
)
<> (identical_corners.pk_min_amount,
identical_corners.pk_min_duration)
AND (
test_corners.corner_amount,
test_corners.corner_duration
)
= (identical_corners.corner_amount,
identical_corners.corner_duration)
LEFT JOIN
/*Edges*/
(
/*"Left"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
min_amount AS edge_start_amount ,
min_duration AS edge_start_duration,
min_amount AS edge_end_amount ,
max_duration AS edge_end_duration
FROM interestrates

UNION

/*"Right"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
max_amount AS edge_start_amount ,
min_duration AS edge_start_duration,
max_amount AS edge_end_amount ,
max_duration AS edge_end_duration
FROM interestrates

UNION

/*"Top"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
min_amount AS edge_start_amount ,
min_duration AS edge_start_duration,
max_amount AS edge_end_amount ,
min_duration AS edge_end_duration
FROM interestrates

UNION

/*"Bottom"*/
SELECT min_amount AS pk_min_amount ,
min_duration AS pk_min_duration ,
min_amount AS edge_start_amount ,
max_duration AS edge_start_duration,
max_amount AS edge_end_amount ,
max_duration AS edge_end_duration
FROM interestrates
) identical_edges
ON (
test_corners.pk_min_amount,
test_corners.pk_min_duration
)
<> (identical_edges.pk_min_amount,
identical_edges.pk_min_duration)
AND (
/*Test Corner on "vertical" edge*/
(

identical_edges.edge_start_amount = identical_edges.edge_start_amount
AND
test_corners.corner_amount = identical_edges.edge_start_amount
AND
test_corners.corner_duration >

identical_edges.edge_start_duration
AND
test_corners.corner_duration < identical_edges.edge_end_duration
)
OR
/*Test Corner on "horizontal" edge*/
(

identical_edges.edge_start_duration =

identical_edges.edge_start_duration
AND
test_corners.corner_duration =

identical_edges.edge_start_duration
AND
test_corners.corner_amount >

identical_edges.edge_start_amount
AND
test_corners.corner_amount < identical_edges.edge_end_amount
)
)
LEFT JOIN
/*Definition Corners*/
( SELECT 0 AS boundary_amount,
1 AS boundary_duration
FROM dual

UNION

SELECT 0 AS boundary_amount,
36 AS boundary_duration
FROM dual

UNION

SELECT 20000 AS boundary_amount,
1 AS boundary_duration
FROM dual

UNION

SELECT 20000 AS boundary_amount,
36 AS boundary_duration
FROM dual
) definition_corners
ON test_corners.corner_amount =
definition_corners.boundary_amount
AND test_corners.corner_duration=
definition_corners.boundary_duration
LEFT JOIN
/*Definition Edges*/
(
/*Left*/
SELECT 0 AS boundary_start_amount ,
1 AS boundary_start_duration,
0 AS boundary_end_amount ,
36 AS boundary_end_duration
FROM dual

UNION

/*Right*/
SELECT 20000 AS boundary_start_amount ,
1 AS boundary_start_duration,
20000 AS boundary_end_amount ,
36 AS boundary_end_duration
FROM dual

UNION

/*Top*/
SELECT 0 AS boundary_start_amount ,
36 AS boundary_start_duration,
20000 AS boundary_end_amount ,
36 AS boundary_start_duration
FROM dual

UNION

/*Bottom*/
SELECT 0 AS boundary_start_amount ,
1 AS boundary_start_duration,
20000 AS boundary_end_amount ,
1 AS boundary_end_duration
FROM dual
) definition_edges
ON
/*Test corner on "vertical" edge*/
(

definition_edges.boundary_start_amount =
definition_edges.boundary_end_amount
AND
test_corners.corner_amount =
definition_edges.boundary_start_amount
AND
test_corners.corner_duration >

definition_edges.boundary_start_duration
AND
test_corners.corner_duration <
definition_edges.boundary_end_duration
)
OR
/*Test corner on "horizontal" edge*/
(

definition_edges.boundary_start_duration =

definition_edges.boundary_end_duration
AND
test_corners.corner_duration =

definition_edges.boundary_start_duration
AND
test_corners.corner_amount >

definition_edges.boundary_start_amount
AND
test_corners.corner_amount <
definition_edges.boundary_end_amount
)
GROUP BY test_corners.pk_min_amount ,
test_corners.pk_min_duration,
test_corners.corner_amount ,
test_corners.corner_duration
ORDER BY test_corners.pk_min_amount ,
test_corners.pk_min_duration,
test_corners.corner_amount ,
test_corners.corner_duration
) test_result
WHERE 1 + nbr_identical_corners + 2*nbr_identical_edges
+3*nbr_definition_corners+2*nbr_definition_edges <> 4 ;

Done!

My only problem is: This seems pretty complicated for something I
would consider a standard task of checking data consistency. So does
anybody have any comments or know an easier way to perform such a
test?

I am looking forward to your answers.

Best,

Hans

Cimode

unread,
Jun 28, 2009, 1:20:53 PM6/28/09
to

Google up Lorentzos for a good reference on Temporal data...

Bob Badour

unread,
Jun 28, 2009, 3:53:26 PM6/28/09
to
Hans Mayr wrote:
> Hello,
>
> I was analyzing the problem of logic in connection with ranges like a
> price for a certain article which is valid from a certain date till a
> certain date. Especially in cases where there is not only one from-to-
> dimension but several I found some results quite interesting to me. I
> would like to summarize them here to share my results and to see if I
> get any feedback from experts, any easier solution than the ones I
> found.

<snip>

> My only problem is: This seems pretty complicated for something I
> would consider a standard task of checking data consistency. So does
> anybody have any comments or know an easier way to perform such a
> test?
>
> I am looking forward to your answers.
>
> Best,
>
> Hans

I found what you wrote too long to bother reading entirely. The
Lorentzos, Date and Darwen book on Temporal Databases provides a
theoretical framework for dealing with intervals and interval type
generators. Snodgrass has an earlier book length treatment of the topic
in SQL.

Philipp Post

unread,
Jun 30, 2009, 5:17:50 AM6/30/09
to
>Snodgrass has an earlier book length treatment of the topic in SQL.<

This book is available at the site of university of Arizona for
download

http://www.cs.arizona.edu/~rts/tdbbook.pdf

Other books also cover temporal data in some chapters, such as Joe
Celko's SQL for Smarties.

The general patter I know off is having <something>_start_date and
<something>_end_date. NULL in the end_date means current. That makes
it easy to get the current data, for what you could build a view.

Regarding data integrity the following comes to mind (certainly not a
complete list):
- CHECK(start_date < end_date)
- regarding overlaps you could say, that the new start_date or
end_date is not allowed to be between the existing start_date and
end_date pairs applicable.
- just one NULL in the end_date by article to be allowed. COUNT can be
helpfull there. COUNT(end_date) gives you the count of all NOT NULL
entries, COUNT(*) applied to all columns, includes those with NULL in
the end_date
- triggers, views with check option or assertions (most products do
not have that) could be used to enforce your rules.

brgds

Philipp Post

Bob Badour

unread,
Jun 30, 2009, 12:47:47 PM6/30/09
to
Philipp Post wrote:

>>Snodgrass has an earlier book length treatment of the topic in SQL.<
>
> This book is available at the site of university of Arizona for
> download
>
> http://www.cs.arizona.edu/~rts/tdbbook.pdf
>
> Other books also cover temporal data in some chapters, such as Joe
> Celko's SQL for Smarties.

Except that Joe is an idiot. One reads his nonsense at one's own risk.


> The general patter I know off is having <something>_start_date and
> <something>_end_date. NULL in the end_date means current. That makes
> it easy to get the current data, for what you could build a view.

NULL anywhere is a really dumb idea. Why not check out the theoretical
work that has already been done instead of making uninformed observations?


> Regarding data integrity the following comes to mind (certainly not a
> complete list):
> - CHECK(start_date < end_date)
> - regarding overlaps you could say, that the new start_date or
> end_date is not allowed to be between the existing start_date and
> end_date pairs applicable.

But that doesn't even begin to deal with the declarative constraints for
the normal requirements of full coverage, adjacency and non-overlap.

Hans Mayr

unread,
Jun 30, 2009, 4:36:39 PM6/30/09
to
Hello,

Thanks to everybody.

For the recommendation to read Lorentzos, Date and Darwen's book:
Another reader has sent me an email, strongly recommending to read
Date's "Introduction to Database Systems" first. I will look at
Snodgrass’ book, but from the title I assume that it does not cover
exactly the multi-dimensional cases I described.

Philipp: Sorry, but I think the problems I wrote about are more
complex than the one you offer a solution for.

To be honest: It was fun to think about my problem and come up with
the solution I stated in my original message. However I do not have
the time to dig into database theory though I'd love to do so. What I
need now is a solution, best a hint if there is one "standard
solution" for the type of problem I described. So maybe I posted to
the wrong group.

Thanks anyway.

Best,

Hans

Philipp Post

unread,
Jul 1, 2009, 8:39:01 AM7/1/09
to
> NULL anywhere is a really dumb idea. Why not check out the theoretical work that has already been done instead of making uninformed observations? <

Basically I fully agree with your comments on NULL. However due to the
fact that SQL supports CURRENT_DATE / CURRENT_TIMESTAMP in queries but
not stored in a table, which would constantly change, you would most
likely end up using a date long in the future to represent current
valid data. But which one would you pick? 9999-12-31 or 3000-01-01 or
2999-31-12 or whatever else? Further this is no logically correct
representation of "now" resp. "today" either. Therefore I consider it
a valid approach to use in this specific case NULL for the current
entries which do not have an end_date specified yet and build a VIEW
on top of it using COALESCE(end_date, CURRENT_DATE) in order to always
have the right values in it.

> But that doesn't even begin to deal with the declarative constraints for the normal requirements of full coverage, adjacency and non-overlap. <

Fair enough. Mea culpa.

brgds

Philipp Post

Kevin Kirkpatrick

unread,
Jul 1, 2009, 12:54:01 PM7/1/09
to

Hi Hans - I may not have thought this through enough, but it seems the
problem domain is such that the min/max values for the dimensions will
usually be limitted to a set of discrete values (in your interestrates
example, amount borders would be chosen from $1000 intervals under
$20000 and months would be whole numbers under 36)

With that assumption, constraints can be handled as follows:

Let D1, D2, ... Dn represent discrete sets of allowable boundary
values (in your interestrate example, D1={<d1=0>, <d1=1000>, ...,
<d1=20000>} and D2 = {<d2=1>, <d2=2>, <d2=3>, ...<d2=36>} )

Let Dn_INT be the interval between discrete values in Dn (1000 for
D1_INT in your example)

Let T = {key, min_d1, max_d1, min_d2, max_d2, ... min_dn, max dn}

Then you can check

SELECT D1.d1, D2.d2, ... Dn.dn , COUNT (T.key) AS matches
FROM
D1 JOIN D2 JOIN ... DN
left outer join T
ON (
D1.d1 >= T.min_d1 AND D1.d1+D1_INT <= T.max_d1 AND
D2.d2 >= T.min_d2 AND D2.d2+D2_INT <= T.max_d2 AND
...
Dn.dn >= T.min_dn AND Dn.dn+Dn_INT <= T.max_dn
)
GROUP BY D1.d1, D2.d2, ... Dn.dn
HAVING COUNT (T.key) <> 1

This query will return all overlaps (matches > 1) and all gaps
(matches=0).

Of course, this approach only works if "D1 JOIN D2 JOIN ... DN" isn't
computationally infeasible due to large cardinalities of the
underlying sets.


0 new messages