Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SQL: Working with huge tables of chronological data

9 views
Skip to first unread message

John

unread,
Apr 5, 2007, 10:27:23 AM4/5/07
to
Hi,

I'm trying to figure out an efficient way to search for the non
existence of events in chronological data with SQL. The goal (detailed
below) seems pretty simple but so far it looks like it's tricky to do
with Oracle. Here's my problem:

I'm working with 2 simple but huge tables each recording a different
kind of event associated with a timestamp. For instance:

Table A
(03:50pm, A1)
(03:55pm, A2)
(03:58pm, A3)

Table B
(03:51pm, B1)
(04:00pm, B2)

I'm looking for all the chronological sequences (Ax, Ay) where no B
event is present between Ax and Ay. In this example, the result would
be (A2, A3).

I've been searching actively for an efficient solution for this
problem and I couldn't find any fast enough. Do you have any idea?

Thanks a lot,

John

Ana C. Dent

unread,
Apr 5, 2007, 10:51:28 AM4/5/07
to
"John" <acide.as...@gmail.com> wrote in news:1175783243.167337.195580
@n59g2000hsh.googlegroups.com:

> Hi,
>
> I'm trying to figure out an efficient way to search for the non
> existence of events in chronological data with SQL. The goal (detailed
> below) seems pretty simple but so far it looks like it's tricky to do
> with Oracle. Here's my problem:
>
> I'm working with 2 simple but huge tables each recording a different
> kind of event associated with a timestamp. For instance:
>
> Table A
> (03:50pm, A1)
> (03:55pm, A2)
> (03:58pm, A3)
>
> Table B
> (03:51pm, B1)
> (04:00pm, B2)
>
> I'm looking for all the chronological sequences (Ax, Ay) where no B
> event is present between Ax and Ay. In this example, the result would

I do not understand the logic that would lead the answer above;
since none of the timestamps in Table A match any timestamp in Table B.

> be (A2, A3).
>
> I've been searching actively for an efficient solution for this
> problem and I couldn't find any fast enough. Do you have any idea?

How fast is fast enough?
How do we know what you tried & deemed unacceptable?

DA Morgan

unread,
Apr 5, 2007, 10:59:25 AM4/5/07
to

To me your example and explantion make no sense.

What Ax and Ay? Please try for more clarity with your explanation.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu
(replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

John

unread,
Apr 5, 2007, 11:29:18 AM4/5/07
to
Thanks for your answers, here is a clarification:

- Ax and Ay are any different A events (A1, A2...). But since I'm
looking for chronological sequences, Ay has to happen after Ax.

- If I merge the two tables A and B while respecting the chronology,
this would lead to the following. I've also add an additionnal A4
event here to clarify.

Table A
(03:50pm, A1)
(03:55pm, A2)
(03:58pm, A3)

(03:59pm, A4)

Table B
(03:51pm, B1)
(04:00pm, B2)

Chronology
(03:50pm, A1)
(03:51pm, B1)


(03:55pm, A2)
(03:58pm, A3)

(03:59pm, A4)
(04:00pm, B2)

I'm looking for all the sequences of events A in the chronology with
no B event in the middle. Here the results would be:
(A2, A3) ; (A2, A4) and (A3, A4)

Thanks!

John

> damor...@x.washington.edu


> (replace x with u to respond)

> Puget Sound Oracle Users Groupwww.psoug.org- Hide quoted text -
>
> - Show quoted text -


John

unread,
Apr 5, 2007, 12:02:10 PM4/5/07
to
Thanks for your answer,

Here are the technical details and the query I've been using so far.

TableA is ~100 millions row and contains (timestamp, evtA)
TableB is ~30 millions row and contains (timestamp, evtB)

The following query took ~60h (on a private but quite slow server) to
compute. ~1h is what I'm aiming to.

select TA1_B.evtA, TA2.evtA
from
(
select TA1.evtA, TA1.timestamp timeA1, TB.evtB, min(TB.timestamp)
min_timeB
from tableA TA1 left outer join tableB TB on (TA1.timestamp <
TB.timestamp)
group by TA1.evtA, TA1.timestamp, TB.evtB
) TA1_B,
tableA TA2
where
TA1_B.timeA1 < TA2.timestamp
and (TA2.timestamp < TA1_B.min_timeB or TA1_B.min_timeB is null)
and TA1_B.evtA <> TA2.evtA;

Thanks!

John

On Apr 5, 10:51 am, "Ana C. Dent" <anaced...@hotmail.com> wrote:
> "John" <acide.ascorbi...@gmail.com> wrote in news:1175783243.167337.195580

> How do we know what you tried & deemed unacceptable?- Hide quoted text -

DA Morgan

unread,
Apr 5, 2007, 2:48:08 PM4/5/07
to

Sorry but your answer begets more questions:

Is this the real problem or a simplification?
Is this something that will be run once or repeatedly?
Is it possible for the same time to be in A and B?
Is it possible to have a B before an A beginning the sequence?
Is it possible for there to be multiple Bs between As?

And I am not at all surprised it is taking a lot of time.


--
Daniel A. Morgan
University of Washington

damo...@x.washington.edu

John

unread,
Apr 5, 2007, 3:50:37 PM4/5/07
to
> Is this the real problem or a simplification?
It's a simplification but not that much. The real problem involves
user_ids but this part can be skipped here.

> Is this something that will be run once or repeatedly?

Only once.

> Is it possible for the same time to be in A and B?

No, A and B are completely different data.

> Is it possible to have a B before an A beginning the sequence?
> Is it possible for there to be multiple Bs between As?

Yes everything is possible, A events and B events happen
independently.

Thanks for being interested in my problem!

John

Jonathan Lewis

unread,
Apr 5, 2007, 3:50:44 PM4/5/07
to

"John" <acide.as...@gmail.com> wrote in message
news:1175783243.1...@n59g2000hsh.googlegroups.com...


Interesting problem.

The data size and available resources are likely
to make a big difference when testing solutions
for feasibility.

Here's a possibility, start with:

select 'A' flag, event, timestamp from tableA
union all
select 'B', event, timestamp from tableB

Option a)
Order by timestamp. Open a pl/sql cursor
on the result set and walk the data one row
at a time, reporting rows when the current
and previous rows are 'A' rows.


Option b)
Use the analytic lag(,1) function

select
flag, event, prior_event, timestamp, prior_timestamp
from (
select
flag, lag(flag,1) over (order by timestamp) prior_flag,
event, lag(event,1) over (order by timestamp) prior_event
timestamp, lag(timestamp,1) over (order by timestamp) prior_timestamp
from
(the union all query)
where
flag = prior_flag
and flag = 'A' -- if you just want A's without a B in between.
;

I may have some errors in the analtyic code, but I hope
there's enough there to give you the right idea.

Either option will lead to a massive sort operation on
all your data.

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


Charles Hooper

unread,
Apr 5, 2007, 5:07:36 PM4/5/07
to

I would be inclined to not handle this by not just using SQL. You
could potentially have a nearly full Cartesian join on the first table
to itself, for example:
SELECT
A1.V1,
A2.V1
FROM
T1 A1,
T1 A2
WHERE
A1.V1<A2.V1;

A quick setup:
CREATE TABLE T1 (V1 DATE NOT NULL, V2 VARCHAR2(10));
CREATE TABLE T2 (V1 DATE NOT NULL, V2 VARCHAR2(10));

CREATE INDEX T1_IND1 ON T1(V1);
CREATE INDEX T2_IND1 ON T2(V1);

INSERT INTO
T1
SELECT
TRUNC(TRUNC(SYSDATE) + (ROWNUM*2.5/24/60),'MI'),
TO_CHAR(ROWNUM)
FROM
DUAL
CONNECT BY
LEVEL<=1000;

COMMIT;

INSERT INTO
T2
SELECT
TRUNC(TRUNC(SYSDATE) + (ROWNUM*9.415/24/60),'MI'),
TO_CHAR(ROWNUM)
FROM
DUAL
CONNECT BY
LEVEL<=300;

COMMIT;

We now have two tables, T1 and T2, that correspond to your table A and
B, respectively. If we perform a full outer join between these two
table, we obtain all time values in the two tables with no duplicates
(9i+ syntax) (note that TO_CHAR is used to limit the width of the
columns for display purposes):
SELECT
TO_CHAR(T1.V1,'HH24:MI') T1_V1,
T1.V2 T1_V2,
TO_CHAR(T2.V1,'HH24:MI') T2_V1,
T2.V2 T2_V2
FROM
T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY
NVL(T1.V1,T2.V1);

T1_V1 T1_V2 T2_V1 T2_V2
===== ========== ===== ==========
00:02 1
00:05 2
00:07 3
00:09 1
00:10 4
00:12 5
00:15 6
00:17 7
00:18 2
00:20 8
00:22 9
00:25 10
00:27 11
00:28 3
00:30 12
00:32 13
00:35 14
00:37 15 00:37 4
00:40 16
00:42 17
00:45 18
00:47 19 00:47 5
00:50 20
00:52 21
00:55 22
00:56 6
00:57 23
01:00 24
01:02 25
01:05 26 01:05 7
01:07 27
01:10 28
01:12 29
01:15 30 01:15 8
01:17 31
01:20 32
01:22 33
01:24 9
01:25 34
01:27 35
01:30 36
01:32 37
01:34 10

Extending the above to give more detail:
SELECT
TO_CHAR(NVL(T1.V1,T2.V1),'HH24:MI') TIME_DATE,
DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,
TO_CHAR(T1.V1,'HH24:MI') T1_V1,
T1.V2 T1_V2,
TO_CHAR(T2.V1,'HH24:MI') T2_V1,
T2.V2 T2_V2
FROM
T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY
NVL(T1.V1,T2.V1);
TIME_DATE TIME_SLOT T1_V1 T1_V2 T2_V1 T2_V2
========= ========= ===== ========== ===== ==========
00:02 A 00:02 1
00:05 A 00:05 2
00:07 A 00:07 3
00:09 B 00:09 1
00:10 A 00:10 4
00:12 A 00:12 5
00:15 A 00:15 6
00:17 A 00:17 7
00:18 B 00:18 2
00:20 A 00:20 8
00:22 A 00:22 9
00:25 A 00:25 10
00:27 A 00:27 11
00:28 B 00:28 3
00:30 A 00:30 12
00:32 A 00:32 13
00:35 A 00:35 14
00:37 AB 00:37 15 00:37 4
00:40 A 00:40 16
00:42 A 00:42 17
00:45 A 00:45 18
00:47 AB 00:47 19 00:47 5
00:50 A 00:50 20
00:52 A 00:52 21
00:55 A 00:55 22
00:56 B 00:56 6

What can we do with the above to avoid the Cartesian join as much as
possible? We can use LEAD to peek at the next set of values:
SELECT
TO_CHAR(TIME_DATE,'HH24:MI') TIME_DATE,
TO_CHAR(LEAD(TIME_DATE,1) OVER (ORDER BY TIME_DATE),'HH24:MI')
NEXT_TIME_DATE,
TIME_SLOT,
LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,
T1_V1,
T1_V2
FROM
(SELECT
NVL(T1.V1,T2.V1) TIME_DATE,
DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,
TO_CHAR(T1.V1,'HH24:MI') T1_V1,
T1.V2 T1_V2,
TO_CHAR(T2.V1,'HH24:MI') T2_V1,
T2.V2 T2_V2
FROM
T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY
NVL(T1.V1,T2.V1));

TIME_DATE NEXT_TIME_DATE TIME_SLOT NEXT_TIME_SLOT T1_V1 T1_V2
========= ============== ========= ============== ===== ==========
00:02 00:05 A A 00:02 1
00:05 00:07 A A 00:05 2
00:07 00:09 A B 00:07 3
00:09 00:10 B A
00:10 00:12 A A 00:10 4
00:12 00:15 A A 00:12 5
00:15 00:17 A A 00:15 6
00:17 00:18 A B 00:17 7
00:18 00:20 B A
00:20 00:22 A A 00:20 8
00:22 00:25 A A 00:22 9
00:25 00:27 A A 00:25 10
00:27 00:28 A B 00:27 11
00:28 00:30 B A
00:30 00:32 A A 00:30 12
00:32 00:35 A A 00:32 13
00:35 00:37 A AB 00:35 14
00:37 00:40 AB A 00:37 15
00:40 00:42 A A 00:40 16
00:42 00:45 A A 00:42 17
00:45 00:47 A AB 00:45 18
00:47 00:50 AB A 00:47 19
00:50 00:52 A A 00:50 20
00:52 00:55 A A 00:52 21
00:55 00:56 A B 00:55 22
00:56 00:57 B A

Now, if you can scan through the rows returned programmatically,
creating a processing break when TIME_SLOT or NEXT_TIME_SLOT is not A,
you should be able to handle the processing. In this case remember
00:02, since TIME_SLOT is A and NEXT_TIME_SLOT is A, and report 00:02
- 00:05. Process the next line, and remember 00:05 also, since
TIME_SLOT is A and NEXT_TIME_SLOT is A, and output 00:02 - 00:07 and
00:05 - 00:07. Process the next line, either TIME_SLOT is not A or
NEXT_TIME_SLOT is not A, so clear the remembered list and process the
next line. It is quite simple to handle programmatically.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.

DA Morgan

unread,
Apr 5, 2007, 5:37:00 PM4/5/07
to

Charles seems to have provided a good starting point for you.
The reason for all my questions is that without knowing the business
rules, now clear, it would be easy to point you in the wrong direction.

But I take issue, at least in theory, with your response that the same
times can not occur in A and B. They are are independent ... why not?

HTH

joel garry

unread,
Apr 5, 2007, 5:39:08 PM4/5/07
to

I'm too stupid to do this in SQL. I'd select out the timestamp, data
and a tag for which table, from the two tables into two files, send
both files through unix sort, awk for the pattern, and then back into
the db with sqlloader or maybe as an external table. This often is
faster because you don't upset Oracle with too-large sorting within
the db, and you just blast sequentially through the data rather than
trying to lag values.

Please don't top-post.

jg
--
@home.com is bogus.
"Oh boy, we need to work on the I/O on this test system, I swear there
are gerbils in that server running back and forth with some floppies
in their mouths transferring the data between disks. " - Herod T

Mladen Gogala

unread,
Apr 5, 2007, 6:22:25 PM4/5/07
to

John, allow me to take a shot: Ralph Kimball describes the timeline
table in his DW toolkit book. He uses fixed length intervals (connected
with the warehouse granularity). Also, you may want to introduce a
synthetic event C, defined as the absence of B in the given period.
SQL, as opposed to Perl is not a solution for all problems.

--
http://www.mladen-gogala.com

DA Morgan

unread,
Apr 5, 2007, 6:46:11 PM4/5/07
to

Similarly a pipelined table function could generate all possible times
and be joined.

Maxim Demenko

unread,
Apr 5, 2007, 7:44:50 PM4/5/07
to Charles Hooper
Charles Hooper schrieb:

> On Apr 5, 3:50 pm, "John" <acide.ascorbi...@gmail.com> wrote:
>>> Is this the real problem or a simplification?
>> It's a simplification but not that much. The real problem involves
>> user_ids but this part can be skipped here.
>>
>>> Is this something that will be run once or repeatedly?
>> Only once.
>>
>>> Is it possible for the same time to be in A and B?
>> No, A and B are completely different data.
>>
>>> Is it possible to have a B before an A beginning the sequence?
>>> Is it possible for there to be multiple Bs between As?
>> Yes everything is possible, A events and B events happen
>> independently.
>>
>> Thanks for being interested in my problem!
>>
>> John
>
> I would be inclined to not handle this by not just using SQL. You
> could potentially have a nearly full Cartesian join on the first table
> to itself, for example:
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.
>

I suggest, pure SQL solution could be like this:

create table a as
with t as (
select 'A' event,dbms_random.value(1,31) + trunc(sysdate,'MM') edate
from dual
connect by level <= 60 )
select * from t;


create table b as
with t as (select 'B' event,dbms_random.value(1,31) +
trunc(sysdate,'MM') edate
from dual
connect by level <= 15)
select * from t;

WITH u as ( SELECT * FROM a UNION ALL SELECT * FROM b ORDER BY 2)
,
t as
(SELECT u.*,
row_number() over(order by edate) id, -- to enumerate
all events


decode(lag(event) over (order by edate), event, 0, 1)
start_of_group
FROM u
ORDER BY edate
)
,
t1 as ( SELECT id,event,edate,sum(start_of_group) over(order by
edate) group_id FROM t)
,
t2 as
(SELECT id,
event,
edate,
group_id,
count(*) over(partition by group_id) cnt_in_group,
row_number() over(partition by group_id order by edate)
row_num
FROM t1
)
,
t3 as
(SELECT connect_by_root(id) first_event,
connect_by_root(edate) first_date,
id last_event,
edate last_date
FROM t2
WHERE cnt_in_group >1
AND event ='A' connect by prior row_num=row_num-1
AND prior group_id=group_id
)
SELECT *
FROM t3
WHERE first_event != last_event
ORDER BY first_event,
last_event;

However, considering the data volumes ( 1e8 for A Events and 3e7 for B
Events), this could result in worst case ( all B events are
chronologically after or before A events) in 1e8!/2!(1e8-2)!
permutations, what nearly equals 1e16 rows, in best case, if all B
events are evenly distributed among A events, this will result in
approximately 1e7 rows, assuming the distribution is somewhere in the
middle - still would expect a lot of data in result set - i would second
the Charles suggestion to process programmatically, possibly dividing
the source data in chunks.

Best regards

Maxim

Charles Hooper

unread,
Apr 5, 2007, 10:51:28 PM4/5/07
to
On Apr 5, 7:44 pm, Maxim Demenko <mdeme...@gmail.com> wrote:
> Charles Hooper schrieb:
> > On Apr 5, 3:50 pm, "John" <acide.ascorbi...@gmail.com> wrote:
> >>> Is this the real problem or a simplification?
> >> It's a simplification but not that much. The real problem involves
> >> user_ids but this part can be skipped here.
>
> >>> Is this something that will be run once or repeatedly?
> >> Only once.
>
> >>> Is it possible for the same time to be in A and B?
> >> No, A and B are completely different data.
>
> >>> Is it possible to have a B before an A beginning the sequence?
> >>> Is it possible for there to be multiple Bs between As?
> >> Yes everything is possible, A events and B events happen
> >> independently.
>
> >> Thanks for being interested in my problem!
>
> >> John
>
> > I would be inclined to not handle this by not just using SQL. You
> > could potentially have a nearly full Cartesian join on the first table
> > to itself, for example:
>
> > Charles Hooper
> > PC Support Specialist
> > K&M Machine-Fabricating, Inc.
>
> I suggest, pure SQL solution could be like this:

(SNIP)

Nice. I stopped short of providing a solution in my previous post.
Here is an example of an all SQL solution, continuing from my previous
post:

Let's try an experiment to see if we can generate a sequential counter
that skips when either TIME_SLOT or NEXT_TIME_SLOT is not A:
SELECT
TIME_DATE,
NEXT_TIME_DATE,
(ROW_NUMBER() OVER (ORDER BY TIME_DATE))*DECODE(TIME_SLOT,'A',
1,NULL)*DECODE(NEXT_TIME_SLOT,'A',1,NULL) TT,
TIME_SLOT,


NEXT_TIME_SLOT,
T1_V1,
T1_V2
FROM
(SELECT

TIME_DATE TIME_DATE,
LEAD(TIME_DATE) OVER (ORDER BY TIME_DATE) NEXT_TIME_DATE,


TIME_SLOT,
LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,
T1_V1,
T1_V2
FROM
(SELECT
NVL(T1.V1,T2.V1) TIME_DATE,
DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,
TO_CHAR(T1.V1,'HH24:MI') T1_V1,
T1.V2 T1_V2,
TO_CHAR(T2.V1,'HH24:MI') T2_V1,
T2.V2 T2_V2
FROM
T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY

NVL(T1.V1,T2.V1)));

TIME_DATE NEXT_TIME_DATE TT TIME_SLOT NEXT_TIME_SLOT T1_V1
T1_V2
==================== ==================== =========== =========
============== ===== ==========
05-APR-2007 00:02:00 05-APR-2007 00:05:00 1 A A 00:02 1
05-APR-2007 00:05:00 05-APR-2007 00:07:00 2 A A 00:05 2
05-APR-2007 00:07:00 05-APR-2007 00:09:00 A B 00:07 3
05-APR-2007 00:09:00 05-APR-2007 00:10:00 B A
05-APR-2007 00:10:00 05-APR-2007 00:12:00 5 A A 00:10 4
05-APR-2007 00:12:00 05-APR-2007 00:15:00 6 A A 00:12 5
05-APR-2007 00:15:00 05-APR-2007 00:17:00 7 A A 00:15 6
05-APR-2007 00:17:00 05-APR-2007 00:18:00 A B 00:17 7
05-APR-2007 00:18:00 05-APR-2007 00:20:00 B A
05-APR-2007 00:20:00 05-APR-2007 00:22:00 10 A A 00:20 8
05-APR-2007 00:22:00 05-APR-2007 00:25:00 11 A A 00:22 9
05-APR-2007 00:25:00 05-APR-2007 00:27:00 12 A A 00:25 10
05-APR-2007 00:27:00 05-APR-2007 00:28:00 A B 00:27 11
05-APR-2007 00:28:00 05-APR-2007 00:30:00 B A
05-APR-2007 00:30:00 05-APR-2007 00:32:00 15 A A 00:30 12
05-APR-2007 00:32:00 05-APR-2007 00:35:00 16 A A 00:32 13
05-APR-2007 00:35:00 05-APR-2007 00:37:00 A AB 00:35 14
05-APR-2007 00:37:00 05-APR-2007 00:40:00 AB A 00:37 15
05-APR-2007 00:40:00 05-APR-2007 00:42:00 19 A A 00:40 16
05-APR-2007 00:42:00 05-APR-2007 00:45:00 20 A A 00:42 17
05-APR-2007 00:45:00 05-APR-2007 00:47:00 A AB 00:45 18
05-APR-2007 00:47:00 05-APR-2007 00:50:00 AB A 00:47 19
05-APR-2007 00:50:00 05-APR-2007 00:52:00 23 A A 00:50 20
05-APR-2007 00:52:00 05-APR-2007 00:55:00 24 A A 00:52 21
05-APR-2007 00:55:00 05-APR-2007 00:56:00 A B 00:55 22
05-APR-2007 00:56:00 05-APR-2007 00:57:00 B A
05-APR-2007 00:57:00 05-APR-2007 01:00:00 27 A A 00:57 23
05-APR-2007 01:00:00 05-APR-2007 01:02:00 28 A A 01:00 24
05-APR-2007 01:02:00 05-APR-2007 01:05:00 A AB 01:02 25
05-APR-2007 01:05:00 05-APR-2007 01:07:00 AB A 01:05 26
05-APR-2007 01:07:00 05-APR-2007 01:10:00 31 A A 01:07 27
...

Now, let's use SYS_CONNECT_BY_PATH to connect the start and end times
together:
SELECT
TIME_DATE,
NEXT_TIME_DATE,
TO_CHAR(TIME_DATE,'HH24:MI') ||'-' ||
SUBSTR(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(NEXT_TIME_DATE,'HH24:MI'),','),
2,50)||',',
1,INSTR(SUBSTR(SYS_CONNECT_BY_PATH(TO_CHAR(NEXT_TIME_DATE,'HH24:MI'),','),
2,50)||',',',')-1) TIME_RANGE
FROM
(SELECT
TIME_DATE,
NEXT_TIME_DATE,
(ROW_NUMBER() OVER (ORDER BY TIME_DATE))*DECODE(TIME_SLOT,'A',
1,NULL)*DECODE(NEXT_TIME_SLOT,'A',1,NULL) TS,
TIME_SLOT,
NEXT_TIME_SLOT
T1_V1,
T1_V2
FROM
(SELECT
TIME_DATE TIME_DATE,
LEAD(TIME_DATE,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_DATE,


TIME_SLOT,
LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,
T1_V1,
T1_V2
FROM
(SELECT
NVL(T1.V1,T2.V1) TIME_DATE,
DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,
TO_CHAR(T1.V1,'HH24:MI') T1_V1,
T1.V2 T1_V2,
TO_CHAR(T2.V1,'HH24:MI') T2_V1,
T2.V2 T2_V2
FROM
T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
ORDER BY

NVL(T1.V1,T2.V1))))
CONNECT BY PRIOR
TS=TS+1;

TIME_DATE NEXT_TIME_DATE TIME_RANGE
==================== ====================
=========================================================
05-APR-2007 00:02:00 05-APR-2007 00:05:00 00:02-00:05
05-APR-2007 00:05:00 05-APR-2007 00:07:00 00:05-00:07
05-APR-2007 00:02:00 05-APR-2007 00:05:00 00:02-00:07
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:12
05-APR-2007 00:12:00 05-APR-2007 00:15:00 00:12-00:15
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:15
05-APR-2007 00:15:00 05-APR-2007 00:17:00 00:15-00:17
05-APR-2007 00:12:00 05-APR-2007 00:15:00 00:12-00:17
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:17
05-APR-2007 00:20:00 05-APR-2007 00:22:00 00:20-00:22
05-APR-2007 00:22:00 05-APR-2007 00:25:00 00:22-00:25
05-APR-2007 00:20:00 05-APR-2007 00:22:00 00:20-00:25
05-APR-2007 00:25:00 05-APR-2007 00:27:00 00:25-00:27
05-APR-2007 00:22:00 05-APR-2007 00:25:00 00:22-00:27
05-APR-2007 00:20:00 05-APR-2007 00:22:00 00:20-00:27
05-APR-2007 00:30:00 05-APR-2007 00:32:00 00:30-00:32
05-APR-2007 00:32:00 05-APR-2007 00:35:00 00:32-00:35
05-APR-2007 00:30:00 05-APR-2007 00:32:00 00:30-00:35
05-APR-2007 00:40:00 05-APR-2007 00:42:00 00:40-00:42
05-APR-2007 00:42:00 05-APR-2007 00:45:00 00:42-00:45
05-APR-2007 00:40:00 05-APR-2007 00:42:00 00:40-00:45
05-APR-2007 00:50:00 05-APR-2007 00:52:00 00:50-00:52
05-APR-2007 00:52:00 05-APR-2007 00:55:00 00:52-00:55
05-APR-2007 00:50:00 05-APR-2007 00:52:00 00:50-00:55
05-APR-2007 00:57:00 05-APR-2007 01:00:00 00:57-01:00
05-APR-2007 01:00:00 05-APR-2007 01:02:00 01:00-01:02
05-APR-2007 00:57:00 05-APR-2007 01:00:00 00:57-01:02
05-APR-2007 01:07:00 05-APR-2007 01:10:00 01:07-01:10
05-APR-2007 01:10:00 05-APR-2007 01:12:00 01:10-01:12
05-APR-2007 01:07:00 05-APR-2007 01:10:00 01:07-01:12
05-APR-2007 01:17:00 05-APR-2007 01:20:00 01:17-01:20
05-APR-2007 01:20:00 05-APR-2007 01:22:00 01:20-01:22
05-APR-2007 01:17:00 05-APR-2007 01:20:00 01:17-01:22
05-APR-2007 01:25:00 05-APR-2007 01:27:00 01:25-01:27
05-APR-2007 01:27:00 05-APR-2007 01:30:00 01:27-01:30
05-APR-2007 01:25:00 05-APR-2007 01:27:00 01:25-01:30
...

(Rows 1652) CONNECT BY WITHOUT FILTERING (cr=23 pr=0 pw=0
time=31559 us)
(Rows 1194) VIEW (cr=23 pr=0 pw=0 time=29665 us)
(Rows 1194) WINDOW NOSORT (cr=23 pr=0 pw=0 time=24882 us)
(Rows 1194) VIEW (cr=23 pr=0 pw=0 time=20154 us)
(Rows 1194) WINDOW SORT (cr=23 pr=0 pw=0 time=14131 us)
(Rows 1194) VIEW (cr=23 pr=0 pw=0 time=15595 us)
(Rows 1194) SORT ORDER BY (cr=23 pr=0 pw=0 time=12007 us)
(Rows 1194) VIEW (cr=23 pr=0 pw=0 time=28726 us)
(Rows 1194) UNION-ALL (cr=23 pr=0 pw=0 time=23946 us)
(Rows 1000) HASH JOIN RIGHT OUTER (cr=14 pr=0 pw=0
time=11251 us)
(Rows 300) TABLE ACCESS FULL T2 (cr=7 pr=0 pw=0
time=662 us)
(Rows 1000) TABLE ACCESS FULL T1 (cr=7 pr=0 pw=0
time=2026 us)
(Rows 194) MERGE JOIN ANTI (cr=9 pr=0 pw=0 time=4647
us)
(Rows 300) TABLE ACCESS BY INDEX ROWID T2 (cr=2 pr=0
pw=0 time=2128 us)
(Rows 300) INDEX FULL SCAN T2_IND1 (cr=1 pr=0 pw=0
time=918 us)
(Rows 106) SORT UNIQUE (cr=7 pr=0 pw=0 time=2528 us)
(Rows 1000) INDEX FAST FULL SCAN T1_IND1 (cr=7 pr=0
pw=0 time=2026 us)

Ana C. Dent

unread,
Apr 5, 2007, 11:03:56 PM4/5/07
to
"John" <acide.as...@gmail.com> wrote in news:1175802637.291526.37550
@p77g2000hsh.googlegroups.com:

I would NOT use SQL to solve this problem.
I'd use Perl or PL/SQL.
Treat Table_A & Table_B as ordered lists (based upon field of choice;
timestamp, ID , whatever).
You only needs to pass thru each list a single time.
Open two read loops; one on A & one on B.
Read 1 record from each list.
1) If A < B, then read record from A.
2) If A < B, then you found 1st hit & read record from A
3) If B < A, then read record from B
4) If B < A, the you have a hit & read record from B
etc. loop until no more records

Mladen Gogala

unread,
Apr 6, 2007, 1:12:48 AM4/6/07
to
On Fri, 06 Apr 2007 03:03:56 +0000, Ana C. Dent wrote:

> I would NOT use SQL to solve this problem. I'd use Perl

That's the spirit!

--
http://www.mladen-gogala.com

EdStevens

unread,
Apr 6, 2007, 10:44:29 AM4/6/07
to
On Apr 5, 10:03 pm, "Ana C. Dent" <anaced...@hotmail.com> wrote:
> "John" <acide.ascorbi...@gmail.com> wrote in news:1175802637.291526.37550

Classic sort/merge logic. Something every cobol programmer learned
the first week on the job. Something we often forget with all of our
point/click/object programming. Sometimes the old, simple solutions
are still the best.

Ana C. Dent

unread,
Apr 6, 2007, 10:49:27 AM4/6/07
to
"EdStevens" <queti...@yahoo.com> wrote in
news:1175870668.9...@y66g2000hsf.googlegroups.com:

I did not claim it to be original, just effective.
Yes, I did learn this techique when doing Cobol programming.

John

unread,
Apr 6, 2007, 10:54:12 AM4/6/07
to
On Apr 5, 3:50 pm, "John" <acide.ascorbi...@gmail.com> wrote:
> > Is this the real problem or a simplification?
>
> It's a simplification but not that much. The real problem involves
> user_ids but this part can be skipped here.
>
> > Is this something that will be run once or repeatedly?
>
> Only once.
>
> > Is it possible for the same time to be in A and B?
>
> No, A and B are completely different data.

Ho I misunderstood the question. I understood the question as "can A
and B have the same values ?", which answer is no. But yes, A and B
can have the same timestamp.

Sorry, thanks for helping,

John

John

unread,
Apr 6, 2007, 10:55:13 AM4/6/07
to
Hi Jonathan,

Jonathan Lewis wrote:
> Option b)
> Use the analytic lag(,1) function
>
> select
> flag, event, prior_event, timestamp, prior_timestamp
> from (
> select
> flag, lag(flag,1) over (order by timestamp) prior_flag,
> event, lag(event,1) over (order by timestamp) prior_event
> timestamp, lag(timestamp,1) over (order by timestamp) prior_timestamp
> from
> (the union all query)
> where
> flag = prior_flag
> and flag = 'A' -- if you just want A's without a B in between.
> ;

Won't this option b) only compute the consecutive A pairs? If I have:

Chronology
(03:50pm, A1)
(03:51pm, B1)

(03:55pm, A2)
(03:58pm, A3)

(03:59pm, A4)
(04:00pm, B2)

I'm looking for all the sequences of events A in the chronology with
no B event in the middle. The results would be:
(A2, A3) ; (A3, A4) and also (A2, A4) even if A2 and A4 are not
consecutive.

Thanks for helping!

John

John

unread,
Apr 6, 2007, 11:06:36 AM4/6/07
to
On Apr 5, 5:07 pm, "Charles Hooper" <hooperc2...@yahoo.com> wrote:
> What can we do with the above to avoid the Cartesian join as much as
> possible? We can use LEAD to peek at the next set of values:
> SELECT
> TO_CHAR(TIME_DATE,'HH24:MI') TIME_DATE,
> TO_CHAR(LEAD(TIME_DATE,1) OVER (ORDER BY TIME_DATE),'HH24:MI')
> NEXT_TIME_DATE,
> TIME_SLOT,
> LEAD(TIME_SLOT,1) OVER (ORDER BY TIME_DATE) NEXT_TIME_SLOT,
> T1_V1,
> T1_V2
> FROM
> (SELECT
> NVL(T1.V1,T2.V1) TIME_DATE,
> DECODE(T1.V1,NULL,'B',NVL2(T2.V1,'AB','A')) TIME_SLOT,
> TO_CHAR(T1.V1,'HH24:MI') T1_V1,
> T1.V2 T1_V2,
> TO_CHAR(T2.V1,'HH24:MI') T2_V1,
> T2.V2 T2_V2
> FROM
> T1 FULL OUTER JOIN T2 ON T1.V1=T2.V1
> ORDER BY
> NVL(T1.V1,T2.V1));

Wouo thanks for such a big and detailed answer!

However I think this query only search for consecutive A pairs. As in
the previous post, if I have:

Chronology
(03:50pm, A1)
(03:51pm, B1)
(03:55pm, A2)
(03:58pm, A3)
(03:59pm, A4)
(04:00pm, B2)

I'm looking for all the sequences of events A in the chronology with

John

unread,
Apr 6, 2007, 11:10:49 AM4/6/07
to
On Apr 5, 5:37 pm, DA Morgan <damor...@psoug.org> wrote:
> But I take issue, at least in theory, with your response that the same
> times can not occur in A and B. They are are independent ... why not?

I misunderstood your question. They can occur at the same time.

Thanks for helping,

John

Charles Hooper

unread,
Apr 6, 2007, 11:49:36 AM4/6/07
to

At first glance, it appears to be only giving consecutive pairs, but
that is not the case. The input looks like this, from a previous step
in the build of the final SQL statement:
05-APR-2007 00:02:00 05-APR-2007 00:05:00 1 A
A 00:02 1
05-APR-2007 00:05:00 05-APR-2007 00:07:00 2 A
A 00:05 2
05-APR-2007 00:07:00 05-APR-2007 00:09:00 A
B 00:07 3


05-APR-2007 00:09:00 05-APR-2007 00:10:00 B A

05-APR-2007 00:10:00 05-APR-2007 00:12:00 5 A
A 00:10 4
05-APR-2007 00:12:00 05-APR-2007 00:15:00 6 A
A 00:12 5
05-APR-2007 00:15:00 05-APR-2007 00:17:00 7 A
A 00:15 6
05-APR-2007 00:17:00 05-APR-2007 00:18:00 A
B 00:17 7

With the above, you would like to have the pairs:
00:02-00:05 (1a), 00:02-00:07 (1b), 00:05-00:07 (1c)
The next sequence:
00:10-00:12 (2a), 00:10-00:15 (2b), 00:12-00:15 (2c), 00:10-00:17
(2d), 00:12-00:17 (2e), 00:15-00:17 (2f)
...

The output of my last post shows this, which is essentially the same
as the above, just in a slightly different order:
05-APR-2007 00:02:00 05-APR-2007 00:05:00 00:02-00:05 (1a)
05-APR-2007 00:05:00 05-APR-2007 00:07:00 00:05-00:07 (1c)
05-APR-2007 00:02:00 05-APR-2007 00:05:00 00:02-00:07 (1b)
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:12 (2a)
05-APR-2007 00:12:00 05-APR-2007 00:15:00 00:12-00:15 (2c)
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:15 (2b)
05-APR-2007 00:15:00 05-APR-2007 00:17:00 00:15-00:17 (2f)
05-APR-2007 00:12:00 05-APR-2007 00:15:00 00:12-00:17 (2e)
05-APR-2007 00:10:00 05-APR-2007 00:12:00 00:10-00:17 (2d)

The execution plan does not look as bad as I expected - a proper
DBMS_XPLAN (rather than the execution plan I stripped from a 10046
trace):
PLAN_TABLE_OUTPUT
=================
-----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-
Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | CONNECT BY WITHOUT FILTERING | | 1 | | 1652 |
00:00:00.07 | 23 | 19 | | | |
| 2 | VIEW | | 1 | 1035 | 1194 |00:00:00.07
| 23 | 19 | | | |
| 3 | WINDOW NOSORT | | 1 | 1035 | 1194 |
00:00:00.07 | 23 | 19 | 73728 | 73728 | |
| 4 | VIEW | | 1 | 1035 | 1194 |00:00:00.07
| 23 | 19 | | | |
| 5 | WINDOW SORT | | 1 | 1035 | 1194 |
00:00:00.06 | 23 | 19 | 43008 | 43008 |38912 (0)|
| 6 | VIEW | | 1 | 1035 | 1194 |00:00:00.06
| 23 | 19 | | | |
| 7 | SORT ORDER BY | | 1 | 1035 | 1194 |
00:00:00.06 | 23 | 19 | 64512 | 64512 |57344 (0)|
| 8 | VIEW | | 1 | 1035 | 1194 |00:00:00.04
| 23 | 19 | | | |
| 9 | UNION-ALL | | 1 | | 1194 |00:00:00.04
| 23 | 19 | | | |
|* 10 | HASH JOIN RIGHT OUTER | | 1 | 1000 | 1000 |
00:00:00.04 | 14 | 12 | 1155K| 1155K| 1219K (0)|
| 11 | TABLE ACCESS FULL | T2 | 1 | 300 | 300 |
00:00:00.02 | 7 | 6 | | | |
| 12 | TABLE ACCESS FULL | T1 | 1 | 1000 | 1000 |
00:00:00.02 | 7 | 6 | | | |
| 13 | MERGE JOIN ANTI | | 1 | 35 | 194 |
00:00:00.02 | 9 | 7 | | | |
| 14 | TABLE ACCESS BY INDEX ROWID| T2 | 1 | 300 | 300
|00:00:00.01 | 2 | 1 | | | |
| 15 | INDEX FULL SCAN | T2_IND1 | 1 | 300 |
300 |00:00:00.01 | 1 | 1 | | | |
|* 16 | SORT UNIQUE | | 300 | 1000 | 106 |
00:00:00.02 | 7 | 6 | 64512 | 64512 |57344 (0)|
| 17 | INDEX FAST FULL SCAN | T1_IND1 | 1 | 1000 |
1000 |00:00:00.02 | 7 | 6 | | | |

Predicate Information (identified by operation id):
1 - access("TS"+1=PRIOR NULL)
10 - access("T1"."V1"="T2"."V1")
16 - access("T1"."V1"="T2"."V1")
filter("T1"."V1"="T2"."V1")

DA Morgan

unread,
Apr 6, 2007, 12:02:37 PM4/6/07
to

I guess the professor just comes out in me but Ok ... given they can
have the same timestamp ... when they do ... does B come before A or
after A.

Even the simple can grow rather complex when all possibilities are
considered. <g>

DA Morgan

unread,
Apr 6, 2007, 12:04:29 PM4/6/07
to

Which leads to the question I asked above ... then which comes first?
The chicken or the egg?

I think the programming logic to do what you actually need will be
substantially more complex than that of what you originally asked. <g>

John

unread,
Apr 6, 2007, 2:54:26 PM4/6/07
to
On Apr 6, 12:04 pm, DA Morgan <damor...@psoug.org> wrote:
> John wrote:
> > On Apr 5, 5:37 pm, DA Morgan <damor...@psoug.org> wrote:
> >> But I take issue, at least in theory, with your response that the same
> >> times can not occur in A and B. They are are independent ... why not?
>
> > I misunderstood your question. They can occur at the same time.
>
> > Thanks for helping,
>
> > John
>
> Which leads to the question I asked above ... then which comes first?
> The chicken or the egg?

They are in a separate tables so there is no one coming before the
other or not.

Thanks for your help,

John

EdStevens

unread,
Apr 6, 2007, 3:07:24 PM4/6/07
to
On Apr 6, 9:49 am, "Ana C. Dent" <anaced...@hotmail.com> wrote:

Ah, the limitations of the written word. It sounds to me as if you
thought I was being sarcastic. Maybe I'm misunderstanding you now, or
maybe you misunderstood me ... I was not meaning to be sarcastic
toward your post. If anything, I was giving a bit of a dig to everyone
else (including myself) for missing the obvious.

As an old, unrepentant cobol programmer myself, please accept my
apology.

EdStevens

unread,
Apr 6, 2007, 4:31:56 PM4/6/07
to
On Apr 6, 9:49 am, "Ana C. Dent" <anaced...@hotmail.com> wrote:

Didn't mean to sound sarcastic (if that's what you were thinking --
the shortcomings of written communications, both ways). Just taking a
mild dig at everyone else (myself included) for losing sight of the
simple but un-sexy solution.

As an unrepentant old cobol programmer myself, I miss the old days.

Maxim Demenko

unread,
Apr 6, 2007, 4:34:56 PM4/6/07
to John
John schrieb:


<snip>


Chronology
(03:50pm, A1)
(03:51pm, B1)
(03:55pm, A2)
(03:58pm, A3)
(03:59pm, A4)
(04:00pm, B2)

I'm looking for all the sequences of events A in the chronology with
no B event in the middle. Here the results would be:
(A2, A3) ; (A2, A4) and (A3, A4)

Thanks!

John
<snip>

At this point i am lost.

Best regards

Maxim

DA Morgan

unread,
Apr 6, 2007, 5:11:31 PM4/6/07
to

Let me repeat my question by way of an example. Which is correct?

Chronology
(03:50pm, A1)
(03:51pm, B1)
(03:55pm, A2)

(03:55pm, B2)
(03:56pm, B3)

or

Chronology
(03:50pm, A1)
(03:51pm, B1)

(03:55pm, B2)
(03:55pm, A2)
(03:56pm, B3)

Jonathan Lewis

unread,
Apr 6, 2007, 5:45:44 PM4/6/07
to

"John" <acide.as...@gmail.com> wrote in message

news:1175871313.4...@n76g2000hsh.googlegroups.com...

Correct, the option using the analytic lag() would
only give you the adjacent pairs.

Mladen Gogala

unread,
Apr 6, 2007, 7:36:15 PM4/6/07
to
On Fri, 06 Apr 2007 22:45:44 +0100, Jonathan Lewis wrote:

> Correct, the option using the analytic lag() would only give you the
> adjacent pairs.

Actually, it wouldn't. In version 10g, lag has an "offset" argument
which would tell you by how much is it trailing.

--
http://www.mladen-gogala.com

Ana C. Dent

unread,
Apr 6, 2007, 7:45:05 PM4/6/07
to
"EdStevens" <queti...@yahoo.com> wrote in
news:1175891515.9...@n59g2000hsh.googlegroups.com:

SQL is a great tool, but not appropriate for every task.
A hammer is a great tool, but appropriate for every task.
Better results occur when selecting the right tool for the job at hand.

No offense was taken.
I was just surprised that others are as prehistoric as I am.

Maxim Demenko

unread,
Apr 7, 2007, 3:41:47 AM4/7/07
to Mladen Gogala
Mladen Gogala schrieb:

> On Fri, 06 Apr 2007 22:45:44 +0100, Jonathan Lewis wrote:
>
>> Correct, the option using the analytic lag() would only give you the
>> adjacent pairs.
>
> Actually, it wouldn't. In version 10g, lag has an "offset" argument
> which would tell you by how much is it trailing.
>

Actually, lag() has an "offset" argument since 8i, nevertheless, lag()
seems not to be suitable to find *all* consecutive pairs.

Best regards

Maxim

Andrea B.

unread,
Apr 7, 2007, 2:57:53 AM4/7/07
to
John ha scritto:

> I'm trying to figure out an efficient way to search for the non
> existence of events in chronological data with SQL. The goal (detailed
> below) seems pretty simple but so far it looks like it's tricky to do
> with Oracle. Here's my problem:
>
> I'm working with 2 simple but huge tables each recording a different
> kind of event associated with a timestamp. For instance:
>
> Table A
> (03:50pm, A1)
> (03:55pm, A2)
> (03:58pm, A3)
>
> Table B
> (03:51pm, B1)
> (04:00pm, B2)
>
> I'm looking for all the chronological sequences (Ax, Ay) where no B
> event is present between Ax and Ay. In this example, the result would
> be (A2, A3).
>
> I've been searching actively for an efficient solution for this
> problem and I couldn't find any fast enough. Do you have any idea?

I'm not a SQL guru...this is just a hint.
First make a union of the two table sorting for timestamp and event
type, add a sequence field.

1 3.55 A1
2 3.56 B1
3 3.58 A2
...

Then you can make a second query selecting every event and the next in
sequence

3:55, A1, 3.56 B1
3.56, B1, 3.58 A2
...

and filtering the records where both event are A-type.

Jonathan Lewis

unread,
Apr 7, 2007, 4:54:56 AM4/7/07
to

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


"Mladen Gogala" <mgogala.S...@verizon.net> wrote in message
news:pan.2007.04...@verizon.net...

Jonathan Lewis

unread,
Apr 7, 2007, 5:10:13 AM4/7/07
to

"Mladen Gogala" <mgogala.S...@verizon.net> wrote in message
news:pan.2007.04...@verizon.net...


Tsk, Tsk,

Mladen,

After your complaints about lazy participants in Oracle-l, you
didn't read the whole of this fairly short posting before responding.

The first few lines were as follows:

==================================


> Jonathan Lewis wrote:
>> Option b)
>> Use the analytic lag(,1) function
>>
>> select
>> flag, event, prior_event, timestamp, prior_timestamp
>> from (
>> select
>> flag, lag(flag,1) over (order by timestamp) prior_flag,

....
==================================

followed by

==================================


>> Correct, the option using the analytic lag() would only give you the
>> adjacent pairs.

==================================

So option (b), as stated, would only give the adjacent pairs.
Moreover it looks like I already knew that the lag() function
allowed for an offset - because I used it (making the default
value explicit).

In fact, if you read the old manuals, you will find lag() has had
an offset since it was first introduced in Oracle 8.1.6.

charely

unread,
Apr 8, 2007, 4:08:35 AM4/8/07
to

"John" <acide.as...@gmail.com> schreef in bericht
news:1175788930....@e65g2000hsc.googlegroups.com...
> Thanks for your answer,
>
> Here are the technical details and the query I've been using so far.
>
> TableA is ~100 millions row and contains (timestamp, evtA)
> TableB is ~30 millions row and contains (timestamp, evtB)
>
> The following query took ~60h (on a private but quite slow server) to
> compute. ~1h is what I'm aiming to.
>
> select TA1_B.evtA, TA2.evtA
> from
> (
> select TA1.evtA, TA1.timestamp timeA1, TB.evtB, min(TB.timestamp)
> min_timeB
> from tableA TA1 left outer join tableB TB on (TA1.timestamp <
> TB.timestamp)
> group by TA1.evtA, TA1.timestamp, TB.evtB
> ) TA1_B,
> tableA TA2
> where
> TA1_B.timeA1 < TA2.timestamp
> and (TA2.timestamp < TA1_B.min_timeB or TA1_B.min_timeB is null)
> and TA1_B.evtA <> TA2.evtA;
>
> Thanks!
>
> John

>
> On Apr 5, 10:51 am, "Ana C. Dent" <anaced...@hotmail.com> wrote:
>> "John" <acide.ascorbi...@gmail.com> wrote in
>> news:1175783243.167337.195580
>> @n59g2000hsh.googlegroups.com:
>>
>>
>>
>>
>>
>> > Hi,

>>
>> > I'm trying to figure out an efficient way to search for the non
>> > existence of events in chronological data with SQL. The goal (detailed
>> > below) seems pretty simple but so far it looks like it's tricky to do
>> > with Oracle. Here's my problem:
>>
>> > I'm working with 2 simple but huge tables each recording a different
>> > kind of event associated with a timestamp. For instance:
>>
>> > Table A
>> > (03:50pm, A1)
>> > (03:55pm, A2)
>> > (03:58pm, A3)
>>
>> > Table B
>> > (03:51pm, B1)
>> > (04:00pm, B2)
>>
>> > I'm looking for all the chronological sequences (Ax, Ay) where no B
>> > event is present between Ax and Ay. In this example, the result would
>>
>> I do not understand the logic that would lead the answer above;
>> since none of the timestamps in Table A match any timestamp in Table B.

>>
>> > be (A2, A3).
>>
>> > I've been searching actively for an efficient solution for this
>> > problem and I couldn't find any fast enough. Do you have any idea?
>>
>> How fast is fast enough?
>> How do we know what you tried & deemed unacceptable?- Hide quoted text -
>>
>> - Show quoted text -
>
>

What about

select ta1.timestamp , ta2.timestamp from tablea ta1 , tablea ta2
where ta2.timestamp > ta1.timestamp and ta2.timestamp <=
nvl((select min(timestamp) from tableb b where b.timestamp >
ta1.timestamp) ,
(select max(timestamp) from tablea))

The nvl function is only needed to also catch events where no later event
in b exist.

I have not tested this for performance , but assuming you have indexes on
the timestamp
columns ( or using IOTs for the tables) , the optimizer will probably use
range scans on
those indexes

Charles Hooper

unread,
Apr 8, 2007, 11:03:51 AM4/8/07
to
On Apr 8, 4:08 am, "charely" <nos...@skynet.be> wrote:
> "John" <acide.ascorbi...@gmail.com> schreef in berichtnews:1175788930....@e65g2000hsc.googlegroups.com...

> > Here are the technical details and the query I've been using so far.
>
> > TableA is ~100 millions row and contains (timestamp, evtA)
> > TableB is ~30 millions row and contains (timestamp, evtB)
>
> > The following query took ~60h (on a private but quite slow server) to
> > compute. ~1h is what I'm aiming to.
>
> > select TA1_B.evtA, TA2.evtA
> > from
> > (
> > select TA1.evtA, TA1.timestamp timeA1, TB.evtB, min(TB.timestamp)
> > min_timeB
> > from tableA TA1 left outer join tableB TB on (TA1.timestamp <
> > TB.timestamp)
> > group by TA1.evtA, TA1.timestamp, TB.evtB
> > ) TA1_B,
> > tableA TA2
> > where
> > TA1_B.timeA1 < TA2.timestamp
> > and (TA2.timestamp < TA1_B.min_timeB or TA1_B.min_timeB is null)
> > and TA1_B.evtA <> TA2.evtA;
>
> > Thanks!
>
> > John
>
> What about
>
> select ta1.timestamp , ta2.timestamp from tablea ta1 , tablea ta2
> where ta2.timestamp > ta1.timestamp and ta2.timestamp <=
> nvl((select min(timestamp) from tableb b where b.timestamp >
> ta1.timestamp) ,
> (select max(timestamp) from tablea))
>
> The nvl function is only needed to also catch events where no later event
> in b exist.
>
> I have not tested this for performance , but assuming you have indexes on
> the timestamp
> columns ( or using IOTs for the tables) , the optimizer will probably use
> range scans on
> those indexes.

I don't that that the SQL statement you provided will offer the data
that the OP was wanting. Assume the following:
TABLEA (T1)
05-APR-2007 00:07:00


05-APR-2007 00:10:00
05-APR-2007 00:12:00

05-APR-2007 00:15:00
05-APR-2007 00:17:00

05-APR-2007 00:20:00

TABLEB (T2)
05-APR-2007 00:09:00
05-APR-2007 00:18:00

The OP wanted to retrieve pairs of data from two rows of TABLEA (T1)
where a value from TABLEB (T2) does not fall between the pairs from
TABLEA (T1). In this case, we need to report all pairs of values
between the time values 05-APR-2007 00:10:00, 05-APR-2007 00:12:00, 05-
APR-2007 00:15:00, and 05-APR-2007 00:17:00 since those fall between
the two values from TABLEB (T2). That should yield the following
list:
00:10-00:12
00:12-00:15
00:10-00:15
00:15-00:17
00:12-00:17
00:10-00:17

Let's reformat the query that you provided so that it can use the
sample tables and indexes that I provided previously, in order to see
if it is an efficient starting point for the OP:
SELECT /*+ GATHER_PLAN_STATISTICS */
TA1.V1,
TA2.V1
FROM
T1 TA1,
T2 TA2
WHERE
TA2.V1 > TA1.V1
AND TA2.V1 <= NVL(
(SELECT
MIN(V1)
FROM
T2 B
WHERE
B.V1 > TA1.V1),
(SELECT
MAX(V1)
FROM
T1));

SELECT
*
FROM
TABLE (DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST'));

| Id | Operation | Name | Starts | E-Rows | A-

Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|* 1 | FILTER | | 1 | |
1000 |00:00:00.47 | 79 | | | |
| 2 | MERGE JOIN | | 1 | 15000
| 167K|00:00:01.17 | 8 | | | |
| 3 | SORT JOIN | | 1 | 300
| 300 |00:00:00.01 | 1 | 11264 | 11264 |10240 (0)|
| 4 | INDEX FULL SCAN | T2_IND1 | 1 | 300


| 300 |00:00:00.01 | 1 | | | |

|* 5 | SORT JOIN | | 300 | 1000
| 167K|00:00:00.50 | 7 | 36864 | 36864 |32768 (0)|
| 6 | INDEX FAST FULL SCAN | T1_IND1 | 1 | 1000 |
1000 |00:00:00.01 | 7 | | | |
| 7 | SORT AGGREGATE | | 71341 | 1 |
71341 |00:00:02.09 | 69 | | | |
| 8 | FIRST ROW | | 71341 | 15 |
71341 |00:00:01.21 | 69 | | | |
|* 9 | INDEX RANGE SCAN (MIN/MAX) | T2_IND1 | 71341 | 15 |
71341 |00:00:00.47 | 69 | | | |
| 10 | SORT AGGREGATE | | 1 | 1
| 1 |00:00:00.01 | 2 | | | |
| 11 | INDEX FULL SCAN (MIN/MAX)| T1_IND1 | 1 | 1000
| 1 |00:00:00.01 | 2 | | | |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------
1 - filter("TA2"."V1"<=NVL(,))
5 -
access(INTERNAL_FUNCTION("TA2"."V1")>INTERNAL_FUNCTION("TA1"."V1"))

filter(INTERNAL_FUNCTION("TA2"."V1")>INTERNAL_FUNCTION("TA1"."V1"))
9 - access("B"."V1">:B1)

Notice the Starts column in the above - that is the number of times
that portion of the plan was executed. Also note the significant
difference between estimated and actual rows, as well as what is
reported in the actual time column. Where did that bind variable in
the predicate information for ID 9 come from (Oracle optimization)?

Here is the plan for the final SQL statement that I provided, just for
comparison:


| Id | Operation | Name | Starts | E-
Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-
Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | CONNECT BY WITHOUT FILTERING | | 1

| | 1652 |00:00:00.06 | 23 | 12 | |
| |
| 2 | VIEW | | 1 |
1035 | 1194 |00:00:00.07 | 23 | 12 | |
| |
| 3 | WINDOW NOSORT | | 1 |
1035 | 1194 |00:00:00.06 | 23 | 12 | 73728 | 73728
| |
| 4 | VIEW | | 1 |
1035 | 1194 |00:00:00.06 | 23 | 12 | |
| |
| 5 | WINDOW SORT | | 1 |
1035 | 1194 |00:00:00.05 | 23 | 12 | 43008 | 43008 |38912
(0)|
| 6 | VIEW | | 1 |
1035 | 1194 |00:00:00.05 | 23 | 12 | |


| |
| 7 | SORT ORDER BY | | 1 |

1035 | 1194 |00:00:00.05 | 23 | 12 | 57344 | 57344 |51200
(0)|
| 8 | VIEW | | 1 |
1035 | 1194 |00:00:00.07 | 23 | 12 | |
| |
| 9 | UNION-ALL | | 1
| | 1194 |00:00:00.06 | 23 | 12 | |


| |
|* 10 | HASH JOIN RIGHT OUTER | | 1 |

1000 | 1000 |00:00:00.05 | 14 | 12 | 898K| 898K| 1132K


(0)|
| 11 | TABLE ACCESS FULL | T2 | 1 |

300 | 300 |00:00:00.04 | 7 | 6 | |


| |
| 12 | TABLE ACCESS FULL | T1 | 1 |

1000 | 1000 |00:00:00.01 | 7 | 6 | |


| |
| 13 | MERGE JOIN ANTI | | 1 |

35 | 194 |00:00:00.01 | 9 | 0 | | |


|
| 14 | TABLE ACCESS BY INDEX ROWID| T2 | 1 |

300 | 300 |00:00:00.01 | 2 | 0 | |


| |
| 15 | INDEX FULL SCAN | T2_IND1 | 1 |

300 | 300 |00:00:00.01 | 1 | 0 | |


| |
|* 16 | SORT UNIQUE | | 300 |

1000 | 106 |00:00:00.01 | 7 | 0 | 36864 | 36864 |32768


(0)|
| 17 | INDEX FAST FULL SCAN | T1_IND1 | 1 |

1000 | 1000 |00:00:00.01 | 7 | 0 | |
| |

Predicate Information (identified by operation id):

---------------------------------------------------


1 - access("TS"+1=PRIOR NULL)
10 - access("T1"."V1"="T2"."V1")
16 - access("T1"."V1"="T2"."V1")
filter("T1"."V1"="T2"."V1")

If nothing else, this shows how efficient analytic functions can be
when compared to other methods.

Regarding Mladen Gogala suggestion to use LAG(V1,1), LAG(V1,2),
LAG(V1,3), ... LAG(V1, n) - I looked at using that method initially.
The problems that I encountered: #1 what should the value of n be so
that I do not miss any potential matches, #2 how do I create a new
result row for each of the matches, since all of the matches will be
returned on the same result row. In other words, you will only be
able to return one set of matches per result row when using LAG,
unless you can find some way to uncoil all of the resulting matches
into new result rows. That may be the reason that Jonathan Lewis
stated "the option using the analytic lag() would only give you the
adjacent pairs."

I would also suggest not retrieving all rows from the two tables
(TableA is ~100 millions rows, TableB is ~30 millions rows) and
processing the data client side, since you would then also need to
consider latency caused by network traffic when the 130 million rows
are retrieved from the database and sent to the client for processing.

Tony_Miller

unread,
Apr 8, 2007, 11:48:54 AM4/8/07
to
So does this mean we can all tell Mladen to go and read the friggin manual,
since he has WAYYY to much time here to criticize others asking questions??

"Jonathan Lewis" <jona...@jlcomp.demon.co.uk> wrote in message
news:xoWdnTbZfObx_Yrb...@bt.com...

Mladen Gogala

unread,
Apr 8, 2007, 1:02:31 PM4/8/07
to
On Sat, 07 Apr 2007 10:10:13 +0100, Jonathan Lewis wrote:

> In fact, if you read the old manuals, you will find lag() has had an
> offset since it was first introduced in Oracle 8.1.6.

I have only 10G manuals installed at home. The mistake is mine. As for
the oracle-l, the problem was inaccurate and harmful "advice", not people
not reading manuals.

--
http://www.mladen-gogala.com

Maxim Demenko

unread,
Apr 8, 2007, 1:09:32 PM4/8/07
to Tony_Miller
Tony_Miller schrieb:

> So does this mean we can all tell Mladen to go and read the friggin manual,
> since he has WAYYY to much time here to criticize others asking questions??

If somebody asks a question which can be easily answered by reading the
manual or simply by an *attempt* to think, he obviously deserve to get
some portion critic.
On the other side, this thread has about 40 postings and counts, and
till now, all were related to OP. I don't see your point.

Best regards

Maxim

Mladen Gogala

unread,
Apr 8, 2007, 1:42:10 PM4/8/07
to
On Sun, 08 Apr 2007 10:48:54 -0500, Tony_Miller wrote:

> So does this mean we can all tell Mladen to go and read the friggin
> manual, since he has WAYYY to much time here to criticize others asking
> questions??

This must be one of your legendary useful posts? I don't want to ruin
the thread by starting a flame but I you can always put me in your
kill file. As for the protected environment where bad people like me
are thrown out, go to the oracle-l. Just send an email with "SUBSCRIBE
ORACLE-L" in the subject line to oracle-l...@freelists.org

--
http://www.mladen-gogala.com

charely

unread,
Apr 8, 2007, 4:45:42 PM4/8/07
to

"Charles Hooper" <hoope...@yahoo.com> schreef in bericht
news:1176044631....@y66g2000hsf.googlegroups.com...

As far as I can tell, the provided query does match the OP's request,
but I do agree that the CBO has generated a poor execution plan.
I got a better plan with

select /*+USE_NL(ta2 ta1)*/ ta1.v1 , ta2.v1 from t1 ta1 , t1 ta2
where ta2.v1 > ta1.v1 and ta2.v1 <=
nvl((select min(v1) from t2 b where b.v1 > ta1.v1) ,
(select max(v1) from t1))


-----------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost
(%CPU)| Time |

-----------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 25 | 650 | 7514K
(1)| 25:02:57 |

| 1 | NESTED LOOPS | | 249K| 6346K| 20258
(2)| 00:04:04 |

| 2 | TABLE ACCESS FULL | T1 | 9999 | 126K| 9
(0)| 00:00:01 |

|* 3 | INDEX RANGE SCAN | IX_T1 | 25 | 325 | 2
(0)| 00:00:01 |

| 4 | SORT AGGREGATE | | 1 | 13 |
| |

| 5 | FIRST ROW | | 500 | 6500 | 2
(0)| 00:00:01 |

|* 6 | INDEX RANGE SCAN (MIN/MAX) | IX_T2 | 500 | 6500 | 2
(0)| 00:00:01 |

| 7 | SORT AGGREGATE | | 1 | 13 |
| |

| 8 | INDEX FULL SCAN (MIN/MAX)| IX_T1 | 9999 | 126K|
| |

-----------------------------------------------------------------------------------------

the plan still does two index range scans for every row in T1 ,

but avoids sorting the union of T1 and T2.


Maxim Demenko

unread,
Apr 8, 2007, 6:25:07 PM4/8/07
to charely
charely schrieb:

An interesting approach, i think, this can be rewritten (to avoid
mentioned min/max index range scans) into

with u as (select 'A' event,timestamp from tablea
union all
select 'B' event,timestamp from tableb),
u1 as (select u.*,
last_value(decode(event,'B',timestamp) ignore nulls) over(order by
timestamp desc) b_timestamp
from u)
select *
from u1, u1 u2
where u1.timestamp < u2.b_timestamp
and u2.timestamp <= nvl(u1.timestamp,(select max(timestamp) from tablea))
and u1.event='A' and u2.event='A'

which in my tests on relatively small datasets yields better plan (and
runtime). However, it seems (again on small datasets), the earlier
suggested query with connect_by_root still performs better...


Best regards

Maxim

Jonathan Lewis

unread,
Apr 9, 2007, 6:23:43 AM4/9/07
to

"John" <acide.as...@gmail.com> wrote in message
news:1175871313.4...@n76g2000hsh.googlegroups.com...
>
> Chronology
> (03:50pm, A1)
> (03:51pm, B1)
> (03:55pm, A2)
> (03:58pm, A3)
> (03:59pm, A4)
> (04:00pm, B2)
>

> I'm looking for all the sequences of events A in the chronology with
> no B event in the middle. The results would be:
> (A2, A3) ; (A3, A4) and also (A2, A4) even if A2 and A4 are not
> consecutive.
>

I may have missed know if anyone has come

From the above, it looks as if you need the
following query (in principle). The hints assume
an index on ta(timed) and tb(timed)

select
/*+ ordered use_nl(ta2) index(ta1) index(ta2) */
ta1.timed, ta1.event,
ta2.timed, ta2.event
from
ta ta1,
ta ta2
where
ta2.timed > ta1.timed
and ta2.timed < (
select /*+ index(tb) no_unnest */
min(tb.timed)
from tb
where tb.timed > ta1.timed
)
order by
ta1.timed
;

1) You need to join TA to itself because a single row in
TA can produce a result set of unknown size.

2) The condition for generating a row is that another row
in TA should have a larger time stamp, which does not
exceed the next occurrence from TB.

3) I haven't yet considered problems relating to non-unique
values for timestamp, and timestamps in TA matching
timestamps in TB

If you can create indexes on the (timed) column, then
the plan for the query above gets as close to the
simple COBOL merge (described by other posters)
in action and performance as a non-procedural statement
could.

If you can't create the necessary indexes, then the
code probably has to sort TA twice, and TB once
which means that for large data sets it probably
won't be able to match the non-procedural option.

Execution plan (9i and 10g)


-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
Cost |

-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10090 | 216K|
204K|
| 1 | TABLE ACCESS BY INDEX ROWID | TA | 5 | 55 |
101 |
| 2 | NESTED LOOPS | | 10090 | 216K|
204K|
| 3 | TABLE ACCESS BY INDEX ROWID | TA | 2009 | 22099 |
2007 |
| 4 | INDEX FULL SCAN | TA_U1 | 2009 | |
7 |
|* 5 | INDEX RANGE SCAN | TA_U1 | 100 | |
1 |
| 6 | SORT AGGREGATE | | 1 | 7 |
|
| 7 | FIRST ROW | | 1 | 7 |
1 |
|* 8 | INDEX RANGE SCAN (MIN/MAX)| TB_U1 | 3 | |
1 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("TA2"."TIMED">"SYS_ALIAS_1"."TIMED" AND "TA2"."TIMED"<
(SELECT
/*+ NO_UNNEST INDEX("TB") */ MIN("TB"."TIMED") FROM "TB" "TB"
WHERE
"TB"."TIMED">:B1))
8 - access("TB"."TIMED">:B1)

The plan walks the index on TA in order fetching rows.

For each row it gets index entries from TA where the timestamp
exceeds the current timestamp - but because the subquery to
TB is used as part of the access predicate you know that the
range scan on the index is the minimum range scan. Because the
subquery on the index on TB uses range scan (min/max) this
too will be minimal.

The costs appear as they do simply because of the amount
of data I happened to create in the table over and above the
sample set shown.

charely

unread,
Apr 9, 2007, 7:34:36 AM4/9/07
to

"Jonathan Lewis" <jona...@jlcomp.demon.co.uk> schreef in bericht
news:FN2dnelHbc44iYfb...@bt.com...

This is basically the same query and execution as I suggested in an other
part of this thread.
One issue however , you will be missing all rows combinations in ta that are
later than any
event in tb , that is where "select min(tb.timed) from tb where tb.timed >
ta1.timed"
returns null. This can be remedied by using nvl and some arbitrary large
date:
nvl((select min(tb.timed) from tb where tb.timed > ta1.timed") , sysdate +
1)
- assuming there is are no future events in ta.


Jonathan Lewis

unread,
Apr 9, 2007, 8:44:19 AM4/9/07
to


"charely" <nos...@skynet.be> wrote in message
news:461a24bd$0$13860$ba62...@news.skynet.be...


>
> This is basically the same query and execution as I suggested in an other
> part of this thread.
> One issue however , you will be missing all rows combinations in ta that
> are later than any
> event in tb , that is where "select min(tb.timed) from tb where tb.timed
> > ta1.timed"
> returns null. This can be remedied by using nvl and some arbitrary large
> date:
> nvl((select min(tb.timed) from tb where tb.timed > ta1.timed") , sysdate
> + 1)
> - assuming there is are no future events in ta.
>

Charley,

You are correct - I lost sight of that entire branch
of the thread. It is your solution, minus the boundary
condition that you had captured in your version of
the code. (I leave the need for < or <= operators
to the original poster to decide - the choice depends
on his exact requirements).

I think the 'select max(timed) from ta' option in your
original solution is actually more elegant than picking
an arbitrary future date - it doesn't add materially to
the resource usage as it will only execute once through
a min/max range scan.


The answer to Charles Hooper's question in the follow-up
to your original post: where did the bind variable come from ?
It's how Oracle handles the correlated column from the outer
table as it generates the plan for the correlated subquery.

Jonathan Lewis

unread,
Apr 9, 2007, 8:47:23 AM4/9/07
to

"Jonathan Lewis" <jona...@jlcomp.demon.co.uk> wrote in message news:...

>
> "charely" <nos...@skynet.be> wrote in message
> news:461a24bd$0$13860$ba62...@news.skynet.be...
>>
>
> Charley,
>
> You are correct - I lost sight of that entire branch
> of the thread. It is your solution, minus the boundary
> condition that you had captured in your version of
> the code. (I leave the need for < or <= operators
> to the original poster to decide - the choice depends
> on his exact requirements).
>
> I think the 'select max(timed) from ta' option in your
> original solution is actually more elegant than picking
> an arbitrary future date - it doesn't add materially to
> the resource usage as it will only execute once through
> a min/max range scan.
>
>
> The answer to Charles Hooper's question in the follow-up
> to your original post: where did the bind variable come from ?
> It's how Oracle handles the correlated column from the outer
> table as it generates the plan for the correlated subquery.
>

charely

Apologies for spelling your name incorrectly.
I let the spell-checker "correct" what I had
typed before my brain caught up with what
I was reading.

John

unread,
Apr 11, 2007, 5:41:34 PM4/11/07
to

Both are correct. This Chronology table doesn't exist but can be
created if needed to have the results.

Thanks,
John

John

unread,
Apr 11, 2007, 6:58:58 PM4/11/07
to
Thanks everyone for your answers,

The best query so far (trusting the Oracle's optimizer forecasting) is
the following:

TableA is ~100 millions row and contains (timestamp, evtA)
TableB is ~30 millions row and contains (timestamp, evtB)

select TA1_B.evtA, TA2.evtA
from
(select rownum, TA1.evtA, TA1.timestamp timeA1,
(select min(TB.timestamp)
from tableB TB
where TA1.timestamp < TB.timestamp) min_timeB
from tableA TA1


) TA1_B,
tableA TA2
where
TA1_B.timeA1 < TA2.timestamp
and (TA2.timestamp < TA1_B.min_timeB or TA1_B.min_timeB is null)
and TA1_B.evtA <> TA2.evtA;

It took ~5h to compute instead of the ~60h with the original query
posted in one of my previous post (which had a left outer join this
new one don't have).
It's still a bit slow for me so I'm still interested by your comments
and suggestions!

John

0 new messages