SQL Kung Fu

Adam Jensen

unread,

Aug 8, 2018, 10:04:51 PM8/8/18

to

Hi,

I'm an SQL noob experimenting with tdbc::sqlite3 for a hobby project.
Given a table looks like this:

CREATE TABLE Example (start REAL, stop REAL, description TEXT);
INSERT INTO Example VALUES (10, 30, "interesting");
INSERT INTO Example VALUES (15, 20, "fail");
INSERT INTO Example VALUES (50, 80, "interesting");
INSERT INTO Example VALUES (60, 65, "fail");

SELECT * FROM Example;
10.0|30.0|interesting
15.0|20.0|fail
50.0|80.0|interesting
60.0|65.0|fail

How complicated would it be to construct from it a dataset that looks
something like this:

10.0|15.0|interesting
20.0|30.0|interesting
50.0|60.0|interesting
65.0|80.0|interesting

Could this be done in SQL, or is it better done in Tcl, or some
combination of the two?

Any ideas, hints, pointers, speculations, etc. will be very
appreciated! :)

heinrichmartin

unread,

Aug 9, 2018, 3:33:50 AM8/9/18

to

On Thursday, August 9, 2018 at 4:04:51 AM UTC+2, Adam Jensen wrote:
> Hi,
>
> I'm an SQL noob experimenting with tdbc::sqlite3 for a hobby project.
> Given a table looks like this:

hobby or homework?

> CREATE TABLE Example (start REAL, stop REAL, description TEXT);
> INSERT INTO Example VALUES (10, 30, "interesting");
> INSERT INTO Example VALUES (15, 20, "fail");
> INSERT INTO Example VALUES (50, 80, "interesting");
> INSERT INTO Example VALUES (60, 65, "fail");
>
> SELECT * FROM Example;
> 10.0|30.0|interesting
> 15.0|20.0|fail
> 50.0|80.0|interesting
> 60.0|65.0|fail
>
> How complicated would it be to construct from it a dataset that looks
> something like this:
>
> 10.0|15.0|interesting
> 20.0|30.0|interesting
> 50.0|60.0|interesting
> 65.0|80.0|interesting

what are the rules to compile the result? initial data punches holes in interesting intervals? one fail per interesting or any number? overlapping intervals or disjunct interesting with fully contained fails?

> Could this be done in SQL, or is it better done in Tcl, or some
> combination of the two?

my assumption would be: implementation in Tcl is more obvious, but it is possible in SQL. Performance to be measured.

> Any ideas, hints, pointers, speculations, etc. will be very
> appreciated! :)

too many open questions ...

Adam Jensen

unread,

Aug 9, 2018, 11:57:13 AM8/9/18

to

On Thu, 09 Aug 2018 00:33:47 -0700, heinrichmartin wrote:

> On Thursday, August 9, 2018 at 4:04:51 AM UTC+2, Adam Jensen wrote:
>> Hi,
>>
>> I'm an SQL noob experimenting with tdbc::sqlite3 for a hobby project.
>> Given a table looks like this:
>
> hobby or homework?
>

It is a hobby project that might eventually make its way into the public
domain. The goal is to create a Media Annotation, Analysis and Playback
Synthesis System (I call the project MAAPSS). Conceptually, it is
somewhat similar to vcode[1,2] but more "hackable" with broader general
applications.

http://social.cs.uiuc.edu/projects/vcode.html
https://youtu.be/Sv_OZ174wpg

>> CREATE TABLE Example (start REAL, stop REAL, description TEXT); INSERT
>> INTO Example VALUES (10, 30, "interesting");
>> INSERT INTO Example VALUES (15, 20, "fail");
>> INSERT INTO Example VALUES (50, 80, "interesting");
>> INSERT INTO Example VALUES (60, 65, "fail");
>>
>> SELECT * FROM Example;
>> 10.0|30.0|interesting 15.0|20.0|fail 50.0|80.0|interesting
>> 60.0|65.0|fail
>>
>> How complicated would it be to construct from it a dataset that looks
>> something like this:
>>
>> 10.0|15.0|interesting 20.0|30.0|interesting 50.0|60.0|interesting
>> 65.0|80.0|interesting
>
> what are the rules to compile the result? initial data punches holes in
> interesting intervals? one fail per interesting or any number?
> overlapping intervals or disjunct interesting with fully contained
> fails?
>

In the configuration that I am currently exploring, the database will
contain many entries with intervals that could overlap in every possible
way. The entries could be labeled in many ways other than "interesting"
and "fail". This little example is a simplified attempt to isolate what I
see as the core problem, which is probably mostly based in my current
lack of understanding of SQL/ite.

>> Could this be done in SQL, or is it better done in Tcl, or some
>> combination of the two?
>
> my assumption would be: implementation in Tcl is more obvious, but it is
> possible in SQL. Performance to be measured.
>

I am getting up to speed on Tcl fairly rapidly (I bought a couple of
books recently which are very useful - something I should have done a
long time ago), but I am not yet very familiar with relational database
design and SQL beyond the very basics. So any pointers to similar SQL
problems with solutions and explanations would be tremendously useful at
this point. Any hints at all, really, will be much appreciated.

Adam Jensen

unread,

Aug 9, 2018, 12:57:24 PM8/9/18

to

On Thu, 09 Aug 2018 15:20:06 +0000, Stefan Ram wrote:

> Adam Jensen <han...@riseup.net> writes:
>>Given a table looks like this:
>>CREATE TABLE Example (start REAL, stop REAL, description TEXT); INSERT
>>INTO Example VALUES (10, 30, "interesting");
>>INSERT INTO Example VALUES (15, 20, "fail");
>>INSERT INTO Example VALUES (50, 80, "interesting");
>>INSERT INTO Example VALUES (60, 65, "fail");

>>How complicated would it be to construct from it a dataset that looks
>>something like this:
>>10.0|15.0|interesting 20.0|30.0|interesting 50.0|60.0|interesting
>>65.0|80.0|interesting
>

> Easy (I'm using MySQL),
>
> DROP SCHEMA S; CREATE SCHEMA S; USE S;

>
> CREATE TABLE Example( start REAL, stop REAL, description TEXT );

> INSERT INTO Example( start, stop, description )VALUES( 10, 30,
> 'interesting' );
> INSERT INTO Example( start, stop, description )VALUES( 15, 20, 'fail' );
> INSERT INTO Example( start, stop, description )VALUES( 50, 80,
> 'interesting' );
> INSERT INTO Example( start, stop, description )VALUES( 60, 65, 'fail' );
>
> SELECT 10.0, 15.0, 'interesting' FROM Example UNION SELECT 20.0, 30.0,
> 'interesting' FROM Example UNION SELECT 50.0, 60.0, 'interesting' FROM
> Example UNION SELECT 65.0, 80.0, 'interesting' FROM Example;

These SELECT statements seem to be based on the human-generated desired
result. Said differently, how would you generate these SELECT statements
based on the data in the "Example" table?

Rich

unread,

Aug 9, 2018, 1:39:50 PM8/9/18

to

Adam Jensen <han...@riseup.net> wrote:
> On Thu, 09 Aug 2018 00:33:47 -0700, heinrichmartin wrote:
>
>> On Thursday, August 9, 2018 at 4:04:51 AM UTC+2, Adam Jensen wrote:
>>> Hi,
>>>
>>> I'm an SQL noob experimenting with tdbc::sqlite3 for a hobby project.
>>> Given a table looks like this:
>>
>> hobby or homework?
>>
>
> It is a hobby project that might eventually make its way into the public
> domain. The goal is to create a Media Annotation, Analysis and Playback
> Synthesis System (I call the project MAAPSS). Conceptually, it is
> somewhat similar to vcode[1,2] but more "hackable" with broader general
> applications.

A bit more info - but you've still inadquedaly described what these
data points mean, and you've not at all described how the output is
obtained given the inputs.

> In the configuration that I am currently exploring, the database will
> contain many entries with intervals that could overlap in every possible
> way. The entries could be labeled in many ways other than "interesting"
> and "fail". This little example is a simplified attempt to isolate what I
> see as the core problem, which is probably mostly based in my current
> lack of understanding of SQL/ite.

Ok, my best guess - given your very lacking description. You've got a
database of start times and stop times denoting 'ranges' of a media
file (audio/video/etc.).

And you are wanting some output that relates to the how ranges overlap.

If I'm on the right track, while you could use sqlite as the storage db
for this, you'll likely need to create custom interval overlap handling
functions to do what you want.

Alternately, you might consider a different DB backend, one that has an
actual interval type, and that already includes functions for computing
interval overlaps.

https://www.postgresql.org/docs/10/static/datatype-datetime.html (see
the "interval" type).

https://www.postgresql.org/docs/10/static/functions-datetime.html (see
the bits that are useful for the "interval" type).

Adam Jensen

unread,

Aug 9, 2018, 8:09:07 PM8/9/18

to

> By translating the English-language transformation rules (ELTR) from
> the customer-supplied English-language requirements specification
> (CSELRS) into SQL.
>
> Since no such CSELRS with ELTRs but only an example was available,
> I couldn't use such rules, but was able to show how to write an SQL
> query which produces the example output and was formally based on the
> Example input (using »from Example«).

Is that a bit like answering the question: "How do I get the area of a
unit circle in Tcl?"

Answer:

puts 0.7854

Adam Jensen

unread,

Aug 9, 2018, 9:37:53 PM8/9/18

to

On Thu, 09 Aug 2018 17:39:48 +0000, Rich wrote:

> A bit more info - but you've still inadquedaly described what these data
> points mean, and you've not at all described how the output is obtained
> given the inputs.
>

Is that you, Dick?

http://assets.amuniversal.com/628360f0a42b0132cbc1005056a9545d

> Ok, my best guess - given your very lacking description. You've got a
> database of start times and stop times denoting 'ranges' of a media file
> (audio/video/etc.).
>

You're famous! <smirk>

http://assets.amuniversal.com/e1996ea0c0af0132d64a005056a9545d

> And you are wanting some output that relates to the how ranges overlap.
>
> If I'm on the right track, while you could use sqlite as the storage db
> for this, you'll likely need to create custom interval overlap handling
> functions to do what you want.
>

Yes, I am looking for a combination of SQL and Tcl that will derive a set
of start and stop numbers from the contents of the database and a query.
I am surprised the example wasn't clear for most everyone.

> Alternately, you might consider a different DB backend, one that has an
> actual interval type, and that already includes functions for computing
> interval overlaps.
>
> https://www.postgresql.org/docs/10/static/datatype-datetime.html (see
> the "interval" type).
>
> https://www.postgresql.org/docs/10/static/functions-datetime.html (see
> the bits that are useful for the "interval" type).

The 'start' and 'stop' values stored in the database are numbers of type
REAL, representing seconds. There is no need for calculations with, or
transformations between, elaborate time representation formats.

Rich

unread,

Aug 9, 2018, 10:03:36 PM8/9/18

to

Adam Jensen <han...@riseup.net> wrote:
> On Thu, 09 Aug 2018 17:39:48 +0000, Rich wrote:
>> If I'm on the right track, while you could use sqlite as the storage
>> db for this, you'll likely need to create custom interval overlap
>> handling functions to do what you want.
>
> Yes, I am looking for a combination of SQL and Tcl that will derive a
> set of start and stop numbers from the contents of the database and a
> query. I am surprised the example wasn't clear for most everyone.

You gave inputs, you gave an output, and what the method of deriving
the outputs from the inputs was given as a black box.

It should have been blatantly obvious that was never going to be clear
to anyone but you.

>> Alternately, you might consider a different DB backend, one that has an
>> actual interval type, and that already includes functions for computing
>> interval overlaps.
>>
>> https://www.postgresql.org/docs/10/static/datatype-datetime.html (see
>> the "interval" type).
>>
>> https://www.postgresql.org/docs/10/static/functions-datetime.html (see
>> the bits that are useful for the "interval" type).
>
> The 'start' and 'stop' values stored in the database are numbers of type
> REAL, representing seconds. There is no need for calculations with, or
> transformations between, elaborate time representation formats.

Except that the interval type in Postgres already provides operators to
test for overlappingness. Which seems to have something to do (maybe)
with deriving the outputs from the inputs.

Sqlite provides no such operators, so you'll have to create them
yourself.

Rich

unread,

Aug 9, 2018, 10:06:10 PM8/9/18

to

Stefan is pointing out that you have not given us any information on
how you want the inputs transformed into the outputs.

But since you continue to fail to explain what the outputs really
represent, we may as well quit trying and let you go back to answering
your homework questions from class yourself.

Adam Jensen

unread,

Aug 9, 2018, 11:02:26 PM8/9/18

to

On Fri, 10 Aug 2018 02:10:34 +0000, Stefan Ram wrote:

> Adam Jensen <han...@riseup.net> writes:

>>Yes, I am looking for a combination of SQL and Tcl that will derive a
>>set of start and stop numbers from the contents of the database and a
>>query. I am surprised the example wasn't clear for most everyone.
>

> While I now can understand this for the special case given, the
> description of the range of possible inputs is still not clear to me.
>
> For example, what should be the output for:
>
> 10.0|30.0|interesting 12.0|32.0|interesting 15.0|20.0|fail
> 16.0|19.0|interesting 17.0|19.0|interesting 17.0|20.0|fail
>
> , or, if this should not be possible as an input, what are the rules
> that specify what /is/ possible as an input?

I thought the simplified example that isolate the basic problem would be
more convenient than an example with a lot of context. I guess that
turned out to be a little frustrating for some folks. Oops.

Anyway, here's a basic process with your example.

10.0|30.0|interesting
12.0|32.0|interesting
15.0|20.0|fail
16.0|19.0|interesting
17.0|19.0|interesting
17.0|20.0|fail

1. Find all of the "interesting" segments.
10.0|30.0|interesting
12.0|32.0|interesting
16.0|19.0|interesting
17.0|19.0|interesting

2. Find any "interesting" segment overlaps and merge.
10.0|32.0|interesting

3. Find all "fail" segments with a period that overlaps the period of any
"interesting" segment.
15.0|20.0|fail
17.0|20.0|fail

4. Find any overlaps in those (3) "fail" segments and merge.
15.0|20.0|fail

5. Derive a new list of "interesting" segments that omits all (4) "fail"
segments.
10.0|15.0|interesting
20.0|32.0|interesting

I had hoped to get some comments about effective ways to do something
like this in SQL (sqlite) maybe augmented with Tcl.

The labels "interesting" and "fail" are almost arbitrary. I chose them
for the example in an attempt to keep it simple.

Not to complicate the original question, but for anyone who likes more
context, what I was preparing to use for the early tests and experiments
is something like this:

------------------------------------------------------------
CREATE TABLE MediaFiles (
file_id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL,
file_name TEXT NOT NULL,
source TEXT);

CREATE TABLE Segments (
segment_id INTEGER PRIMARY KEY AUTOINCREMENT,
file_id INTEGER REFERENCES MediaFiles( file_id )
ON UPDATE CASCADE ON DELETE CASCADE,
begin_time REAL,
end_time REAL);

CREATE TABLE Attributes(
attribute_id INTEGER PRIMARY KEY AUTOINCREMENT,
tag TEXT NOT NULL,
description TEXT);

CREATE TABLE SegmentAttributes(
segment_id INTEGER REFERENCES Segments( segment_id )
ON UPDATE CASCADE ON DELETE CASCADE,
attribute_id INTEGER REFERENCES Attributes( attribute_id )
ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY ( segment_id, attribute_id ));
------------------------------------------------------------

This is very simple. It's just a placeholder for tests while I work on
other parts of the system.

ze

unread,

Aug 9, 2018, 11:40:57 PM8/9/18

to

On Wednesday, August 8, 2018 at 10:04:51 PM UTC-4, Adam Jensen wrote:
> Hi,
> ...

> Any ideas, hints, pointers, speculations, etc. will be very
> appreciated! :)

Interesting question! Here is an answer, at least a partial one, and you should be able to complete it given your familiarity with SQL:

First, create a new table as follows (I renamed your example table to kungfu):

create table kungfu_points2 (tpoint real, description text);
insert into kungfu_points2 (tpoint, description)
select start, description from kungfu
union
select stop, description from kungfu;

Then, this gives you almost the result you want:

select fromx, min(tox) tox, min(description) description from
(select a.tpoint fromx, b.tpoint tox, a.description
from kungfu_points2 a, kungfu_points2 b
where (a.tpoint < b.tpoint)
and exists
(select 1 from kungfu
where a.tpoint >= start and
b.tpoint <= stop)
) temp_table group by fromx

With your sample data, it gives you this result:

"fromx","tox","description"
10.0,15.0,"interesting"
15.0,20.0,"fail"
20.0,30.0,"fail"
50.0,60.0,"interesting"
60.0,65.0,"fail"
65.0,80.0,"fail"

So, the last entry retains the previous sub-segment description, and not the containing segment. It is getting late, and I had my 15 minutes to spare, so....

heinrichmartin

unread,

Aug 10, 2018, 4:18:09 AM8/10/18

to

On Thursday, August 9, 2018 at 5:57:13 PM UTC+2, Adam Jensen wrote:
> On Thu, 09 Aug 2018 00:33:47 -0700, heinrichmartin wrote:
> > what are the rules to compile the result? initial data punches holes in
> > interesting intervals? one fail per interesting or any number?
> > overlapping intervals or disjunct interesting with fully contained
> > fails?
> >
>
> In the configuration that I am currently exploring, the database will
> contain many entries with intervals that could overlap in every possible
> way. The entries could be labeled in many ways other than "interesting"
> and "fail". This little example is a simplified attempt to isolate what I
> see as the core problem, which is probably mostly based in my current
> lack of understanding of SQL/ite.

I'd use SQL to select and sort the correct data, but do the interval math in Tcl. I am not aware of an existing implementation.

> I am getting up to speed on Tcl fairly rapidly (I bought a couple of
> books recently which are very useful - something I should have done a
> long time ago), but I am not yet very familiar with relational database
> design and SQL beyond the very basics. So any pointers to similar SQL
> problems with solutions and explanations would be tremendously useful at
> this point. Any hints at all, really, will be much appreciated.

Finally, let's talk Tcl on c.l.t.

Representing the intervals as list of lists sounds reasonable. Note that lsearch and lsort provide efficient means to use nested elements. Also look at lsearch -bisect and lreplace.

Don't be afraid of multiple copies of your (temporary) lists, Tcl is quite efficient unless you let them shimmer. Make sure you are not creating string representations of those lists.

Finally (ze was faster with this remark!), it might be more efficient to process a list of on-off-times than a list of intervals. You can transform it in SQL to: time|label|onoff

(untested, but you'll get the idea)
select start as time, description, 1 as onoff from kungfu
union
select stop as time, description, -1 as onoff from kungfu
order by time;

Then increment/aggregate the onoffs before any point in time.

Adam Jensen

unread,

Aug 10, 2018, 12:59:30 PM8/10/18

to

I would like to extend the original example and add some explicit
description of a humanistic approach to deriving the desired output. This
might enable a more meaningful understanding of the program requirements.

0. Starting with a dataset like this:

10.0|30.0|interesting
12.0|32.0|interesting
15.0|20.0|fail

16.0|39.0|interesting
17.0|21.0|fail

50.0|80.0|interesting
60.0|65.0|fail

85.0|90.0|fail

1. Find all of the "interesting" segments.
10.0|30.0|interesting
12.0|32.0|interesting

16.0|39.0|interesting
50.0|80.0|interesting

2. Find any "interesting" segments with overlaps and merge them.
10.0|39.0|interesting
50.0|80.0|interesting

3. Find all "fail" segments with a period that overlaps the period of any

(step 2) "interesting" segment.
15.0|20.0|fail
17.0|21.0|fail
60.0|65.0|fail

4. Find any overlaps in those (step 3) "fail" segments and merge.
15.0|21.0|fail
60.0|65.0|fail

5. Derive a new list of "interesting" segments from (step 2) that omits
all "fail" segments from (step 4).
10.0|15.0|interesting
21.0|39.0|interesting

50.0|60.0|interesting
65.0|80.0|interesting

Initial setup to experiment with this example:
-------------------------------------------------

CREATE TABLE Example (start REAL, stop REAL, description TEXT);
INSERT INTO Example VALUES (10, 30, "interesting");

INSERT INTO Example VALUES (12, 32, "interesting");

INSERT INTO Example VALUES (15, 20, "fail");

INSERT INTO Example VALUES (16, 39, "interesting");
INSERT INTO Example VALUES (17, 21, "fail");

INSERT INTO Example VALUES (50, 80, "interesting");
INSERT INTO Example VALUES (60, 65, "fail");

INSERT INTO Example VALUES (85, 90, "fail");
-------------------------------------------------

Adam Jensen

unread,

Aug 10, 2018, 1:04:51 PM8/10/18

to

Hi! Thanks for having a look at this problem. I've been examining your
approach. To make it more convenient for anyone who would like to follow
this thread, I've restated your approach along with the original example
dataset and included some output:

CREATE TABLE kungfu (start REAL, stop REAL, description TEXT);
INSERT INTO kungfu VALUES (10, 30, "interesting");
INSERT INTO kungfu VALUES (15, 20, "fail");
INSERT INTO kungfu VALUES (50, 80, "interesting");
INSERT INTO kungfu VALUES (60, 65, "fail");

SELECT * FROM kungfu;

10.0|30.0|interesting
15.0|20.0|fail
50.0|80.0|interesting
60.0|65.0|fail

ze

unread,

Aug 10, 2018, 2:03:17 PM8/10/18

to

On Friday, August 10, 2018 at 1:04:51 PM UTC-4, Adam Jensen wrote:
> On Thu, 09 Aug 2018 20:40:54 -0700, ze wrote:
>
>
> Hi! Thanks for having a look at this problem. I've been examining your
> approach. To make it more convenient for anyone who would like to follow
> this thread, I've restated your approach along with the original example
> dataset and included some output:
>

Well, with some slight modification of the original and a where clause, you can get the desired output:

create table kungfu_points (tpoint real, description text, pos integer);
insert into kungfu_points (tpoint, description, pos)
select start, description, 1 from kungfu
union
select stop, description, 2 from kungfu;

select fromx, tox, (case when pos = 4 then description2 else description1 end) description
from (
SELECT fromx, MIN(tox) tox,
MIN(description1) description1,
MIN(description2) description2,
MIN(pos) pos
FROM
(SELECT a.tpoint fromx, b.tpoint tox, a.description description1, b.description description2, a.pos + b.pos pos
FROM kungfu_points a, kungfu_points b

WHERE (a.tpoint < b.tpoint) AND EXISTS
(SELECT 1 FROM kungfu
WHERE a.tpoint >= start AND b.tpoint <= stop)
) temp_table GROUP BY fromx

) temp_table2
where (pos = 4 and description2 = 'interesting')
or
(pos <> 4 and description1 = 'interesting')

Adam Jensen

unread,

Aug 10, 2018, 2:04:40 PM8/10/18

to

On Fri, 10 Aug 2018 01:18:06 -0700, heinrichmartin wrote:

> (untested, but you'll get the idea)
> select start as time, description, 1 as onoff from kungfu union select
> stop as time, description, -1 as onoff from kungfu order by time;

select * from kungfu;

10.0|30.0|interesting
15.0|20.0|fail
50.0|80.0|interesting
60.0|65.0|fail

select start as time, description, 1 as onoff from kungfu
union
select stop as time, description, -1 as onoff from kungfu
order by time;

10.0|interesting|1
15.0|fail|1
20.0|fail|-1
30.0|interesting|-1
50.0|interesting|1
60.0|fail|1
65.0|fail|-1
80.0|interesting|-1

Maybe something like this would get closer:
* That which is below is almost certainly NOT valid SQL. (Learning SQL is
on my ToDo list).

select
start as time,
description,

(1 if description="interesting",
-1 if description="fail") as onoff from kungfu

union
select
stop as time,
description,

(-1 if description="interesting",
1 if description="fail") as onoff from kungfu
order by time;

10.0|interesting|1
15.0|fail|-1
20.0|fail|1
30.0|interesting|-1
50.0|interesting|1
60.0|fail|-1
65.0|fail|1
80.0|interesting|-1

This result is (above) looks like it would solve the problem.

Alternatively, it might be written like this (except the UNIONs don't
seem to be valid):

select
start as time,
description,
1 as onoff

from kungfu where description="interesting"
union

select
start as time,
description,

-1 as onoff
from kungfu where description="fail"

select
stop as time,
description,
-1 as onoff

from kungfu where description="interesting"

union
select
stop as time,
description,
1 as onoff

from kungfu where description="fail"
order by time;

Rich

unread,

Aug 10, 2018, 2:16:51 PM8/10/18

to

Adam Jensen <han...@riseup.net> wrote:
> (1 if description="interesting",
> -1 if description="fail") as onoff from kungfu

Look up the SQL "case" statement here. It will do what your psudeo
code wants to have happen.

Adam Jensen

unread,

Aug 10, 2018, 2:53:39 PM8/10/18

to

-- Thanks! This query runs:

select
start as time,
description,

CASE description
WHEN "interesting" THEN 1
WHEN "fail" THEN -1
END

as onoff from kungfu
union
select
stop as time,
description,

CASE description
WHEN "interesting" THEN -1
WHEN "fail" THEN 1
END

as onoff from kungfu
order by time;

-- On a more elaborate dataset:

CREATE TABLE kungfu (start REAL, stop REAL, description TEXT);
INSERT INTO kungfu VALUES (10, 30, "interesting");
INSERT INTO kungfu VALUES (12, 32, "interesting");
INSERT INTO kungfu VALUES (15, 20, "fail");
INSERT INTO kungfu VALUES (16, 39, "interesting");
INSERT INTO kungfu VALUES (17, 21, "fail");
INSERT INTO kungfu VALUES (50, 80, "interesting");
INSERT INTO kungfu VALUES (60, 65, "fail");
INSERT INTO kungfu VALUES (85, 90, "fail");

SELECT * FROM kungfu;

10.0|30.0|interesting
12.0|32.0|interesting
15.0|20.0|fail
16.0|39.0|interesting

17.0|21.0|fail

50.0|80.0|interesting
60.0|65.0|fail

85.0|90.0|fail

-- The new query gives this result:
10.0|interesting|1
12.0|interesting|1
15.0|fail|-1
16.0|interesting|1
17.0|fail|-1
20.0|fail|1
21.0|fail|1
30.0|interesting|-1
32.0|interesting|-1
39.0|interesting|-1

50.0|interesting|1
60.0|fail|-1
65.0|fail|1
80.0|interesting|-1

85.0|fail|-1
90.0|fail|1

-- Except for the redundancies, it's good.

heinrichmartin

unread,

Aug 10, 2018, 3:15:30 PM8/10/18

to

Great you found sth that works.
Note that it might become tricky in general cases (e.g. multiple overlapping fail intervals). I'd use symmetric patterns for all intervals: increment on start and decrement on end; then at any time you'd have interesting iff sum(interesting) > 0 and sum(fail) <= 0.

Btw data sanity expected, i.e. start <= stop for all intervals.

Adam Jensen

unread,

Aug 10, 2018, 4:27:12 PM8/10/18

to

On Fri, 10 Aug 2018 12:15:27 -0700, heinrichmartin wrote:

> Great you found sth that works.

The ball is rolling, yes, in the sense that the problem looks to be
solvable possibly entirely in SQL. That's significant.

https://idioms.thefreedictionary.com/got+the+ball+rolling

> Note that it might become tricky in general cases (e.g. multiple
> overlapping fail intervals). I'd use symmetric patterns for all
> intervals: increment on start and decrement on end; then at any time
> you'd have interesting iff sum(interesting) > 0 and sum(fail) <= 0.

If you could put together an example to illustrate what you have in mind,
I would be interested in having a very close look at it.

> Btw data sanity expected, i.e. start <= stop for all intervals.

Even a very simple database for the early experiments and development of
the larger system will need to be quite a bit more elaborate than the
single table that has been used in this thread to illustrate the core
problem. Eventually, the working system will be constrained, bolstered
and decorated with the appropriate checks and error handling to maintain
data integrity and system stability. At the moment, there still isn't a
basic method that solves the fundamental problem for the simple single
table example. But I think you are right in that those concerns will
eventually come into scope.

ze

unread,

Aug 10, 2018, 9:24:31 PM8/10/18

to

On Friday, August 10, 2018 at 4:27:12 PM UTC-4, Adam Jensen wrote:
>
> The ball is rolling, yes, in the sense that the problem looks to be
> solvable possibly entirely in SQL. That's significant.
>

In the tradition of your earlier responses to Rich, in regards to the Internet Dick, here's what this comment deserves :-)

https://i0.kym-cdn.com/photos/images/original/000/001/582/picard-facepalm.jpg?1240934151

Adam Jensen

unread,

Aug 11, 2018, 7:11:22 AM8/11/18

to

I haven't yet invested sufficient time in earnest research and study of
database design and SQL programming to estimate the costs or even
possibilities of some things. You folks provided insight into this
problem that I lacked. My concerns that there might be an architectural
problem or performance bottleneck in my project, which is in the very
early stages of development, are alleviated. Many thanks!

Adam Jensen

unread,

Aug 11, 2018, 7:34:16 AM8/11/18

to

On Fri, 10 Aug 2018 17:54:45 +0000, Stefan Ram wrote:

> I'm just an amateur, so I used Python.

I've read most of this book this week:

The Tcl Programming Language: A Comprehensive Guide

http://www.magicsplat.com/ttpl/
https://wiki.tcl.tk/48868

It is surprisingly well written for a programming book and it explains so
much. This book has answered so many outstanding questions that I've had.
My experience is that this is the Tcl documentation that was missing.

Adam Jensen

unread,

Aug 11, 2018, 8:38:31 AM8/11/18

to

Hmm, I don't know enough about SQL to understand what's happening here
but I can test it. I set up PostgreSQL-10.4 on FreeBSD-11.2 yesterday and
for fun, I'll run this program on that setup.

-- -----------------------------------------------------------

CREATE TABLE kungfu (start REAL, stop REAL, description TEXT);

INSERT INTO kungfu VALUES

(10, 30, 'interesting'),
(12, 32, 'interesting'),
(15, 20, 'fail'),
(16, 39, 'interesting'),
(17, 21, 'fail'),
(50, 80, 'interesting'),
(60, 65, 'fail'),
(85, 90, 'fail');

CREATE TABLE kungfu_points (tpoint REAL, description TEXT, pos INTEGER);

INSERT INTO kungfu_points (tpoint, description, pos)
SELECT start, description, 1 FROM kungfu
UNION
SELECT stop, description, 2 FROM kungfu;

SELECT fromx,
tox,
( CASE
WHEN pos = 4 THEN description2
ELSE description1
END ) description
FROM (SELECT fromx,
Min(tox) tox,
Min(description1) description1,
Min(description2) description2,
Min(pos) pos

FROM (SELECT a.tpoint fromx,
b.tpoint tox,
a.description description1,
b.description description2,
a.pos + b.pos pos
FROM kungfu_points a,
kungfu_points b
WHERE ( a.tpoint < b.tpoint )
AND EXISTS (SELECT 1
FROM kungfu
WHERE a.tpoint >= start
AND b.tpoint <= stop))
temp_table
GROUP BY fromx) temp_table2

WHERE ( pos = 4
AND description2 = 'interesting' )
OR ( pos <> 4
AND description1 = 'interesting' );
-- -----------------------------------------------------------
fromx | tox | description
-------+-----+-------------
10 | 12 | interesting
12 | 15 | interesting
16 | 17 | interesting
21 | 30 | interesting
30 | 32 | interesting
32 | 39 | interesting
50 | 60 | interesting
65 | 80 | interesting

The manually (or cognitively?) generated output dataset is:

start | stop | description
-------+------+-------------
10 | 15 | interesting
21 | 39 | interesting
50 | 60 | interesting
65 | 80 | interesting

Your approach is pretty close but the time segment:
16 | 17 | interesting
should be "fail".

SELECT * FROM kungfu;
start | stop | description
-------+------+-------------
10 | 30 | interesting
12 | 32 | interesting
15 | 20 | fail
16 | 39 | interesting
17 | 21 | fail
50 | 80 | interesting
60 | 65 | fail
85 | 90 | fail

Your approach looks to be very close and I am keeping all of this in my
project notes. Thanks!