Spatial join query and ExecutionException

12 views
Skip to first unread message

Thomas Leduc

unread,
Mar 26, 2008, 5:56:39 AM3/26/08
to JEQL Users
hi,
Congratulation Martin for this interesting job.

I've tried to use JEQL to compute an intersection between 2 shapefiles
(1235 rows * 1489 rows) using this script :

basedir = "/home/leduc/data/datas2tests/shp/mediumshape2D/";
ta = null;
ShapefileReader ta file: basedir + "landcover2000.shp";
tb = null;
ShapefileReader tb file: basedir + "bzh5_communes.shp";
tt = select Geom.intersection(ta.GEOMETRY,tb.GEOMETRY) from ta
join tb on Geom.intersects(ta.GEOMETRY,tb.GEOMETRY);
t = select count(*) from tt;
Print t;

It has crashed with following exception :

Jeql 0.4-alpha (c) 2007-2008 Martin Davis
/home/leduc/data/datas2tests/shp/mediumshape2D/bzh5_communes.dbf (Too
many open files)
jeql.engine.ExecutionException: /home/leduc/data/datas2tests/shp/
mediumshape2D/bzh5_communes.dbf (Too many open files)
at
jeql.io.shapefile.ShapefileRowList.iterator(ShapefileRowList.java:29)
at jeql.engine.query.SimpleJoinRowList
$SimpleJoinRowIterator.next(SimpleJoinRowList.java:98)
at jeql.engine.query.FilterRowList
$FilterRowIterator.next(FilterRowList.java:64)
at jeql.engine.query.SelectedItemsRowList
$SelectedItemsRowIterator.next(SelectedItemsRowList.java:73)
at
jeql.engine.query.GroupEvaluator.evalAggregatedBaseRows(GroupEvaluator.java:
84)
at jeql.engine.query.GroupEvaluator.eval(GroupEvaluator.java:
72)
at
jeql.engine.query.QueryEvaluator.computeGroupResult(QueryEvaluator.java:
127)
at jeql.engine.query.QueryEvaluator.eval(QueryEvaluator.java:
47)
at jeql.syntax.SelectNode.eval(SelectNode.java:99)
at jeql.syntax.AssignmentNode.eval(AssignmentNode.java:31)
at jeql.syntax.StatementListNode.eval(StatementListNode.java:
41)
at jeql.engine.JeqlEngine.evalProgram(JeqlEngine.java:52)
at jeql.JeqlRunner.exec(JeqlRunner.java:87)
at jeql.JeqlRunner.run(JeqlRunner.java:66)
at jeql.JeqlRunner.main(JeqlRunner.java:16)
Run completed in 12.734 s

then I've tried to limitate the size of the join query this way
(inserting 2 intermediate small tables) :
basedir = "/home/leduc/data/datas2tests/shp/mediumshape2D/";
tta = null;
ShapefileReader tta file: basedir + "landcover2000.shp";
ta = select * from tta limit 10;
ttb = null;
ShapefileReader ttb file: basedir + "bzh5_communes.shp";
tb = select * from ttb limit 10;
tt = select Geom.intersection(ta.GEOMETRY,tb.GEOMETRY) from ta
join tb on Geom.intersects(ta.GEOMETRY,tb.GEOMETRY);
t = select count(*) from tt;
Print t;

it crashed also with a NullPointerException :
Jeql 0.4-alpha (c) 2007-2008 Martin Davis
NullPointerException :
jeql.engine.query.SingleGroupRowList.loadRows(GroupEvaluator.java:235)
Run completed in 75 ms

I'm probably wrong somewhere but don't know where ? Thanks for your
help.

PS :
java version "1.6.0_04"
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode)

and the Bourne shell script I've started (with a 256 MB heap space) :
#! /bin/sh
MEM=256M;
LIB=../lib;
java -Xms${MEM} -Xmx${MEM} -cp ${LIB}/jeql.jar:${LIB}/jts-1.9-
alpha4.jar:${LIB}/jump-core-1.2.jar:${LIB}/junit.jar jeql.JeqlRunner $
{@};

Martin Davis

unread,
Mar 26, 2008, 10:52:01 AM3/26/08
to jeql-...@googlegroups.com
Thanks, Thomas.

Hmmm...  Ok, bear with me since JEQL is a work-in-progress.  The problem you're seeing is due to the fact that by default JEQL tries not to load everything into memory, but just stream it in as needed.  However, it's clear that this doesn't work well when you do that on both sides of a join!  I'm going to have to think about how to handle this "behind the scenes".

In the meantime, you can force the shapefile tables to be loaded into memory by sorting them, like this:


   ta = null;
   ShapefileReader ta file: basedir + "landcover2000.shp";
   taSort = select * from ta sort by xxx;

(where xxx is some field in your table - it doesn't really matter which one, although a numeric one would be best)

Disclaimer: I haven't actually tried this, but it should work.  Let me know if it helps.  I'll try it too when I get time.

I'm planning to add a Memorize t command which would cause table t to be "materialized" in memory directly.  Also, I'll probably have to either force one side of a join into memory, or require at least one of the tables to be in memory.

The idea behind the streaming is to let you scan a very large file and do some processing on it. But obviously this doesn't generalize to joins very well (at least, not without basically building an entire database system - which isn't really my intent).

HTH - Martin

Thomas Leduc

unread,
Mar 26, 2008, 3:47:53 PM3/26/08
to JEQL Users
On Mar 26, 3:52 pm, "Martin Davis" <mtncl...@gmail.com> wrote:
> In the meantime, you can force the shapefile tables to be loaded into memory
> by sorting them, like this:
>
> ta = null;
> ShapefileReader ta file: basedir + "landcover2000.shp";
> taSort = select * from ta sort by xxx;
[...]
> I'm planning to add a Memorize t command which would cause table t to be
> "materialized" in memory directly. Also, I'll probably have to either force

Ok Martin,
Thanks for your answer. Indeed, I think that the "Memorize" option is
really meaningful.
I've tried your workaround (replacing sort by order - gid is a serial
field) :

basedir = "/home/leduc/data/datas2tests/shp/mediumshape2D/";
tta = null;
ShapefileReader tta file: basedir + "landcover2000.shp";
ta = select * from tta order by gid;
tb = null;
ShapefileReader tb file: basedir + "bzh5_communes.shp";
tt = select Geom.intersection(ta.GEOMETRY,tb.GEOMETRY) from ta
join tb on Geom.intersects(ta.GEOMETRY,tb.GEOMETRY);
t = select count(*) from tt;
Print t;

but problem still remains :

$ ./jeql.sh testJointureShp-1.jql
Run completed in 11.975 s

Martin Davis

unread,
Mar 26, 2008, 3:53:02 PM3/26/08
to jeql-...@googlegroups.com
Thomas,

I'm puzzled by the run output that you give.  Do you mean that Jeql still throws an error, or just that it takes a long time?

Thomas Leduc

unread,
Mar 26, 2008, 4:14:40 PM3/26/08
to JEQL Users
On 26 mar, 20:53, "Martin Davis" <mtncl...@gmail.com> wrote:
> Thomas,
>
> I'm puzzled by the run output that you give. Do you mean that Jeql still
> throws an error, or just that it takes a long time?

Reading your 1st mail, I thought that storing one of the 2 "tables" in
memory (using "order by") will be enough for the join query to
succeed. But, it seems not to be the case. The 2nd one must also be
stored. Following script does not crash any more :

basedir = "/home/leduc/data/datas2tests/shp/mediumshape2D/";
tta = null;
ShapefileReader tta file: basedir + "landcover2000.shp";
ta = select * from tta order by gid;
ttb = null;
ShapefileReader ttb file: basedir + "bzh5_communes.shp";
tb = select * from ttb order by OBJECTID;
tt = select Geom.intersection(ta.GEOMETRY,tb.GEOMETRY) from ta join tb
on Geom.intersects(ta.GEOMETRY,tb.GEOMETRY);
t = select count(*) from tt;
Print t;

-> Run completed in 2812 ms !
Ok, nice.
Many thanks.

PS : "Memorize" option is an interesting one, so as the ability to
garbage collect corresponding object on demand... I mean flush the
corresponding table on disk and not "Memorize" it any more to free
memory.

Martin Davis

unread,
Mar 26, 2008, 4:27:21 PM3/26/08
to jeql-...@googlegroups.com
Great, good that it worked.  2.8 secs seems pretty good!

Actually I would have thought that just "memorizing" the join table (the second one) would have fixed the problem.  So maybe I should do some investigation here and see what's going on.

Good point about being able to flush a table.  It's a bit tricky, because you can easily create many aliases for a single memory table.  Java will GC it if it's not being used any more - so if you set all references to null it should disappear eventually.  But it might be nice to be able to force this.

I'll definitely work on "Memorize" soon.  Also, I think I might make the default to memorize tables when they are read in, and provide an optional stream: parameter to Readers to allow indicating that the table should be streamed in.  This would allow handling the big case when needed, but would avoid surprises like the one you ran into.
Reply all
Reply to author
Forward
0 new messages