Problem with ST_SymDifference function

31 views
Skip to first unread message

artkraw

unread,
May 17, 2018, 6:26:58 PM5/17/18
to SpatiaLite Users
Dear all, 

I have a problem with the ST_SymDifference function. If I have two tables (tab_1, tab_xp3), each of which contains one polygon, and then perform a symmetrical difference, then I get the correct very good result.
In the case when I add the second polygon (or more polygons) to the cutting table tab_xp3 I get the wrong result ... Why is this happening? What should I do to make this function work properly?

See attached picture 

Code: 

Drop Table IF EXISTS tab_1; 
Create Table tab_1 (Id Integer Primary Key, Name Text);
SELECT AddGeometryColumn('tab_1', 'geometria', 2180, 'POLYGON', 'XY', 0);
Insert into tab_1 (Id, Name, geometria) Values (1, 'bbox', GeomFromText('POLYGON((500000.0000 320000.0000, 620000.0000 320000.0000, 620000.0000 200000.0000, 500000.0000 200000.0000, 500000.0000 320000.0000))', 2180));


ONE POLYGON 
Drop Table IF EXISTS tab_xp3; 
Create Table tab_xp3 (Id Integer Primary Key, Name Text);
SELECT AddGeometryColumn('tab_xp3', 'GEOMETRY', 2180, 'POLYGON', 'XY', 0);
Insert into tab_xp3 (Id, Name, GEOMETRY) Values (3, 'xp3', GeomFromText('POLYGON((495000.0000 325000.0000, 495000.0000 321000.0000, 621000.0000 195000.0000, 625000.0000 195000.0000, 625000.0000 199000.0000, 499000.0000 325000.0000, 495000.0000 325000.0000))', 2180));


TWO POLYGONS 
Drop Table IF EXISTS tab_xp3; 
Create Table tab_xp3 (Id Integer Primary Key, Name Text);
SELECT AddGeometryColumn('tab_xp3', 'GEOMETRY', 2180, 'POLYGON', 'XY', 0);
Insert into tab_xp3 (Id, Name, GEOMETRY) Values (3, 'xp3', GeomFromText('POLYGON((495000.0000 325000.0000, 495000.0000 321000.0000, 621000.0000 195000.0000, 625000.0000 195000.0000, 625000.0000 199000.0000, 499000.0000 325000.0000, 495000.0000 325000.0000))', 2180));
Insert into tab_xp3 (Id, Name, GEOMETRY) Values (1, 'p1x', GeomFromText('POLYGON((495000.0000 260000.0000, 560000.0000 195000.0000, 530000.0000 195000.0000, 495000.0000 230000.0000, 495000.0000 260000.0000))', 2180));

CREATE TABLE sdif_tab_x AS 
SELECT tab_1.Name, SymDifference(tab_1.geometria, tab_xp3.GEOMETRY) 
FROM tab_1, tab_xp3;

Best regards, 
Artur K.
Przechwytywanie.PNG

a.fu...@lqt.it

unread,
May 17, 2018, 7:52:17 PM5/17/18
to spatiali...@googlegroups.com
On Thu, 17 May 2018 15:26:58 -0700 (PDT), artkraw wrote:
> Dear all, 
>
> I have a problem with the ST_SymDifference function. If I have two
> tables (tab_1, tab_xp3), each of which contains one polygon, and then
> perform a symmetrical difference, then I get the correct very good
> result.
> In the case when I add the second polygon (or more polygons) to the
> cutting table tab_xp3 I get the wrong result ... Why is this
> happening? What should I do to make this function work properly?
>

Hi Artur,

the results of your SQL queries are exactly what they are expected
to be, there is nothing wrong in ST_SymDifference().
I suppose that you are just misunderstanding the very basic
principles about SQL queries.

1. the ST_SymDifference() function (exactly as many other Spatial SQL
functions) will evaluate a couple of Geometries at each time.
2. a SQL query joining two (or more) tables, when no WHERE clause is
specified, will compute the cartesian product of both tables.
3. so, in your "two polygs" case, the cartesian product (1X2) will
insert into the output table _TWO_ geometries:
- the first one corresponding to the SymDiff of the unique
polygon in "tab_1" and the first polygon in "tab_xp3"
- the second one corresponding to the SymDiff of the same
polygon in "tab_1" and the second polygon in "tab_xp3"

I easily imagine that your real intention was computing just
once the SymDiff between the single polygon in "tab_1" and
_ALL_ the polygons in "tab_xp3".

you can easily do such a thing in Spatial SQL: you are just
required to invoke ST_Collect() in order to aggregate all
polygons from "tab_xp3" before computing the SymDiff:

CREATE TABLE sdif_tab_x AS
SELECT tab_1.Name, SymDifference(tab_1.geometria,
ST_Collect(tab_xp3.GEOMETRY))
FROM tab_1, tab_xp3;

the result of this modified query is shown in the
attached figure (and I imagine that is what you
were axpecting to get).

bye Sandro
SymDiff.png

Artur Krawczyk

unread,
May 18, 2018, 4:58:31 AM5/18/18
to spatiali...@googlegroups.com
Thank you very much Sandro !!!! 

Thanks for the lesson " mDifference() function (exactly as many other Spatial SQL

   functions)  will evaluate a couple of Geometries at each time. "
I don't know about this ... 
From your answer is the next question for me.
If in table tab_1 I will have more polygons than one, then these polygons will also have to be included in the collect function?
SymDifference(ST_Collect(tab_1.geometria),

Something like:
 
CREATE TABLE sdif_tab_x AS
SELECT tab_1.Name, SymDifference(ST_Collect(tab_1.geometria),
       ST_Collect(tab_xp3.GEOMETRY))
FROM tab_1, tab_xp3;

Thank you very very much,
Artur K. 

a.fu...@lqt.it

unread,
May 18, 2018, 6:00:46 AM5/18/18
to spatiali...@googlegroups.com
On Fri, 18 May 2018 10:58:28 +0200, Artur Krawczyk wrote:
> From your answer is the next question for me.
> If in table tab_1 I will have more polygons than one, then these
> polygons will also have to be included in the collect function?
> SymDifference(ST_Collect(tab_1.geometria),
>
> Something like:
>  
> CREATE TABLE sdif_tab_x AS
> SELECT tab_1.Name, SymDifference(ST_Collect(tab_1.geometria),
>        ST_Collect(tab_xp3.GEOMETRY))
> FROM tab_1, tab_xp3;
>

Hi Artur,

you are probably accustomed to desktop-GIS tools, where usually
a spatial operation applies to two different layers as a whole.

but Spatial SQL works in a completely different way; a couple
of individual geometries will be fetched during each execution
step (one from the first table, the other from the second table),
and all the resulting geometries will be then inserted into the
resultset.

so, if table_1 contains 1,000 features and table_2 contains 1,000
features as well, the resultset will contain all possible combinations
(the so called "cartesian product") between individual rows from
table_1 and table_2, and the resultset will consequently contain
1 million rows (1,000 X 1,000).

the most usual way to opportunely restrict the resultset is by
defining some appropriate filter in a WHERE clause; just as a
very basic example:

SELECT *
FROM table_1, table_2
WHERE table_1.id = table_2.id;

assuming that both tables contain two different versions
of the same dataset, the above WHERE clause will then
return just 1,000 rows, and each row from the resulset
will correspond to a matching pair of features.

if, for any good reason, you really want to process
_ALL_ geometries from both tables in a single step
you just have to aggregate them by calling ST_Collect()
on behalf of both table_1 and table_2.

Note: this is not the most usual way to take profit from
Spatial SQL processing capabilities, and it will probably
be highly inefficient when processing huge dataset containing
many and many rows. However it will nicely work (at least,
assuming that your machine has enough free memory to
complete the preliminary aggregation process).

bye sandro

artkraw

unread,
May 18, 2018, 9:20:30 AM5/18/18
to SpatiaLite Users
Hi Sandro,  

you are probably accustomed to desktop-GIS tools, where usually
a spatial operation applies to two different layers as a whole.

Yes, you hit the nail on the head, exactly. !!! 
Sometimes in the GIS you have to do a geometry-based operation without using attributes, just geometry is important. 


tab_xp3 with two polygons and table tab_4 with four polygons 

Drop Table IF EXISTS tab_4; 
Create Table tab_4 (Id Integer Primary Key, Name Text);
SELECT AddGeometryColumn('tab_4', 'GEOMETRY', 2180, 'POLYGON', 'XY', 0);
Insert into tab_4 (Id, Name, GEOMETRY ) Values (1, '1', GeomFromText('POLYGON((500000.0000 320000.0000, 560000.0000 320000.0000, 560000.0000 260000.0000, 500000.0000 260000.0000, 500000.0000 320000.0000))', 2180));
Insert into tab_4 (Id, Name, GEOMETRY ) Values (2, '2', GeomFromText('POLYGON((560000.0000 320000.0000, 620000.0000 320000.0000, 620000.0000 260000.0000, 560000.0000 260000.0000, 560000.0000 320000.0000))', 2180));
Insert into tab_4 (Id, Name, GEOMETRY ) Values (3, '3', GeomFromText('POLYGON((500000.0000 260000.0000, 560000.0000 260000.0000, 560000.0000 200000.0000, 500000.0000 200000.0000, 500000.0000 260000.0000))', 2180));
Insert into tab_4 (Id, Name, GEOMETRY ) Values (4, '4', GeomFromText('POLYGON((560000.0000 260000.0000, 620000.0000 260000.0000, 620000.0000 200000.0000, 560000.0000 200000.0000, 560000.0000 260000.0000))', 2180));

CREATE TABLE sdif_tab_4x AS
SELECT tab_4.Name, ST_SymDifference(ST_Collect(tab_4.GEOMETRY ),
       ST_Collect(tab_xp3.GEOMETRY))
FROM tab_4, tab_xp3;

Returning no geometry  (one row in resultable)

Summarizing

1. ST_SymDifference  polygon - collect(polygons) -> works fine 
2. ST_SymDifference  collect(polygons) -  collect(polygons)  ->   doesn't  work 
3. ST_SymDifference  collect(polygons) -   polygon ->  doesn't work

I can imagine why point 2. doesn't work (no junction table?), but why doesn't work 3. But any way great that 1. works .. :-). 

Thanks very much for help.

Best regards, 
Artur K,       

a.fu...@lqt.it

unread,
May 18, 2018, 10:08:32 AM5/18/18
to spatiali...@googlegroups.com
On Fri, 18 May 2018 06:20:29 -0700 (PDT), artkraw wrote:
> I can imagine why point 2. doesn't work (no junction table?), but why
> doesn't work 3.
>

Artur,

the golden rule to debug any complex SQL query is by
breaking it in smaller/simpler parts, then carefully
checking each single part.

both your cases 2) and 3) are strictly similar,
because both depends on collecting all geometries
from "tab_4".
so the most obvious thing to be done is carefully
checking this simpler query:

SELECT ST_Collect(tab_4.GEOMETRY)
FROM tab_4;

as you can easily check the result of this aggregation
is a MULTIPOLYGON, as shown in the attached figure.
unhappily, as this second diagnostic query confirms, a
MULTIPOLYGON like this one is _INVALID_:

SELECT ST_IsValid(ST_Collect(tab_4.GEOMETRY) )
FROM tab_4;
--------------------
0

and an invalid geometry is hardly expected to produce
valid results.
the reason explaining why this MultiPolygon is invalid
are rather obvious; all individual Polygons in a
MultiPolygon are always expected to be separated, and
must never share any edge.
this MultiPolygon badly violates the rule.

under such a condition we need a different aggregate
function in order to produce a valid result, that is
ST_Union(); notice the difference:

- ST_Collect will blindly aggregate all input
geometries returning a MultiPolygon (that
could possibly be invalid).
- ST_Union will do the same but in a more intelligent
(and slower) way; whenever possible it will attempt
to merge together smaller polygons into bigger
polygons; so in this case it will return just a single
"big square".

now we are finally ready to rewrite our complex SQL
query:

CREATE TABLE sdif_tab_4x AS
SELECT tab_4.Name, ST_SymDifference(ST_Union(tab_4.GEOMETRY),
ST_Collect(tab_xp3.GEOMETRY))
FROM tab_4, tab_xp3;

and this time it nicely works as expected ;-)

bye Sandro

collect_tab_4.png

artkraw

unread,
May 18, 2018, 5:19:54 PM5/18/18
to SpatiaLite Users
Dear  Sandro,

Thank you for the pointing out for ST_IsValid(ST_Collect(tab_4.GEOMETRY) 
From this time I can finish my query series... great thanks a lot 
Thanks a lot ... ! 

BTW ...  I will try to save our conversation and convert it into web page ... 

--
Best regards,
Artur K, 


Przechwytywanie.PNG
Reply all
Reply to author
Forward
0 new messages