[postgis-users] creating topology data is slow

95 views
Skip to first unread message

Marc-André Goderre

unread,
Jun 2, 2015, 1:25:20 PM6/2/15
to postgi...@lists.osgeo.org

Hello all,

I'm creating topology data from almost all world road OSM data ( More than 77 Million ways).

Processing the data  with postgis topology in one shot is so long than that I created a hexagonal grid that let me create topological data from only one or many cell of the grid.

Even this way, It' s going to take many month to process the world's data.

Is there a way to make it faster?

This is my function that I use to process data.

 

CREATE OR REPLACE FUNCTION cm_update_tile_data(tile_id integer)

  RETURNS character varying AS

$BODY$

DECLARE r RECORD;

 

BEGIN

  FOR r IN SELECT * FROM way, join world_grid  on line&& the_geom where gid=tile_id and topo_geom is NULL LOOP

    BEGIN

      UPDATE way

SET topo_geom = topology.toTopoGeom(st_force2d(line), 'way_topo', 1)

      WHERE id = r.id;

    EXCEPTION

      WHEN OTHERS THEN

        RAISE WARNING 'Loading of record % failed: %', r.id, SQLERRM;

    END;

  END LOOP;

  UPDATE world_grid set last_update = now() where gid=tile_id;

  return 'OK';

END

$BODY$

  LANGUAGE plpgsql VOLATILE

  COST 100;

 

Thanks

 

Marc

 

Rémi Cura

unread,
Jun 2, 2015, 3:33:31 PM6/2/15
to PostGIS Users Discussion
I'm working on a batch way to import topology,
it is not ready yet.

If you now for sure your data is correct, you can skip a lot of testing and be efficient in a batch way.

You could use grass and the topological export.

Cheers,
Rémi-C

_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Lars Aksel Opsahl

unread,
Jun 3, 2015, 2:02:10 AM6/3/15
to PostGIS Users Discussion
We are working quite big datasets (about 1 billion points and 20 million lines) that we convert from simple feature to topology. I ran a test yesterday and we added these 20 million lines in about 9 hours with attached attributes. This means heavy parallel pressing and grid usage on a fast database, but we use the Postgis topology package mostly as it is (picked from trunk about a moth ago) .

We will give a talk about how we do this at foss4g 2015 in Como in July.

Lars

________________________________
Fra: postgis-us...@lists.osgeo.org [postgis-us...@lists.osgeo.org] på vegne av Rémi Cura [remi...@gmail.com]
Sendt: 2. juni 2015 21:33
Til: PostGIS Users Discussion
Emne: Re: [postgis-users] creating topology data is slow

I'm working on a batch way to import topology,
it is not ready yet.

If you now for sure your data is correct, you can skip a lot of testing and be efficient in a batch way.

You could use grass and the topological export.

Cheers,
Rémi-C

2015-06-02 19:25 GMT+02:00 Marc-André Goderre <mago...@cgq.qc.ca<mailto:mago...@cgq.qc.ca>>:
Hello all,
I'm creating topology data from almost all world road OSM data ( More than 77 Million ways).
Processing the data with postgis topology in one shot is so long than that I created a hexagonal grid that let me create topological data from only one or many cell of the grid.
Even this way, It' s going to take many month to process the world's data.
Is there a way to make it faster?
This is my function that I use to process data.

CREATE OR REPLACE FUNCTION cm_update_tile_data(tile_id integer)
RETURNS character varying AS
$BODY$
DECLARE r RECORD;

BEGIN
FOR r IN SELECT * FROM way, join world_grid on line&& the_geom where gid=tile_id and topo_geom is NULL LOOP
BEGIN
UPDATE way
SET topo_geom = topology.toTopoGeom(st_force2d(line), 'way_topo', 1)
WHERE id = r.id<http://r.id>;
EXCEPTION
WHEN OTHERS THEN
RAISE WARNING 'Loading of record % failed: %', r.id<http://r.id>, SQLERRM;
END;
END LOOP;
UPDATE world_grid set last_update = now() where gid=tile_id;
return 'OK';
END
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

Thanks

Marc


_______________________________________________
postgis-users mailing list
postgi...@lists.osgeo.org<mailto:postgi...@lists.osgeo.org>
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Sandro Santilli

unread,
Jun 3, 2015, 4:49:14 AM6/3/15
to PostGIS Users Discussion
On Wed, Jun 03, 2015 at 05:56:24AM +0000, Lars Aksel Opsahl wrote:
> We are working quite big datasets (about 1 billion points and 20 million lines) that we convert from simple feature to topology. I ran a test yesterday and we added these 20 million lines in about 9 hours with attached attributes. This means heavy parallel pressing and grid usage on a fast database, but we use the Postgis topology package mostly as it is (picked from trunk about a moth ago) .
>
> We will give a talk about how we do this at foss4g 2015 in Como in July.

I hope a video of your talk and a copy of the slides will be made available
as I'd love to see them but will probably not come to Como.

Last time I've heard about parallel processing during topology building
it resulted in a corrupted topology.
How long does it take to ST_ValidateTopology on it ?

--strk;

Lars Aksel Opsahl

unread,
Jun 3, 2015, 5:55:53 AM6/3/15
to Sandro Santilli, PostGIS Users Discussion

Yes that was the first problem we we got and we got dead locks so we where not able to finish the job at all.

If we get a topology exception we roll back the all the transactions to avoid the problem and having invalid topology data stored in the database.

The way avoid this problem was to only work on lines that was inside each cell and gradually extend/merge the size of the cells. That way the jobs did not cause any problems for each other.

I have only run the “select * from ValidateTopology('topo_ar5_sysdata');” for parts of the data earlier and will start it for the whole layer later this week because I right now I am testing some other stuff. Thanks for reminding me about running the ValidateTopology again. I will get back to you with the results.

Lars

________________________________________
Fra: Sandro Santilli [sandro....@gmail.com] p&#229; vegne av Sandro Santilli [st...@keybit.net]
Sendt: 3. juni 2015 10:49
Til: PostGIS Users Discussion
Kopi: Lars Aksel Opsahl
Emne: Re: [postgis-users] creating topology data is slow

Marc-André Goderre

unread,
Jun 4, 2015, 11:22:15 AM6/4/15
to PostGIS Users Discussion
Does the processing became slower when the edge_data table became bigger?

Marc

-----Message d'origine-----
De : postgis-us...@lists.osgeo.org [mailto:postgis-us...@lists.osgeo.org] De la part de Lars Aksel Opsahl
Envoyé : 3 juin 2015 01:56
À : PostGIS Users Discussion
Objet : Re: [postgis-users] creating topology data is slow

Lars Aksel Opsahl

unread,
Jun 5, 2015, 12:57:03 AM6/5/15
to PostGIS Users Discussion

Yes there is slow down but I am not ready with the numbers yet and we have not focused to much on that now, because to go from simple feature to topology is a one time job in our case.

Whats more important is the update time when we apply changes to a exiting layer after it's created and that also involves adding new lines so we have to look more into this later, but an update in our case also involves line attribute handling and surfaces and surface attributes.

I will have more numbers ready in July.

Lars

________________________________________
Fra: postgis-us...@lists.osgeo.org [postgis-us...@lists.osgeo.org] p&#229; vegne av Marc-André Goderre [mago...@cgq.qc.ca]
Sendt: 4. juni 2015 17:21

Lars Aksel Opsahl

unread,
Jun 5, 2015, 3:32:59 AM6/5/15
to PostGIS Users Discussion
Then it's finally done validating and it took about 40 hours.

[local] postgres@sl=# select * from ValidateTopology('topo_ar5_sysdata');

error | id1 | id2
-------+-----+-----
(0 rows)

Time: 145477755.129 ms

Lars

________________________________________
Fra: Sandro Santilli [sandro....@gmail.com] p&#229; vegne av Sandro Santilli [st...@keybit.net]
Sendt: 3. juni 2015 10:49
Til: PostGIS Users Discussion
Kopi: Lars Aksel Opsahl
Emne: Re: [postgis-users] creating topology data is slow

Lars Aksel Opsahl

unread,
Jul 18, 2015, 5:24:02 PM7/18/15
to PostGIS Users Discussion
Hi

Yes it becomes slower.

When I run 1 thread we start with out about 150 rows pr second,
but after 18 mill. edges we are down to about half the performance

With 15 threads we start out with adding 1600 edges pr. second and here also we end with about the half the
performance after 18 mill edges.

When adding edges that is covers by many cells the performance goes down a lot but I have not had any time to look at this because this is not a big issue since the total time is down to 16 hours.

More info at http://www.slideshare.net/laopsahl/foss4-g-topologyjuly162015

Lars

________________________________________
Fra: postgis-us...@lists.osgeo.org [postgis-us...@lists.osgeo.org] p&#229; vegne av Marc-André Goderre [mago...@cgq.qc.ca]
Sendt: 4. juni 2015 17:21
Reply all
Reply to author
Forward
0 new messages