Transit data - 2.2 million trackpoints

3 views
Skip to first unread message

Gavin Treadgold

unread,
Jun 23, 2008, 5:23:06 AM6/23/08
to nzopengis
(sending again, I always pick the wrong email account...)

Hi all,

Just a brief email to let you know the outcome of an interesting email
exchange that started last Monday. I'll write more about this once
I've got the data uploaded, but the short story is that I have
obtained from Transit 2.2 million trackpoints from the 2008 High Speed
Data Collection Survey and I've just uploaded the first region (North
Canterbury of course) to OSM.

<http://www.transit.govt.nz/hsdc/overview.jsp>

I've got quite a few regions that need converting, but have a good
process worked out now so hope to get most of them up evenings this
week. I'm tagging them with TransitHSDC2008 in addition to the usual
NZ and New+Zealand tags.

<http://openstreetmap.org/traces/tag/TransitHSDC2008>

Will write more once they're uploaded, including thanks to others that
have helped make this happen.

Cheers Gav

Robin Paulson

unread,
Jun 23, 2008, 6:12:58 AM6/23/08
to nzop...@googlegroups.com
2008/6/23 Gavin Treadgold <redi...@gmail.com>:

hi gavin, that looks hugely useful info, i'm sure it'll help lots,
particularly the areas with no yahoo aerial photo coverage.

one thing that concerns me though, is the same question that came out
of the discussions a couple of months ago on the linz data. there was
no consensus reached on how osm were going to display attribution, and
the suggestion was not to include the data, at least until the new
license is settled; i'm concerned these track points from transit fall
into the same category, and could potentially pollute the data

that's assuming this is crown copyright also, and that there will be
similar restrictions placed on it's use?

Gavin Treadgold

unread,
Jun 23, 2008, 6:39:21 AM6/23/08
to nzop...@googlegroups.com
I talked with my contact at Transit about this, and their main issue
is actually around liability, and how the raw data would be used. I
think that trackpoints are actually easier to deal with than more
structured data that comes from LINZ. The points I made...

1. As the trackpoints will for most people blur into a general mess,
it won't be an issue to identify and attribute individual points.

2. Not everyone is going to be accessing the trackpoints - they are
raw data to produce structured data.

3. Each GPX file has had a disclaimer included in the header
description tag. I have also pointed the URL in the GPX header to a
page on the gis.org.nz wiki which is going to be the effective home
page for this data outlining (again) the disclaimer etc.

The disclaimer is as follows.
"DISCLAIMER: The data contained in this data set is collected and used
by Transit New Zealand for specific purposes. Transit and its
employees or agents involved in preparation of this database cannot
accept liability for its contents or for any consequences arising from
its use. People using the contents of the database should apply, and
rely upon, their own skill and judgement. The contents should not be
used in isolation from other sources of advice and information."

I was quite up front in explaining this to my contact at Transit, and
after a couple of emails, he was happy with what I proposed.

It might be a different matter if I had attempted to get shapefiles of
the network, but given that trackpoints blend into everyone else's I
don't think there are going to be any pollution issues, not when
compared to loading structured data from LINZ.

Also on this note, a new guideline was developed on the Govt Web
Standards wiki earlier this year that is about to be more widely
promoted to encourage more flexible licensing and the sort of thing
you are after.

<http://www.gis.org.nz/wiki/Government_Geospatial_Information_Web_Access_Guideline
>

So I think Government are becoming more aware of the approach that is
needed for these sort of community-led ventures, and these are good
means of getting Government to test the waters of opening up more
information.

Cheers Gav

Hamish

unread,
Jun 24, 2008, 7:48:44 AM6/24/08
to nzop...@googlegroups.com
Gavin Treadgold wrote:
> the short story is that I have obtained from Transit 2.2 million trackpoints
> from the 2008 High Speed Data Collection Survey and I've just uploaded
> the first region (North Canterbury of course) to OSM.
>
> <http://www.transit.govt.nz/hsdc/overview.jsp>
>
> I've got quite a few regions that need converting, but have a good
> process worked out now so hope to get most of them up evenings this
> week. I'm tagging them with TransitHSDC2008 in addition to the usual
> NZ and New+Zealand tags.
>
> <http://openstreetmap.org/traces/tag/TransitHSDC2008>

Hi Gavin ... what can I say, nice work.


Slight issue -- I notice that the track lines are not broken between
segments and so you get massive "leaps" in the data. for example:
http://openstreetmap.org/user/rediguana/traces/132459


Perhaps OSM's filters take care of that automatically? If not, it is
easily fixed with GpsBabel. In the following I split a track into
multiple segments when the jump between points > 100m. (FWIW I got an
identical split with a 50m threshold)

gpsbabel -i gpx -f transit-20080312-coastalotago.gpx \
-x track,sdistance=0.1k \
-o gpx -F transit-20080312-coastalotago_split.gpx


for all tracks in dir a little unix shell script magic:

# may need to add the "pack" filter, -x track,pack,sdistance=0.1k
for MAP in transit-*.gpx ; do
gpsbabel -i gpx -f "$MAP" -x track,sdistance=0.1k \
-o gpx -F "`basename $MAP .gpx`_split.gpx"
done


After that it all looks good. To test I loaded them into a GRASS GIS
lat/lon WGS84 location using the v.in.gpsbabel module:

#import as track
v.in.gpsbabel -t in=transit-20080312-coastalotago.gpx \
out=coastalotago_312_trk format=gpx

#import as points (grass 6.4svn)
v.in.gpsbabel -tp in=transit-20080312-coastalotago.gpx \
out=coastalotago_312_pts format=gpx

#import as points (grass 6.3.0)
cat transit-20080312-coastalotago.gpx | grep '<trkpt' | \
cut -f2-4 -d'"' | sed -e 's/" lon="/|/' | \
v.in.ascii out=coastalotago_312_pts x=2 y=1

regards,
Hamish

Gavin Treadgold

unread,
Jun 24, 2008, 7:14:14 PM6/24/08
to nzop...@googlegroups.com
(again, sent from the correct account. Gmail is doing something funny with outgoing email. Hopefully fixed)



On Tue, Jun 24, 2008 at 11:48 PM, Hamish <hamish...@gmail.com> wrote:
Hi Gavin ... what can I say, nice work.

Thanks Hamish :)
 
Slight issue -- I notice that the track lines are not broken between
segments and so you get massive "leaps" in the data. for example:
 http://openstreetmap.org/user/rediguana/traces/132459

Yep - the data I received had only date, not timestamps, and the points were sometimes out of order in the text files I received.

Perhaps OSM's filters take care of that automatically?

If the points were properly ordered by time, they should have, but in this case I couldn't see an easy fix, and as the main application is viewing as points, rather than worring about the connecting segments, I wasn't too worried.

If not, it is easily fixed with GpsBabel. In the following I split a track into
multiple segments when the jump between points > 100m. (FWIW I got an
identical split with a 50m threshold)

Nice - I didn't even think of splitting the tracks.

I'll give that a go now. Note that pack/merge won't work with this data as all of them have timestamps set to 00:00:00, which for GPSBabel means that you're going to end up with only one point at that time - additional points with the same time are dropped. So distance is the only way we can do this with this data.
 
Update - it appears to work fine on removing the ugly rendering appearance on the tracklogs, but there are still some areas that are not properly formed - e.g. where there are points on both sides of the highway, the tracks zig-zag from one side to the next, and these are only 20m apart, so I think the files will be unreasonably large if we split the tracks to that level. Note that even 0.1km has significantly increased the file size in some files from anywhere from 20%-60% because of the additional overhead associated with creating multiple tracks.

So, I probably won't modify the source files as they are uploaded to OSM as their original task was to be standalone trackpoints rather than polylines, and this was reflected in the data I got from Transit. I have added your code suggestions to the wiki page for the data, so that anyone that visits it can understand the limitations of the data, and see how to refine it further, and tidy up the rendering of the tracklogs. Hope you're OK with me recording your advice there :)

<http://www.gis.org.nz/wiki/Transit_High_Speed_Data_Collection_Survey#GPX_File_Limitations>

This is probably as far as we can take this without asking Transit to provide timestamps so we can properly order the tracklogs. I will suggest that we try to get timestamps for the 2009 survey as long as there are no privacy concerns.

Cheers Gav


Hamish

unread,
Jun 25, 2008, 8:27:16 AM6/25/08
to nzop...@googlegroups.com
Hamish:

>> Slight issue -- I notice that the track lines are not broken between
>> segments and so you get massive "leaps" in the data. for example:
>> http://openstreetmap.org/user/rediguana/traces/132459

Gavin:


> Yep - the data I received had only date, not timestamps, and the points were
> sometimes out of order in the text files I received.

I assume they were not random, dump from DB lists them north->south or..?
Presumably they were originally loaded into the system in a sequential way.


>> Perhaps OSM's filters take care of that automatically?
>
> If the points were properly ordered by time, they should have, but in this
> case I couldn't see an easy fix, and as the main application is viewing as
> points, rather than worring about the connecting segments, I wasn't too
> worried.

ok, understood.

>> If not, it is easily fixed with GpsBabel. In the following I split a track
>> into multiple segments when the jump between points > 100m.
>> (FWIW I got an identical split with a 50m threshold)
>
> Nice - I didn't even think of splitting the tracks.
>
> I'll give that a go now. Note that pack/merge won't work with this data as
> all of them have timestamps set to 00:00:00, which for GPSBabel means
> that you're going to end up with only one point at that time - additional points
> with the same time are dropped. So distance is the only way we can do this
> with this data.

ah, I hadn't realized that gpsbable used the timestamp as the key.

I hadn't actually tried pack,merge. No matter, turns out the transit
files I saw the warning: "trackfilter-split: Cannot split more than
one track, please pack (or merge) before!" for were actually just my
own experiments that the wildcard caught.


> Update - it appears to work fine on removing the ugly rendering appearance
> on the tracklogs, but there are still some areas that are not properly
> formed - e.g. where there are points on both sides of the highway, the
> tracks zig-zag from one side to the next, and these are only 20m apart, so I
> think the files will be unreasonably large if we split the tracks to that
> level. Note that even 0.1km has significantly increased the file size in
> some files from anywhere from 20%-60% because of the additional
> overhead associated with creating multiple tracks.

ok; if these are going in as points, then not a problem for the purpose.
The one road I looked at with travel on both sides of the street
formed nice lines, but ok, not all of them will be like that. Of
course multi-lane roads will have positional issues too.

> So, I probably won't modify the source files as they are uploaded to OSM as
> their original task was to be standalone trackpoints rather than polylines,
> and this was reflected in the data I got from Transit.

sounds fine.

> I have added your
> code suggestions to the wiki page for the data, so that anyone that visits
> it can understand the limitations of the data, and see how to refine it
> further, and tidy up the rendering of the tracklogs. Hope you're OK with me
> recording your advice there :)

sure; it's a public list with the word "open" in it after all. :)

> <http://www.gis.org.nz/wiki/Transit_High_Speed_Data_Collection_Survey#GPX_File_Limitations>
>
> This is probably as far as we can take this without asking Transit to
> provide timestamps so we can properly order the tracklogs. I will suggest
> that we try to get timestamps for the 2009 survey as long as there are no
> privacy concerns.

there may well be privacy concerns. I know if I were doing the survey
it would be annoying for some jerk to go into the data and analyze how
long I stopped for a pie at lunch or when I didn't slow down to 50kph
until 12m after the speed zone change then start writing letters to
the editor/boss...

But using timestamp as the key is mostly just a quirk of the gpsbabel
tool, not a serious shortcoming of the data. As long as they are
sequential in time other tools may be easily used.

re. wiki page:
----------
"Data Quality
From Transit's website.
The SCRIM+ is fitted with Trimble GPS equipment sampling the
Omni-Star satellite to record the differential GPS coordinates of the
centreline. Tilt sensors for crossfall and gradient together with a
gyroscope provide alignment details when out of sight of satellites.
The original data was provided by Transit in GD49/NZMG and was
reprojected using the NTv2 grid to WGS84 to maintain a high level of
accuracy. Each point should be accurate to ~+/- 1m. "
----------

I am not sure what DGPS level they used, but it is a real shame they
chose to go with NZMG. I hope internally they saved as lat/lon WGS84!
NZMG is both mathematically and practically meaingless below ~1m, so
it would be a shame if they paid a lot of money and went to a lot of
effort to get sub-meter accuracy then degraded it by using an inferior
projection system when NZTM+GD2k is right there.

Another thing, great that you used the NTv2 grid to do the datum
transform, but what assurance do you have that Transit* did the same
when they converted from the GPS's raw WGS84 to NZMG on the way into
the dataset?

* actually from what I've seen/recall the Trimble software does this
internally somewhere.

Regardless, even 5m error probably wouldn't be noticed. Attached find
a little test image where I put the transit data (red dots, green
line) over the top of LINZ topo4 road_cl layer (some years old;
reprojected from NZTM -> LL/wgs84). As Highway 1 South heads past
Dunedin's Octagon. At the top right the GPS track is about 24m NW of
the old LINZ road data.

I'd be interested to know how the latest v14 LINZ data lines up there.
Maybe this is part of the reason why Transit commissioned the job?

---

If anyone is interested, here is the PROJ.4 (proj.maptools.org)
command to do NZMG/GD49->LL/WGS84 with the distortion grid:

cs2cs -f "%.7f" +init=epsg:27200 +nadgrids=/path/to/nzgd2kgrid0005.gsb \
+to +proj=longlat +datum=WGS84 \
< coord_NZMG.txt > coord_LLwgs84.txt

data files should be two columns separated by whitepace, "easting northing".


enough from me already,
Hamish

octagon_transit_gps.png

kimo

unread,
Aug 17, 2008, 9:59:45 AM8/17/08
to nzopengis
Let's not worry about the reprojection using Trimble equipment. It
will be OK.

Besides, if you have a look at the points, they are lanes, not roads,
so we will need to average the lanes for most purposes.

Chopping out the skips may work for a few tracks, but in general they
are more scrambled than that, there are often multiple lanes and
offramps recorded.

I have been intrigued to see if I could recover the tracks with
spatial analysis, and I have had preliminary success for simple cases.
Is it worth continuing? A few exceptions could be edited by hand. I
can get other attributes, default directions for the lanes, road
names, changes from old transit centrelines and lots of other
metadata. I expect that a bulk snapping exercise to existing segments
may be the way to go.

Robin Paulson

unread,
Apr 14, 2010, 8:32:46 PM4/14/10
to nzop...@googlegroups.com

does anyone know what happened with this; did all the tracks get uploaded?

gavin?

Gavin Treadgold

unread,
Apr 14, 2010, 8:41:58 PM4/14/10
to nzop...@googlegroups.com
On 2010-04-15, at 12:32 , Robin Paulson wrote:
I'm tagging them with TransitHSDC2008 in addition to the usual
NZ and New+Zealand tags.

<http://openstreetmap.org/traces/tag/TransitHSDC2008>

does anyone know what happened with this; did all the tracks get uploaded?

Did you try that link? ;)

All of the traces were uploaded as GPX and the trackpoints have been available for nearly two years now. They were _not_ converted to vector and imported. I felt the best solution at the time was just a conversion to GPX and upload so it becomes part of the underlying source data.

Cheers Gav

Robin Paulson

unread,
Apr 14, 2010, 8:49:17 PM4/14/10
to nzop...@googlegroups.com
On 15 April 2010 12:41, Gavin Treadgold <redi...@gmail.com> wrote:
> Did you try that link? ;)
> All of the traces were uploaded as GPX and the trackpoints have been
> available for nearly two years now. They were _not_ converted to vector and
> imported. I felt the best solution at the time was just a conversion to GPX
> and upload so it becomes part of the underlying source data.
> Cheers Gav
>

yes, i did. the reason i ask is because when i looked at osm within
potlatch, not all the roads (in fact very few) had gps tracks
associated with them. i was under the impression the transit logs were
for every road in the country? my question would have been better
worded as; were there any problems/missing data i guess?

i've tried the other link (to transit), but that appears to be dead

Gavin Treadgold

unread,
Apr 14, 2010, 9:00:26 PM4/14/10
to nzop...@googlegroups.com

On 2010-04-15, at 12:49 , Robin Paulson wrote:
> i was under the impression the transit logs were
> for every road in the country? my question would have been better
> worded as; were there any problems/missing data i guess?

Yes - that would be the disconnect. This data was solely for the National Highway network. So, you should only see the data for the highways and nothing else. All non-state highways will not be included in that upload.

As I understand it, and unless things have changed with the newer NZ Transport Agency, the 73 (for the time being) road controlling authorities are the agencies that contain authoritative road data for their jurisdiction. The old Transit was only authoritative for the highway network.

Hope that clears it up! :)

Cheers Gav

kimo

unread,
Apr 27, 2010, 5:36:46 PM4/27/10
to nzopengis


On Apr 15, 12:41 pm, Gavin Treadgold <redigu...@gmail.com> wrote:
> On 2010-04-15, at 12:32 , Robin Paulson wrote:
>
> >> I'm tagging them with TransitHSDC2008 in addition to the usual
> >> NZ and New+Zealand tags.
>
> >> <http://openstreetmap.org/traces/tag/TransitHSDC2008>
>
> > does anyone know what happened with this; did all the tracks get uploaded?
>
I downloaded them all at the time and managed to find a way of
unscrambling the 'random' order to build lane vectors even on the
multi-lanes for the motorways. I did not get around to editing the
exceptions and uploading them again, but could if they are still
useful.

But what use are the lines versus the points?
The only purpose for the gps tracks I could see is to pick up new
roadworks that straightened corners and you can see that from the
points.
They have no attributes, and you need to decide if lanes are to be
drawn instead of centrelines.

> Did you try that link? ;)
>
> All of the traces were uploaded as GPX and the trackpoints have been available for nearly two years now. They were _not_ converted to vector and imported. I felt the best solution at the time was just a conversion to GPX and upload so it becomes part of the underlying source data.
>
> Cheers Gav

--
You received this message because you are subscribed to the Google Groups "nzopengis" group.
To post to this group, send email to nzop...@googlegroups.com.
To unsubscribe from this group, send email to nzopengis+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nzopengis?hl=en.

Gavin Treadgold

unread,
Apr 27, 2010, 6:29:30 PM4/27/10
to nzop...@googlegroups.com
Disclaimer - I didn't get the data in ideal form from Transit ;)

On 2010-04-28, at 09:36 , kimo wrote:
> I downloaded them all at the time and managed to find a way of
> unscrambling the 'random' order to build lane vectors even on the
> multi-lanes for the motorways.

Yes, this was a PITA. It looked like the original data I received was an export from the road survey, and without looking at what I received a couple of years ago, I think that the coords in the text file may have been ordered by latitude desc (or maybe asc) - this of course threw all the points out of order :(

> But what use are the lines versus the points?


Lines hold absolutely no value. The points hold all the value.

OSM GPX export appears to assume that the points in a GPX are in correct order, and renders the thumbnail to display lines. I think it is important to note that GPX trackpoint lines hold no direct value as they are only derived from the point at either end, and any error in the points is magnified in the line segment. JOSM is good - last time I used it, it only displays GPX points and doesn't calculate and render the line segments. I would much rather have 1sec tracklogs and rely on quantity of points, than 10 or 30sec tracklogs and use the lines as some people do.

> The only purpose for the gps tracks I could see is to pick up new
> roadworks that straightened corners and you can see that from the
> points.

My real intent was to open Transit (at the time) up to the idea that general citizens are actually interested in the data ;)

I've driven many of the state highways myself, and have reasonably accurate tracklogs, but this was just a means of adding some raw underlying data that adds to the corpus of real data. I've never been comfortable with the tracing of streets in OSM from satellite imagery - some of it many years old. Nor the reliance upon LINZ roads which have often proven themselves to be quite inaccurate.

Hence my focus has always been to get more raw trackpoint data in - so we can actually say, someone has been there and driven, walked, cycled it at a given point in time. It gives more credibility to the data. Having Transit data in there for the highways therefore also provides a little insurance against someone coming in and making incorrect edits. And if there is ever a copyright issue with a commercial/proprietary provider, having raw trackpoint data makes it a lot easier to make the case that we built it up ourselves and did not copy a commercial product, nor had the need.

> They have no attributes, and you need to decide if lanes are to be
> drawn instead of centrelines.

Original data had no attributes - this was raw data from the driven survey, not the value-added output after processing.

Yes - the data was far from ideal, but it did serve a few needs at the time, which is why I invested a little time to get it in there. At some point hopefully we'll be able to get richer data from NZTA under a suitable license.

kimo

unread,
May 10, 2010, 8:22:18 AM5/10/10
to nzopengis


On Apr 28, 10:29 am, Gavin Treadgold <redigu...@gmail.com> wrote:
> Disclaimer - I didn't get the data in ideal form from Transit ;)
>
I understood the deliberate obfuscation and I took up the challenge
to defeat it.

The result of my test joining the dots was a superb set of main
highway lanes to very high accuracy, much better than OSM needs, they
have kinematic correction, inertial navigation and have been tidied
up. At the time it got all the new state highways from rural road
straightening. My question was: Is it the goal to draw each lane? If
so, then I could finish the process and make it available to all as
lines as well as points. There are so many points they are very hard
to deal with, but as lines they are much more manageable.

I'm not into manual editing, loading other people's work is much more
rewarding.
Reply all
Reply to author
Forward
0 new messages