definition of "conflating", Garalyn can MTC get us desired datasets?

Skip to first unread message

Amar Pai

Dec 7, 2009, 1:58:02 PM12/7/09
to sf-bike-planner, Matthew Heberger,
[Matt said]:

There is definitely a gulf between programmers and GIS types... you and Amar kept talking about "conflating" datasets. In ten years of using GIS, I've never heard anyone use that word, and I'm still not sure exactly what exactly you were trying to do!

Hi Matt,

All that's meant by conflating is "combining different layers into a single layer."  So for instance, I have one shapefile that lists the SF road network - it shows all streets and identifies each street segment by name, e.g. "Market Street."  I have another shapefile that lists bike network for SF - e.g. it draws all bike paths / lanes in SF - but it doesn't identify which of these correspond to streets and what the streets are.  So it shows a bike lane running along Market but it doesn't actually label those segments as Market.

What's needed from our end is a data file that combines all these different layers so all the data exists in one file, similar to the Google map format I sent out earlier.  The TIGER data that MTC had available didn't have all this data combined ('conflated') and me/John couldn't figure out how to do this.  The data applies to the exact same map (SF) so coordinates should be in agreement across different feature sets; we just need a way to mash the feature sets together so there's a single map w/ all relevant details.

Garlynn, did the other email I sent out w/ data formats used give you an idea of what's needed?  The one dataset I forgot to mention there is orientation data.  In my case I had to take a CSV containing 1-way orientations (-1, 1 or 0 if two way) for each edge in the street graph.  I cadged this from SFGIS but it was out of date and obviously would have preferred to have this data exist in the shapefile/dbf itself.

If we could get the Google maps format for entire bay area, up to date as possible, that would be a great starting point.  If MTC could help with this it would be awesome.



Garlynn Woodsong

Dec 8, 2009, 12:31:49 PM12/8/09
to Amar Pai, sf-bike-planner, Matthew Heberger
Hey guys,

Conflation was a big problem for us at MTC, as well. We built (heads-up digitized) the original bicycle route dataset as a stand-alone data package, traced on top of (or copied from) the street basemap or other data sources. When we started working on the bicycle trip planner, we quickly realized that we needed to conflate this data with our basemap, at least for Class 2 & 3 facilities (Class 1, being off-street paths, could just be added to the network as new links), so as to get the other attributes like one-way that we needed to support the routing engine. I don't quite remember what the outcome was, but I think that MTC should now have a conflated version of the regional bike route database that has already been conflated to their TeleAtlas North America (TANA) basemap. As mentioned at the meeting, this is both good and bad. Good, because that will allow for routing. Bad, because TANA began life as proprietary data, so it might be tough to convince MTC that they have created a new data product and thus have license to share it -- though I think this argument can be successfully and convincingly made, as there is past precedent for this at the agency.

I'll try and set up a meeting for this group with the relevant folks at MTC soon. I'll get back to you with potential dates.

       O           ...let's go do
   O                        something
 O                              i n t e r e s t i n g
O                 .garlynn.g.woodsong.
o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o  o

Amar Pai

Dec 8, 2009, 1:43:14 PM12/8/09
to Matthew Heberger, sf-bike-planner
Hi Matt,

Agree that the problem of computing intersections is a major one.  I dealt with it in a fairly kludgy manner (thought it did work) -- I reduced # of digits of precision for all coords enough that two 'very close' points on the map would end up as the same #.  This, along with the SF intersections shapefile that explicitly laid out intersections for the road network, and some ugly data massaging, was enough to generate nodes for a directed graph.  But it definitely would have been preferable to actually run some software process (in ArcGIS or whatever) that calculates all intersection points in a robust way.  I just didn't want to get into the computational geometry headache of it. 

Not sure what you mean by "Google maps format"? Do you mean data that is in decimal degrees, e.g. lat/long.

This is the vector data specification Google provides on its website, for anybody who wants to submit bike network data to them:

It seems to encompass most of the stuff that we need.

On Tue, Dec 8, 2009 at 9:32 AM, Matthew Heberger <> wrote:
Hi Amar,

I was being slightly tongue-in-cheek, but I did actually go home and look up the word... and found out it means "blend: mix together different elements."

I see what you're trying to do. You've got two different datasets: the road network in one file, and bike lanes in another file, and you want to somehow merge them together.

I can't imagine any kind of automated methdo for this that would produce the desired output.

When GIS users talk about combining data, there are actually lots of different methods. Some methods combine the geographic features (merge, union, intersect). Others will transfer attributes from one datasets table to another, e.g. join, or spatial join).

Also, when you are working with transportation datasets, the connectivity is extremely important. E.g. when one line segment crosses another, do they connect? This would be the case for two roads that come together at an intersection, and you can turn from one onto another. In another case, two lines may cross, but there is no connection. Imagine an overpass or a tunnel... The technical term for this is "topology". Unfortunately, shapefiles are not the best data format to work with when topology is important.

Really, getting in there and closely examining the geometry and the attributes is the only way to do it. Sometimes, automated methods (which we sometimes call geoprocessing), can get you 80% of the way there. But the last 20% of the editing is probably going to be manual, and its going to take 99% of your time and effort.

Anyhow, now that you've got several GIS pros on the team, we should be able to deal with all of these issues. Perhaps we could have a meetup in Jan where I could do a tutorial on the GIS side of things for folks who are interested and want to help with that. 

Not sure what you mean by "Google maps format"? Do you mean data that is in decimal degrees, e.g. lat/long.

Code from bike mapper
Tried following John's instructions, but couldn't get cygwin installed on my old Windows machine. I've installed software called TortoiseSVN, and have been able to download code from the Google Code page, but haven't been able to upload. I'll try again this evening.


On Mon, Dec 7, 2009 at 10:58 AM, Amar Pai <> wrote:

"I don't walk, I get CARRIED" -- ODB (RIP)
Reply all
Reply to author
0 new messages