Friendly Forks

80n

unread,

Aug 14, 2010, 10:42:11 AM8/14/10

to osm-...@googlegroups.com

What can be done to make the various forks that are likely to appear more friendly, both to each other and to the "Steve Coast controlled OSM" (TM).

I'm thinking here about such things as:
* Continuity of OSM accounts (is there a way an OSM user could be re-associated with the same user Id in a forked database?)
* Ensuring tagging schemes do not diverge
* Where licenses and usage permits, enabling data from different OSM forks to be combined (unique ranges for element ID's?)

Are there other areas where forks can be made to be more cooperative with each other?

80n

John Smith

unread,

Aug 14, 2010, 11:04:12 AM8/14/10

to osm-...@googlegroups.com

On 15 August 2010 00:42, 80n <80n...@gmail.com> wrote:
> What can be done to make the various forks that are likely to appear more
> friendly, both to each other and to the "Steve Coast controlled OSM" (TM).

It's trivial to be friendly to each other, just don't be nasty :)

> I'm thinking here about such things as:
> * Continuity of OSM accounts (is there a way an OSM user could be
> re-associated with the same user Id in a forked database?)

This would only be an issue for forks, so the CC-BY and PD "forks"
which are really forks could just add new accounts.

If they want to ensure things so that the same userid is the same
between them, regardless if they are a fork or not could hook into the
oauth stuff.

> * Ensuring tagging schemes do not diverge

This is bound to happen, if for no other reason than differing
personal opinions, who gets there first etc etc etc.

One solution might be to setup a translation matrix, where a DB keeps
tabs on various tagging schemes, OSM might end up going this way in
any case. Alternatively the OSM wiki could be deemed authoritative and
those wanting to expand tags could keep working within current and
future OSM tagging structures/schemes...

> * Where licenses and usage permits, enabling data from different OSM forks
> to be combined (unique ranges for element ID's?)

Unique IDs is an option, there is also some scripts with the French
data that tries to match attributes from existing data, although once
OSM switches to ODBL that may no longer be an option, just as it might
be difficult to shift data from a CC database into a PD database.

Another option might be to do something in editors, where multiple
layers are downloaded from each system and then anything new is
uploaded into 1 or more databases that are compatible with the user's
preference. That would bypass the issue of licenses since the author
is responsible for where their data ends up.

Anthony DiPierro

unread,

Aug 14, 2010, 11:41:17 AM8/14/10

to osm-...@googlegroups.com

On Sat, Aug 14, 2010 at 10:42 AM, 80n <80n...@gmail.com> wrote:
> * Continuity of OSM accounts (is there a way an OSM user could be
> re-associated with the same user Id in a forked database?)

oauth seems like the best way to do that, so long as OSMF cooperates
or at least doesn't hinder things.

However, there's a potential snafu

> * Ensuring tagging schemes do not diverge

Presumably the fork and OSM will continue to use the same
editors/renderers. In my opinion this provides an optimum balance
between ensuring tagging schemes do not diverge too much, but allowing
flexibility in both projects. If a fork wants to try out a new
tagging scheme for a while, they're free to do so, and if it turns out
to be a really good idea maybe the editors/renderers will adopt it and
it'll wind up being adopted by OSM as well.

Ensuring tagging schemes do not diverge too much between OSM and the
fork is basically the same problem as ensuring that they don't diverge
too much between Europe and North America. Document and communicate.

80n

unread,

Aug 14, 2010, 1:24:45 PM8/14/10

to osm-...@googlegroups.com

On Sat, Aug 14, 2010 at 4:41 PM, Anthony DiPierro <dipi...@gmail.com> wrote:

On Sat, Aug 14, 2010 at 10:42 AM, 80n <80n...@gmail.com> wrote:
> * Continuity of OSM accounts (is there a way an OSM user could be
> re-associated with the same user Id in a forked database?)

oauth seems like the best way to do that, so long as OSMF cooperates
or at least doesn't hinder things.

However, there's a potential snafu

Can oauth be used in this way?

I have a Steve's OSM account: 80n. The status-quo fork would have all my contributions under the same account Id. When I register as 80n in the fork, how could I verifiably claim to be the same 80n and thus establish ownership of all those contributions?

In some ways it doesn't matter at all. Everyone could be given new accounts. But establishing continuity would help with things like contributors stats (these are important to some people) and make the transition less alien for established OSM users.

I don't know enough about oauth to know if it can do this? Would existing OSM users have to invest time in setting up an oauth account? Is that easy to do?

If we want people to switch to the fork it has to be easy to do.

> * Ensuring tagging schemes do not diverge

Presumably the fork and OSM will continue to use the same
editors/renderers. In my opinion this provides an optimum balance
between ensuring tagging schemes do not diverge too much, but allowing
flexibility in both projects. If a fork wants to try out a new
tagging scheme for a while, they're free to do so, and if it turns out
to be a really good idea maybe the editors/renderers will adopt it and
it'll wind up being adopted by OSM as well.

Ensuring tagging schemes do not diverge too much between OSM and the
fork is basically the same problem as ensuring that they don't diverge
too much between Europe and North America. Document and communicate.

This should be pretty easy. Or at least no more difficult than it is within OSM.

Can anyone think of any other areas where things could be made easier?

Anthony DiPierro

unread,

Aug 14, 2010, 1:45:50 PM8/14/10

to osm-...@googlegroups.com

On Sat, Aug 14, 2010 at 1:24 PM, 80n <80n...@gmail.com> wrote:
> On Sat, Aug 14, 2010 at 4:41 PM, Anthony DiPierro <dipi...@gmail.com>
> wrote:
>>
>> On Sat, Aug 14, 2010 at 10:42 AM, 80n <80n...@gmail.com> wrote:
>> > * Continuity of OSM accounts (is there a way an OSM user could be
>> > re-associated with the same user Id in a forked database?)
>>
>> oauth seems like the best way to do that, so long as OSMF cooperates
>> or at least doesn't hinder things.
>>
>> However, there's a potential snafu
>>
> Can oauth be used in this way?

In theory. I haven't actually used oauth with OSM yet so I'm not sure
of the exact details. It looks like you have to "register your
application as an oauth consumer", though, and that requires an OSM
account, so there is a lot of room for OSMF to prevent us from using
oauth if they decide they want to.

(By the way, I think I had decided that "potential snafu" I was
talking about wasn't actually a problem, and just forgot to delete
that sentence. In any case, I forget the details of it.)

> I have a Steve's OSM account: 80n. The status-quo fork would have all my
> contributions under the same account Id. When I register as 80n in the
> fork, how could I verifiably claim to be the same 80n and thus establish
> ownership of all those contributions?

My understanding is that our website would provide a link to OSM, the
user would log in to their OSM account, and then they'd return to our
website with a token which verifies that they are the user they say
they are.

> In some ways it doesn't matter at all. Everyone could be given new
> accounts. But establishing continuity would help with things like
> contributors stats (these are important to some people) and make the
> transition less alien for established OSM users.
>
> I don't know enough about oauth to know if it can do this? Would existing
> OSM users have to invest time in setting up an oauth account? Is that easy
> to do?
>
> If we want people to switch to the fork it has to be easy to do.

I would suggest that the fork maintain its own user accounts, and
allow people to link their OSM accounts. Linking would be a one-time
deal, and we'd just maintain a table of links locally. Probably no
reason not to let people link multiple OSM accounts if they want.

I'm not sure exactly how easy it is. My understanding is that it's
similar to buying something with Paypal. But until I get a chance to
actually implement a test case, I don't know.

John Smith

unread,

Aug 14, 2010, 1:47:53 PM8/14/10

to osm-...@googlegroups.com

On 15 August 2010 03:24, 80n <80n...@gmail.com> wrote:
> I have a Steve's OSM account: 80n. The status-quo fork would have all my
> contributions under the same account Id. When I register as 80n in the
> fork, how could I verifiably claim to be the same 80n and thus establish
> ownership of all those contributions?

Since all we have is a userid and/or username we can only authenticate
against a known database, in this case OSM's.

> In some ways it doesn't matter at all. Everyone could be given new
> accounts. But establishing continuity would help with things like
> contributors stats (these are important to some people) and make the
> transition less alien for established OSM users.
>
> I don't know enough about oauth to know if it can do this? Would existing
> OSM users have to invest time in setting up an oauth account? Is that easy
> to do?

What would be easier would be just to send an email through the OSM
system, to automatically setup oauth we have to know their password. I
guess leave the option up to the end user if they want to use oauth or
password authentication, as emails would probably be blocked sooner
rather than later.

TimSC

unread,

Aug 14, 2010, 6:20:08 PM8/14/10

to OSM Fork

Another thing is to have as much code and tools in common as possible.
Forks have slightly different requirements than the parent project. As
I understand it, the rails paradigm puts as much configuration as
possible in the database. E.g. the license that the fork is using
probably should be in a database table, rather than a configuration
file (as it is now I believe). This would enable us to update to the
latest version of the rail port without much software changes.

All keeping in step with API changes will make the tools easier for
users too.

Another thing is to allow users to license under several different
licenses. Or to license GPS traces more permissively than data. Or
license simple data (roads) permissively but restrict fancy tagging -
basically give the users fine grain control. All these options should
be in a machine readable format to enable tools to extract data for a
particular license. This would improve migration of data between the
trunk and forks.

Brendan Morley

unread,

Aug 14, 2010, 7:46:57 PM8/14/10

to osm-...@googlegroups.com

On 15/08/2010 12:42 AM, 80n wrote:

What can be done to make the various forks that are likely to appear more friendly, both to each other and to the "Steve Coast controlled OSM" (TM).

As opposed to the "Brendan Morley controlled CM" (-: (p.s. you can always run for a management committee position if you don't like it, but I'd prefer you give me a chance to hear your concerns first.) I'll share my thinking on you topics:

* Continuity of OSM accounts (is there a way an OSM user could be re-associated with the same user Id in a forked database?)

You could use OpenID but then the OSM website would have to be an OpenID provider/server. In CM's case I wouldn't force it at all. However, both the Drupal site and the API site can act as OpenID consumers, so if you want to do things that way then fine.

To take an analogy, can *anyone* publish a book as "Chris Ryan" or "Stephen King"?

* Ensuring tagging schemes do not diverge

Considering the OSM tagging scheme can wander off anywhere, this is impossible to ensure 100% compliance. However I'm confident people with run with the herd when it comes to things like highway=primary and building=yes.

* Where licenses and usage permits, enabling data from different OSM forks to be combined (unique ranges for element ID's?)

Unlikely, and I've given up on the possibility. You're describing a case of "multi-master replication", or in source code control circles, the "git" model. Even in relatively highly controlled environments (e.g. in a government department) it's very difficult to keep track of UFI/IDs if you are combining data sources with different lines of control.

However what I am encouraging is preserving the foreign key of the imported database. This allows the "foreign" owner to occasionally compare lists and choose what to take back over time.

I should mention another confounding factor ... MMR within the CommonMap community. For example there's already going to be at least 2 sources of ID generation:

Within the official API (backed by the Ruby of Rails code)
Within an import stream

Number (2) comes about because the Rails code is so slooow for imports - (see http://trac.openstreetmap.org/ticket/3035)
Instead my scripts will prepare a series of tab separated values for direct import into the PostgreSQL server. To cope, I am likely to tell the PostgreSQL server to issue autoincrement ID's in 2's. Expanding to 4's or 8's etc if/when we get more database servers and/or importers.

Brendan

John Smith

unread,

Aug 14, 2010, 8:57:59 PM8/14/10

to osm-...@googlegroups.com

On 15 August 2010 08:20, TimSC <map...@sheerman-chase.org.uk> wrote:
> file (as it is now I believe). This would enable us to update to the
> latest version of the rail port without much software changes.

The rails port seems horribly inefficient if they need 5 servers just
to run 1 website with the current userbase, so from a technically
point of view this seems very poorly written or a very poor choice of
language. I think Brenden has re-written a lot/all of this in
drupal/php, although using C might be a better option for high load or
even just PHP + the facebook C conversion tool might still ok.

Brendan Morley

unread,

Aug 15, 2010, 8:45:23 AM8/15/10

to osm-...@googlegroups.com

On 15/08/2010 10:57 AM, John Smith wrote:
> On 15 August 2010 08:20, TimSC<map...@sheerman-chase.org.uk> wrote:
>
>> file (as it is now I believe). This would enable us to update to the
>> latest version of the rail port without much software changes.
>>
> The rails port seems horribly inefficient if they need 5 servers just
> to run 1 website with the current userbase, so from a technically
> point of view this seems very poorly written or a very poor choice of
> language.

The OSM-specific rails code is reasonably tight from the parts I saw,
but the ruby/rails paradigms are highly dynamic and so doesn't lend
itself to high amounts of repetition. Which might be great when you
have a page of blog, but when you're dealing with many map nodes on a
map viewport, it tends to slow down quite a lot more agressively than
I'm used to. From memory, the "heavy" rails calls are "map" (return
everything in a viewport) and the "upload" function - in OSM they are
proxied to dedicated servers.

I'm not quite sure why SteveC went rails, but since the main business
case for rails is essentially, "more hardware is less expensive than
more developer hours", I can see his point. Maybe for them, less
developer hours = more time out there surveying.

> I think Brenden has re-written a lot/all of this in
> drupal/php,

The API remains as rails-backed code. I also intend to run an
additional bulk upload method (preprocessing psql data on the client
side and direct upload to the core database).

The front end website I intend to migrate over to drupal-backed code
where possible. The User Diaries function in particular seems to be
better off on a drupal base where it can benefit from a larger developer
pool. Drupal would then also handle forums, wiki, mailing list, issue
list - therefore building a strong, highly interlinked community site.
I'd possibly also use it to host source code management and GPS trace
management - drupal modules exist to do this but I haven't checked them
out much yet.

The main hassle I have with drupal is it's not obvious how to do
cross-database calls (I may not have found the right switch yet) - not a
showstopper though. But it would help reimplement, say, the History
page in a drupal skin. Another approach might be to implement an OSM
API client in Drupal to fetch data for the History page etc., which is a
more architecturally pure approach.

One of my next tests is to run an OpenID server from the drupal site -
the theory being one just registers into the drupal site and then
re-uses that login on the rails API. Then we get the "holy grail" of
single password across all CommonMap offerings.

> although using C might be a better option for high load or
> even just PHP + the facebook C conversion tool might still ok.
>

Well Tom Hughes preferred optimisation in C(++?), although when I was
troubleshooting the slow uploader, a lot of the problem was in a whole
lot of little calls being made of the database. So it's a bit early to
call in the proprietary facebook stuff.

Brendan

John Smith

unread,

Aug 15, 2010, 8:59:39 AM8/15/10

to osm-...@googlegroups.com

On 15 August 2010 22:45, Brendan Morley <mo...@beagle.com.au> wrote:
> Well Tom Hughes preferred optimisation in C(++?), although when I was
> troubleshooting the slow uploader, a lot of the problem was in a whole lot
> of little calls being made of the database. So it's a bit early to call in
> the proprietary facebook stuff.

I wouldn't exactly call the facebook stuff propriatary, since it just
converts php to C and then you use any C compiler, like gcc, to
compile it into a stand alone binary.

Anthony DiPierro

unread,

Aug 15, 2010, 10:20:38 AM8/15/10

to osm-...@googlegroups.com

I don't like rails port at all. From what I've seen it looks like the
database was designed around a rails paradigm rather than the software
being designed around the database.

That's not to say that it was the wrong choice. Just getting
something out there to attract users might have been more important in
the beginning stages of the project.

Rewriting the code which implements the API would be a top priority
for me, *if* I thought I had the time (or the time plus assistance) to
complete it. Right now I don't though, and the rails port seems to be
adequate for the time being.

Something else I'd like to do if I had the time would be to support
more quick editing outside of the standard graphical map-based
editors. People should be able to, for example, correct road name
spellings, without firing up JOSM or Potlatch. An editor just
dedicated to adding lots of POIs quickly would be another good
addition. Provide it with a few tags (or one, say amenity=library)
and a list of addresses, and it pops up an aerial/satellite map with
the approximate location, you click at the proper position, maybe type
the name of the library (or maybe that could be in the spreadsheet you
upload at the start), and then it loads up the next one on the list.
Maybe this is a bit US-centric: it'd be a great way to convert the
TIGER approximate geolocation into really accurate POIs.

John Smith

unread,

Aug 15, 2010, 10:31:03 AM8/15/10

to osm-...@googlegroups.com

On 16 August 2010 00:20, Anthony DiPierro <dipi...@gmail.com> wrote:
> I don't like rails port at all. From what I've seen it looks like the
> database was designed around a rails paradigm rather than the software
> being designed around the database.

Well is a drupal frame work in your opinion a better option?

If so we could follow Brenden down that path, all we'd need to do then
is fix up the API code and we should be able to ditch the rails stuff
completely.

> Something else I'd like to do if I had the time would be to support
> more quick editing outside of the standard graphical map-based
> editors. People should be able to, for example, correct road name
> spellings, without firing up JOSM or Potlatch. An editor just
> dedicated to adding lots of POIs quickly would be another good
> addition. Provide it with a few tags (or one, say amenity=library)
> and a list of addresses, and it pops up an aerial/satellite map with
> the approximate location, you click at the proper position, maybe type
> the name of the library (or maybe that could be in the spreadsheet you
> upload at the start), and then it loads up the next one on the list.
> Maybe this is a bit US-centric: it'd be a great way to convert the
> TIGER approximate geolocation into really accurate POIs.

Maybe we could do something in conjunction with Nearmap? They're
already working on some kind of simplified editor.

Anthony DiPierro

unread,

Aug 15, 2010, 11:00:30 AM8/15/10

to osm-...@googlegroups.com

On Sun, Aug 15, 2010 at 10:31 AM, John Smith <deltafo...@gmail.com> wrote:
> On 16 August 2010 00:20, Anthony DiPierro <dipi...@gmail.com> wrote:
>> I don't like rails port at all. From what I've seen it looks like the
>> database was designed around a rails paradigm rather than the software
>> being designed around the database.
>
> Well is a drupal frame work in your opinion a better option?

I don't really know much about drupal. My understanding is that it's
not being used for the API part anyway, though.

> If so we could follow Brenden down that path, all we'd need to do then
> is fix up the API code and we should be able to ditch the rails stuff
> completely.

My understanding is that he's planning on implementing everything
*but* the API in drupal. If that happens, it'd probably be reasonable
to then reimplement the API without rails. One catch is that there
are a lot of little irregularities which popped up accidentally due to
the use of rails and xml. There is invalid data in the database,
which can't be added any more but which was added previously. The
UTF8 string lengths are not being treated consistently - rails is
counting bytes and the db is counting characters (or is it vice versa?
I forget).

And then Potlatch has its own sort of "private API" and is making
direct queries against the database without going through rails (I
suspect this is how some of the "invalid data" got into the database).
Without a flash expert I think we'd have to ditch Potlatch in order
to ditch rails. Or move to Potlatch 2, which I'm skeptical about the
completion of.

I guess that wouldn't stop us from reimplementing the API outside of
rails without touching the database schema. But there'd likely be
little benefit to doing so until the Potlatch problem is resolved.

And I don't think a public fork would be smart to ditch Potlatch.

Anthony DiPierro

unread,

Aug 15, 2010, 11:03:40 AM8/15/10

to osm-...@googlegroups.com

On Sun, Aug 15, 2010 at 11:00 AM, Anthony DiPierro <dipi...@gmail.com> wrote:
> And I don't think a public fork would be smart to ditch Potlatch.

(Which is why I wrote on the OSM mailing list that I was planning on
doing a *private* fork...)

John Smith

unread,

Aug 15, 2010, 11:23:06 AM8/15/10

to osm-...@googlegroups.com

On 16 August 2010 01:00, Anthony DiPierro <dipi...@gmail.com> wrote:
> My understanding is that he's planning on implementing everything
> *but* the API in drupal. If that happens, it'd probably be reasonable
> to then reimplement the API without rails. One catch is that there

It doesn't make much sense to keep rails just for the API.

> are a lot of little irregularities which popped up accidentally due to
> the use of rails and xml. There is invalid data in the database,
> which can't be added any more but which was added previously. The
> UTF8 string lengths are not being treated consistently - rails is
> counting bytes and the db is counting characters (or is it vice versa?
> I forget).

We can work around that, the API is documented, although probably
poorly, but I don't think it would be hard to learn enough ruby/rails
to convert the logic into PHP, or at least find someone that does.

> And then Potlatch has its own sort of "private API" and is making
> direct queries against the database without going through rails (I
> suspect this is how some of the "invalid data" got into the database).
> Without a flash expert I think we'd have to ditch Potlatch in order
> to ditch rails. Or move to Potlatch 2, which I'm skeptical about the
> completion of.

I'm aware of the private API Richard coded, but it shouldn't pose that
much of a problem, at least no more than converting any of the other
code into PHP, all the API code does is provide a layer between
external software and the database, and they probably threw some logic
in for sanity checking and dealing with errors from the database etc.

It seems Brenden already some experience with shifting the logic so
perhaps he is more qualified to comment, in any case, Brenden is there
a repository somewhere with the existing PHP code and/or details on
setting it up?

John Smith

unread,

Aug 15, 2010, 11:26:47 AM8/15/10

to osm-...@googlegroups.com

I find it curious that people seem to be thinking about doing their
own private forks, you aren't the only one that has stated this,
although perhaps the only one that did on the public mailing lists. To
run a fork it will take a LOT of effort for very little benefit.

If it were me I'd just have OSM files with the data you generate
and/or the data from OSM you wanted to keep, similar to how the
Canadian's are dealing with the GeoBase/Canvec import where they split
the data up into "tiles" using some kind of grid reference numbering.

Anthony DiPierro

unread,

Aug 15, 2010, 11:31:50 AM8/15/10

to osm-...@googlegroups.com

On Sun, Aug 15, 2010 at 11:26 AM, John Smith <deltafo...@gmail.com> wrote:
> On 16 August 2010 01:03, Anthony DiPierro <dipi...@gmail.com> wrote:
>> On Sun, Aug 15, 2010 at 11:00 AM, Anthony DiPierro <dipi...@gmail.com> wrote:
>>> And I don't think a public fork would be smart to ditch Potlatch.
>>
>> (Which is why I wrote on the OSM mailing list that I was planning on
>> doing a *private* fork...)
>>
>
> I find it curious that people seem to be thinking about doing their
> own private forks, you aren't the only one that has stated this,
> although perhaps the only one that did on the public mailing lists. To
> run a fork it will take a LOT of effort for very little benefit.

I'd say in my case the effort is the benefit.

I do taxes for my day job. Coding is where I get to let out my creative side.

John Smith

unread,

Aug 15, 2010, 11:34:25 AM8/15/10

to osm-...@googlegroups.com

On 16 August 2010 01:31, Anthony DiPierro <dipi...@gmail.com> wrote:
> I'd say in my case the effort is the benefit.
>
> I do taxes for my day job. Coding is where I get to let out my creative side.

Coding has almost nothing to do with a fork unless you were planning
to extend the current APIs or other parts of the code base :)

Besides if you want to do something constructive, there is plenty
wrong with JOSM that could be fixed :)

Brendan Morley

unread,

Aug 15, 2010, 5:45:14 PM8/15/10

to osm-...@googlegroups.com

On 16/08/2010 1:23 AM, John Smith wrote:
> On 16 August 2010 01:00, Anthony DiPierro<dipi...@gmail.com> wrote:
>
>> My understanding is that he's planning on implementing everything
>> *but* the API in drupal. If that happens, it'd probably be reasonable
>> to then reimplement the API without rails. One catch is that there
>>
> It doesn't make much sense to keep rails just for the API.
>

Well, the rails API works, and my effort *isn't* because I was
completely unsatisfied with the rails API. If anyone's thinking of
rewriting it then they should at least reach out to the OSM community
for a joint effort.

> It seems Brenden already some experience with shifting the logic so
> perhaps he is more qualified to comment, in any case, Brenden is there
> a repository somewhere with the existing PHP code and/or details on
> setting it up?
>

Some of our messages may have crossed in transit.

I intend to reuse the existing OSM code for stuff that is relatively
unique to OSM (e.g. getting map data in and out of the database.)
I intend to use Drupal for the Web 2.0 stuff.
Drupal happens to be coded in php, but I've done very little php changes
so far. (Mostly to apply bugfixes.)
I may do (or ask someone to do) some Drupal code to glue it to the OSM API.

In fact given the choice I'd rather concentrate on imports as there's
better value there - at least JOSM can end up being the API interface to
start with.

Brendan

John Smith

unread,

Aug 15, 2010, 5:58:17 PM8/15/10

to OSM Fork

On 16 August 2010 07:45, Brendan Morley <mo...@beagle.com.au> wrote:
> Well, the rails API works, and my effort *isn't* because I was completely
> unsatisfied with the rails API. If anyone's thinking of rewriting it then
> they should at least reach out to the OSM community for a joint effort.

I rarely get much response about this sort of thing through the dev
list or the bug tracker so I tend to assume that they don't care or
aren't interested in anything other than ruby/rails...

> In fact given the choice I'd rather concentrate on imports as there's better value there

It's all sort of tied in with forking, to be able to keep up with
updates from OSM osmosis is pretty inefficient, maybe it's because it
was written in java, I don't know, but just coding that in C would
push things along. By the time you do that, you kind of have a start
on the rest of the API stuff, so it's just a matter of expanding from
there.

John Smith

unread,

Aug 19, 2010, 12:25:56 AM8/19/10

to osm-...@googlegroups.com

On 15 August 2010 10:57, John Smith <deltafo...@gmail.com> wrote:
> The rails port seems horribly inefficient if they need 5 servers just
> to run 1 website with the current userbase, so from a technically
> point of view this seems very poorly written or a very poor choice of
> language. I think Brenden has re-written a lot/all of this in
> drupal/php, although using C might be a better option for high load or
> even just PHP + the facebook C conversion tool might still ok.

It seems I had a few details wrong, the hiphop stuff converts PHP code
to C++ and uses g++ to compile it, and even simple scripts take a few
minutes to compile, the conversion only taks a few tenths of a second.

Also the conversion process for somethings is particularly bad, eg bcmod.

I made a simple loop script that counted to 100mill, this took about
6.7s to run in php, but only took 3.6s when compiled. However when I
tweaked it to only go up to 10 mill, but do a bcmod() function on each
loop, the php script ran in about 27s, the compiled copy took over
90s.

I'm currently waiting for phc (http://www.phpcompiler.org/) to build
to see if it does a better job.

John Smith

unread,

Aug 19, 2010, 3:18:13 PM8/19/10

to osm-...@googlegroups.com

On 19 August 2010 14:25, John Smith <deltafo...@gmail.com> wrote:
> Also the conversion process for somethings is particularly bad, eg bcmod.
>
> I made a simple loop script that counted to 100mill, this took about
> 6.7s to run in php, but only took 3.6s when compiled. However when I
> tweaked it to only go up to 10 mill, but do a bcmod() function on each
> loop, the php script ran in about 27s, the compiled copy took over
> 90s.
>
> I'm currently waiting for phc (http://www.phpcompiler.org/) to build
> to see if it does a better job.

phc binary performance was almost the complete opposite, a simple loop
to 100m took twice as long as php at 13.9s, but for a loop to 10m with
a bcmod operation it took a touch over half the time php did at
15.9s...

Can anyone think of any other benchmarks to try?

Brendan Morley

unread,

Aug 19, 2010, 5:54:04 PM8/19/10

to osm-...@googlegroups.com

Are you saying that there are bottlenecks in the php side of things?
Rather than the database engine? Otherwise I'm not sure why you're
going to the trouble of benchmarking the various compilation engines...

Brendan

John Smith

unread,

Aug 19, 2010, 5:58:14 PM8/19/10

to osm-...@googlegroups.com

On 20 August 2010 07:54, Brendan Morley <mo...@beagle.com.au> wrote:
> Are you saying that there are bottlenecks in the php side of things? Rather
> than the database engine? Otherwise I'm not sure why you're going to the
> trouble of benchmarking the various compilation engines...

I'm saying it is pointless to run the ruby code, unless you have 6 or
7 spare machines to throw at the problem, but you already know this or
you wouldn't be bypassing the API to upload data :)

Brendan Morley

unread,

Aug 19, 2010, 6:02:48 PM8/19/10

to osm-...@googlegroups.com

Of course (-:

I think though that, say, *any* php (or in my case, perl) is enough.

Brendan

Sam Vekemans

unread,

Aug 19, 2010, 6:06:04 PM8/19/10

to osm-...@googlegroups.com

it can be croud-sourced.
how many people are already available to help?

have you made a list of all the datasets available and the order they
need to ne imported?

there are lots of people available to help (everyone that attempted
importing in osm) could be able to help, since they already know the
data & exactly what linitations it has.

since this is a 'data warehouse' ... those who imported their local
data could be the same people to be updating the data for their
'warehouse rack space'.

cheers,
sam

--
Twitter: @Acrosscanada
Blogs: http://acrosscanadatrails.posterous.com/
http://Acrosscanadatrails.blogspot.com
Facebook: http://www.facebook.com/sam.vekemans
Skype: samvekemans
IRC: irc://irc.oftc.net #osm-ca Canadian OSM channel (an open chat room)
@Acrosscanadatrails

John Smith

unread,

Aug 19, 2010, 6:07:22 PM8/19/10

to osm-...@googlegroups.com

On 20 August 2010 08:02, Brendan Morley <mo...@beagle.com.au> wrote:
> I think though that, say, *any* php (or in my case, perl) is enough.

I was hoping a PHP compiler would be efficient enough to save me from
recoding the API in native C, but I think the conversion/optimisation
of the compilers is sub-par and will probably end up needing to do it
in C.

Brendan Morley

unread,

Aug 19, 2010, 6:10:00 PM8/19/10

to osm-...@googlegroups.com

On 20/08/2010 8:06 AM, Sam Vekemans wrote:
> there are lots of people available to help (everyone that attempted
> importing in osm) could be able to help, since they already know the

> data& exactly what linitations it has.
>

Yes, I am trying to get that same understanding of "knowing the data...
and limitations".

Brendan

Sam Vekemans

unread,

Aug 19, 2010, 6:24:26 PM8/19/10

to osm-...@googlegroups.com

cool,

because (for the last 2 years) i have been 'dealing' with bulk data, i
already know who these people are :)

so, i'd like to compile the list, and just use the 'code name'
OpenImportsMap' .... as essentially, this is what it is. .... this
database can be assembled and available in no time :)

..... weather or not the osm community wants to use it, is out of our hands.

cheers,
sam

On 8/19/10, Brendan Morley <mo...@beagle.com.au> wrote:

Anthony DiPierro

unread,

Aug 19, 2010, 6:52:13 PM8/19/10

to osm-...@googlegroups.com

For what it's worth, I agree with this. There are tons of improvement which can be made to Rails Port with any rewrite, regardless of the use of hiphop.

However, if you enjoy learning hiphop...

-- sent from cell phone

On Aug 19, 2010 6:02 PM, "Brendan Morley" <mo...@beagle.com.au> wrote:

On 20/08/2010 7:58 AM, John Smith wrote:
>

> On 20 August 2010 07:54, Brendan Morley<mo...@beagle.com...

Brendan Morley

unread,

Aug 19, 2010, 7:58:12 PM8/19/10

to osm-...@googlegroups.com

I would recommend against a C implementation:

Have you considered?:
* C will likely reduce agility/flexibility and the number of people in the contributor developer pool
* Why did OSM go Rails and what alternatives did they throw out?
* The main bottleneck is in the way Rails divides requests to the database for operations involving many rows
* The total demand of the OSM userbase - it was certainly much higher than I first assumed
* The data is stored in a canonical topological form, not a OGC SFS form -
PostGIS may not actually be of assistance.
* How does Oracle Spatial and ArcSDE deal with similar problems
* OSM is an I/O driven website, not a computationally driven website (i.e. we barely deal with datum transformation, distance buffering, point-in-polygon etc)

Brendan

---
Sent from my Nokia E63's tiny keypad.
Please forgive any fat fingered spelling mistakes.

-original message-
Subject: Re: [OSM Fork] Re: Friendly Forks
From: John Smith <deltafo...@gmail.com>
Date: 20/08/2010 8:07 am

On 20 August 2010 08:02, Brendan Morley <mo...@beagle.com.au> wrote:

> I think though that, say, *any* php (or in my case, perl) is enough.

I was hoping a PHP compiler would be efficient enough to save me from

John Smith

unread,

Aug 19, 2010, 8:17:41 PM8/19/10

to osm-...@googlegroups.com

On 20 August 2010 09:58, Brendan Morley <mo...@beagle.com.au> wrote:
> * C will likely reduce agility/flexibility and the number of people in the contributor developer pool

The same could be said of any language, I don't know Ruby so you could
say the status quo would, in theory at least, have the same effect.

> * Why did OSM go Rails

I'm guessing for the same reason most people choose PHP or Perl or
...., that's what the coders knew, it's quick to code in and so on.

> and what alternatives did they throw out?

No idea...

> * The main bottleneck is in the way Rails divides requests to the database for operations involving many rows

Rails, like most frame works, suffer this problem, this is the same
reason I don't bother with PEAR and just use PHP for most website
stuff I have to code.

> * OSM is an I/O driven website, not a computationally driven website (i.e. we barely deal with datum transformation, distance buffering, point-in-polygon etc)

I assume you are talking about database IO ? That may or may not be a
problem depending how much RAM you throw at the DB and how well you
can tweak the database settings to get as much of it in RAM as
possible, there will always be some IO issues simply due to the amount
of writes that need to occur, but you can minimise the read IO issues
with more RAM, although a few SSDs that are big enough might get
around IO problems completely...

Anthony DiPierro

unread,

Aug 19, 2010, 8:35:02 PM8/19/10

to osm-...@googlegroups.com

I actually think C, for the core API, would be the best choice.
Mainly because it is the most portable. Most languages, including
perl, PHP, and Ruby, support C modules (or extensions or libraries or
whatever they call them). Most languages do not support PHP modules.

In my opinion the core API should be as language neutral as possible.

That said, I don't know many people interested in writing the core API
in C. And by not many I mean I don't know anyone other than myself
interested in doing it (and unfortunately I don't have the spare time
to do it all by myself).

So, yeah, I think PHP is fine. I'd say even sticking with Ruby/Rails
is fine for now. Performance can come later, when you have enough in
donations to hire me full time :).

Sam Vekemans

unread,

Aug 19, 2010, 9:37:30 PM8/19/10

to osm-...@googlegroups.com

i think it depends on which way you/we want the project to go.

if we want the project to not accept individual user contributions
(say adding in steps) .. and only permit the import of the government
data. ... then the datbase needs to be usable by those who actually
work with the government data directly. (gis pros .

so perhaps inviting those in the osgeo-community who have experience
with this? as long as they understand that its a 'public domain / open
access / bulk_import database' ...

oh, so this database design needs to easily work with the perl script
bulk_upload.pl and not have the roadblocks that osm has, with the
limitations. so the contributors can have (maybe it needs to be a
schedual?) api access time.

if its dealing with imports only, the database can be updated monthly,
with panet-import.osm files being made monthly.

and it needs to able to produce shp files as its core function (3d
models) and handle all of the various bulk datasets available.
ideas,
sam

John Smith

unread,

Aug 19, 2010, 9:45:03 PM8/19/10

to osm-...@googlegroups.com

On 20 August 2010 11:37, Sam Vekemans <acrosscan...@gmail.com> wrote:
> oh, so this database design needs to easily work with the perl script
> bulk_upload.pl and not have the roadblocks that osm has, with the
> limitations. so the contributors can have (maybe it needs to be a
> schedual?) api access time.

bulk_upload is slow, probably in part due to the APIs, it's much much
quicker to inject the data directly into the database, so if you are
thinking about a specific community of doing bulk imports of data only
there are better ways to do this.

Sam Vekemans

unread,

Aug 19, 2010, 9:45:55 PM8/19/10

to osm-...@googlegroups.com

i mean, leave the user-contributed adding in stairs, to the
openstreetmap project.
and focus on providing openstreetmap (and naby other systems) the
simple datawarehouse that can be used as an underlay, and focus on
proviging the various map renderings of the data, so that more users
can use i as an overlay..

..... ie, if this database can host all contours and render a complete
basemap for cyclemap, then the cyclemap only needs to render
transparent overlays for the route relations, and select poi/features
(that isnt available from this project).

..... i think that rendering the panet as a transparent overlay with
only the small time user edits, would be easier for osm.... so this
project should be focused on handleing the 'bulk_data'.

again fat fingers sorry,
cheers,
sam

John Smith

unread,

Aug 19, 2010, 9:55:58 PM8/19/10

to osm-...@googlegroups.com

On 20 August 2010 11:45, Sam Vekemans <acrosscan...@gmail.com> wrote:
> i mean, leave the user-contributed adding in stairs, to the
> openstreetmap project.
> and focus on providing openstreetmap (and naby other systems) the
> simple datawarehouse that can be used as an underlay, and focus on
> proviging the various map renderings of the data, so that more users
> can use i as an overlay..

Then my point in my first reply still stands, bulk_upload/APIs suck,
you are better of directly importing into a DB and/or coming up with
your own APIs or at the very least recoding the APIs in a faster
language.

Sam Vekemans

unread,

Aug 19, 2010, 9:57:05 PM8/19/10

to osm-...@googlegroups.com

yup,
'injecting data directly' thats what i was trying to say :)

so having the database setup so its only those people who knohow to
'bulk_inject' and it gets done 1 dataset at a time. .... then we can
work it that the datasets that are available can 'get inline' to be
uploaded.

i messaged dave hanson, i think he was given direct api access to
inject TIGER into the database, perhaps he can help with better ideas.

... and a google docs chart can handle the oganization of these
datasets that are 'in line', (so non-tech users like me, and see it
and work to direct traffic in this warehouse.

.... now i know why i liked working in physical warehouses back when i
1st started working :-)
... organizing skids :)

cheers,
sam

Brendan Morley

unread,

Aug 20, 2010, 12:21:28 AM8/20/10

to osm-...@googlegroups.com

Well just to reconfirm,

I am planning "direct injection".

I am even planning multiple threads - e.g. Sam you might be given all
the IDs where (ID % 8 == 6), that is, 6, 14, 22, 30, 38 etc, John gets
(ID % 8 == 7), and so on.

Importers would need to be (a) trustworthy (since we're bypassing the
API) - and (b) be able to handle recent versions of the JRE, osmosis,
PostgreSQL+PostGIS, ogr2ogr, and perl. (Optionally Merkaartor for
sanity checking of changesets)

My perl scripts are home built.

Trusted importers would upload their psql "COPY" files to a mutually
trusted location - say, Amazon S3 - and the CM server would poll on a
first come first served basis.

Just some thoughts for now.

Brendan

John Smith

unread,

Aug 20, 2010, 12:23:21 AM8/20/10

to osm-...@googlegroups.com

On 20 August 2010 14:21, Brendan Morley <mo...@beagle.com.au> wrote:
> Importers would need to be (a) trustworthy (since we're bypassing the API) -
> and (b) be able to handle recent versions of the JRE, osmosis,
> PostgreSQL+PostGIS, ogr2ogr, and perl. (Optionally Merkaartor for sanity
> checking of changesets)

I'm guessing you could bypass most, if not all of that and generate
some scripts that convert from OSM or SHP format and convert any data
into suitable SQL format.

Sam Vekemans

unread,

Aug 20, 2010, 12:37:03 AM8/20/10

to osm-...@googlegroups.com

well,
in canada, on the #osm-ca irc chat on oftc,net we have 6 users who are
capable of doing this.

anyway, everyone is welcome to join the talk

Also, please announce to the talk-ca mailing list your intentions.
.... all of us who are working in canada would rather see the data
injected. .... even if its not in osm, this can surve as a model for
others and work as a 'soft entry' to acceptance.

cheers,
sam

On 8/19/10, Brendan Morley <mo...@beagle.com.au> wrote:

Reply all

Reply to author

Forward