It's trivial to be friendly to each other, just don't be nasty :)
> I'm thinking here about such things as:
> * Continuity of OSM accounts (is there a way an OSM user could be
> re-associated with the same user Id in a forked database?)
This would only be an issue for forks, so the CC-BY and PD "forks"
which are really forks could just add new accounts.
If they want to ensure things so that the same userid is the same
between them, regardless if they are a fork or not could hook into the
oauth stuff.
> * Ensuring tagging schemes do not diverge
This is bound to happen, if for no other reason than differing
personal opinions, who gets there first etc etc etc.
One solution might be to setup a translation matrix, where a DB keeps
tabs on various tagging schemes, OSM might end up going this way in
any case. Alternatively the OSM wiki could be deemed authoritative and
those wanting to expand tags could keep working within current and
future OSM tagging structures/schemes...
> * Where licenses and usage permits, enabling data from different OSM forks
> to be combined (unique ranges for element ID's?)
Unique IDs is an option, there is also some scripts with the French
data that tries to match attributes from existing data, although once
OSM switches to ODBL that may no longer be an option, just as it might
be difficult to shift data from a CC database into a PD database.
Another option might be to do something in editors, where multiple
layers are downloaded from each system and then anything new is
uploaded into 1 or more databases that are compatible with the user's
preference. That would bypass the issue of licenses since the author
is responsible for where their data ends up.
oauth seems like the best way to do that, so long as OSMF cooperates
or at least doesn't hinder things.
However, there's a potential snafu
> * Ensuring tagging schemes do not diverge
Presumably the fork and OSM will continue to use the same
editors/renderers. In my opinion this provides an optimum balance
between ensuring tagging schemes do not diverge too much, but allowing
flexibility in both projects. If a fork wants to try out a new
tagging scheme for a while, they're free to do so, and if it turns out
to be a really good idea maybe the editors/renderers will adopt it and
it'll wind up being adopted by OSM as well.
Ensuring tagging schemes do not diverge too much between OSM and the
fork is basically the same problem as ensuring that they don't diverge
too much between Europe and North America. Document and communicate.
On Sat, Aug 14, 2010 at 10:42 AM, 80n <80n...@gmail.com> wrote:oauth seems like the best way to do that, so long as OSMF cooperates
> * Continuity of OSM accounts (is there a way an OSM user could be
> re-associated with the same user Id in a forked database?)
or at least doesn't hinder things.
However, there's a potential snafu
> * Ensuring tagging schemes do not divergePresumably the fork and OSM will continue to use the same
editors/renderers. In my opinion this provides an optimum balance
between ensuring tagging schemes do not diverge too much, but allowing
flexibility in both projects. If a fork wants to try out a new
tagging scheme for a while, they're free to do so, and if it turns out
to be a really good idea maybe the editors/renderers will adopt it and
it'll wind up being adopted by OSM as well.
Ensuring tagging schemes do not diverge too much between OSM and the
fork is basically the same problem as ensuring that they don't diverge
too much between Europe and North America. Document and communicate.
In theory. I haven't actually used oauth with OSM yet so I'm not sure
of the exact details. It looks like you have to "register your
application as an oauth consumer", though, and that requires an OSM
account, so there is a lot of room for OSMF to prevent us from using
oauth if they decide they want to.
(By the way, I think I had decided that "potential snafu" I was
talking about wasn't actually a problem, and just forgot to delete
that sentence. In any case, I forget the details of it.)
> I have a Steve's OSM account: 80n. The status-quo fork would have all my
> contributions under the same account Id. When I register as 80n in the
> fork, how could I verifiably claim to be the same 80n and thus establish
> ownership of all those contributions?
My understanding is that our website would provide a link to OSM, the
user would log in to their OSM account, and then they'd return to our
website with a token which verifies that they are the user they say
they are.
> In some ways it doesn't matter at all. Everyone could be given new
> accounts. But establishing continuity would help with things like
> contributors stats (these are important to some people) and make the
> transition less alien for established OSM users.
>
> I don't know enough about oauth to know if it can do this? Would existing
> OSM users have to invest time in setting up an oauth account? Is that easy
> to do?
>
> If we want people to switch to the fork it has to be easy to do.
I would suggest that the fork maintain its own user accounts, and
allow people to link their OSM accounts. Linking would be a one-time
deal, and we'd just maintain a table of links locally. Probably no
reason not to let people link multiple OSM accounts if they want.
I'm not sure exactly how easy it is. My understanding is that it's
similar to buying something with Paypal. But until I get a chance to
actually implement a test case, I don't know.
Since all we have is a userid and/or username we can only authenticate
against a known database, in this case OSM's.
> In some ways it doesn't matter at all. Everyone could be given new
> accounts. But establishing continuity would help with things like
> contributors stats (these are important to some people) and make the
> transition less alien for established OSM users.
>
> I don't know enough about oauth to know if it can do this? Would existing
> OSM users have to invest time in setting up an oauth account? Is that easy
> to do?
What would be easier would be just to send an email through the OSM
system, to automatically setup oauth we have to know their password. I
guess leave the option up to the end user if they want to use oauth or
password authentication, as emails would probably be blocked sooner
rather than later.
What can be done to make the various forks that are likely to appear more friendly, both to each other and to the "Steve Coast controlled OSM" (TM).
* Continuity of OSM accounts (is there a way an OSM user could be re-associated with the same user Id in a forked database?)
* Ensuring tagging schemes do not diverge
* Where licenses and usage permits, enabling data from different OSM forks to be combined (unique ranges for element ID's?)
The rails port seems horribly inefficient if they need 5 servers just
to run 1 website with the current userbase, so from a technically
point of view this seems very poorly written or a very poor choice of
language. I think Brenden has re-written a lot/all of this in
drupal/php, although using C might be a better option for high load or
even just PHP + the facebook C conversion tool might still ok.
I'm not quite sure why SteveC went rails, but since the main business
case for rails is essentially, "more hardware is less expensive than
more developer hours", I can see his point. Maybe for them, less
developer hours = more time out there surveying.
> I think Brenden has re-written a lot/all of this in
> drupal/php,
The API remains as rails-backed code. I also intend to run an
additional bulk upload method (preprocessing psql data on the client
side and direct upload to the core database).
The front end website I intend to migrate over to drupal-backed code
where possible. The User Diaries function in particular seems to be
better off on a drupal base where it can benefit from a larger developer
pool. Drupal would then also handle forums, wiki, mailing list, issue
list - therefore building a strong, highly interlinked community site.
I'd possibly also use it to host source code management and GPS trace
management - drupal modules exist to do this but I haven't checked them
out much yet.
The main hassle I have with drupal is it's not obvious how to do
cross-database calls (I may not have found the right switch yet) - not a
showstopper though. But it would help reimplement, say, the History
page in a drupal skin. Another approach might be to implement an OSM
API client in Drupal to fetch data for the History page etc., which is a
more architecturally pure approach.
One of my next tests is to run an OpenID server from the drupal site -
the theory being one just registers into the drupal site and then
re-uses that login on the rails API. Then we get the "holy grail" of
single password across all CommonMap offerings.
> although using C might be a better option for high load or
> even just PHP + the facebook C conversion tool might still ok.
>
Well Tom Hughes preferred optimisation in C(++?), although when I was
troubleshooting the slow uploader, a lot of the problem was in a whole
lot of little calls being made of the database. So it's a bit early to
call in the proprietary facebook stuff.
Brendan
I wouldn't exactly call the facebook stuff propriatary, since it just
converts php to C and then you use any C compiler, like gcc, to
compile it into a stand alone binary.
I don't like rails port at all. From what I've seen it looks like the
database was designed around a rails paradigm rather than the software
being designed around the database.
That's not to say that it was the wrong choice. Just getting
something out there to attract users might have been more important in
the beginning stages of the project.
Rewriting the code which implements the API would be a top priority
for me, *if* I thought I had the time (or the time plus assistance) to
complete it. Right now I don't though, and the rails port seems to be
adequate for the time being.
Something else I'd like to do if I had the time would be to support
more quick editing outside of the standard graphical map-based
editors. People should be able to, for example, correct road name
spellings, without firing up JOSM or Potlatch. An editor just
dedicated to adding lots of POIs quickly would be another good
addition. Provide it with a few tags (or one, say amenity=library)
and a list of addresses, and it pops up an aerial/satellite map with
the approximate location, you click at the proper position, maybe type
the name of the library (or maybe that could be in the spreadsheet you
upload at the start), and then it loads up the next one on the list.
Maybe this is a bit US-centric: it'd be a great way to convert the
TIGER approximate geolocation into really accurate POIs.
Well is a drupal frame work in your opinion a better option?
If so we could follow Brenden down that path, all we'd need to do then
is fix up the API code and we should be able to ditch the rails stuff
completely.
> Something else I'd like to do if I had the time would be to support
> more quick editing outside of the standard graphical map-based
> editors. People should be able to, for example, correct road name
> spellings, without firing up JOSM or Potlatch. An editor just
> dedicated to adding lots of POIs quickly would be another good
> addition. Provide it with a few tags (or one, say amenity=library)
> and a list of addresses, and it pops up an aerial/satellite map with
> the approximate location, you click at the proper position, maybe type
> the name of the library (or maybe that could be in the spreadsheet you
> upload at the start), and then it loads up the next one on the list.
> Maybe this is a bit US-centric: it'd be a great way to convert the
> TIGER approximate geolocation into really accurate POIs.
Maybe we could do something in conjunction with Nearmap? They're
already working on some kind of simplified editor.
I don't really know much about drupal. My understanding is that it's
not being used for the API part anyway, though.
> If so we could follow Brenden down that path, all we'd need to do then
> is fix up the API code and we should be able to ditch the rails stuff
> completely.
My understanding is that he's planning on implementing everything
*but* the API in drupal. If that happens, it'd probably be reasonable
to then reimplement the API without rails. One catch is that there
are a lot of little irregularities which popped up accidentally due to
the use of rails and xml. There is invalid data in the database,
which can't be added any more but which was added previously. The
UTF8 string lengths are not being treated consistently - rails is
counting bytes and the db is counting characters (or is it vice versa?
I forget).
And then Potlatch has its own sort of "private API" and is making
direct queries against the database without going through rails (I
suspect this is how some of the "invalid data" got into the database).
Without a flash expert I think we'd have to ditch Potlatch in order
to ditch rails. Or move to Potlatch 2, which I'm skeptical about the
completion of.
I guess that wouldn't stop us from reimplementing the API outside of
rails without touching the database schema. But there'd likely be
little benefit to doing so until the Potlatch problem is resolved.
And I don't think a public fork would be smart to ditch Potlatch.
(Which is why I wrote on the OSM mailing list that I was planning on
doing a *private* fork...)
It doesn't make much sense to keep rails just for the API.
> are a lot of little irregularities which popped up accidentally due to
> the use of rails and xml. There is invalid data in the database,
> which can't be added any more but which was added previously. The
> UTF8 string lengths are not being treated consistently - rails is
> counting bytes and the db is counting characters (or is it vice versa?
> I forget).
We can work around that, the API is documented, although probably
poorly, but I don't think it would be hard to learn enough ruby/rails
to convert the logic into PHP, or at least find someone that does.
> And then Potlatch has its own sort of "private API" and is making
> direct queries against the database without going through rails (I
> suspect this is how some of the "invalid data" got into the database).
> Without a flash expert I think we'd have to ditch Potlatch in order
> to ditch rails. Or move to Potlatch 2, which I'm skeptical about the
> completion of.
I'm aware of the private API Richard coded, but it shouldn't pose that
much of a problem, at least no more than converting any of the other
code into PHP, all the API code does is provide a layer between
external software and the database, and they probably threw some logic
in for sanity checking and dealing with errors from the database etc.
It seems Brenden already some experience with shifting the logic so
perhaps he is more qualified to comment, in any case, Brenden is there
a repository somewhere with the existing PHP code and/or details on
setting it up?
I find it curious that people seem to be thinking about doing their
own private forks, you aren't the only one that has stated this,
although perhaps the only one that did on the public mailing lists. To
run a fork it will take a LOT of effort for very little benefit.
If it were me I'd just have OSM files with the data you generate
and/or the data from OSM you wanted to keep, similar to how the
Canadian's are dealing with the GeoBase/Canvec import where they split
the data up into "tiles" using some kind of grid reference numbering.
I'd say in my case the effort is the benefit.
I do taxes for my day job. Coding is where I get to let out my creative side.
Coding has almost nothing to do with a fork unless you were planning
to extend the current APIs or other parts of the code base :)
Besides if you want to do something constructive, there is plenty
wrong with JOSM that could be fixed :)
> It seems Brenden already some experience with shifting the logic so
> perhaps he is more qualified to comment, in any case, Brenden is there
> a repository somewhere with the existing PHP code and/or details on
> setting it up?
>
Some of our messages may have crossed in transit.
I intend to reuse the existing OSM code for stuff that is relatively
unique to OSM (e.g. getting map data in and out of the database.)
I intend to use Drupal for the Web 2.0 stuff.
Drupal happens to be coded in php, but I've done very little php changes
so far. (Mostly to apply bugfixes.)
I may do (or ask someone to do) some Drupal code to glue it to the OSM API.
In fact given the choice I'd rather concentrate on imports as there's
better value there - at least JOSM can end up being the API interface to
start with.
Brendan
I rarely get much response about this sort of thing through the dev
list or the bug tracker so I tend to assume that they don't care or
aren't interested in anything other than ruby/rails...
> In fact given the choice I'd rather concentrate on imports as there's better value there
It's all sort of tied in with forking, to be able to keep up with
updates from OSM osmosis is pretty inefficient, maybe it's because it
was written in java, I don't know, but just coding that in C would
push things along. By the time you do that, you kind of have a start
on the rest of the API stuff, so it's just a matter of expanding from
there.
It seems I had a few details wrong, the hiphop stuff converts PHP code
to C++ and uses g++ to compile it, and even simple scripts take a few
minutes to compile, the conversion only taks a few tenths of a second.
Also the conversion process for somethings is particularly bad, eg bcmod.
I made a simple loop script that counted to 100mill, this took about
6.7s to run in php, but only took 3.6s when compiled. However when I
tweaked it to only go up to 10 mill, but do a bcmod() function on each
loop, the php script ran in about 27s, the compiled copy took over
90s.
I'm currently waiting for phc (http://www.phpcompiler.org/) to build
to see if it does a better job.
phc binary performance was almost the complete opposite, a simple loop
to 100m took twice as long as php at 13.9s, but for a loop to 10m with
a bcmod operation it took a touch over half the time php did at
15.9s...
Can anyone think of any other benchmarks to try?
Brendan
I'm saying it is pointless to run the ruby code, unless you have 6 or
7 spare machines to throw at the problem, but you already know this or
you wouldn't be bypassing the API to upload data :)
I think though that, say, *any* php (or in my case, perl) is enough.
Brendan
have you made a list of all the datasets available and the order they
need to ne imported?
there are lots of people available to help (everyone that attempted
importing in osm) could be able to help, since they already know the
data & exactly what linitations it has.
since this is a 'data warehouse' ... those who imported their local
data could be the same people to be updating the data for their
'warehouse rack space'.
cheers,
sam
--
Twitter: @Acrosscanada
Blogs: http://acrosscanadatrails.posterous.com/
http://Acrosscanadatrails.blogspot.com
Facebook: http://www.facebook.com/sam.vekemans
Skype: samvekemans
IRC: irc://irc.oftc.net #osm-ca Canadian OSM channel (an open chat room)
@Acrosscanadatrails
I was hoping a PHP compiler would be efficient enough to save me from
recoding the API in native C, but I think the conversion/optimisation
of the compilers is sub-par and will probably end up needing to do it
in C.
Yes, I am trying to get that same understanding of "knowing the data...
and limitations".
Brendan
because (for the last 2 years) i have been 'dealing' with bulk data, i
already know who these people are :)
so, i'd like to compile the list, and just use the 'code name'
OpenImportsMap' .... as essentially, this is what it is. .... this
database can be assembled and available in no time :)
..... weather or not the osm community wants to use it, is out of our hands.
cheers,
sam
On 8/19/10, Brendan Morley <mo...@beagle.com.au> wrote:
For what it's worth, I agree with this. There are tons of improvement which can be made to Rails Port with any rewrite, regardless of the use of hiphop.
However, if you enjoy learning hiphop...
-- sent from cell phone
On Aug 19, 2010 6:02 PM, "Brendan Morley" <mo...@beagle.com.au> wrote:
On 20/08/2010 7:58 AM, John Smith wrote:
>
> On 20 August 2010 07:54, Brendan Morley<mo...@beagle.com...
Have you considered?:
* C will likely reduce agility/flexibility and the number of people in the contributor developer pool
* Why did OSM go Rails and what alternatives did they throw out?
* The main bottleneck is in the way Rails divides requests to the database for operations involving many rows
* The total demand of the OSM userbase - it was certainly much higher than I first assumed
* The data is stored in a canonical topological form, not a OGC SFS form -
PostGIS may not actually be of assistance.
* How does Oracle Spatial and ArcSDE deal with similar problems
* OSM is an I/O driven website, not a computationally driven website (i.e. we barely deal with datum transformation, distance buffering, point-in-polygon etc)
Brendan
---
Sent from my Nokia E63's tiny keypad.
Please forgive any fat fingered spelling mistakes.
-original message-
Subject: Re: [OSM Fork] Re: Friendly Forks
From: John Smith <deltafo...@gmail.com>
Date: 20/08/2010 8:07 am
On 20 August 2010 08:02, Brendan Morley <mo...@beagle.com.au> wrote:
> I think though that, say, *any* php (or in my case, perl) is enough.
I was hoping a PHP compiler would be efficient enough to save me from
The same could be said of any language, I don't know Ruby so you could
say the status quo would, in theory at least, have the same effect.
> * Why did OSM go Rails
I'm guessing for the same reason most people choose PHP or Perl or
...., that's what the coders knew, it's quick to code in and so on.
> and what alternatives did they throw out?
No idea...
> * The main bottleneck is in the way Rails divides requests to the database for operations involving many rows
Rails, like most frame works, suffer this problem, this is the same
reason I don't bother with PEAR and just use PHP for most website
stuff I have to code.
> * OSM is an I/O driven website, not a computationally driven website (i.e. we barely deal with datum transformation, distance buffering, point-in-polygon etc)
I assume you are talking about database IO ? That may or may not be a
problem depending how much RAM you throw at the DB and how well you
can tweak the database settings to get as much of it in RAM as
possible, there will always be some IO issues simply due to the amount
of writes that need to occur, but you can minimise the read IO issues
with more RAM, although a few SSDs that are big enough might get
around IO problems completely...
In my opinion the core API should be as language neutral as possible.
That said, I don't know many people interested in writing the core API
in C. And by not many I mean I don't know anyone other than myself
interested in doing it (and unfortunately I don't have the spare time
to do it all by myself).
So, yeah, I think PHP is fine. I'd say even sticking with Ruby/Rails
is fine for now. Performance can come later, when you have enough in
donations to hire me full time :).
if we want the project to not accept individual user contributions
(say adding in steps) .. and only permit the import of the government
data. ... then the datbase needs to be usable by those who actually
work with the government data directly. (gis pros .
so perhaps inviting those in the osgeo-community who have experience
with this? as long as they understand that its a 'public domain / open
access / bulk_import database' ...
oh, so this database design needs to easily work with the perl script
bulk_upload.pl and not have the roadblocks that osm has, with the
limitations. so the contributors can have (maybe it needs to be a
schedual?) api access time.
if its dealing with imports only, the database can be updated monthly,
with panet-import.osm files being made monthly.
and it needs to able to produce shp files as its core function (3d
models) and handle all of the various bulk datasets available.
ideas,
sam
bulk_upload is slow, probably in part due to the APIs, it's much much
quicker to inject the data directly into the database, so if you are
thinking about a specific community of doing bulk imports of data only
there are better ways to do this.
..... ie, if this database can host all contours and render a complete
basemap for cyclemap, then the cyclemap only needs to render
transparent overlays for the route relations, and select poi/features
(that isnt available from this project).
..... i think that rendering the panet as a transparent overlay with
only the small time user edits, would be easier for osm.... so this
project should be focused on handleing the 'bulk_data'.
again fat fingers sorry,
cheers,
sam
Then my point in my first reply still stands, bulk_upload/APIs suck,
you are better of directly importing into a DB and/or coming up with
your own APIs or at the very least recoding the APIs in a faster
language.
so having the database setup so its only those people who knohow to
'bulk_inject' and it gets done 1 dataset at a time. .... then we can
work it that the datasets that are available can 'get inline' to be
uploaded.
i messaged dave hanson, i think he was given direct api access to
inject TIGER into the database, perhaps he can help with better ideas.
... and a google docs chart can handle the oganization of these
datasets that are 'in line', (so non-tech users like me, and see it
and work to direct traffic in this warehouse.
.... now i know why i liked working in physical warehouses back when i
1st started working :-)
... organizing skids :)
cheers,
sam
I am planning "direct injection".
I am even planning multiple threads - e.g. Sam you might be given all
the IDs where (ID % 8 == 6), that is, 6, 14, 22, 30, 38 etc, John gets
(ID % 8 == 7), and so on.
Importers would need to be (a) trustworthy (since we're bypassing the
API) - and (b) be able to handle recent versions of the JRE, osmosis,
PostgreSQL+PostGIS, ogr2ogr, and perl. (Optionally Merkaartor for
sanity checking of changesets)
My perl scripts are home built.
Trusted importers would upload their psql "COPY" files to a mutually
trusted location - say, Amazon S3 - and the CM server would poll on a
first come first served basis.
Just some thoughts for now.
Brendan
I'm guessing you could bypass most, if not all of that and generate
some scripts that convert from OSM or SHP format and convert any data
into suitable SQL format.
anyway, everyone is welcome to join the talk
Also, please announce to the talk-ca mailing list your intentions.
.... all of us who are working in canada would rather see the data
injected. .... even if its not in osm, this can surve as a model for
others and work as a 'soft entry' to acceptance.
cheers,
sam
On 8/19/10, Brendan Morley <mo...@beagle.com.au> wrote: