wikipedia-iphone on OLPC

4 views

Skip to first unread message

Chris Ball

unread,

May 1, 2008, 12:11:50 AM5/1/08

to wikipedi...@googlegroups.com

Hi Patrick/all,

As surprising as it might sound, I think wikipedia-iphone is a pretty
good fit for distributing Wikipedia snapshots on the One Laptop Per
Child laptop. :) I have the Ruby/Mongrel server running under Linux,
so it's almost there, and I have a few changes in mind:

* Port the Ruby/Mongrel code to Python/BaseHTTPServer so that it can
run as a standalone "activity" on the XO, which comes with Python
and its standard library preinstalled.

* Serve up unparsed text combined with one of the Javascript wiki->HTML
parsers rather than using the parser inside wikipedia-iphone.

* Some method for selecting articles to include in the snapshot. At the
moment I'm most interested in making a Spanish snapshot, because the
largest OLPC deployment countries at the moment are Peru and Uruguay,
both Spanish-speaking. The dump of articles without history for
Spanish is 400M .bz2, which is larger than we can afford to put on
every laptop -- 100M or so would be ideal (the laptop has 1G flash).
We should add a subset of images, too.

* For OLPC, I think we can accomplish cutting our snapshot down to
popular/linked articles by choosing what goes into the .xml.bz2
(perhaps there are already tools for doing this? I don't know much
about it), as long as we deal with the link-breaking we do as a
result. This should be simple enough: when we're rendering a wiki
link, look it up in the index, and link to the internal page if it
is, and either the external site or no link at all if it isn't.

Of these, I think some are probably interesting to iPhone users too; in
particular, checking for internal link validity instead of generating
links that fail, and using one of the JavaScript parsers instead of the
current regexps. If people are interested, maybe we could split up some
of the work?

Thanks very much, would love to hear any ideas on what else to look at,

- Chris.
--
Chris Ball <c...@laptop.org>

Patrick Collison

unread,

May 1, 2008, 6:31:22 AM5/1/08

to wikipedi...@googlegroups.com

Hey Chris,

On Thu, May 1, 2008 at 5:11 AM, Chris Ball <c...@laptop.org> wrote:
> As surprising as it might sound, I think wikipedia-iphone is a pretty
> good fit for distributing Wikipedia snapshots on the One Laptop Per
> Child laptop. :)

Great, sounds very interesting, and am happy to help out.

> * Port the Ruby/Mongrel code to Python/BaseHTTPServer so that it can
> run as a standalone "activity" on the XO, which comes with Python
> and its standard library preinstalled.

That should be quite trivial.

> * Serve up unparsed text combined with one of the Javascript wiki->HTML
> parsers rather than using the parser inside wikipedia-iphone.

Hmm. I can imagine this introducing a significant rendering lag,
especially for large pages. Has this approach been attempted already
by anyone that you're aware of on the XO?

> * Some method for selecting articles to include in the snapshot. At the
> moment I'm most interested in making a Spanish snapshot, because the
> largest OLPC deployment countries at the moment are Peru and Uruguay,
> both Spanish-speaking. The dump of articles without history for
> Spanish is 400M .bz2, which is larger than we can afford to put on
> every laptop -- 100M or so would be ideal (the laptop has 1G flash).
> We should add a subset of images, too.

We can sort the articles by number of inlinks, or something, which
seems on the face of it a reasonable way of picking the top quartile.
I also wrote a quick PageRank calculator for Mediawiki a while ago
(http://collison.ie/code/pagerank.rb) -- could use that.

> * For OLPC, I think we can accomplish cutting our snapshot down to
> popular/linked articles by choosing what goes into the .xml.bz2
> (perhaps there are already tools for doing this? I don't know much
> about it), as long as we deal with the link-breaking we do as a
> result. This should be simple enough: when we're rendering a wiki
> link, look it up in the index, and link to the internal page if it
> is, and either the external site or no link at all if it isn't.

Yep.

> Of these, I think some are probably interesting to iPhone users too; in
> particular, checking for internal link validity instead of generating
> links that fail, and using one of the JavaScript parsers instead of the
> current regexps. If people are interested, maybe we could split up some
> of the work?

Well, checking link validity at runtime is actually a fairly expensive
operation... we could make it cheaper, but we'd also require more
space as a result. We could push some of this work to compile time,
though.

Cheers,

Patrick

Reply all

Reply to author

Forward

0 new messages