Once you have the source anyway, things are explained in the README file:
==Get started==
cd c
./configure
make
cd ..
Get a Wikipedia XML dump (e.g. enwiki-20071018-pages-articles.xml.bz2), and
place it in root (wp) directory.
cd sh
./process ../<dump>
cd ..
The processing stage will take several hours (8 on 2.16GHz MBP). (If someone
wants to speed it up, implement xmlprocess.rb in C.) Once this is done, you
can delete the original dump. If you get sick of waiting, use a dump of
the Simple English Wikipedia, which is several orders of magnitude smaller
than the standard English dumps.
--snip--
To the above, I'll add that you need to run "sh bootstrap.sh" before
running configure in the wp/c directory.
Try following those instructions, and let me know where you have issues.
On 06/02/2008, Oskar_ Vikholm <snobb...@hotmail.com> wrote:
>
> Okay... But I have no idea how to do it... so it would be great if you
> could give me some instructions and explain how you did it with the english-
> and german dump!
>
> Thanks man
>
> > Date: Tue, 5 Feb 2008 17:59:00 -0800
> > From: pat...@collison.ie
> > To: snobb...@hotmail.com
> > Subject: Re: Language
>
> >
> > I can't -- but I'm happy to assist with you doing so :).
> >
> > Post to the group --
> > http://groups.google.com/group/wikipedia-iphone/topics --
> with any
> > questions, and we'll help out.
> >
> > On 05/02/2008, Oskar_ Vikholm <snobb...@hotmail.com> wrote:
> > >
> > > Hello. Could you make an dump for Swedish? (http://sv.wikipedia.org)
> Would
> > > be great man! Thanks for the english dump!
> > > ________________________________
> > > Tokbilliga solresor & gratis reseguider! MSN Resor
>
>
> ________________________________
> Den perfekta mixen av nöjesnyheter & livekonserter! MSN Video
This is easily worked around -- edit Makefile.am, and on the first
line, remove "searcher", and then run bootstrap.sh, configure and make
again.
> Then after creating zhwiki-20080114-pages-articles.xml.bz2.processed I
> realize I do not have rubygem, so I went over to rubyforge to download
> and install it.
>
> After that there's this error "in `gem_original_require': no such file
> to load -- inline (LoadError)" which cannot resolve..
gem install RubyInline.
> and mklocatedb is a MacOS script... need to find a replacement for
> linux..
Hmm. I'm not exactly sure how to work around this one. If you can get
things working up to this, though, am happy to help figure out some
sort of workaround.
Would be great if you can get this working on Linux -- will then
integrate the changes into the main distribution.
You can safely take indexer out of Makefile.am, too. (I looked into
this, and it turns out that using ternary operators as an lvalue was
disallowed in GCC 4, as you suspected. Although I'm using GCC 4, I'm
using Apple's version, which has kept support.)
> After installing RubyInline things start to work.. so I am leaving it
> to work for a while...
> so I guess I will need to find the mklocatedb alternative.
Yeah. Mklocatedb is just a BSD licensed shell script, so it's possible
that it could be coaxed to function on Linux without too much work. We
could probably add a modified version to the source distribution.
Strange. As it happens, those source files aren't actually used any
more, so I'll just remove them in the next release.
Cool, this is the same version as mine.
> and extracting the executables code and bigram from the findutils rpm
> at
> http://rpm2html.osmirror.nl/redhat-archive/updates/5.0/en/os/i386/findutils-4.1-24.i386.html
> http://rpm.pbone.net/index.php3?stat=3&search=findutils-locate&srodzaj=3&dist[]=46
>
> I can get mklocatedb.sh to create _mklocatedb[randomtext].list but
> _mklocatedb[randomtext].bigram is empty, resulting an empty
> locatedb...
> I've tried both the x86_64 and i386 binaries with the same result..
Can you use the built-in locate commands? In mklocatedb.sh I see:
: ${bigram:=locate.bigram}
: ${code:=locate.code}
: ${sort:=sort}
What happens if you point locate.bigram at /usr/lib/locate/bigram, and
locate.code at /usr/lib/locate/code? (I believe that's the default
location for those binaries on Debian-based systems.)
Patrick
Awesome! So, last step is to remove original dump name from all of
those files. E.g., zhwiki-20080114-pages-articles.xml.bz2.processed
becomes just "processed". The folder structure on the iPhone should
look like this (there's no need to tar it, you can just scp -r the
directory):
wp/
processed
locate.db
locate.prefixdb
blocks.db
Although scp works fine, I normally set up a local webserver, and curl
the processed file directly -- it's about 2x as fast on my iPhone,
since there's no unnecessary encryption overhead.
Would you mind documenting the steps you undertook somewhere, so that
I can add it to the source distribution? Also, I'm happy to integrate
whatever suggestions you have to make it easier for future Linux users
-- e.g. including the mklocatedb.sh shell script, etc. (A linux
version of sh/process would be nice to have; the other can then become
sh/process.osx or something.)
Patrick
Sweet!
> The zh wiki is actually the Chinese wikipedia and most of the entries
> have unicode double-byte character titles, and it seems that your
> program, in the capitalizing procedure, destroys the first unicode
> character in the title.
> For example, navigating to the page "美鐵" (redirected from the page
> Amtrak) results in the program accessing the page "é鐵"
>
>
> 2008-02-07 16:16:49.423 Wikipedia[465:d03] key: WebActionButtonKey,
> value: 1
> 2008-02-07 16:16:49.426 Wikipedia[465:d03] key:
> WebActionModifierFlagsKey, value: 0
> 2008-02-07 16:16:49.429 Wikipedia[465:d03] key:
> WebActionNavigationTypeKey, value: 0
> 2008-02-07 16:16:49.432 Wikipedia[465:d03] key:
> WebActionOriginalURLKey, value: wp://localhost/%E7%BE%8E%E9%90%B5
> 2008-02-07 16:16:49.434 Wikipedia[465:d03] loading article with url
> wp://localhost/%E7%BE%8E%E9%90%B5
> 2008-02-07 16:16:49.436 Wikipedia[465:d03] url path: /美鐵
> 2008-02-07 16:16:49.438 Wikipedia[465:d03] cap: é鐵
>
> Secondly the search box in the application does not match unicode
> characters.
>
> Thirdly the rendered unicode texts are garbled (but the links to other
> pages are fine), as you can see this screenshot:
> http://img114.imageshack.us/img114/8131/img0029da6.jpg
Yes, Unicode is broken (many German users have also complained...).
I'd like to fix it, but unfortunately have very little time to work on
this at the moment...
> I think this program has lots of potential! With this framework we can
> even try to import dictionary files into the system (e.g. stardict)
> and the result would be much better than weDict!
> Keep up the good work!
>
> I will create a wiki page on my personal site to document what I have
> done to convert the database in a short while. Or do you prefer
> creating a new google codes / sourceforge project so we can have a
> centralized repository for related information?
I'm creating a Google Code project right now to centralise this
info... will ping again when it's set up.
Ok, done. I've also committed the current source code to the SVN repo.
We're good to go!