not able to generate PDF from MediaWiki Collection extension

72 views
Skip to first unread message

Kunal

unread,
Jul 14, 2008, 2:01:15 PM7/14/08
to mwlib
I have installed mwlib and mwlib.rl and other dependencies as well.

I am starting my server as

$mw-serve --protocol=http

And when try to create a PDF it just refreshes the wikipedia page
countless number of times without producing any relevant PDF file.

I am using SVN based code and latest mediawiki version.

Please help.

Johannes Beigel

unread,
Jul 14, 2008, 2:12:01 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 20:01 schrieb Kunal:
> I am starting my server as
>
> $mw-serve --protocol=http
>
> And when try to create a PDF it just refreshes the wikipedia page
> countless number of times without producing any relevant PDF file.

We need a tiny bit more information to help you with your problem.

What exactly do you mean with "wikipedia page" (what is on that page,
what is its URL)?

What is the output of mw-serve?

Maybe there is some relevant output in the errorlog of your webserver?

Regards,
Johannes Beigel

Kunal

unread,
Jul 14, 2008, 2:16:55 PM7/14/08
to mwlib
sorry to say about wikipedia. Its my local mediaWiki Server. that is
doing this

Output of mw-serve is and my mediawiki page goes on sying rendering at
0%

[root@kunal-sun kunal]# mw-serve --protocol=http
mw-serve.info >> serving http on 0.0.0.0:8899
mwlib.serve.info >> render 8da31a74d1eb7303 rl
mwlib.serve.info >> using existing ZIP file to render '/var/cache/mw-
serve/8da31a74d1eb7303/output.rl'
mwlib.wsgi.info >> request took 0.083316 s
kunal-sun.com - - [14/Jul/2008 23:45:56] "POST / HTTP/1.1" 200 53
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.002370 s
kunal-sun.com - - [14/Jul/2008 23:45:57] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.002353 s
kunal-sun.com - - [14/Jul/2008 23:46:01] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.002329 s
kunal-sun.com - - [14/Jul/2008 23:46:03] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.003945 s
kunal-sun.com - - [14/Jul/2008 23:46:07] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.002331 s
kunal-sun.com - - [14/Jul/2008 23:46:11] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.002391 s
kunal-sun.com - - [14/Jul/2008 23:46:17] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl
mwlib.wsgi.info >> request took 0.003873 s
kunal-sun.com - - [14/Jul/2008 23:46:21] "POST / HTTP/1.1" 200 101
mwlib.serve.info >> render_status 8da31a74d1eb7303 rl


On Jul 14, 11:12 pm, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Johannes Beigel

unread,
Jul 14, 2008, 2:25:02 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 20:16 schrieb Kunal:
> [root@kunal-sun kunal]# mw-serve --protocol=http
> mw-serve.info >> serving http on 0.0.0.0:8899
> mwlib.serve.info >> render 8da31a74d1eb7303 rl
> mwlib.serve.info >> using existing ZIP file to render '/var/cache/mw-
> serve/8da31a74d1eb7303/output.rl'

OK, so the ZIP file /var/cache/mw-serve/8da31a74d1eb7303/
collection.zip should have been generated successfully. Could you
invoke this on the command line to try to produce the PDF directly
from this ZIP file?

$ mw-render -w rl -o test.pdf -c /var/cache/mw-serve/8da31a74d1eb7303/
collection.zip

If everything works, there should be a file test.pdf afterwards.
Depending on the size of the article collection, this could take a
while. If there's no test.pdf file, could you post the output of the
command?

If the above does work, there must be some problem with the
communication from MediaWiki to mw-serve or from mw-serve to mw-
render. The output you posted didn't contain any error though: It's
perfectly ok, if there are several render_status requests.

Kunal

unread,
Jul 14, 2008, 2:30:57 PM7/14/08
to mwlib
Thanks a million for responding so quickly.

I tried that
and endup with this error. Also I am able to generate ODF Files
successfully. It is just the PDF



[root@kunal-sun kunal]# mw-render -w rl -o test.pdf -c /var/cache/mw-
serve/8da31a74d1eb7303/
Traceback (most recent call last):
File "/usr/bin/mw-render", line 8, in <module>
load_entry_point('mwlib==0.8.0.dev', 'console_scripts', 'mw-
render')()
File "/usr/lib/python2.5/site-packages/mwlib-0.8.0.dev-py2.5-linux-
i686.egg/mwlib/apps.py", line 249, in render
options, args = parser.parse_args()
File "/usr/lib/python2.5/site-packages/mwlib-0.8.0.dev-py2.5-linux-
i686.egg/mwlib/options.py", line 97, in parse_args
self.env = self.makewiki()
File "/usr/lib/python2.5/site-packages/mwlib-0.8.0.dev-py2.5-linux-
i686.egg/mwlib/options.py", line 101, in makewiki
env = wiki.makewiki(self.options.config, metabook=self.metabook)
File "/usr/lib/python2.5/site-packages/mwlib-0.8.0.dev-py2.5-linux-
i686.egg/mwlib/wiki.py", line 245, in makewiki
res = _makewiki(conf, metabook)
File "/usr/lib/python2.5/site-packages/mwlib-0.8.0.dev-py2.5-linux-
i686.egg/mwlib/wiki.py", line 222, in _makewiki
raise RuntimeError("could not read config file %r" % (conf,))
RuntimeError: could not read config file '/var/cache/mw-serve/
8da31a74d1eb7303/'

On Jul 14, 11:25 pm, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Johannes Beigel

unread,
Jul 14, 2008, 2:36:03 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 20:30 schrieb Kunal:
> [root@kunal-sun kunal]# mw-render -w rl -o test.pdf -c /var/cache/mw-
> serve/8da31a74d1eb7303/

There's the "collection.zip" part missing :-)

The ZIP file – the argument for the -c option of mw-render – should be
named (I hope that Google Groups doesn't break the line again):

/var/cache/mw-serve/8da31a74d1eb7303/collection.zip

This file should exist and be a valid ZIP file.

Kunal

unread,
Jul 14, 2008, 2:46:12 PM7/14/08
to mwlib
output is

Could not load writer 'rl': you need to have the svn version of
reportlab installed

As far as I have mwlib.rl installed from easy_install and reportlab
ver 2.1 from my distribution (mandriva 2008.1 powerpack).

What do you say ?


On Jul 14, 11:36 pm, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Johannes Beigel

unread,
Jul 14, 2008, 2:49:08 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 20:46 schrieb Kunal:
> output is
>
> Could not load writer 'rl': you need to have the svn version of
> reportlab installed
>
> As far as I have mwlib.rl installed from easy_install and reportlab
> ver 2.1 from my distribution (mandriva 2008.1 powerpack).
>
> What do you say ?

I'd say you need the svn version of reportlab. Citing README.txt of
mwlib.rl:

*reportlab*
the svn version is needed currently

svn co http://www.reportlab.co.uk/svn/public/reportlab/trunk
python setup.py install

Kunal

unread,
Jul 14, 2008, 2:52:44 PM7/14/08
to mwlib
OK. Thanks a lot for the reply. i will check and let you know.

On Jul 14, 11:49 pm, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Kunal

unread,
Jul 14, 2008, 2:54:28 PM7/14/08
to mwlib
Meanwhile I wanted to know if there is any way customize pdf branding
(formatting colors etc).

thanks

Johannes Beigel

unread,
Jul 14, 2008, 2:58:16 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 20:54 schrieb Kunal:
> Meanwhile I wanted to know if there is any way customize pdf branding
> (formatting colors etc).

That's currently undocumented, but you can adjust the file mwlib/rl/
pdfstyles.py or – better – create a file customconfig.py, put it
somewhere in your PYTHONPATH and just override the settings made in
pdfstyles.py you want to change (this file is imported at the bottom
of pdfstyles.py).

Kunal

unread,
Jul 14, 2008, 3:05:10 PM7/14/08
to mwlib
Thanks for the tip.

Regarding the PDF rendering . It is now able to render pdf but I am
not getting .pdf file for download. and browser displays I belive PDF
code.


%PDF-1.3 %“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
% 'BasicFonts': class PDFDictionary 1 0 obj % The standard fonts
dictionary << /F1 2 0 R /F2+0 13 0 R /F3+0 17 0 R /F4+0 21 0 R /F5+0
25 0 R >> endobj % 'F1': class PDFType1Font 2 0 obj % Font Helvetica
<< /BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /
Type1 /Type /Font >> endobj % 'Page1': class PDFPage 3 0 obj % Page
dictionary << /Contents 29 0 R /MediaBox [ 0 0 595.2756 841.8898 ] /
Parent 28 0 R /Resources << /Font 1 0 R /ProcSet [ /PDF /Text /ImageB /
ImageC /ImageI ] >> /Rotate 0 /Trans << >> /Type /Page >> endobj %
'Page2': class PDFPage 4 0 obj % Page dictionary << /Contents 30 0 R /
MediaBox [ 0 0 595.2756 841.8898 ] /Parent 28 0 R /Resources << /Font
1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> /Rotate 0 /
Trans << >> /Type /Page >> endobj %
'FormXob.e14987a4f4460425f2e7af3e02f190c5': class PDFImageXObject 5 0
obj << /BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter [ /
ASCII85Decode /FlateDecode ] /Height 527 /Length 163115 /SMask 6 0 R /
Subtype /Image /Type /XObject /Width 800 >> stream

On Jul 14, 11:58 pm, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Johannes Beigel

unread,
Jul 14, 2008, 3:13:22 PM7/14/08
to mw...@googlegroups.com

Am 14.07.2008 um 21:05 schrieb Kunal:
> Regarding the PDF rendering . It is now able to render pdf but I am
> not getting .pdf file for download. and browser displays I belive PDF
> code.

That's strange. Could you copy the URL of the "Click here" link and
output the HTTP headers, e.g. with curl:

$ curl -D - -o test.pdf http://the-copied-url

There should be (among others) a header "Content-Type: application/
pdf". If there is no such header, the information got lost when being
transmitted from mw-server to MediaWiki to your browser. (If
everything works, curl should have downloaded the PDF as test.pdf.)

Does right-clicking on the "Click here" link and saving produce a
valid PDF document?

Kunal

unread,
Jul 14, 2008, 3:25:15 PM7/14/08
to mwlib
I can save it as pdf and open it with adobe reader.

Command output is
[root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdf
http://kunal-sun.com/kunwiki/index.php/Special:Collection/download/?collection_id=a08c5fcf5cc8cd93&writer=rl
[1] 8442
[root@kunal-sun a08c5fcf5cc8cd93]# HTTP/1.1 200 OK
Date: Mon, 14 Jul 2008 19:23:57 GMT
Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
X-Powered-By: PHP/5.2.5
Content-Length: 132
Content-Type: application/json

% Total % Received % Xferd Average Speed Time Time
Time Current
Dload Upload Total Spent
Left Speed
100 132 100 132 0 0 639 0 --:--:-- --:--:--
--:--:-- 0


On Jul 15, 12:13 am, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:
> Am 14.07.2008 um 21:05 schrieb Kunal:
>
> > Regarding the PDF rendering . It is now able to render pdf but I am
> > not getting .pdf file for download. and browser displays I belive PDF
> > code.
>
> That's strange. Could you copy the URL of the "Click here" link and  
> output the HTTP headers, e.g. with curl:
>
> $ curl -D - -o test.pdfhttp://the-copied-url

Kunal

unread,
Jul 14, 2008, 3:39:49 PM7/14/08
to mwlib
Just checked from windows xp machine (IE6). There it asks to download
a pdf file. But On linux it is still like the same.

On Jul 15, 12:25 am, Kunal <kunal...@gmail.com> wrote:
> I can save it as pdf and open it with adobe reader.
>
> Command output is
> [root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdfhttp://kunal-sun.com/kunwiki/index.php/Special:Collection/download/?c...

Johannes Beigel

unread,
Jul 14, 2008, 3:54:33 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 21:25 schrieb Kunal:
> I can save it as pdf and open it with adobe reader.

OK.

> Command output is
> [root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdf
> http://kunal-sun.com/kunwiki/index.php/Special:Collection/download/?collection_id=a08c5fcf5cc8cd93&writer=rl
> [1] 8442
> [root@kunal-sun a08c5fcf5cc8cd93]# HTTP/1.1 200 OK
> Date: Mon, 14 Jul 2008 19:23:57 GMT
> Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
> X-Powered-By: PHP/5.2.5
> Content-Length: 132
> Content-Type: application/json
>
> % Total % Received % Xferd Average Speed Time Time
> Time Current
> Dload Upload Total Spent
> Left Speed
> 100 132 100 132 0 0 639 0 --:--:-- --:--:--
> --:--:-- 0

Huh, that looks completely wrong and shouldn't have produced a valid
PDF file either.
There should have been downloaded only 132 Bytes. Could you check the
size of the output file with "ls -l test.pdf"? And see what "file
test.pdf" says? Maybe post the contents of the file (if it's JSON it
should be readable)?

Are you sure that this is the same link that works, when you right
click + save from the browser? How big is the PDF file when downloaded
via right click? It surely must be larger than 132 Bytes?

Kunal

unread,
Jul 14, 2008, 4:06:49 PM7/14/08
to mwlib
File size 259.3 Kb when used save as.

Also why in ie its coming correctly ?


output
[1] 10163
[root@kunal-sun a08c5fcf5cc8cd93]# HTTP/1.1 200 OK
Date: Mon, 14 Jul 2008 20:04:41 GMT
Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
X-Powered-By: PHP/5.2.5
Content-Length: 132
Content-Type: application/json

% Total % Received % Xferd Average Speed Time Time
Time Current
Dload Upload Total Spent
Left Speed
100 132 100 132 0 0 579 0 --:--:-- --:--:--
--:--:-- 0


Kunal

unread,
Jul 14, 2008, 4:13:01 PM7/14/08
to mwlib


content of test.pdf file

{"error": "error executing command 'download': [Errno 2] No such file
or directory: '/var/cache/mw-serve/a08c5fcf5cc8cd93/output.'"}

Johannes Beigel

unread,
Jul 14, 2008, 4:11:45 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 22:06 schrieb Kunal:

Ah ok: There should be single quotes '...' around the URL (the shell
gets confused by some characters in that URL). Could you post the
output with this change?

Johannes Beigel

unread,
Jul 14, 2008, 4:14:22 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 22:06 schrieb Kunal:
> Also why in ie its coming correctly ?

I don't know. If everything else is ok (that means curl should give
the correct Content-Type and Content-Length and fetch the PDF file),
there could be some problem with the other browser you are using. Are
there any settings for applications to open downloaded files with? Can
you download PDF files from other sites without problems?

Johannes Beigel

unread,
Jul 14, 2008, 4:18:20 PM7/14/08
to mw...@googlegroups.com

OK, that should be fixed by putting the URL in single quotes when
passing it to curl on the command line (the "&writer=rl" part has been
cut off by the shell).

Johannes Beigel

unread,
Jul 14, 2008, 4:22:24 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 22:13 schrieb Kunal:

BTW: It's a bug of the Collection extension to respond with JSON in
this case. It should return a HTTP error instead. I've opened a ticket:

http://code.pediapress.com/wiki/ticket/225

Kunal

unread,
Jul 14, 2008, 4:29:03 PM7/14/08
to mwlib

>
> Ah ok: There should be single quotes '...' around the URL (the shell  
> gets confused by some characters in that URL). Could you post the  
> output with this change?

Output is
[root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdf 'http://kunal-
sun.com/kunwiki/index.php/Special:Collection/download/?
collection_id=a08c5fcf5cc8cd93&writer=rl'
HTTP/1.1 200 OK
Date: Mon, 14 Jul 2008 20:27:57 GMT
Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
X-Powered-By: PHP/5.2.5
Content-Length: 265505
Content-Type: text/html

% Total % Received % Xferd Average Speed Time Time
Time Current
Dload Upload Total Spent
Left Speed
100 259k 100 259k 0 0 1009k 0 --:--:-- --:--:--
--:--:-- 5346k

Johannes Beigel

unread,
Jul 14, 2008, 4:40:54 PM7/14/08
to mw...@googlegroups.com

Am 14.07.2008 um 22:29 schrieb Kunal:
> Output is
> [root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdf 'http://
> kunal-
> sun.com/kunwiki/index.php/Special:Collection/download/?
> collection_id=a08c5fcf5cc8cd93&writer=rl'
> HTTP/1.1 200 OK
> Date: Mon, 14 Jul 2008 20:27:57 GMT
> Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
> X-Powered-By: PHP/5.2.5
> Content-Length: 265505
> Content-Type: text/html

Uh. That's bad. The Content-Type should be "application/pdf", not
"text/html". Could there be anything in your PHP/Apache-setup that
changes the Content-Type after it gets returned by MediaWiki? Is the
saved file a valid PDF file? Could you replace these three lines (line
615-617) in Collection.body.php

if ( isset( $headers['content-type'] ) ) {
header( 'Content-Type: ' . $headers['content-type']);
}

with these two lines

header( 'Content-Type: application/pdf' );
header( 'X-Foo: bar' );

and check if the output from curl changes?

Are you using the latest release version of MediaWiki or the svn-
Version? And Collection extension is the latest svn version?

(BTW: This means that your browser on Linux does the Right Thing and
actually IE shouldn't download the file as PDF, but that's of no real
importance right here :-)

Kunal

unread,
Jul 14, 2008, 4:57:43 PM7/14/08
to mwlib
Oh wow. You are great. :)

Now I am able to download pdf file.

MediaWiki: latest release
Collection: "1.0pre"

New output:
[root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdf 'http://kunal-
sun.com/kunwiki/index.php/Special:Collection/download/?
collection_id=a08c5fcf5cc8cd93&writer=rl'
HTTP/1.1 200 OK
Date: Mon, 14 Jul 2008 20:55:09 GMT
Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
X-Powered-By: PHP/5.2.5
X-Foo: bar
Content-Length: 265505
Content-Type: application/pdf

% Total % Received % Xferd Average Speed Time Time
Time Current
Dload Upload Total Spent
Left Speed
100 259k 100 259k 0 0 1050k 0 --:--:-- --:--:--
--:--:-- 5982k

Kunal

unread,
Jul 14, 2008, 5:12:47 PM7/14/08
to mwlib
My god I am so impressed. I have once seen on a blog that developers
of Collection extension are very responsive and helping. And I am so
happy to have your help.

Thanks you so much.

BTW: I am from India, Bangalore and its 2 am for me. My wife is
shouting at me :). You have anything else to add. What about my
software versions ??

Johannes Beigel

unread,
Jul 14, 2008, 5:15:17 PM7/14/08
to mw...@googlegroups.com
Am 14.07.2008 um 22:57 schrieb Kunal:
> Oh wow. You are great. :)

Thanks! :-)

But actually: not really... :-/

This was not really a fix, just some "debugging" code to test where
the error might occur. Most probably the correct download of ODF files
has stopped working because of setting a fixed Content-Type of
application/pdf.

Could you insert the following debugging lines

var_dump($headers);
return;

before the line (626)

wfResetOutputBuffers();

This should disable the download, but inserts some raw output from PHP
at the top of the page that is returned when you click on the "Click
here" link. Could you copy and paste this output (it's easies to copy
& paste when you do "View Page Source" in your browser, the text
should be at the very beginning of the page source).


Kunal

unread,
Jul 14, 2008, 5:22:41 PM7/14/08
to mwlib
This is the output I've got



array(3) {
["date"]=>
string(29) "Mon, 14 Jul 2008 21:21:27 GMT"
["server"]=>
string(27) "WSGIServer/0.1 Python/2.5.2"
["content-length"]=>
string(6) "265505"
}

Kunal

unread,
Jul 14, 2008, 5:26:24 PM7/14/08
to mwlib
by the way ODF is still working fine

Johannes Beigel

unread,
Jul 14, 2008, 5:54:24 PM7/14/08
to mw...@googlegroups.com

Oh, could you execute the following command

$ mw-render --writer-info rl

and post the output? The output *should* contain a Content-Type line
with application/pdf. If it does not, most probably, the fix for the
download problem is to update mwlib and especially mwlib.rl to the
latest Mercurial versions.

Kunal

unread,
Jul 14, 2008, 10:02:32 PM7/14/08
to mwlib
the output is

[kunal@kunal-sun ~]$ mw-render --writer-info rl
Description: PDF documents (using ReportLab)

On Jul 15, 2:54 am, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Kunal

unread,
Jul 14, 2008, 10:17:57 PM7/14/08
to mwlib
I have updated mwlib and mwlib.rl both with mercurial repositories and
it seems to be working fine now.

Thanks

Johannes Beigel

unread,
Jul 15, 2008, 4:11:37 AM7/15/08
to mw...@googlegroups.com
Am 15.07.2008 um 04:17 schrieb Kunal:
> I have updated mwlib and mwlib.rl both with mercurial repositories and
> it seems to be working fine now.

I'm glad to hear this. So we'll update mwlib and mwlib.rl on the
Python Package Index soon.

Regards,
Johannes Beigel

Message has been deleted

Kunal

unread,
Jul 16, 2008, 3:32:36 AM7/16/08
to mwlib
Yes that will help. I was actually trusting the python repository. and
lot of people still does.

BTW: I was wondering now that you have XML also in place if you guys
have any plan to plan with formatted output using XSL style sheet (I
know that docbook is supported). And how can I help ?

And again thanks a lot again, you are very helpful.

Regards
Kunal

On Jul 15, 1:11 pm, Johannes Beigel <johannes.bei...@brainbot.com>
wrote:

Volker Haas

unread,
Jul 16, 2008, 4:03:59 AM7/16/08
to mw...@googlegroups.com
Hi Kunal


harrydeo2006 wrote:
> Yes that will help. I was actually trusting the python repository. and
> lot of people still does.
>

We are aware of that and we'll try to make sure this is not a problem
anymore in the future.


> BTW: I was wondering now that you have XML also in place if you guys
> have any plan to plan with formatted output using XSL style sheet (I
> know that docbook is supported). And how can I help ?
>
>

Any help is always appreciated, and I am sure Heiko or Jojo can give you
some details on that. Since both are out of office, attending the
Wikimania, you might not get a definitive reply from them until early
next week.


> And again thanks a lot again, you are very helpful.
>

Great to hear ;)

Best,

Volker

--
volker haas brainbot technologies ag
fon +49 6131 2116394 boppstraße 64
fax +49 6131 2116392 55118 mainz
volke...@brainbot.com http://www.brainbot.com/

Kunal

unread,
Jul 16, 2008, 1:19:59 PM7/16/08
to mwlib

>
> Any help is always appreciated, and I am sure Heiko or Jojo can give you
> some details on that. Since both are out of office, attending the
> Wikimania, you might not get a definitive reply from them until early
> next week.

Thats nice to hear. I would be more than happy to help this project.

Thanks
Kunal

Heiko Hees

unread,
Jul 24, 2008, 8:22:34 AM7/24/08
to mw...@googlegroups.com
Kunal wrote:
> BTW: I was wondering now that you have XML also in place if you guys
> have any plan to plan with formatted output using XSL style sheet (I
> know that docbook is supported). And how can I help ?

DocBook support is currently little more than a proof of concept
implementation. There are some open issues like image sizes, MathML, DTD
enformcement, etc. Any help to work on the open issues is appreciated.

Nonetheless, basic XSLTs using jade works:
docbook2pdf -l /usr/share/sgml/declaration/xml.dcl -e no-valid t.xml
(t.xml being a file generated by mw-render with the -w docbook option)

If you have a certain application in mind let us know. I'd happily
assist you with that.

Heiko

Kunal

unread,
Aug 1, 2008, 4:11:01 AM8/1/08
to mwlib
> Nonetheless, basic XSLTs using jade works:
> docbook2pdf -l /usr/share/sgml/declaration/xml.dcl -e no-valid t.xml
> (t.xml being a file generated by mw-render with the -w docbook option)
>
> If you have a certain application in mind let us know. I'd happily
> assist you with that.

Actually I was thinking of a way to map elements from wiki to DocBook
or any other DTD (XSD) and then submit it to any XSLT processor for
cutomised output.
How about integrating a DITA toolkit like interface here
http://dita-ot.sourceforge.net.


Heiko Hees

unread,
Aug 1, 2008, 10:13:06 AM8/1/08
to mw...@googlegroups.com
Hi Kunal,

>> Nonetheless, basic XSLTs using jade works:
>> docbook2pdf -l /usr/share/sgml/declaration/xml.dcl -e no-valid t.xml
>> (t.xml being a file generated by mw-render with the -w docbook option)
>>
>> If you have a certain application in mind let us know. I'd happily
>> assist you with that.
>
> Actually I was thinking of a way to map elements from wiki to DocBook
> or any other DTD (XSD) and then submit it to any XSLT processor for
> cutomised output.

My above example uses jade with DSSL (not XSLT as I claimed), but using
XSLT with the docbook export should also work.

> How about integrating a DITA toolkit like interface here
> http://dita-ot.sourceforge.net.

I'd personally prefer to see some work on the DocBook writer, since
there are many DocBook related open source tools available
(http://www.dpawson.co.uk/docbook/tools.html), while DITA is not too
popular yet.

Heiko

Kunal

unread,
Aug 4, 2008, 2:14:10 AM8/4/08
to mwlib
> I'd personally prefer to see some work on the DocBook writer, since
> there are many DocBook related open source tools available
> (http://www.dpawson.co.uk/docbook/tools.html), while DITA is not too
> popular yet.
>

Thanks for the reply.
I was not particularly stres any DTD, whether is DocBook or DITA. I
meant to say implementation of generic XSD and xslt framewor.

I have following in my mind:
1. Out put wiki to xml (Any valid XML, currently it would be
docbook). Also, I have heard that MediaWiki's dumpbackup.php dumpes
into an XML file (not sure of which schema).
2. Then use an open source XML converter to convert it to a popular
XML format (or multiple), even docx.
3. Form the generated XML we can now apply various XSLs to produce the
kind of output we want.

What do you say ?

Kunal

Heiko Hees

unread,
Aug 5, 2008, 11:54:41 AM8/5/08
to mw...@googlegroups.com
Hi Kunal,

thanks for your proposal.

Kunal wrote:
> I have following in my mind:
> 1. Out put wiki to xml (Any valid XML, currently it would be
> docbook). Also, I have heard that MediaWiki's dumpbackup.php dumpes
> into an XML file (not sure of which schema).

By wiki, do you mean all articles of a wiki site? This should be
possible using the cdbwiki with the docbook-writer.

dumpbackup.php exports XML but the articles are still represented as
wikitext, not XML.

> 2. Then use an open source XML converter to convert it to a popular
> XML format (or multiple), even docx.

DocBook is popular ;) regarding docx, you may want to have a look at the
ODFExport : http://www.mediawiki.org/wiki/Extension:OpenDocument_Export

> 3. Form the generated XML we can now apply various XSLs to produce the
> kind of output we want.

Yes, it should already be possible to derive various output using XSLs
with the DocBook-writer output.

> What do you say ?

It would be great if someone would test and document some
mwlib-DocBook-XSLTs tool chains. I'd be happy to assist in this endeavor.

Heiko

Keith Fahlgren

unread,
Aug 5, 2008, 12:44:10 PM8/5/08
to mw...@googlegroups.com
On Tue, Aug 5, 2008 at 8:54 AM, Heiko Hees <heiko...@pediapress.com> wrote:
>> 2. Then use an open source XML converter to convert it to a popular
>> XML format (or multiple), even docx.
>
> DocBook is popular ;) regarding docx, you may want to have a look at the
> ODFExport : http://www.mediawiki.org/wiki/Extension:OpenDocument_Export

There's now alpha-quality DocBook->ODF stylesheets in the DocBook-XSL project.

>> 3. Form the generated XML we can now apply various XSLs to produce the
>> kind of output we want.
>
> Yes, it should already be possible to derive various output using XSLs
> with the DocBook-writer output.

The challenge here will be ensuring valid DocBook output, which is
often much more difficult than it first appears, especially with
complex input.

Keith

Heiko Hees

unread,
Aug 5, 2008, 1:11:27 PM8/5/08
to mw...@googlegroups.com
Keith Fahlgren wrote:
>> DocBook is popular ;) regarding docx, you may want to have a look at the
>> ODFExport : http://www.mediawiki.org/wiki/Extension:OpenDocument_Export
>
> There's now alpha-quality DocBook->ODF stylesheets in the DocBook-XSL project.

We also considered this option. But after some investigation and talking
to the people from sun, we decided to directly generate ODF.

>> Yes, it should already be possible to derive various output using XSLs
>> with the DocBook-writer output.
>
> The challenge here will be ensuring valid DocBook output, which is
> often much more difficult than it first appears, especially with
> complex input.

There is a related ticket:
http://code.pediapress.com/wiki/ticket/219

Since we handled the DTD issue for PDF, and ODF I am pretty sure that
this will also be possible for DocBook.

Heiko

Reply all
Reply to author
Forward
0 new messages