We need a tiny bit more information to help you with your problem.
What exactly do you mean with "wikipedia page" (what is on that page,
what is its URL)?
What is the output of mw-serve?
Maybe there is some relevant output in the errorlog of your webserver?
Regards,
Johannes Beigel
OK, so the ZIP file /var/cache/mw-serve/8da31a74d1eb7303/
collection.zip should have been generated successfully. Could you
invoke this on the command line to try to produce the PDF directly
from this ZIP file?
$ mw-render -w rl -o test.pdf -c /var/cache/mw-serve/8da31a74d1eb7303/
collection.zip
If everything works, there should be a file test.pdf afterwards.
Depending on the size of the article collection, this could take a
while. If there's no test.pdf file, could you post the output of the
command?
If the above does work, there must be some problem with the
communication from MediaWiki to mw-serve or from mw-serve to mw-
render. The output you posted didn't contain any error though: It's
perfectly ok, if there are several render_status requests.
There's the "collection.zip" part missing :-)
The ZIP file – the argument for the -c option of mw-render – should be
named (I hope that Google Groups doesn't break the line again):
/var/cache/mw-serve/8da31a74d1eb7303/collection.zip
This file should exist and be a valid ZIP file.
I'd say you need the svn version of reportlab. Citing README.txt of
mwlib.rl:
*reportlab*
the svn version is needed currently
svn co http://www.reportlab.co.uk/svn/public/reportlab/trunk
python setup.py install
That's currently undocumented, but you can adjust the file mwlib/rl/
pdfstyles.py or – better – create a file customconfig.py, put it
somewhere in your PYTHONPATH and just override the settings made in
pdfstyles.py you want to change (this file is imported at the bottom
of pdfstyles.py).
That's strange. Could you copy the URL of the "Click here" link and
output the HTTP headers, e.g. with curl:
$ curl -D - -o test.pdf http://the-copied-url
There should be (among others) a header "Content-Type: application/
pdf". If there is no such header, the information got lost when being
transmitted from mw-server to MediaWiki to your browser. (If
everything works, curl should have downloaded the PDF as test.pdf.)
Does right-clicking on the "Click here" link and saving produce a
valid PDF document?
OK.
> Command output is
> [root@kunal-sun a08c5fcf5cc8cd93]# curl -D - -o test.pdf
> http://kunal-sun.com/kunwiki/index.php/Special:Collection/download/?collection_id=a08c5fcf5cc8cd93&writer=rl
> [1] 8442
> [root@kunal-sun a08c5fcf5cc8cd93]# HTTP/1.1 200 OK
> Date: Mon, 14 Jul 2008 19:23:57 GMT
> Server: Apache/2.2.8 (Mandriva Linux/PREFORK-6mdv2008.1)
> X-Powered-By: PHP/5.2.5
> Content-Length: 132
> Content-Type: application/json
>
> % Total % Received % Xferd Average Speed Time Time
> Time Current
> Dload Upload Total Spent
> Left Speed
> 100 132 100 132 0 0 639 0 --:--:-- --:--:--
> --:--:-- 0
Huh, that looks completely wrong and shouldn't have produced a valid
PDF file either.
There should have been downloaded only 132 Bytes. Could you check the
size of the output file with "ls -l test.pdf"? And see what "file
test.pdf" says? Maybe post the contents of the file (if it's JSON it
should be readable)?
Are you sure that this is the same link that works, when you right
click + save from the browser? How big is the PDF file when downloaded
via right click? It surely must be larger than 132 Bytes?
Ah ok: There should be single quotes '...' around the URL (the shell
gets confused by some characters in that URL). Could you post the
output with this change?
I don't know. If everything else is ok (that means curl should give
the correct Content-Type and Content-Length and fetch the PDF file),
there could be some problem with the other browser you are using. Are
there any settings for applications to open downloaded files with? Can
you download PDF files from other sites without problems?
OK, that should be fixed by putting the URL in single quotes when
passing it to curl on the command line (the "&writer=rl" part has been
cut off by the shell).
BTW: It's a bug of the Collection extension to respond with JSON in
this case. It should return a HTTP error instead. I've opened a ticket:
Uh. That's bad. The Content-Type should be "application/pdf", not
"text/html". Could there be anything in your PHP/Apache-setup that
changes the Content-Type after it gets returned by MediaWiki? Is the
saved file a valid PDF file? Could you replace these three lines (line
615-617) in Collection.body.php
if ( isset( $headers['content-type'] ) ) {
header( 'Content-Type: ' . $headers['content-type']);
}
with these two lines
header( 'Content-Type: application/pdf' );
header( 'X-Foo: bar' );
and check if the output from curl changes?
Are you using the latest release version of MediaWiki or the svn-
Version? And Collection extension is the latest svn version?
(BTW: This means that your browser on Linux does the Right Thing and
actually IE shouldn't download the file as PDF, but that's of no real
importance right here :-)
Thanks! :-)
But actually: not really... :-/
This was not really a fix, just some "debugging" code to test where
the error might occur. Most probably the correct download of ODF files
has stopped working because of setting a fixed Content-Type of
application/pdf.
Could you insert the following debugging lines
var_dump($headers);
return;
before the line (626)
wfResetOutputBuffers();
This should disable the download, but inserts some raw output from PHP
at the top of the page that is returned when you click on the "Click
here" link. Could you copy and paste this output (it's easies to copy
& paste when you do "View Page Source" in your browser, the text
should be at the very beginning of the page source).
Oh, could you execute the following command
$ mw-render --writer-info rl
and post the output? The output *should* contain a Content-Type line
with application/pdf. If it does not, most probably, the fix for the
download problem is to update mwlib and especially mwlib.rl to the
latest Mercurial versions.
I'm glad to hear this. So we'll update mwlib and mwlib.rl on the
Python Package Index soon.
Regards,
Johannes Beigel
harrydeo2006 wrote:
> Yes that will help. I was actually trusting the python repository. and
> lot of people still does.
>
We are aware of that and we'll try to make sure this is not a problem
anymore in the future.
> BTW: I was wondering now that you have XML also in place if you guys
> have any plan to plan with formatted output using XSL style sheet (I
> know that docbook is supported). And how can I help ?
>
>
Any help is always appreciated, and I am sure Heiko or Jojo can give you
some details on that. Since both are out of office, attending the
Wikimania, you might not get a definitive reply from them until early
next week.
> And again thanks a lot again, you are very helpful.
>
Great to hear ;)
Best,
Volker
--
volker haas brainbot technologies ag
fon +49 6131 2116394 boppstraße 64
fax +49 6131 2116392 55118 mainz
volke...@brainbot.com http://www.brainbot.com/
DocBook support is currently little more than a proof of concept
implementation. There are some open issues like image sizes, MathML, DTD
enformcement, etc. Any help to work on the open issues is appreciated.
Nonetheless, basic XSLTs using jade works:
docbook2pdf -l /usr/share/sgml/declaration/xml.dcl -e no-valid t.xml
(t.xml being a file generated by mw-render with the -w docbook option)
If you have a certain application in mind let us know. I'd happily
assist you with that.
Heiko
>> Nonetheless, basic XSLTs using jade works:
>> docbook2pdf -l /usr/share/sgml/declaration/xml.dcl -e no-valid t.xml
>> (t.xml being a file generated by mw-render with the -w docbook option)
>>
>> If you have a certain application in mind let us know. I'd happily
>> assist you with that.
>
> Actually I was thinking of a way to map elements from wiki to DocBook
> or any other DTD (XSD) and then submit it to any XSLT processor for
> cutomised output.
My above example uses jade with DSSL (not XSLT as I claimed), but using
XSLT with the docbook export should also work.
> How about integrating a DITA toolkit like interface here
> http://dita-ot.sourceforge.net.
I'd personally prefer to see some work on the DocBook writer, since
there are many DocBook related open source tools available
(http://www.dpawson.co.uk/docbook/tools.html), while DITA is not too
popular yet.
Heiko
thanks for your proposal.
Kunal wrote:
> I have following in my mind:
> 1. Out put wiki to xml (Any valid XML, currently it would be
> docbook). Also, I have heard that MediaWiki's dumpbackup.php dumpes
> into an XML file (not sure of which schema).
By wiki, do you mean all articles of a wiki site? This should be
possible using the cdbwiki with the docbook-writer.
dumpbackup.php exports XML but the articles are still represented as
wikitext, not XML.
> 2. Then use an open source XML converter to convert it to a popular
> XML format (or multiple), even docx.
DocBook is popular ;) regarding docx, you may want to have a look at the
ODFExport : http://www.mediawiki.org/wiki/Extension:OpenDocument_Export
> 3. Form the generated XML we can now apply various XSLs to produce the
> kind of output we want.
Yes, it should already be possible to derive various output using XSLs
with the DocBook-writer output.
> What do you say ?
It would be great if someone would test and document some
mwlib-DocBook-XSLTs tool chains. I'd be happy to assist in this endeavor.
Heiko
There's now alpha-quality DocBook->ODF stylesheets in the DocBook-XSL project.
>> 3. Form the generated XML we can now apply various XSLs to produce the
>> kind of output we want.
>
> Yes, it should already be possible to derive various output using XSLs
> with the DocBook-writer output.
The challenge here will be ensuring valid DocBook output, which is
often much more difficult than it first appears, especially with
complex input.
Keith
We also considered this option. But after some investigation and talking
to the people from sun, we decided to directly generate ODF.
>> Yes, it should already be possible to derive various output using XSLs
>> with the DocBook-writer output.
>
> The challenge here will be ensuring valid DocBook output, which is
> often much more difficult than it first appears, especially with
> complex input.
There is a related ticket:
http://code.pediapress.com/wiki/ticket/219
Since we handled the DTD issue for PDF, and ODF I am pretty sure that
this will also be possible for DocBook.
Heiko