Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Using python to convert PDF document to MSWord documents

2,425 views
Skip to first unread message

Timothy Grant

unread,
Sep 28, 2004, 12:31:04 PM9/28/04
to JEET, pytho...@python.org
----- Original Message -----
From: JEET <hjee...@yahoo.com>
Date: Tue, 28 Sep 2004 17:13:17 +0100 (BST)
Subject: Using python to convert PDF document to MSWord documents
To: pytho...@python.org




Hello All,

Can anyone please suggest me if there any python modules available to
convert PDF document to MSWord documents. If not then can you please
suggest how can i acheive this.

Many thanks in advance,

Regards
Deb

======

What you ask is quite difficult. My understanding is that PDF files
are simply Postscript files with some special wrapping. Depending on
the nature of the PDF (is it encrypted, are there other special
provisions?) you may be able to strip the raw text from the file and
create and RTF file from it. However you will lose all formatting in
this case. If the formatting is "standard" across all the PDFs you may
be able to infer from the text something that will allow you to
replace some or all of it.

--
Stand Fast,
tjg.

Daniel Dittmar

unread,
Sep 28, 2004, 1:18:19 PM9/28/04
to
> From: JEET <hjee...@yahoo.com>

> Can anyone please suggest me if there any python modules available to
> convert PDF document to MSWord documents. If not then can you please
> suggest how can i acheive this.

No python modules, but:
- feeding the subject line to google brings some sponsored links that
claim to solve your problem
- http://www.quiss.org/swftools/ has a tool to convert PDF to Flash, so
there must be some code to detect Text, Fonts etc.

Daniel

Cameron Laird

unread,
Sep 28, 2004, 2:08:03 PM9/28/04
to
In article <mailman.4018.1096389...@python.org>,
Timothy Grant <timoth...@gmail.com> wrote:
.
.
.

>Can anyone please suggest me if there any python modules available to
>convert PDF document to MSWord documents. If not then can you please
>suggest how can i acheive this.
.
.
.
<URL: http://phaseit.net/claird/comp.text.pdf/PDF_converters.html >

Ksenia Marasanova

unread,
Sep 28, 2004, 5:30:23 PM9/28/04
to pytho...@python.org

Pdf2swf is based on xpdf (http://www.foolabs.com/xpdf).
Another tool, that is also based on xpdf, is pdftohtml
(http://pdftohtml.sourceforge.net/). It can convert pdf to html (using
absolute CSS positioning) or to xml. I don't know if there is any rtf
or Word writers in Python, but in the previous VB life I programmed a
simple Word macro that would open HTML page and save it as .doc
document. It was the most easy way to get all images embedded and
formatting correctly done. Don't know, however, how it will handle
absolute positioning.

Another possible option is to convert PDF to PS format, and than use
pstoedit (http://www.pstoedit.net/pstoedit) with shareware RTF plugin
mentioned on that page. Don't have any experience with this option.

Ksenia.

Jan Gregor

unread,
Oct 2, 2004, 2:04:53 PM10/2/04
to
> Can anyone please suggest me if there any python modules available to
> convert PDF document to MSWord documents. If not then can you please
> suggest how can i acheive this.

I think that there's no specification of doc format. Pdf and doc are also
different class of formats. So you can extract text (with ghostscript
frontend ps2ascii and hope in right encoding), and pictures. Typesetting
of word document is your work.

Maybe conversion pdf to html and import of html to word can be better
way - but again, you go from stronger language to weaker.


Jan

Steve Holden

unread,
Oct 2, 2004, 10:24:01 PM10/2/04
to
Timothy Grant wrote:

> ----- Original Message -----
> From: JEET <hjee...@yahoo.com>
> Date: Tue, 28 Sep 2004 17:13:17 +0100 (BST)
> Subject: Using python to convert PDF document to MSWord documents
> To: pytho...@python.org
>
>
>
>
> Hello All,
>
> Can anyone please suggest me if there any python modules available to
> convert PDF document to MSWord documents. If not then can you please
> suggest how can i acheive this.
>
> Many thanks in advance,
>

One of the problems with such a module would be that PDF is primarily a
display format, and so the structure of the file doesn't necessarily
conform with the structure of the document.

regards
Steve

sup...@convertzone.com

unread,
Jan 4, 2005, 11:22:08 AM1/4/05
to
You can use "pdf to word", it can help you to batch convert pdf to word
or text at one time, keeping source layout, and Standalone software, MS
Word, Adobe Acrobat and Reader NOT required! and you can get more
information from
http://www.convertzone.com/net/cz-PDF%20to%20Word-1-1.htm.

ConvertZone Support team
ConvertZone Software Co,.ltd
http://www.convertzone.com
sup...@convertzone.com

************************************************************
ConvertZone provides office(PDF, Word, Excel, PowerPoint, AutoCAD etc),
video(DVD, VCD, SVCD etc), audio(MP3, WAV, MIDI etc), image(JPG, GIF,
TIF, BMP etc) file converter.
************************************************************

zbin...@gmail.com

unread,
Nov 6, 2015, 2:08:36 AM11/6/15
to
I'm not a developer, i always use this free online pdf to word converter http://www.online-code.net/pdf-to-word.html to convert pdf to ms doc online.
0 new messages