How do I convert a word document to reStructured text?

2,873 views
Skip to first unread message

Martin Hans

unread,
Oct 28, 2008, 12:35:19 PM10/28/08
to sphin...@googlegroups.com
Hi,
Currently our product's user manual is written in Word, but we'd like to move all our documentation into Sphinx. Does anybody have an idea how I can do this at least partially automatically?
 
Thanks.
 
Martin

Tim Golden

unread,
Oct 28, 2008, 1:04:33 PM10/28/08
to sphin...@googlegroups.com

(Sorry, very hurried reply). Have a look at this thread:

http://mail.python.org/pipermail/python-list/2008-August/504299.html

or look at things like antiword which dump out the text element
of Word. Possibly coupled with an html2rst module (which I believe
exist around).

TJG

Christophe de VIENNE

unread,
Oct 28, 2008, 2:24:29 PM10/28/08
to sphin...@googlegroups.com
Another hurried answer :

I would try to convert to open document first using open office, then
use sxw2rest that is mentioned here :
http://docutils.sourceforge.net/docs/user/links.html#import

My 2 cents,

Christophe

2008/10/28 Martin Hans <mh...@gmx.net>

Bruce Eckel

unread,
Oct 28, 2008, 2:34:11 PM10/28/08
to sphin...@googlegroups.com
Well, I've just converted my Word book to Sphinx, and "partial automation" is the key. Actually, you can automate a lot of it but it's a cost-benefit analysis each time. For example, If I had a lot more headers it would have made sense to write a VBA Macro to put the Rest underlines on the headers, but I just did it by hand.

What I ended up doing is creating a build system that would take the word doc and convert it to ASCII, then do some post-processing (breaking it up into individual files, modifying the contents, etc.) But I kept it in Word and just continued to run the build until I was sure I had everything out that I wanted (and I jumped the gun on a few things like stripping out some of the indexing -- seemed like a good idea at the time but now I wish I hadn't). And finally I felt like I didn't need the word doc anymore and just started editing in Sphinx, but before that I did any edits in Word and ran the converter.

To get you started, here's the part that automatically writes the Word doc into an ASCII file:

if not os.path.exists("TIPython.txt") or os.path.getmtime("TIPython.doc") > os.path.getmtime("TIPython.txt"):
    import win32com.client
    import sys, os
    from win32com.client import constants # Run makepy (part of Pythonwin) if this doesn't work
    o = win32com.client.Dispatch("Word.Application")
    o.Visible = 0
    here = os.getcwd()
    o.Documents.Open(os.path.join(here, "TIPython.doc"))
    o.ActiveDocument.AcceptAllRevisions()
    o.ActiveDocument.SaveAs(FileName = os.path.join(here, "TIPython.txt"), FileFormat = constants.wdFormatText)
    o.ActiveDocument.Close()
    print "(TIPython.doc saved as TIPython.txt)"
--
Bruce Eckel

Yarko T

unread,
Oct 28, 2008, 9:23:17 PM10/28/08
to sphin...@googlegroups.com
Bruce's approach sounds good.  There's also a python interface to Open Office.   PyUNO (some smaple scripts there too)

Regards,
Yarko

Georg Brandl

unread,
Oct 29, 2008, 2:52:58 AM10/29/08
to sphin...@googlegroups.com
Yarko T schrieb:

> Bruce's approach sounds good. There's also a python interface to Open
> Office. PyUNO (some smaple scripts there too)
> <http://wiki.services.openoffice.org/wiki/Python>

Though I've heard you must have quite a high pain resistance to use it :)

Georg

Reply all
Reply to author
Forward
0 new messages