Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to convert .doc file to .txt in Python

3,801 views
Skip to first unread message

subhabrat...@gmail.com

unread,
Apr 9, 2015, 6:26:09 AM4/9/15
to
Dear Group,

I was trying to convert .doc file to .txt file.

I got of python-docx, zipfile but they do not seem to help me much.

You may kindly suggest how to convert from .doc to .docx/.html/.pdf/.rtf as from them I am being able to convert to .txt.

If any one of the Python experts may kindly help me.

Regards,
Subhabrata Banerjee.

subhabrat...@gmail.com

unread,
Apr 9, 2015, 6:50:32 AM4/9/15
to
I could do one it seems running with
>>> import win32com.client as win32
>>> word = win32.Dispatch("Word.Application")
>>> word.Visible = 0
>>> word.Documents.Open("/python27/Document1.doc")
<COMObject Open>
>>> doc = word.ActiveDocument

seems working. You may suggest better.

Tim Golden

unread,
Apr 9, 2015, 6:53:55 AM4/9/15
to pytho...@python.org
There are several approaches, but this one will work (assuming you are
on Windows and have the pywin32 package installed):

<code>
import os
import win32com.client

DOC_FILEPATH = "c:/temp/something.docx"
doc = win32com.client.GetObject(DOC_FILEPATH)
text = doc.Range().Text

#
# do something with the text...
#
with open("something.txt", "wb") as f:
f.write(text.encode("utf-8"))

os.startfile("something.txt")

</code>

TJG

Tim Chase

unread,
Apr 9, 2015, 7:46:15 AM4/9/15
to pytho...@python.org
On 2015-04-09 03:25, subhabrat...@gmail.com wrote:
> You may kindly suggest how to convert from .doc
> to .docx/.html/.pdf/.rtf as from them I am being able to convert
> to .txt.

Use an external tool such as "wv", "antiword", or "catdoc" that has
already done the hard work for you.

-tkc




subhabrat...@gmail.com

unread,
Apr 9, 2015, 10:40:15 AM4/9/15
to
On Thursday, April 9, 2015 at 4:23:55 PM UTC+5:30, Tim Golden wrote:
Thanks Tim it is slightly better than my solution.
0 new messages