[Python-il] Python docx package with Hebrew support

144 views
Skip to first unread message

גל וין gal vine

unread,
May 5, 2013, 5:24:14 AM5/5/13
to pyth...@hamakor.org.il

Hey all,

 

I'm trying to use the docx package to parse docx files, and use Hebrew in it, not very successfully.

Also I need to be using templates and\or bookmarks on those documents, -

But the package is of low documentation, and I need some assistance.

 

Any of you had the pleasure of using that,

Or working (successfully) with win32com.client to parse word documents?

 

Thanks in advance,

Gal

Israeli nature and Parks Authority

 

 

גל וין

תיקי ממשק

חטיבת מדע

gal...@npa.org.il

gal...@gmail.com

0522-250403

 

Amit Aronovitch

unread,
May 5, 2013, 6:26:44 AM5/5/13
to pyth...@hamakor.org.il
On Sun, May 5, 2013 at 12:24 PM, גל וין gal vine <gal...@npa.org.il> wrote:

Hey all,

 

I'm trying to use the docx package to parse docx files, and use Hebrew in it, not very successfully.

Also I need to be using templates and\or bookmarks on those documents, -

But the package is of low documentation, and I need some assistance.

 



Nope - never done that, just general comments:

 The project seems active on github. If something breaks while using Hebrew, but works otherwise you can try opening an issue
 https://github.com/mikemaccana/python-docx/issues
   (also let us know - there's a local forum for promoting Hebrew/bidi related issues in free software)
 

Any of you had the pleasure of using that,

Or working (successfully) with win32com.client to parse word documents?

 

 
Haven't done that either (I did use win32com to communicate with Outlook - but that was long time ago).

Another idea which you might try, in case the python ODF tools are more mature:
 Convert to ODF (using the libreoffice/openoffice docx filter, which is quite good), then try ezodf/odfpy...


Just my 2c,

    AA

asaf greenberg

unread,
May 5, 2013, 3:27:53 PM5/5/13
to pyth...@hamakor.org.il

as last resort,
if you don't mind the structure too much, you can always extract the DOCX file - it's a ZIP file.
and retrieve your particular item using lxml, beautifulsoup, plain html2text or even regular expressions.
Reply all
Reply to author
Forward
0 new messages