node module for docx to html

337 views
Skip to first unread message

Arvind T

unread,
Nov 18, 2014, 12:44:47 AM11/18/14
to nod...@googlegroups.com
Hi
Is there any node module to converting docx to html without losing the format?
I have tried mammoth but the format is getting lost.

Matt

unread,
Nov 18, 2014, 10:03:27 AM11/18/14
to nod...@googlegroups.com
I ended up controlling OpenOffice via the node-java module to do this. It worked, used gobs and gobs of memory, and wasn't particularly fast, but hey it worked.

Unfortunately I no longer have the code available to me.

Matt.

--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+un...@googlegroups.com.
To post to this group, send email to nod...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/91a61680-bdcc-429c-b504-749edda5dd7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ken

unread,
Nov 19, 2014, 3:18:21 PM11/19/14
to nod...@googlegroups.com
I did some work on this using XSL a few years ago which luckily I seem to have saved.  This transform was designed to replace the XSL at the heart of Sharepoint's Word to HTML converter. That converter handled extracting the XML from the docx (which is really just a zip file), so you'd have to build a bit of infrastructure around it, but it could get you started:


Note that this was designed for a particular use model of allowing business users to update website content by uploading word docs.  We wanted the HTML to be semantic (e.g. a bulleted list in Word became a <UL> in HTML, a paragraph became a <P>, while the original DocX2HTML just used DIVs for everything).  It doesn't try to preserve all in-line formats, but relied on the idea of mapping Word style names to CSS classes (the creation of the CSS stylesheets was done by hand).

--Ken
Reply all
Reply to author
Forward
0 new messages