[kramdown-users] Word docx to kramdown converter

Michael Franzl

unread,

Dec 30, 2013, 7:43:18 AM12/30/13

to kramdow...@rubyforge.org

Hi,

I have written a docx to kramdown converter, and just released it here
for the public:

http://rubygems.org/gems/docx_converter
https://github.com/michaelfranzl/docx_converter

Improvement suggestions are welcome,
Michael
_______________________________________________
kramdown-users mailing list
kramdow...@rubyforge.org
http://rubyforge.org/mailman/listinfo/kramdown-users

Thomas Leitner

unread,

Dec 30, 2013, 12:26:03 PM12/30/13

to kramdow...@rubyforge.org

Hi Michael,

On 2013-12-30 13:43 +0100 Michael Franzl wrote:
> I have written a docx to kramdown converter, and just released it
> here for the public:
>
> http://rubygems.org/gems/docx_converter
> https://github.com/michaelfranzl/docx_converter
>
> Improvement suggestions are welcome,

thanks for sharing!

I have had a quick look through the source code and saw that you create
a kramdown text file in the parsing code. Just curious: Why did you do
it this way instead of directly creating a kramdown document tree?

Happy new year!

-- Thomas

Michael Franzl

unread,

Dec 30, 2013, 3:28:51 PM12/30/13

to kramdow...@rubyforge.org

On 12/30/2013 06:26 PM, Thomas Leitner wrote:
> I have had a quick look through the source code and saw that you create
> a kramdown text file in the parsing code. Just curious: Why did you do
> it this way instead of directly creating a kramdown document tree?

Chiefly because Word's XML is not strictly recursive, it is like a
linear state machine where the first child of a node determines the
format for all the rest of the children. This doesn't mix well with
recursively building a kramdown tree.

However, it mixes well with outputting the kramdown ASCII syntax, which
in itself is like a state machine when parsed from left to right. I've
taken advantage of this circumstance and the code is much simpler this
way. I've still chosen a recursive algorithm, but only to avoid loops
with many .xpath selectors.

I've actually tried generating kramdown nodes but eventually had to give
up -- not because of kramdown but because of the nature of Word's XML
structure.

And, when you parse the generated kramdown syntax with kramdown, you'll
end up with the same kramdown tree :)

> Happy new year!

Thanks, to you too, and keep up the good work :)

Michael

Reply all

Reply to author

Forward