ODT to sphinx?

203 views
Skip to first unread message

Chris Withers

unread,
Apr 2, 2009, 5:55:03 AM4/2/09
to sphin...@googlegroups.com
Hi All,

I have an OpenOffice.org doc that contains a load of text and tables,
all styled up (little hand formatting) from my tutorial at PyCon this
year. I'm interested in turning it into documentation for the packages I
covered and I was wondering if anyone knew of a way of automating this?

I'm interested in the options for doing this as a one-off export btu
also interested if anyone has found a way to do thsi as an ongoing thing?

Hmmm, I guess what I'm really interested in is a wysiwig ReST editor. Is
one of those in existence anywhere?

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk

James Rowe

unread,
Apr 5, 2009, 10:51:26 PM4/5/09
to sphin...@googlegroups.com
* Chris Withers (ch...@simplistix.co.uk) wrote:
> I have an OpenOffice.org doc that contains a load of text and tables,
> all styled up (little hand formatting) from my tutorial at PyCon this
> year. I'm interested in turning it into documentation for the packages I
> covered and I was wondering if anyone knew of a way of automating this?

I've had some luck converting XHTML to reST with well... xhtml2rest[1]
in the past. You probably would have to push OOo's output through tidy
first to make it work, unless its output has become much better since
I last used it. I think it is all going to depend on how much styling
you wish to keep from the OOo file how successful this is going to be.
Also, don't forget that ODT is only a bunch of XML files in a zip, you
could probably hack a quick generator with XSLT at a push.

And I've just found odt2rst[2] in The Google by guessing the search
term, but I've never used it.

Thanks,

James
1. http://docutils.sourceforge.net/sandbox/xhtml2rest/
2. http://code.google.com/p/odt2rst/

Tim Michelsen

unread,
Apr 5, 2009, 7:35:34 PM4/5/09
to sphin...@googlegroups.com
Chris Withers schrieb:

> Hi All,
>
> I have an OpenOffice.org doc that contains a load of text and tables,
> all styled up (little hand formatting) from my tutorial at PyCon this
> year. I'm interested in turning it into documentation for the packages I
> covered and I was wondering if anyone knew of a way of automating this?
AAIK, Sphinx/docutils do not have this.

I only know of converters that may get you started. I would be a
therrific addition to docutils/sphinx:
* PyODConverter: http://www.artofsolving.com/opensource/pyodconverter
=> requires openoffice install
=> i used it here: Nautilus-Skript zur Konvertierung von Oopenoffice
Dokumenten: http://forum.ubuntuusers.de/post/850846/
(German, but you may read the bash script)
* in the net I found also:
http://opendocumentfellowship.com/development/projects/odfpy

> Hmmm, I guess what I'm really interested in is a wysiwig ReST editor. Is
> one of those in existence anywhere?

* Gedit has a ReSt good schema:
http://textmethod.com/wiki/ReStructuredTextToolsForGedit
Unfortunatly, pydev doesn't support it yet.

Georg Brandl

unread,
Apr 7, 2009, 4:09:51 PM4/7/09
to sphin...@googlegroups.com
Chris Withers schrieb:

> Hi All,
>
> I have an OpenOffice.org doc that contains a load of text and tables,
> all styled up (little hand formatting) from my tutorial at PyCon this
> year. I'm interested in turning it into documentation for the packages I
> covered and I was wondering if anyone knew of a way of automating this?
>
> I'm interested in the options for doing this as a one-off export btu
> also interested if anyone has found a way to do thsi as an ongoing thing?

At PyCon, I've spoken with people who'd also like to have Sphinx output ODT.
It seems that someone has to do something about that situation ;)

> Hmmm, I guess what I'm really interested in is a wysiwig ReST editor. Is
> one of those in existence anywhere?

I don't know of one. It seems to me that while it is hard to get right, it
is also not as useful for reST as for many other heavy markup languages.

Georg

Chris Withers

unread,
Apr 7, 2009, 7:07:12 PM4/7/09
to sphin...@googlegroups.com
Georg Brandl wrote:
>> I'm interested in the options for doing this as a one-off export btu
>> also interested if anyone has found a way to do thsi as an ongoing thing?
>
> At PyCon, I've spoken with people who'd also like to have Sphinx output ODT.
> It seems that someone has to do something about that situation ;)

Well, it's kind of the opposite... The one thing that does my head in
with Sphinx is that it feel pretty insane to me, when we wave good gui
tools like <insert favourite word processor here> that we're going back
to manually hacking plain text and including the nastiness that ReST
requires by hand...

>> Hmmm, I guess what I'm really interested in is a wysiwig ReST editor. Is
>> one of those in existence anywhere?
>
> I don't know of one. It seems to me that while it is hard to get right, it
> is also not as useful for reST as for many other heavy markup languages.

ReST *is* a heavy markup language ;-)

Chris Withers

unread,
Apr 11, 2009, 7:03:53 AM4/11/09
to sphin...@googlegroups.com
Georg Brandl wrote:
>> I'm interested in the options for doing this as a one-off export btu
>> also interested if anyone has found a way to do thsi as an ongoing thing?
>
> At PyCon, I've spoken with people who'd also like to have Sphinx output ODT.
> It seems that someone has to do something about that situation ;)

Hmm, take 2 on a response, after having a bit more of a think...

So, wouldn't it be cool if you could

- check out some ReST documentation

- run rest2odt to turn it into a .odt

- edit in OOo

- run odt2rest to turn in back into ReST

- check in the docs

...where "rest" might be Sphinx-specific?

That was we get rest goodness and all the tools Sphinx supplies and a
nice wysiwig editing environment...

I guess I'd need to get to know Sphinx much better before I attempted
implementing any of this, but do people feel this would be a "good
thing" and/or possible/easy to implement?

cheers,

Chris

PS: The following libraries look relevant:
http://pypi.python.org/pypi/appy.pod/0.2.1
http://pypi.python.org/pypi/relatorio/0.5.0
http://opendocumentfellowship.com/development/projects/odfpy
http://www.rexx.com/~dkuhlman/odtwriter.html

Has anyone used any of these?
The other side seems harder... Has anyone found anything in python for
parsing a .odt back into python? Once it's back in "python", are there
libraries for writing sphinx-ish ReST?

chris h

unread,
Apr 11, 2009, 8:02:39 AM4/11/09
to sphin...@googlegroups.com
On Saturday 11 April 2009 07:03:53 Chris Withers wrote:

One this to consider Chris is that once the document has been edited it needs
to be processed by sphinx. I'm sure this can be handled by a simple script
that either polls a directory at a specific term or called as a subprocess
being able to tell whether the output is one of three.

Secondly, since sphinx docs in my setup are served from an apache document
root directory there are permission to consider as well.

Thanks for your work on squishdot. Enjoyed that while it lasted. No longer
play with zope/plone and friends as they moved way past my requirements.
Shpinx is nice and light, relatively simple and very easy to learn, a few
nigglies to sort out but with time I'm sure it will get as close to
perfection as needed. Hope it stays light however.

OOo is a very very heavy editor not to discourage you. Check out
http://code.google.com/p/ulipad/ as its a very nice fully functional gui for
rest documents. All that is missing is post editing processor ie: a few
scripts to compile docs and move them to a predetermined directory.

Best regards and best of luck

--

/ch

James Rowe

unread,
Apr 11, 2009, 8:41:36 AM4/11/09
to sphin...@googlegroups.com
* Chris Withers (ch...@simplistix.co.uk) wrote:
>
> Georg Brandl wrote:
> Hmm, take 2 on a response, after having a bit more of a think...
>
> So, wouldn't it be cool if you could
>
> - check out some ReST documentation
>
> - run rest2odt to turn it into a .odt
>
> - edit in OOo
>
> - run odt2rest to turn in back into ReST
>
> - check in the docs
>
> ...where "rest" might be Sphinx-specific?
>
> That was we get rest goodness and all the tools Sphinx supplies and a
> nice wysiwig editing environment...

Personally I fail to see how it would add anything to editing other
than confusion, some examples follow.

What should happen when users have selected styling options? Say
a user has selected Arial 24 to use in a heading instead of using the
semantic option and selecting headline2(or whatever your word processor
uses) how would you cope with that during export? If a user has chosen
to make half a line green how should that be treated? How should
typeface changes be treated? What happens when a word is double
underlined in a paragraph?

Should non-semantic styling just be dropped on the floor during
conversion, or would you shove stylistic attributes in to a reST comment
so the transform could be two-way? I'm not trying to pile on the stop
motion with my comments here, just thinking about what you suggested.

My editor shows graphically bold, underline, headings, and such. It
allows me to jump back and forth between link text and link definitions
with a keystroke, etc. It tells me when I've made a reST formatting
error, and takes me to it. That is definitely good enough for me, but
yeah I can imagine some people would like more mouse oriented options.

I'd hazard a guess that many of those people who aren't satisfied with
their current tools could be satisfied with a web based editor as has
been discussed here before, and it wouldn't need all the hassle of
training users not to use half the functionality of their word processor
that can't be expressed in reST(and I'd argue thankfully so).

> PS: The following libraries look relevant:

> Has anyone used any of these?
> The other side seems harder... Has anyone found anything in python for
> parsing a .odt back into python? Once it's back in "python", are there
> libraries for writing sphinx-ish ReST?

It's only XML so basically any XML tools. ElementTree if you're going
to munge it with Python, XSLT if you just want to push it through
a filter it in to some other format.


Thanks,

James

Chris Withers

unread,
May 1, 2009, 10:59:55 AM5/1/09
to sphin...@googlegroups.com
chris h wrote:
> OOo is a very very heavy editor not to discourage you.

I've kinda swung round to thinking about OOo again...

Yes, it's a heavy editor, that's why I want to use it. I *want* spell
checking, I want a UI that helps me rather than having to do everything
by hand.

I *don't* expect any conversion script I wrote would work with
everything-you-can-stick-in-an-odt. It would handle what ReST is capable
of handling, at best, and barf on other stuff.

Maybe one day ;-)

Chris

Chris Withers

unread,
May 1, 2009, 11:03:45 AM5/1/09
to sphin...@googlegroups.com
James Rowe wrote:
> What should happen when users have selected styling options? Say
> a user has selected Arial 24 to use in a heading instead of using the
> semantic option and selecting headline2(or whatever your word processor
> uses) how would you cope with that during export?

Ignore it, maybe issuing a warning.

> If a user has chosen
> to make half a line green how should that be treated?

Ignore it, issue a warning.

> How should
> typeface changes be treated?

Ignore it, issue a warning.

> What happens when a word is double
> underlined in a paragraph?

Ignore it, issue a warning.

> Should non-semantic styling just be dropped on the floor during
> conversion, or would you shove stylistic attributes in to a reST comment
> so the transform could be two-way?

I would ignore it and issue warning when going from ODT->ReST. I suspect
ReST is a small subset of what an ODT can handle, so I doubt that way
would be a problem, although special consideration would likely be
needed to ReST and Sphinx specific stuff like auto-indexes, etc.

> My editor shows graphically bold, underline, headings, and such. It
> allows me to jump back and forth between link text and link definitions
> with a keystroke, etc. It tells me when I've made a reST formatting
> error, and takes me to it. That is definitely good enough for me, but
> yeah I can imagine some people would like more mouse oriented options.

Which editor do you use?

> I'd hazard a guess that many of those people who aren't satisfied with
> their current tools could be satisfied with a web based editor as has
> been discussed here before, and it wouldn't need all the hassle of
> training users not to use half the functionality of their word processor
> that can't be expressed in reST(and I'd argue thankfully so).

Training is pretty simple when the ODT->ReST script ignores what it
safely can while issuing warning and plain barfing on the rest.

> It's only XML so basically any XML tools.

"only XML" - you're a funny guy ;-)

Chris

James Rowe

unread,
May 1, 2009, 4:26:37 PM5/1/09
to sphin...@googlegroups.com
[Again, this isn't supposed to be stop motion I'm actually interested in
ways to make it easier for people to contribute to documentation for
projects I'm involved with. It's one of the main reasons why I'm using
Sphinx/reST.]

* Chris Withers (ch...@simplistix.co.uk) wrote:

> Ignore it, maybe issuing a warning.

> Ignore it, issue a warning.

> Ignore it, issue a warning.

> Ignore it, issue a warning.

That is why I replied initially, the only way I could see to do this
conversion was ignoring much of the user's settings and issuing tonnes
of warnings. The outcome of which is that the user either:

a) has to now look at Sphinx output with no headings therefore no TOCs
and broken intra-document links, no text styling, or all the
various other ignored properties
b) fire up the word processor and switch all the WYSIWYG options they've
set to the required WYDefineIWYG options.

> > My editor shows graphically bold, underline, headings, and such. It
> > allows me to jump back and forth between link text and link definitions
> > with a keystroke, etc. It tells me when I've made a reST formatting
> > error, and takes me to it. That is definitely good enough for me, but
> > yeah I can imagine some people would like more mouse oriented options.
>
> Which editor do you use?

vim most of the time, but I'd expect similar functionality in any
other editor really. To your other email, using vim as the example
again, spell checking is standard and a toolbar button for bold would be
added with:

:imenu icon=<some.png> Toolbar.bold <command>

and ":vmenu" if you want to support mouse highlighted text.

> > It's only XML so basically any XML tools.
>
> "only XML" - you're a funny guy ;-)

I'm not sure I see the problem really, in the case of ODT it is well
defined and nicely namespaced XML. A glance at a local ODT here shows
you can parse headings and paragraphs out with a f-ugly 2 minute script:

import sys
import textwrap
import zipfile

from xml.etree import ElementTree as ET

ns_elem = lambda e, s="text": "{urn:oasis:names:tc:opendocument:xmlns:%s:1.0}%s" % (s, e)

HEADERS = (None, "=", "-", "'")

zip = zipfile.ZipFile(sys.argv[1])
doc = ET.parse(zip.open("content.xml"))
body = doc.find("//" + ns_elem("text", "office"))

for p in body.findall(ns_elem("p")):
style = p.get(ns_elem("style-name"))

if p.text:
if style in ("P1", "P2", "P3"):
print HEADERS[int(style[1])] * len(p.text)
print
else:
print textwrap.fill(p.text)
else:
print

Of course, it breaks down on other files because their authors have set
different styling options(like using P2 for paragraphs) but I expected
that. My point is the parsing[1] is simple, especially because of the
XML, but you can't trust the styling. I will add the caveat that you're
going to need a real XML parser, because of the extensive namespace
usage.

Thanks,

James

1. And if I was doing this properly I'd write a parser not a hack like
above, but it worked in IPython and that's what matters :)

Chris Withers

unread,
May 9, 2009, 11:28:31 AM5/9/09
to sphin...@googlegroups.com
James Rowe wrote:
> That is why I replied initially, the only way I could see to do this
> conversion was ignoring much of the user's settings and issuing tonnes
> of warnings.

I don't agree. If you're roundtripping ReST, then what tool produces the
ODT will give you an ODT that will re-import.

If you're starting from scratch, I'm imagine a ReST template in ODT
format that helps you "do the right thing". Suitables styles, etc,
should make it easy.

> a) has to now look at Sphinx output with no headings therefore no TOCs
> and broken intra-document links, no text styling, or all the
> various other ignored properties

Why said anything about this? The above all sound like things that could
be roundtripped...

> b) fire up the word processor and switch all the WYSIWYG options they've
> set to the required WYDefineIWYG options.

I don't understand what you're saying here...

> vim most of the time, but I'd expect similar functionality in any
> other editor really. To your other email, using vim as the example
> again, spell checking is standard and a toolbar button for bold would be
> added with:

Is there a ReST mode for emacs?

>>> It's only XML so basically any XML tools.
>> "only XML" - you're a funny guy ;-)
>
> I'm not sure I see the problem really, in the case of ODT it is well
> defined and nicely namespaced XML. A glance at a local ODT here shows
> you can parse headings and paragraphs out with a f-ugly 2 minute script:

...yes, fugly. "Only" when applied to anything as complex as ReST or ODT
is a bit of a joke...

James Rowe

unread,
May 9, 2009, 12:29:50 PM5/9/09
to sphin...@googlegroups.com
* Chris Withers (ch...@simplistix.co.uk) wrote:
> James Rowe wrote:
> Is there a ReST mode for emacs?

Yes rst.el, it comes with emacs.

Thanks,

James

Guenter Milde

unread,
May 11, 2009, 2:54:54 AM5/11/09
to sphin...@googlegroups.com
On 2009-05-09, Chris Withers wrote:

> Is there a ReST mode for emacs?

Not only for emacs...

Docutils maintains a list of ReST supporting editors (and other relevant
links) at the `Docutils Link List`__.

__ http://docutils.sourceforge.net/docs/user/links.html

Günter

Reply all
Reply to author
Forward
0 new messages