I am trying to compile the docs of Pandas but I am unable to get Sphinx to compile a document with some unicode. Is there some flag I need to specify to let Sphinx correctly build documents with unicode in them? In this case, I don't want Sphinx to decode the text... _io.unicode:
Dealing with Unicode Data
~~~~~~~~~~~~~~~~~~~~~~~~~
The ``encoding`` argument should be used for encoded unicode data, which will
result in byte strings being decoded to unicode in the result:
.. ipython:: python
data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'
df = pd.read_csv(StringIO(data), encoding='latin-1')
df
df['word'][1]
Some formats which encode all characters as multiple bytes, like UTF-16, won't
parse correctly at all without specifying the encoding.--Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/sphinx/cmdline.py", line 247, in main
app.build(force_all, filenames)
File "/usr/local/lib/python2.7/dist-packages/sphinx/application.py", line 211, in build
self.builder.build_update()
File "/usr/local/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 211, in build_update
'out of date' % len(to_build))
File "/usr/local/lib/python2.7/dist-packages/sphinx/builders/__init__.py", lin e 231, in build
purple, length):
File "/usr/local/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 131, in status_iterator
for item in iterable:
File "/usr/local/lib/python2.7/dist-packages/sphinx/environment.py", line 458, in update_generator
self.read_doc(docname, app=app)
File "/usr/local/lib/python2.7/dist-packages/sphinx/environment.py", line 609, in read_doc
raise SphinxError(str(err))
SphinxError: 'utf8' codec can't decode byte 0xe4 in position 36: invalid continuation byte
> /usr/local/lib/python2.7/dist-packages/sphinx/environment.py(609)read_doc()
-> raise SphinxError(str(err))
(Pdb)
You received this message because you are subscribed to the Google Groups "sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sphinx-users...@googlegroups.com.
To post to this group, send email to sphinx...@googlegroups.com.
Visit this group at http://groups.google.com/group/sphinx-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
data = 'word,length\nTr\xe4umen,7\nGr\xfc\xdfe,5'
--
You received this message because you are subscribed to the Google Groups "sphinx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sphinx-users...@googlegroups.com.
To post to this group, send email to sphinx...@googlegroups.com.
Visit this group at http://groups.google.com/group/sphinx-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
I’m not a specialist on this, but
seeing your html I would think, that you would like to see those 4 characters \xe4 in your html document, but Sphinx sees this as the utf-8 form of one (non existing) unicode character.
If that is correct, I would try some escaping to bypass the parsing of this sequence as utf-8.
Lothar
--