Sphinx 1.6 em/en dash conversion change?

63 views
Skip to first unread message

sit...@gmail.com

unread,
Jun 26, 2017, 2:39:56 AM6/26/17
to sphinx-dev
Hi,

From what I can see Sphinx 1.6 has moved over to converting double dashes (--) to em dashes and triple dashes (---) to en dashes. Unfortunately in Sphinx 1.5 and below it looks like double dashes went to *en* dashes and triple dashes when to *em* dashes.

This change means it's not possible to have a project that will typeset the same way in both 1.5 and 1.6 versions of Sphinx as older versions have no concept of setting the dash format used. Would it be possible to change the dash formats back to match what happened in 1.5 but make all new projects write a configuration option that explicitly asks for double dash to go to em and triple dash to go to en?

jfbu

unread,
Jun 26, 2017, 6:27:00 AM6/26/17
to sphin...@googlegroups.com
Hi,

There was no such intentional change. What is your docutils version?
and are you using Sphinx 1.6.1 or 1.6.2 ?

maybe some extension is interfering? it would be interesting to know.

With docutils 0.13.1 I consistently get

<p>Two hyphens –</p>
<p>Three hyphens —</p>

in html output from ``--``, respectively ``---`` in rst source file,
with Sphinx 1.6.1, 1.6.2, and current HEAD.

Since Sphinx 1.6 such conversion is handled by docutils.

http://docutils.sourceforge.net/docs/user/config.html#smart-quotes

Docutils source code has potential for converting ``--`` rather to EM dash,
and ``---`` to EN dash, but for versions < 0.14, the smart quotes
transform action is hard-coded
and it maps ``--`` to EN dash and ``---`` to EM dash.

When Docutils>=0.14 is used, Sphinx patches nothing, but uses a derived
class for some reasons, and it could benefit from the class attribute
added at Docutils 0.14 called ``smartquotes_action``.

With Docutils<0.14 Sphinx needs to over-write more, and it could
take this opportunity to achieve same as ``smartquotes_action``.

Thus, Sphinx could provide a user config setting to influence this
``smartquotes_action``. But it does not so far.

As far as I can tell Docutils has no user interface for its 0.14
`smartquotes_action`.
It is only customizable at developer level.


Jean-François




Sitsofe Wheeler

unread,
Jun 26, 2017, 12:57:03 PM6/26/17
to sphin...@googlegroups.com
Hi,

On 26 June 2017 at 11:26, jfbu <jf...@free.fr> wrote:
> Le 26/06/2017 à 08:39, sit...@gmail.com a écrit :
>>
>> Hi,
>>
>> From what I can see Sphinx 1.6 has moved over to converting double dashes
>> (--) to em dashes and triple dashes (---) to en dashes. Unfortunately in
>> Sphinx 1.5 and below it looks like double dashes went to *en* dashes and
>> triple dashes when to *em* dashes.
>>
>> This change means it's not possible to have a project that will typeset
>> the
>> same way in both 1.5 and 1.6 versions of Sphinx as older versions have no
>> concept of setting the dash format used. Would it be possible to change
>> the
>> dash formats back to match what happened in 1.5 but make all new projects
>> write a configuration option that explicitly asks for double dash to go to
>> em and triple dash to go to en?
>>
>
> There was no such intentional change. What is your docutils version?
> and are you using Sphinx 1.6.1 or 1.6.2 ?

1.6.2

> maybe some extension is interfering? it would be interesting to know.
>
> With docutils 0.13.1 I consistently get
>
> <p>Two hyphens –</p>
> <p>Three hyphens —</p>
>
> in html output from ``--``, respectively ``---`` in rst source file,
> with Sphinx 1.6.1, 1.6.2, and current HEAD.

Hmm. My docutils is 0.13.1 and I'm trying to convert the fio
documentation into HTML.
From https://github.com/sphinx-doc/sphinx/blob/7ffd6ccee8b0c6316159c4295e2f44f8c57b90d6/sphinx/util/smartypants.py
def educate_tokens(text_tokens, attr='1', language='en'):
[...]
# Parse attributes:
# 0 : do nothing
# 1 : set all
[...]
elif attr == "1": # Do everything, turn all options on.
do_quotes = True
do_backticks = 1
do_dashes = 1
[...]
if do_dashes == 1:
text = smartquotes.educateDashes(text)

Looking at https://sourceforge.net/p/docutils/code/HEAD/tree/tags/docutils-0.13.1/docutils/utils/smartquotes.py
see this:
[...]
default_smartypants_attr = "1"
[...]
def educateDashes(text):
"""
Parameter: String (unicode or bytes).
Returns: The `text`, with each instance of "--" translated to
an em-dash character.
"""

text = re.sub(r"""---""", smartchars.endash, text) # en (yes, backwards)
text = re.sub(r"""--""", smartchars.emdash, text) # em (yes, backwards)
return text

So double dash goes to em (which is the longer dash).

> Since Sphinx 1.6 such conversion is handled by docutils.
>
> http://docutils.sourceforge.net/docs/user/config.html#smart-quotes
>
> Docutils source code has potential for converting ``--`` rather to EM dash,
> and ``---`` to EN dash, but for versions < 0.14, the smart quotes
> transform action is hard-coded
> and it maps ``--`` to EN dash and ``---`` to EM dash.
>
> When Docutils>=0.14 is used, Sphinx patches nothing, but uses a derived
> class for some reasons, and it could benefit from the class attribute
> added at Docutils 0.14 called ``smartquotes_action``.
>
> With Docutils<0.14 Sphinx needs to over-write more, and it could
> take this opportunity to achieve same as ``smartquotes_action``.

This would be the situation I'm in.

> Thus, Sphinx could provide a user config setting to influence this
> ``smartquotes_action``. But it does not so far.
>
> As far as I can tell Docutils has no user interface for its 0.14
> `smartquotes_action`.
> It is only customizable at developer level.

--
Sitsofe | http://sucs.org/~sits/

jfbu

unread,
Jun 26, 2017, 1:25:48 PM6/26/17
to sphin...@googlegroups.com
> do_dashes = 1S
> [...]
> if do_dashes == 1:
> text = smartquotes.educateDashes(text)
>


Yes, but the method is actually (at Sphinx 1.6.2, because it is slightly
different in future 1.6.3 regarding backticks) using "2" not "1".

This is decided by the apply() method of SmartQuotes class from
docutils.transforms.universal

> Looking at https://sourceforge.net/p/docutils/code/HEAD/tree/tags/docutils-0.13.1/docutils/utils/smartquotes.py
> see this:
> [...]
> default_smartypants_attr = "1"
> [...]
> def educateDashes(text):
> """
> Parameter: String (unicode or bytes).
> Returns: The `text`, with each instance of "--" translated to
> an em-dash character.
> """
>
> text = re.sub(r"""---""", smartchars.endash, text) # en (yes, backwards)
> text = re.sub(r"""--""", smartchars.emdash, text) # em (yes, backwards)
> return text
>
> So double dash goes to em (which is the longer dash).


But the actual docutils method used is smartquotes.educateDashesOldSchool(text)


Have you tested ?

Note: at Sphinx 1.6.3, the "2" is overridden to be "qDe"
which is a backport of docutils 0.14 default to earlier
versions of Docutils. The difference is that it skips
the backticks processing, This matches with Sphinx 1.5.6
behaviour. Anyhow, backticks should not occur usually
as they are mainly involved in reST markup. But it is
better for efficiency to not even fiddle with them
when they end up in text nodes.


>
>> Since Sphinx 1.6 such conversion is handled by docutils.
>>
>> http://docutils.sourceforge.net/docs/user/config.html#smart-quotes
>>
>> Docutils source code has potential for converting ``--`` rather to EM dash,
>> and ``---`` to EN dash, but for versions < 0.14, the smart quotes
>> transform action is hard-coded
>> and it maps ``--`` to EN dash and ``---`` to EM dash.
>>
>> When Docutils>=0.14 is used, Sphinx patches nothing, but uses a derived
>> class for some reasons, and it could benefit from the class attribute
>> added at Docutils 0.14 called ``smartquotes_action``.
>>
>> With Docutils<0.14 Sphinx needs to over-write more, and it could
>> take this opportunity to achieve same as ``smartquotes_action``.
>
> This would be the situation I'm in.


But nothing needs to be changed, a priori.

Please communicate an example if it demonstrates the contrary

>
>> Thus, Sphinx could provide a user config setting to influence this
>> ``smartquotes_action``. But it does not so far.
>>
>> As far as I can tell Docutils has no user interface for its 0.14
>> `smartquotes_action`.
>> It is only customizable at developer level.
>

Side note: I will soon be offline. Please don't worry if I don't reply.

I wrote the above out of memory, if I now realize I am wrong I will
correct before leaving.

Jean-François


Sitsofe Wheeler

unread,
Jun 26, 2017, 8:00:25 PM6/26/17
to sphin...@googlegroups.com
On 26 June 2017 at 18:25, jfbu <jf...@free.fr> wrote:
> Le 26/06/2017 à 18:56, Sitsofe Wheeler a écrit :
>>
>> Looking at
>> https://sourceforge.net/p/docutils/code/HEAD/tree/tags/docutils-0.13.1/docutils/utils/smartquotes.py
>> see this:
>> [...]
>> default_smartypants_attr = "1"
>> [...]
>> def educateDashes(text):
>> """
>> Parameter: String (unicode or bytes).
>> Returns: The `text`, with each instance of "--" translated to
>> an em-dash character.
>> """
>>
>> text = re.sub(r"""---""", smartchars.endash, text) # en (yes,
>> backwards)
>> text = re.sub(r"""--""", smartchars.emdash, text) # em (yes,
>> backwards)
>> return text
>>
>> So double dash goes to em (which is the longer dash).
>
> But the actual docutils method used is
> smartquotes.educateDashesOldSchool(text)
>
> Have you tested ?

I thought I had but retrying just now shows exactly the behaviour you
described so I was wrong. Thanks for your patience and sorry for the
noise!

--
Sitsofe | http://sucs.org/~sits/
Reply all
Reply to author
Forward
0 new messages