Docutils smart quotes

32 views
Skip to first unread message

jfbu

unread,
May 27, 2017, 10:15:16 AM5/27/17
to sphin...@googlegroups.com
(apologies if this message makes it twice to the list; I erred with google groups)

Hi Sphinx developers

I am moving to this list a discussion about Docutils smart quotes
which I started at
https://sourceforge.net/p/docutils/mailman/message/35861133/

It is related to https://github.com/sphinx-doc/sphinx/issues/3788
and https://github.com/sphinx-doc/sphinx/issues/3806
and https://github.com/sphinx-doc/sphinx/pull/3808
and https://github.com/sphinx-doc/sphinx/pull/3811

About the Docutils warnings reported initially at #3788,

1. the first one is due to settings['language_code'] being
set to a value for which Docutils has no localization available

2. the numerous other ones about smart quotes should be only
one per document, the issue will be fixed upstream at Docutils

https://sourceforge.net/p/docutils/mailman/message/35862092/

The rest of this message is a quote from Günter which describes
well the whole context with one reply of mine on one paragraph

Dear Jean-François,

how about moving this discussion to the sphinx-devel mail list? (Feel
free to quote me there. No need to send a copy, I am subscribed there.)

On 26.05.17, jfbu wrote:

.. and only a little with Sphinx ;-) by the way it looks as if
with Sphinx before 1.6.1, calls to Docutils ``get_language()``
always were with ``'en'``, as far as I understand. So the issue
did not arise so far.

This would be a true bug.

Your tests with revealed that it is rather an "implementation detail" --
Sphinx bypasses the Docutils language support mechanism.

I don't knwo why it is done this way, maybe it seemed easier than fixing
Docutils or moving the Docutils-lang support to "*po" files.
In any case it has some disadvantages.

...

on a project with language Turkish containing
a ``.. caution::`` directive. The html output contains Uyarı which
I assume is Turkish for Caution ;-)

OK, so Sphinx expanded the language support.

...

According to your earlier explanations this might be explained if
Sphinx relies entirely on its own localization, not on Docutils's one.

Yes, this seems to be the case.


How about localized directive names?

With Docutils, I can write ::

.. astuce::

Some directives can be given in the document's language, too!

translate with ``rst2html5 --language=fr`` and I get:

<div class="admonition tip">
<p class="admonition-title">Astuce</p>
<p>Some directives can be given in the document’s language, too!</p>
</div>

The localisations are in "docutils/parsers/rst/languages/fr.py".


This does not work with Sphinx:

make -e SPHINXOPTS="-D language='fr'" html

3788smartquotes/index.rst:21: ERROR: Unknown directive type "astuce".

.. astuce::

c'est du français


Also, the header says::

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">

Is the correct language given in Sphinx-exported documents? (I think so,
but it would be nice to know for sure.)


for html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml" lang="fr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

by the way Firefox displays in red the DOCTYPE tag, it complains about it.


this is with 1.5.6. I get the same with 1.6.1 + PR #3811.

The correct lang="fr" appears in html despite Docutils having been
used with ``language_code`` set to ``'en'`` in the document settings



And testing with French I see the Caution is translated into Prudence,
(Avertissement is used for Warning, according to the fr/sphinx.po
file), despite the Docutils get_language having been called
only with 'en'.

This means that the same French rST document will be converted differently
by Docutils vs. Sphinx:

docutils/languages/fr.py contains the translations

u'caution': u'Avertissement!',
...
u'warning': u'Avis',

It would be good to "harmonize" the translations between Doctutils and Sphinx.
However, this goal may clash with compatibility considerations :-(

The Sphinx message catalogs are under

https://github.com/sphinx-doc/sphinx/tree/master/sphinx/locale

But translations are managed at transifex and merged into master prior
to major releases (as far as I understand...)


...


I tend to deduce from your precise explanations and the experiment
reported above that it should indeed be possible to access the Docutils
smart quotes facilities without restraining the language to those
for which Docutils has already contributed translations available.

I think so.

Assumptions:

* Old Sphinx bypassed the Docutils language framework by passing "en" to
"docutils.languages.get_language()".

* Now, in order to get "localized" smart-quotes, the document language
must be known to Docutils.

Currently this has the side-effect that it is also passed to
"docutils.languages.get_language()"
which leads to "missing language support" warnings for languages not fully
supported by Docutils.

Currently the fix mentioned above drops smart quotes if get_language()
informs the language has no Docutils provided translations,

Alternatives:

a) silence all warnings by setting the "report-level" Docutils setting.

+1 fast and easy workaround for affected users
-1 suppresses also warnings the user might better see :-(

b) define and use smart-quotes for a supported "mock language" ("en", say).

-1 dirty hack

c) add full support for the respective languages to Docutils

+1 helps Docutils evolve
-1 takes time to help the Sphinx end user

d) let Sphinx overwrite docutils.languages.get_language() to always
return the "en" module regardless of input and without warning.

+1 functionally equivalent to the previous behaviour (passing "en" to
get_language).
+1 selective silencing of a warning that in Sphinx is a false positive
+1 Docutils still has the correct document language setting.

-1 deepens the split between Docutils and Sphinx language
support.
-1 no incentive to provide translations to Docutils.


Günter

Thanks again Günter for the time spent on your explanations

best,

Jean-François

jfbu

unread,
May 27, 2017, 10:23:55 AM5/27/17
to sphin...@googlegroups.com
ah well, all the quotation formatting was removed.

Trying again sorry...

(apologies if this message makes it twice to the list; I erred with google groups)

Hi Sphinx developers

I am moving to this list a discussion about Docutils smart quotes
which I started at
https://sourceforge.net/p/docutils/mailman/message/35861133/

It is related to https://github.com/sphinx-doc/sphinx/issues/3788
and https://github.com/sphinx-doc/sphinx/issues/3806
and https://github.com/sphinx-doc/sphinx/pull/3808
and https://github.com/sphinx-doc/sphinx/pull/3811

About the Docutils warnings reported initially at #3788,

1. the first one is due to settings['language_code'] being
set to a value for which Docutils has no localization available

2. the numerous other ones about smart quotes should be only
one per document, the issue will be fixed upstream at Docutils

https://sourceforge.net/p/docutils/mailman/message/35862092/

The rest of this message is a quote from Günter which [who] describes
well the whole context with one [or three] reply of mine on one paragraph
[edit: the html header indeed looks sub-optimal]

jfbu

unread,
May 27, 2017, 11:59:49 AM5/27/17
to sphin...@googlegroups.com
Le 27/05/2017 à 16:23, jfbu a écrit :
> Alternatives:
>
> a) silence all warnings by setting the "report-level" Docutils setting.
>
> +1 fast and easy workaround for affected users
> -1 suppresses also warnings the user might better see :-(
>
> b) define and use smart-quotes for a supported "mock language" ("en", say).
>
> -1 dirty hack
>
> c) add full support for the respective languages to Docutils
>
> +1 helps Docutils evolve
> -1 takes time to help the Sphinx end user
>
> d) let Sphinx overwrite docutils.languages.get_language() to always
> return the "en" module regardless of input and without warning.
>
> +1 functionally equivalent to the previous behaviour (passing "en" to
> get_language).
> +1 selective silencing of a warning that in Sphinx is a false positive
> +1 Docutils still has the correct document language setting.
>
> -1 deepens the split between Docutils and Sphinx language
> support.
> -1 no incentive to provide translations to Docutils.
>

at https://github.com/sphinx-doc/sphinx/pull/3813 I propose

e) let Sphinx overwrite docutils.languages.get_language() to always
treat its optional second argument as if it were None, so that
no Warning is emitted if the first argument is not among the
languages supported by Docutils for localization

And an a priori check is done by Sphinx if the language has
smart quotes available via Docutils and it sets the smart_quotes
setting to True only then

The user optional docutils.conf is treated after that so it
can force attempt to use smart quotes,
if for some unknown reason the Sphinx test for availability of Docutils
smart quotes erroneously dis-activated or to the contrary
erroneously activated the feature

Jean-François


Guenter Milde

unread,
May 27, 2017, 2:25:40 PM5/27/17
to sphin...@googlegroups.com
On 2017-05-27, jfbu wrote:
> Le 27/05/2017 à 16:23, jfbu a écrit :
>> Alternatives:

>> a) ...

...


> at https://github.com/sphinx-doc/sphinx/pull/3813 I propose

> e) let Sphinx overwrite docutils.languages.get_language() to always
> treat its optional second argument as if it were None, so that
> no Warning is emitted if the first argument is not among the
> languages supported by Docutils for localization

This can be made even simpler. Instead of overwriting
docutils.languages.get_language(), just call it without second argument
(or with "reporter=None") to drop the warnings.


> And an a priori check is done by Sphinx if the language has
> smart quotes available via Docutils and it sets the smart_quotes
> setting to True only then

Instead of this, ensure that there are pre-defined smart quotes for all
Sphinx-supported languages. (Less work at the end and better for the
users.)

As Sphinx currently uses a smartquotes "monkey patch", additions to the
dictionary of pre-defined smartquotes will be available as fast as the
proposed a priori language check. When providing the additions to
Docutils, they will become part of the upstream with the next release...

Günter

jfbu

unread,
May 27, 2017, 3:44:45 PM5/27/17
to sphin...@googlegroups.com
>
> This can be made even simpler. Instead of overwriting
> docutils.languages.get_language(), just call it without second argument
> (or with "reporter=None") to drop the warnings.
>
>

afaik and to the extent I can be trusted, there is no call to
get_language() from Sphinx itself, they all get issued by Docutils
own code

ok I confirmed on small project using traceback.print_stack(limit=2)
in the monkey patched get_language. It could be that the arguments
originate in Sphinx, but I don't get that feeling. Sphinx hands over
document processing to Docutils and the get_language originate there.

>> And an a priori check is done by Sphinx if the language has
>> smart quotes available via Docutils and it sets the smart_quotes
>> setting to True only then
>
> Instead of this, ensure that there are pre-defined smart quotes for all
> Sphinx-supported languages. (Less work at the end and better for the
> users.)

I didn't even think of that... I haven't even checked so far
which Sphinx-supported
languages are actually lacking smart quotes support in 0.13.1 :-(
(apart from Turkish...)

as I noticed 0.14 got quite some additions I was hoping not to have
to handle this in details...

>
> As Sphinx currently uses a smartquotes "monkey patch", additions to the

The monkey patch is applied only if Docutils version is < 0.13.2, in order
to fix some issues of 0.13.1

> dictionary of pre-defined smartquotes will be available as fast as the

but indeed it is always possible to add to smartquotes.smartchars.quotes

currently it is done for French, but this will get obsoleted with
0.14. It will remain in conditional Sphinx code as long as 0.13.1
is supported by Sphinx

> proposed a priori language check. When providing the additions to
> Docutils, they will become part of the upstream with the next release...
>

you mean there is work for us left to do ;-) I was hoping all the
complicated smart quotes business would resolve itself auto-magically
via mystical powers at Docutils upstream ;-) for all languages...

Best, JF


jfbu

unread,
May 27, 2017, 4:07:44 PM5/27/17
to sphin...@googlegroups.com
Le 27/05/2017 à 21:44, jfbu a écrit :
>>
>> This can be made even simpler. Instead of overwriting
>> docutils.languages.get_language(), just call it without second argument
>> (or with "reporter=None") to drop the warnings.
>>
>>

reading again I think you actually meant exactly what has
been merged now at Sphinx

https://github.com/sphinx-doc/sphinx/pull/3813/files

My use of "overwriting" did not convey that what you suggest
is actually what we have done

Best, JF

Guenter Milde

unread,
May 28, 2017, 5:26:13 AM5/28/17
to sphin...@googlegroups.com
On 2017-05-27, jfbu wrote:

>> This can be made even simpler. Instead of overwriting
>> docutils.languages.get_language(), just call it without second argument
>> (or with "reporter=None") to drop the warnings.

> afaik and to the extent I can be trusted, there is no call to
> get_language() from Sphinx itself, they all get issued by Docutils
> own code

I see, then your overwriting is the way to go...

(I'd be a bit more verbose in the comment, e.g.

+# monkey patch docutils get_language to suppress warnings:
+def docutils_get_language_dont_warn(language_code, reporter=None): # NOQA
+ return docutils_get_language(language_code, None)
+
+docutils.languages.get_language = docutils_get_language_dont_warn
)

...


>>> And an a priori check is done by Sphinx if the language has
>>> smart quotes available via Docutils and it sets the smart_quotes
>>> setting to True only then

>> Instead of this, ensure that there are pre-defined smart quotes for all
>> Sphinx-supported languages. (Less work at the end and better for the
>> users.)

> I didn't even think of that... I haven't even checked so far
> which Sphinx-supported
> languages are actually lacking smart quotes support in 0.13.1 :-(
> (apart from Turkish...)

> as I noticed 0.14 got quite some additions I was hoping not to have
> to handle this in details...

These are the missing quote-locales in my local version (which adds
mk, nb, and no to 0.14rc1:

+ 'mk': u'„“‚‘', # Macedonian, https://mk.wikipedia.org/wiki/Правопис_и_правоговор_на_македонскиот_јазик
+ 'nb': u'«»’’', # Norsk bokmål (canonical form 'no')
+ 'nn': u'«»’’', # Nynorsk [10]
+ 'no': u'«»’’', # Norsk bokmål [10]



>> As Sphinx currently uses a smartquotes "monkey patch", additions to the

> The monkey patch is applied only if Docutils version is < 0.13.2, in order
> to fix some issues of 0.13.1

>> dictionary of pre-defined smartquotes will be available as fast as the

> but indeed it is always possible to add to smartquotes.smartchars.quotes

> currently it is done for French, but this will get obsoleted with
> 0.14. It will remain in conditional Sphinx code as long as 0.13.1
> is supported by Sphinx

I see. Thank you for the clarification.


>> proposed a priori language check. When providing the additions to
>> Docutils, they will become part of the upstream with the next release...


> you mean there is work for us left to do ;-) I was hoping all the
> complicated smart quotes business would resolve itself auto-magically
> via mystical powers at Docutils upstream ;-) for all languages...


Well, to quote myself (in the posting justifying the "smartquotes-locales"
setting: The code is complicated because the task is complicated.

Also, Docutils has smart quotes as "opt-in" for a reason: the code is
fragile and error prone (the automatism may be 95% right but this is not
enough for "always on"). This is especially true for languages with heavy
use of the apostrophe (as Dutch).

Günter

Reply all
Reply to author
Forward
0 new messages