Can a Builder alter the Reader?

11 views
Skip to first unread message

Freek Dijkstra

unread,
Mar 1, 2021, 6:40:08 PM3/1/21
to sphin...@googlegroups.com

Hi,

I created an extension (https://github.com/sphinx-contrib/restbuilder) which recently gained some traction. It is typically used in this workflow:

  1. Documentation in reST with Sphinx-specific directives
  2. --> Sphinx --> reStructuredText Builder/Writer
  3. --> Documentation in reST with only doctree/"vanilla" directives
  4. --> Publish on e.g. GitHub (which understands reST, but not Sphinx-directives).

While refining the extension, I bumbed into an issue with substitutions.

To create valid reStructuredText, I need to know the doctree before the `references.Substitutions` Transform is applied. In other words, I like to disable this Transform.

Is there a way my Builder can do so?

Unfortunately, SphinxBaseReader always seem to use all transforms that are listed by readers.standalone.Reader, which includes this references.Substitutions. Should I subclass SphinxBaseReader, and if so, how can my builder specify the desired reader?

Are there perhaps easier approaches to take?

Any advice is much appreciated.

Regards,
Freek

Komiya Takeshi

unread,
Mar 2, 2021, 7:44:26 AM3/2/21
to sphin...@googlegroups.com
Hi,

Unfortunately, there is no way to install a custom reader. So it's
also no way to disable individual transform.
As a workaround, you can override `sphinx.io.SphinxStandaloneReader`
from your extension.

Note: The doctrees are cached between builders. So some extension that
disables the substitution feature affect to the other builders.

Thanks,
Takeshi KOMIYA

2021年3月2日(火) 8:40 Freek Dijkstra <fr...@macfreek.nl>:
> --
> You received this message because you are subscribed to the Google Groups "sphinx-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sphinx-dev+...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/sphinx-dev/9ea16e5f-a553-c075-e5c7-e64d0d36f380%40macfreek.nl.

Freek Dijkstra

unread,
Mar 3, 2021, 7:57:57 AM3/3/21
to sphin...@googlegroups.com
Hi Takeshi-san,

Thanks for taking the time to reply!

I ended up overriding SphinxStandalonReader, but had to make a replicate
all the code in read_doc(). So be it.

Thanks for the warning on the caching. I'll consider removing the cache
or give a warning (and advice to use -E in case another builder is used).

Do you know if this functionality is more often requested? If so, I can
make a feature request at Github. It would be nice to make read_doc()
use a registry to get the reader class (like it does with the parser),
or -even beter- have some call-back and make get_transforms() use
self.unused_transforms instead of a fixed variable unused_transforms.

Regards,
Freek Dijkstra

Komiya Takeshi

unread,
Mar 5, 2021, 8:19:16 AM3/5/21
to sphin...@googlegroups.com
Hi,

>Do you know if this functionality is more often requested?

If my memory is correct, this is the first request. Nobody wants to
install a custom reader.
But I think it would be better to provide such a API (I feel hacking
read_doc() is annoying!).

So could you file an issue, please? Then, I'll add it in the future.

Thanks,
Takeshi KOMIYA

2021年3月3日(水) 21:57 Freek Dijkstra <fr...@macfreek.nl>:
> To view this discussion on the web, visit https://groups.google.com/d/msgid/sphinx-dev/aca3b736-2706-f68f-0636-8b355ed6d6a6%40macfreek.nl.

Freek Dijkstra

unread,
Mar 7, 2021, 5:10:42 PM3/7/21
to sphin...@googlegroups.com
Hi,

My approach to override SphinxStandalonReader did work, but I think it a
troublesome approach: I have to replicate a significant amount of code,
and caching of doctrees may give unexpected results.

My current thought it to add a postprocessing transformation to my
builder, which works after the (cached) doctree is fetched, and only for
my builders (not for other builders).

This transform would re-add substitution_references and
substitution_definitions (only) where needed.

Would this be possible? I've seen add_post_transform(), but can't figure
out how to ensure that my builder, and ONLY my builder uses this.

Regards,
Freek


>> I ended up overriding SphinxStandalonReader, but had to make a replicate
>> all the code in read_doc(). So be it.
>>
>> Thanks for the warning on the caching. I'll consider removing the cache
>> or give a warning (and advice to use -E in case another builder is used).
>>
>> [...]

Komiya Takeshi

unread,
Mar 8, 2021, 8:42:31 AM3/8/21
to sphin...@googlegroups.com
> Would this be possible? I've seen add_post_transform(), but can't figure
> out how to ensure that my builder, and ONLY my builder uses this.

Yes and No. It's not difficult to move the substitution transform to
the post-transform technically.
But it will cause many troubles to the existing components because
they expect all substitutions
are processed on the parsing step. In reality, it's difficult.

It means your extension conflicts with other builders. I don't have an
idea to resolve it yet.

Takeshi KOMIYA

Freek Dijkstra

unread,
Mar 9, 2021, 3:40:11 AM3/9/21
to sphin...@googlegroups.com
I asked about post transforms:

>> My current thought it to add a postprocessing transformation to my
>> builder, which works after the (cached) doctree is fetched, and only
>> for my builders (not for other builders).

>> Would this be possible? I've seen add_post_transform(), but can't figure
>> out how to ensure that my builder, and ONLY my builder uses this.

Takeshi KOMIYA-san replied:

> Yes and No. It's not difficult to move the substitution transform to
> the post-transform technically.
> But it will cause many troubles to the existing components because
> they expect all substitutions
> are processed on the parsing step. In reality, it's difficult.
>
I meant something else: do not move an transform, but keep it as-is,
and add a transform later.

So my thought was:

* Normal reading and parsing
* Normal transform, including resolving substitutions
* Cache doctree on disk

For rst builder:
* Read doctree from disk
* Add post-transform, to re-add resolved substitutions
* Translate from doctree to rst
* Write output to disk

It sounds silly to first let the SphinxReader resolve substitutions,
and later re-add these substitutions (if needed, in case of nested markup).
But at least it would not break the caching process, or interfere with
other builders.

I have one question. When writing my previous message, I assumed this
order of processing:

1. Reading and parsing
2. Apply transforms
3. Cache doctree on disk
4. Apply post-transforms

However, I now get the impression that the order is:

1. Reading and parsing
2. Apply transforms
3. Apply post-transforms
4. Cache doctree on disk

Could you confirm which order is used?

(If post-transforms are applied before caching, I can still use transforms,
I just need to avoid the add_post_transform(), but instead add my own
post_post_transform methods ;) )

Regards,
Freek

Freek Dijkstra

unread,
Mar 9, 2021, 4:56:52 AM3/9/21
to sphin...@googlegroups.com
Hi,

I did a short test, and fear that the doctree caching is already broken. :(

Given the following index.rst:

    This is a line with a quote, isn't it?

The running `sphinx-build -b xml input output` the resulting index.xml 
is as expected:

    <paragraph>This is a line with a quote, isn’t it?</paragraph>

However, when running this as `sphinx-build -b text input output ; 
sphinx-build -b xml input output`, the resulting index.xml is different:
    <paragraph>This is a line with a quote, isn't it?</paragraph>

In the first case, the straight quote (') is converted to a smart quote (’), 
as expected for XML files. However, the SmartQuotes tranform is disabled for
text files (and man pages). So, the resulting doctree is different.
Since the doctree is cached between runs, the XML builders is feeded the
wrong doctree.

This is of course a minor issue, but it does prove that the caching
mechanism for doctree is already broken.

Regards,
Freek

Freek Dijkstra

unread,
Mar 9, 2021, 5:39:57 AM3/9/21
to sphin...@googlegroups.com
FYI,

I reported both my initial request and the reported problem of sharing
doctree caches between builders.

See:

https://github.com/sphinx-doc/sphinx/issues/8975 - Allow builders to
specify the Transforms

https://github.com/sphinx-doc/sphinx/issues/8974 - Doctree caching is
broken when mixing two builders

Regards,
Freek


Reply all
Reply to author
Forward
0 new messages