How are we supposed to generate our own documentation that uses BeautifulSoup types?

86 views
Skip to first unread message

nbro

unread,
Mar 15, 2023, 6:23:05 AM3/15/23
to beautifulsoup
My question is in the title. In particular, I'm using these dependencies

sphinx                        6.1.3      Python documentation generator
sphinx-autodoc-typehints      1.22       Type hints (PEP 484) support for the Sphinx autodoc extension
sphinx-rtd-theme              1.2.0      Read the Docs theme for Sphinx
requests                      2.28.2     Python HTTP for Humans.
beautifulsoup4                4.11.2     Screen-scraping library

I'm using the sphinx extensions

"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.autosectionlabel",
"sphinx_autodoc_typehints",
"sphinx.ext.intersphinx",


And I have these configurations in the conf.py

# autosectionlabel
autosectionlabel_prefix_document = True
autosectionlabel_maxdepth = 2

# autodoc
autoclass_content = "both"
autodoc_member_order = "bysource"
autodoc_default_options = {"members": True, "inherited-members": True}

# napoleon
napoleon_google_docstring = True
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = False

# sphinx_autodoc_typehints
set_type_checking_flag = True

intersphinx_mapping = {
    "python": (
        f"https://docs.python.org/{sys.version_info.major}.{sys.version_info.minor}",
        None,
    ),
    # THIS IS WHAT I TRIED, BUT IT WORK
    "beautifulsoup4": ("https://beautiful-soup-4.readthedocs.io/en/latest/", None),
}

But when I compile my sphinx docs with the option -n, I get errors like

WARNING: py:class reference target not found: bs4.BeautifulSoup

I suppose that it's possible to include bs4 docs in our own docs, given that bs4 docs are written to be compiled with sphinx.

I also tried with this intersphinx mappings


None of them work, although the file  https://www.crummy.com/software/BeautifulSoup/bs4/doc/objects.inv exists and can be downloaded. So, maybe it's the name that is wrong?

According to this page https://www.crummy.com/software/BeautifulSoup/, the latest release of bs4 was in January of 2023, so I suppose this tool is still in development and it supports what I need. If not, I will need to switch to another tool, I am afraid.

Any help? How do I include bs4 types in my docs?

nbro

unread,
Mar 15, 2023, 7:03:56 AM3/15/23
to beautifulsoup
Apparently, this issue exists since 2015 at least: https://bugs.launchpad.net/beautifulsoup/+bug/1453370. I don't want to believe this has not yet been fixed and that I need to switch to a tool that supports the inclusion of the documentation.

leonardr

unread,
Mar 15, 2023, 7:42:40 AM3/15/23
to beautifulsoup
This isn't an area of development I've focused on, so the issue indeed has not been fixed. If you can provide a piece of the documentation you're trying to write, I can use it as an example and see what it would take to get this working.

My general recommendation for people looking to stop using Beautiful Soup (for whatever reason) is to use lxml instead. However, I don't know what your project is, so lxml might not be appropriate. Instead, you might try Ministrone, which was specifically designed to simplify the Beautiful Soup interface and documentation.

Leonard

nbro

unread,
Mar 15, 2023, 7:55:26 AM3/15/23
to beautifulsoup
In my code, I use type hints everywhere. I'm also using the package types-beautifulsoup4 (https://pypi.org/project/types-beautifulsoup4/). So, the issue that I reported occurs even if I just use type hints. But, if you don't want to use type hints, you can simply do :class:`bs4.BeautifulSoup` in the docstring of a public method, and you will get the error. I really don't know if it's easy to solve this issue, but I believe this issue should be solved, as my impression is that people still use beautiful soup, that's why I also adopted it.

nbro

unread,
Mar 15, 2023, 11:57:34 AM3/15/23
to beautifulsoup
I forgot to say that I have some CI/CD pipeline that also builds the docs and, right now, I need to ignore the errors, but it would be great if my documentation had the links to the BeautifulSoup documentation. That's what missing from BeautifulSoup and what I kindly ask you to fix. It should be too difficult, I suppose. 

nbro

unread,
Mar 15, 2023, 11:58:04 AM3/15/23
to beautifulsoup
*It shouldn't be too difficult...

leonardr

unread,
Mar 16, 2023, 7:58:06 AM3/16/23
to beautifulsoup
I investigated the problem and summarized my findings in the 2015 issue.

The use of intersphinx to check references implies the existence of targets for any reference you might make--basically, API reference documentation--and that's not how the Beautiful Soup documentation is written. Sphinx syntax does not allow me to just drop in targets where appropriate. These targets become section headings rendered with special formatting, and adding them requires changes to the text and structure of the documentation.

I was able to rewrite the documentation such that it contains intersphinx targets for the classes most likely to be mentioned in other projects' documentation. The branch for that is here. It reads a little bit awkward, but it might be worth using in the short term. In the long term, the documentation needs to be rearchitected with the two parts that are now traditional for open source software projects: a narrative guide and a structured set of API documentation automatically generated from docstrings.

Leonard

nbro

unread,
Mar 20, 2023, 6:08:05 AM3/20/23
to beautifulsoup
Hi. Thanks for starting to address the issue. Will you release these changes?

leonardr

unread,
Mar 20, 2023, 7:12:54 AM3/20/23
to beautifulsoup
I don't think this hybrid document reads as well for new users, but I'm willing to try it out and see what happens. I'll put out a release today that includes the rewritten documentation.

Leonard

nbro

unread,
Mar 27, 2023, 10:28:47 AM3/27/23
to beautifulsoup
Thanks. It seems that I can now refer to the bs4.BeautifulSoup, but not bs4.Tag. It would be great that you can also add the targets for this class and maybe other classes that may be referenced. 

nbro

unread,
Mar 27, 2023, 10:29:36 AM3/27/23
to beautifulsoup
*but not bs4.element.Tag

leonardr

unread,
Apr 5, 2023, 11:26:47 AM4/5/23
to beautifulsoup
On Monday, March 27, 2023 at 10:29:36 AM UTC-4 nbro wrote:
*but not bs4.element.Tag

On Monday, March 27, 2023 at 4:28:47 PM UTC+2 nbro wrote:
Thanks. It seems that I can now refer to the bs4.BeautifulSoup, but not bs4.Tag. It would be great that you can also add the targets for this class and maybe other classes that may be referenced. 


Unfortunately, your typo points to the problem. You can refer to bs4.Tag, but the original definition of the class is in the bs4.element package, which is an implementation detail that beginners don't need to know. There's no endpoint for bs4.element.Tag, and I can't create one without exposing information I want to keep out of the human-readable documentation. Similarly for Comment, Doctype, and the other classes. If you need to refer to bs4.element.Tag, that will need to wait for the existence of traditional API documentation, which will expose the underlying module structure.

Now that I have updated Beautiful Soup's test and build frameworks, documentation improvements are next on my list, but it's going to take a little while.

Leonard
Reply all
Reply to author
Forward
0 new messages