Documentation about the built in search?

Martin Bless

unread,

Dec 11, 2021, 3:30:47 PM12/11/21

to sphinx...@googlegroups.com

Hello friends,

is anybody aware of an article, blog post or other documentation that
helps understand how the built in search works?

Regards, Martin

--

See our Sphinx made docs at https://docs.typo3.org/

Charles Bouchard-Légaré

unread,

Apr 8, 2022, 12:29:39 AM4/8/22

to sphinx-users

I have also been looking into it lately and the only things I found was... reading the source code.

I was looking into trying to improve the search widget, trying to get closer to what is available with MkDocs (modal results as you types, etc, see screenshot below).

Here is what I found about Sphinx' search:

The Javascript code for doing the actual search against the index is at sphinx/themes/basic/static/searchtools.js
The generation of the index is done at build time and the code is sphinx/search/__init__.py
Sphinx does not only «full text search»
- First, look into «object» (like Python functions and such)
  - The displayed result is built on the domain's display name and object type localized name
  - No excerpt are provided here (which is kind of sad)
- Then, try full text search
  - An excerpt built from text around the result is displayed
  - I am no expert in full-text search, but it looks both simple and pretty standard, we more priority on terms from titles. There is a good stemmer for several languages.
Sphinx' search clearly this is not an API
- Sphinx' search is not much configurable and does not seems to be part of a public API for users or extension developers to build on.
When writing a new Domain, objects a provided by the get_objects method (which must be provided by your implementation)
- It returns an iterable of «objects», a 6-tuple
- The last item priority determine how important an object is regarding search
- The URL built by the search result depend on the first, second and 5th item
  - fullname «Fully qualified name.»
  - dispname «Name to display when searching/linking.»
  - anchor «The anchor name for the object.»
- In my custom Domain, search-generated URLs don't target the actual documented object. I still need to investigate how the Directive implementation, these three «object tuple» attribute and the search work together. It seems to have still a few Python-specifics in there.
As part of WebSupport, Sphinx provides a few utilities to enable server-side search. Personally, this is not interesting to me at the moment.

For comparison, here is what I found about MkDocs

Unless specific plugins, only full text search is done.
It uses lunr.js
- The documents MkDocs registers to lunr.js are, from my understanding
  - All pages
  - All sub sections, recursively
  - Which means some text is added multiple times. I suspect subsection are prioritized in the results
  - Each item provides two "fields": title and text, somewhat like Sphinx.
- By default, they used to use lunr.py for pregenerating the index. This pregeneration is configurable.
  - This is deprecated now because lunr.py has binary transitive dependencies for non-english languages and this makes MkDocs harder to use for Alpine Docker image users.
  - They offer now to subprocess lunr.js with Nodejs
  - The index can also be generated by Web workers

Other info I found:

ReadTheDocs has quite interesting search features
Someone did made a lunr.js extension for Sphinx, but only indexing "objects" in a separate custom search widget. Not actively maintained.
I've looked into trying lunr in Sphinx for fulltext. Building an Index would be quite simple with a EnvironmentCollector, but leveraging incremental builds would not yield all the optimization one could want because lunr dropped editable indices. Here is a not tested stub that would still need to be integrated with Sphinx's APIs. to give an idea
- class Search:
  def __init__(self, env: BuildEnvironment):
  self._env = env
  self._builder = get_default_builder()
  self._builder.ref("id")
  self._builder.field("title")
  self._builder.field("text")
  
  def index_document(self, node: document):
  self._builder.add(self.extract_search_document(node, section=False))
  found = node.findall(section)
  for element in found:
  self._builder.add(
  {
  self.extract_search_document(element)
  }
  )
  
  def extract_search_document(self, node: Node, section=True):
  title_node = next(node.findall(title))
  if section:
  anchor = title_node["ids"][0]
  uri = self._env.app.builder.get_target_uri(self._env.docname) + "#" + anchor
  else:
  uri = self._env.app.builder.get_target_uri(self._env.docname)
  
  return {
  "id": uri,
  "title": title_node.astext(),
  "text": node.astext()
  }
All in all, I am not sure it is worth it to invest much on an 3rd party search engine such as lunr. I cannot yet prove that would provide much an improvement over Sphinx' search. Adding such dependencies to Sphinx would probably not be acceptable anyway Even as a separate extension, I don't clearly see an improvement here
I see a lot of improvement that can be done by themes. I am not sure whether Sphinx' client-side search javascript code could be used for queries «as you type» efficiently, but having an overlay or modal result display would be great in my opinion. Sadly, my Python skills are quite good, I can play with JS a bit, but web development is something I never invested time or focus on. Thus working on this would require tens of hours of unpleasantness, which is quite daunting I must admit.

All-in-all, I would really like to help improve the search experience with Sphinx, especially on static websites outside of ReadTheDocs. I feel that the best early improvements to be done have to be in themes (improve the UI) and this is something I don't feel I can help much with. I would gladly team up with anybody with Webdev skills to do something about it!

MkDocs Search Screenshot

bradley...@gmail.com

unread,

Mar 16, 2024, 9:20:36 AMMar 16

to sphinx-users

I have a sphinx wrapper and wanted to configure the sphinx search to use all the words in the index but I could not figure out how to get sphinx to do this. ( For the sphinx wrapper, the words in the index are the words in the headings, titles, and page names.)

I used raw html to get the search I wanted. For example, here is the search for the sphinx wrapper
https://xrst.readthedocs.io/latest/xrst_search.html

It would be nice if there was a way to get the sphinx search to function like the example above ?

Reply all

Reply to author

Forward