Doc pages search engine difficulties

24 views
Skip to first unread message

Dexter Lagan

unread,
Aug 17, 2021, 10:34:43 AM8/17/21
to Racket Users
Hello there,

  I'm trying to teach one of my coworkers how to search for function names in the Racket manual, and I'm having serious problems finding very simple things, such as the function to display the file browser dialog (get-file). I know most function names by heart but my coworker isn't that fortunate. I see I'm not the only one, as per these Stack Overflow answers.

  Does anybody know why the search engine doesn't seem to index page contents other than exact function names and titles?

  I understand search engines aren't trivial matters, but I'm thinking that finding very basic Racket functions would be crucial for beginners. If somebody could point me to the search engine repo, I'd love to have a look.

Thanks in advance!

Dexter

Sorawee Porncharoenwase

unread,
Aug 17, 2021, 11:27:34 AM8/17/21
to Dexter Lagan, Racket Users

FWIW, you can use Google to do that. The search query

site:docs.racket-lang.org file browser dialog

shows https://docs.racket-lang.org/mrlib/Path_Dialog.html as the first search result. The page also has a link to get-file and put-file.



--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/377da13a-dafe-4558-b21c-f71cd8238a93n%40googlegroups.com.

Sorawee Porncharoenwase

unread,
Aug 17, 2021, 12:44:59 PM8/17/21
to Dexter Lagan, Racket list

But content search is also unsatisfactory in many situations. As an example, take a look at Python’s documentation, which has the content search. I searched for “list” (https://docs.python.org/3/search.html?q=list&check_keywords=yes&area=default), hoping to find the documentation for the list function, and the result that I wanted is the 20th one. Earlier search results include the readline library, because it contains the content “the history list”, and a code example “reprlib.repr(set(‘supercalifragilisticexpialidocious’))”. This problem virtually doesn’t exist with the indexed term search in Racket.

I'm not saying that there’s nothing to improve. I think making it possible to do a content search, perhaps via a query like content:"file browser dialog", might be a good idea. But I definitely think we should not take the current search functionality away.





On Tue, Aug 17, 2021 at 8:52 AM Dexter Lagan <dexte...@gmail.com> wrote:
  True, I’m forced to do this when the search won’t do. It’s too bad the default search doesn’t work as well. Would it be shameful to use Google as the main search engine, as many others do? I’m sure most wouldn’t appreciate having to depend on Google for this. I’ll look at the code just so I understand.

Dex

On Aug 17, 2021, at 5:27 PM, Sorawee Porncharoenwase <sorawe...@gmail.com> wrote:



Jens Axel Søgaard

unread,
Aug 17, 2021, 3:40:17 PM8/17/21
to Dexter Lagan, Racket Users

Full text search is possible to add - but it would require some changes to the current setup.

One goal of the documentation system is that you can use the documentation on your own computer without any internet access.
When you enter a search term, a piece of JavaScript runs directly on your computer without any queries sent to a server.
In order for this to work there needs to be a prebuilt index in which the search term an be looked up. 
This index is built by `raco setup`. For each function/macro definition in the documentation an entry is added to the index
with associated information (which module is defined in etc.). 

If you use the search page at docs.racket-lang.org the index will be downloaded to your computer first.
Currently the index (the file name is plt-index.js )  has a size around 12 MB. 

If full text search is to be added the index would be considerably larger - which means the index can no longer
be downloaded. The actual search then needs to take place on the web-server. 

Could a standard full text indexer be used? Maybe - but most likely it will be necessary to build a custom one.
Standard indexers stem words ("apple" and "apples" are indexed as the same word). They also filter out
punctuation such as  - . ? / etc. This in turn makes it difficult to search for identifiers.

The compromise today is that we have precise search among the identifiers and keywords marked as
imported by the documentation writers. If full text search is needed, then Google is the place to look.

Back in the day before the current documentation search - it was possible to make a custom Google
backed search page that only indexed a certain set of sites. It worked "okay", but not great.
The current system works much better.

That said, I understand that the amount of documentation has grown to such a size, that it can 
be difficult for newcomers to navigate. Maybe "cheat sheets" such as 

     https://docs.racket-lang.org/racket-cheat/index.html

could be added to some sections in order to give an overview of what's available?

/Jens Axel

Dexter Lagan

unread,
Aug 17, 2021, 4:16:17 PM8/17/21
to Jens Axel Søgaard, Racket Users
  Thank you very much for the thorough explanation. It makes sense. I’ll be sharing the cheat sheet with my coworkers. It sure is a good starting point.

Dex

On Aug 17, 2021, at 9:40 PM, Jens Axel Søgaard <jens...@soegaard.net> wrote:


Reply all
Reply to author
Forward
0 new messages