Searching over multiple Jupyter notebooks

4,779 views
Skip to first unread message

Tony Hirst

unread,
May 8, 2018, 9:07:02 AM5/8/18
to Project Jupyter
Hi

I'm working in an edu context, with notebooks being used to deliver interactive teaching materials, and one of the things we know students do is search overeference/resource materials.

I was wondering if anyone has looked at simple search solutions for searching over jupyter notebooks, eg by dropping them into a lunr.js index using lunr.py, or adding them to sqlite (in which case, what sort of schema did you use?).

In first instance, I was thinking of just indexing the markdown cells in each notebook, with a reference back to the original filepath. (I think effective code search may be a whole other issue.) There are also issues around whether to have views back into a complete notebook, or link to nbconverted html notebooks vs live running notebooks.

V first steps in my thinking, just wondered if it's already work in progress somewhere?


--tony

Pavel Vasev

unread,
May 8, 2018, 11:39:09 AM5/8/18
to jup...@googlegroups.com
I like the idea to generate html files using notebooks or their markdown cells and use known tools for search within html..

вт, 8 мая 2018, 18:07 Tony Hirst <tony....@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/6a9a8879-4c51-4e7f-aece-039afa9c6f82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Grant Nestor

unread,
May 9, 2018, 11:42:32 AM5/9/18
to Project Jupyter
A simple solution is to open a terminal in JupyterLab/Jupyter Notebook and run the following:

grep --include='*.ipynb' --exclude-dir='.ipynb_checkpoints' -rliw . -e 'search query'

This will search your Jupyter server root recursively for files that contain the whole word (case-insensitive) "search query" and only return the file names of matches.

Tony Hirst

unread,
May 10, 2018, 1:40:46 PM5/10/18
to Project Jupyter
Thanks for that. I also started dabbling with a simple lunr.js solution - initial notes here: https://blog.ouseful.info/2018/05/10/initial-sketch-searching-jupyter-notebooks-using-lunr/

Comments welcome... I need to walk the dog and ponder the actual usefulness - or otherwise - of this now. Minimal working demo throws up all sorts of issues. COunterpoint of the grep solution is also really useful. A third point of comparison would be a sqlite/datasette or sqlite/scriptedForm search tool.

--tony

Grant Nestor

unread,
May 11, 2018, 11:04:01 AM5/11/18
to Project Jupyter
You can also create a notebook to do this! 

```py
import glob

pattern = './**/*.ipynb'
query = 'vdom'

for filepath in glob.iglob(pattern, recursive=True):
    with open(filepath) as file:
        s = file.read()
        if (s.find(query) > -1):
            print(filepath)
```

It gets the job done and it's flexible and you're already using it!

Tony Hirst

unread,
May 13, 2018, 3:40:12 PM5/13/18
to Project Jupyter
Ooh, that's neat... but doesn't give a search engine like output? What I really need is a results listing that shows some context for the search hit; I struck on the cell as a convenient proxy for that.

--tony
Reply all
Reply to author
Forward
0 new messages