Spider website files to create JSON file for Lunrjs

12 views
Skip to first unread message

Greg Raven

unread,
Jan 18, 2021, 9:24:22 AM1/18/21
to BBEdit Talk
This may not be the proper forum for this, but I'm trying to incorporate Lunrjs search into my static sites. 

Years ago, Christopher Stone provided me with a wonderful little script that scans project files and generates an HTML list of the titles, with href links.

This is a bit more ambitious because instead of looking for single-line content, I want to grab everything that is inside the <article> tags -- along with the title tag content and the URL -- and create a JSON file that I can then process through Lunrjs for site search.

Here's what Christopher provided me for the previous utility.

```
# generate a TOC of the site files
# thank you, Christopher Stone
cd /Users/greg/Sites/site/ || exit
grep --include="*.html" -Eir "<title>.+</title>" . \
| sed -E '
s!^\.!<li><a href="!
s!:[[:space:]]*<title>[[:space:]]?!">!
s!</title>!</a></li>!
' \
| sort \
| bbedit
```

The final file would have the format:

```
{
    "docs": [
        {
            "location": "#welcome",
            "text": "Full documentation.",
            "title": "Welcome to my docs"
        },
        {
            "location": "#about-this-site",
            "text": "This is to augment my other site.",
            "title": "About this site"
        },
    ]
}
```

Any thoughts or suggestions are appreciated.
Reply all
Reply to author
Forward
0 new messages