Spider website files to create JSON file for Lunrjs

12 views

Skip to first unread message

Greg Raven

unread,

Jan 18, 2021, 9:24:22 AM1/18/21

to BBEdit Talk

This may not be the proper forum for this, but I'm trying to incorporate Lunrjs search into my static sites.

Years ago, Christopher Stone provided me with a wonderful little script that scans project files and generates an HTML list of the titles, with href links.

This is a bit more ambitious because instead of looking for single-line content, I want to grab everything that is inside the <article> tags -- along with the title tag content and the URL -- and create a JSON file that I can then process through Lunrjs for site search.

Here's what Christopher provided me for the previous utility.

```

# generate a TOC of the site files

# thank you, Christopher Stone

cd /Users/greg/Sites/site/ || exit

grep --include="*.html" -Eir "<title>.+</title>" . \

| sed -E '

s!^\.!<li><a href="!

s!:[[:space:]]*<title>[[:space:]]?!">!

s!</title>!</a></li>!

' \

| sort \

| bbedit

```

The final file would have the format:

```

{

"docs": [

{

"location": "#welcome",

"text": "Full documentation.",

"title": "Welcome to my docs"

{

"location": "#about-this-site",

"text": "This is to augment my other site.",

"title": "About this site"

]

}

```

Any thoughts or suggestions are appreciated.

Reply all

Reply to author

Forward

0 new messages