Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Site search

3 views
Skip to first unread message

Tuxedo

unread,
Dec 13, 2022, 12:30:47 PM12/13/22
to
I'm searching for an HTML keyword indexing solution which can be used to
search a set of regular (internationalised) HTML pages and return an
abstract for each.

It's not for external site indexing and not for a huge amount of pages
(100-200 or so) and not an extremely busy site, so each search could be
performed live as opposed to pre-indexed and return the results in a typical
search results manner of a linked <title>Title page</title> along with a
description retrieved either from the meta description (if existing) or from
the text surrounding the keyword(s) found in the HTML body.

What may the Perl module repository have in store? Any recommendations?

Many thanks,
Tuxedo

Jim Gibson

unread,
Dec 16, 2022, 1:03:19 AM12/16/22
to
I use LWP::UserAgent, HTTP::Request, and URI::URL modules to fetch web pages
from servers. I then use HTML::TokeParser to parse the fetched pages and
breakdown the page into tags, text, and comments. This works pretty well, but
it is low level stuff. You are going to have to program a lot of logic to
accomplish the indexing I would guess.

>
> Many thanks,
> Tuxedo


--
Jim Gibson
0 new messages