Extracting Specific Fields (Title, Meta Description if set, H1, H2 etc)

42 views
Skip to first unread message

GuyMark

unread,
Mar 13, 2024, 12:32:24 PMMar 13
to Common Crawl
Hello

I am trying to find a way to populate a small experimental search engine with some specific fields.

Before I spend hours trying to work out how to do it, I wondered if there already exists any "simple" utilities where you can just "click" what you want and extract the data.

For example, I might want to export Title, Description, H1 Text, H2 Text and first 1000 byes of the main text" to a file called "rawData"

I realise no such utility may exist, but I am guessing there are other folks who have had a similar need, so if anyone can point me in the right direction that would be awesome.

My programming skills are VERY limited, but I will learn what I must if that's the only way. Just hoping there might be a simple little "data extractor" out there which "Jo Public" could work out how to use, rather than just folks who are well versed in programming languages.

All comments welcome - especially helpful ones :)
Thank you.
Guy




Reply all
Reply to author
Forward
0 new messages