Crawl pages of sitemap.xml using Abot

297 views
Skip to first unread message

24ni...@gmail.com

unread,
Aug 13, 2015, 9:17:45 AM8/13/15
to Abot Web Crawler
Hello,

I want to crawl pages which is defined in <loc> tag of sitemap.xml using Abot crawler.

would it be possible?

You can take reference of below URL.

http://www.milestoneinternet.com/sitemap.xml

Your help would be appreciable.

Thank you in advance.

sjdi...@gmail.com

unread,
Aug 13, 2015, 11:25:24 AM8/13/15
to Nikunj Soni, Abot Web Crawler
Hi,

You would need a custom implementation of the IHyperlinkParser and pass it into the PoliteWebCrawler constructor. Abot gives that implementation each pages source to get all the links from. The first few lines should check if its a xml file, if so then parse the loc values, if not return an empty collection of Uris.

Hope that helps
Steven 


--
You received this message because you are subscribed to the Google Groups "Abot Web Crawler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abot-web-crawl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages