Random thought - would this be better implemented as a Tika parser?
For XML-based formats, it would be easy enough to detect that it's a
sitemap - and that appears to be the dominant use case, for index and
regular sitemap files.
With this approach, there's the issue of how to communicate per-URL
meta-data such as lastmod, changefreq and priority in an XHTML 1.0-
compatible format. xmlns?
Side note - for the plain text version, I'm thinking it would be a
useful extension to modify the TXTParser to auto-detect and extact
URLs...then that would work fine.
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g