I took a quick look at it. Here's info from the README:
Java Sitemap Parser
Licensed under the Apache License, Version 2.0.
This project was created by Dr. Frank McCown's Search Engine Development students
in the Spring 2009 semester (Harding University, Searcy, AR, USA). We hope this
Sitemap Parser will enable other open source web crawlers to enhance their
crawling abilities.
The code has been tested on a number of websites that use Sitemaps, including
and can be easily substituted for other HTTP libraries.
The actual sitemap processing is more complex than I was expecting. I didn't know about all of the different options (plain, XML, Atom, RSS, index).
The code is reasonable, though could use some cleanup.
Since it's under the Apache license, seems like we could pull it in as a starting point. Where "we" means somebody like you, hopefully :)
-- Ken