I am developing an open source software in which I need to
automatically find rss feeds in pages, and parse the feeds content to
get the title, description, author, etc.
Ultimately, it would be great to parse HTML in order to get only the
text (without html tags).
Because Firefox already does this job, i would like to use the
firefox/mozilla code. But I can't understand how this works, it's
really big and I don't know anything about "firefox inside" :-).
I found those source codes which look interesting for me:
This page looks like a documentation (Feed content access API)
But I can't really understand what to do...
What I need precisely is :
- a tool to discover URLs feeds from a HTML code source page (discover
- a tool to parse the RSS/ ATOM feed (if possible, all versions) and
get some arrays with information like author, title, descriptions,
publication date, etc.
- a tool to parse HTML in the <description> tag : i would like to
extract only the text (i don't need <img src, a href, etc.)
Can firefox help me? If yes, how should I proceed? If no, do you have
ideas for me?
I am waiting for your help, thank you very much :-)
dev of gpl web statistics software http://www.phpmyvisites.us/
PS: Am i in the right place for this kind of questions?