--
For more details about this list
http://datameet.org/discussions/
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hi,
That problem is common to whatever crawler you use. Nutch will extract links from JavaScript, but that's it. I would use Rhino, and a custom HtmlParser plugin for Nutch. This is admittedly non-trivial, but I know of no open source tool that already does this.
Regards,
Gora