I have made few important fixes and improvements (including “redirect with session cookies” which allows to avoid session IDs in URL, and truly Keep-Alieve); but it seems no one is interested L
https://github.com/FuadEfendi/crawler-commons
Everything (5 issues) tracked initially at http://code.google.com/p/crawler-commons/issues/list but unfortunately this mailing list doesn’t receive any notification…
No any idea about Maven… you need to download project and “mvn clean install” will put it into your private repository for future use…
--
Fuad Efendi
--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
Visit this group at http://groups.google.com/group/crawler-commons?hl=en-US.
Hi All,
Is v0.2 actually released?
The trunk CHANGES.txt file [0] indicates that it has been released however I don't see an announcement anywhere and it is certainly not uploaded to Maven Central…
[0] http://crawler-commons.googlecode.com/svn/trunk/CHANGES.txt--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
Visit this group at http://groups.google.com/group/crawler-commons?hl=en-US.
Thanks Ken,
Just a late thought: “follow redirects” is not always good because thousands pages can redirect to logon screen; but without auto-redirect we will have session IDs in URL; Nutch approach is “classic” but sometimes we may need auto-redirect… I want to put it as WIKI somewhere… “follow, but keep track of where you haven’t been yet” J
-Fuad
--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
Visit this group at http://groups.google.com/group/crawler-commons?hl=en-US.
--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
Visit this group at http://groups.google.com/group/crawler-commons?hl=en-US.
I am just learning proper "usage patterns" and I found interesting that huge Liferay community uses SVN together with GitHub to manage patched versions… and I also know that many Lucene/Solr committers use GitHub to manage their contributions, but I am new to this.
I'll submit patch shortly…
Hi Julien,Thanks for the link, very interesting… crawler-commons is not there yet:
>>…crawler-commons isn't an Apache Software Foundation project, so it shouldn't be in the list of repos under https://github.com/apacheGuys it's not kindergartden here (sorry) of course I know that
but would be nice to have an automated mirror at GitHub; next time I'll create "mirror" myself and I'll fork and I'll branch it per-issue as per http://wiki.apache.org/hadoop/GitAndHadoop
Thanks Ken,Just a late thought: “follow redirects” is not always good because thousands pages can redirect to logon screen; but without auto-redirect we will have session IDs in URL; Nutch approach is “classic” but sometimes we may need auto-redirect… I want to put it as WIKI somewhere… “follow, but keep track of where you haven’t been yet” J