Groups
Sign in
Groups
DigitalPebble
Conversations
About
Send feedback
Help
DigitalPebble
Contact the owners and managers
1–30 of 170
This user group is about the various open source projects developed by
DigitalPebble Ltd
: StormCrawler, Behemoth, text-classification API etc...
See
https://github.com/
DigitalPebble
Mark all as read
Report group
0 selected
Julien Nioche
28 Mar
StormCrawler move to the ASF
Hi, Just in case you have missed it, StormCrawler has moved into incubation at the Apache Software
unread,
StormCrawler move to the ASF
Hi, Just in case you have missed it, StormCrawler has moved into incubation at the Apache Software
28 Mar
Maulin
26/07/2021
1_Thank you for choosing us.......!
Dear digita...@googlegroups.com, We heartedly welcom you as a premium member of Norton . Your
unread,
1_Thank you for choosing us.......!
Dear digita...@googlegroups.com, We heartedly welcom you as a premium member of Norton . Your
26/07/2021
Emily
22/06/2021
Your plan expired soon
Dear digita...@googlegroups.com, Greetings from MSR Insure System. Amount received, thanks for
unread,
Your plan expired soon
Dear digita...@googlegroups.com, Greetings from MSR Insure System. Amount received, thanks for
22/06/2021
Steven Zhu
,
Julien Nioche
3
27/05/2021
REDIRECTION question
Hi Steven, Comments inlined below I just finished https security access elasticsearch from SC.
unread,
REDIRECTION question
Hi Steven, Comments inlined below I just finished https security access elasticsearch from SC.
27/05/2021
Steven Zhu
,
Julien Nioche
3
20/05/2021
Best practice to overwrite ElasticSearchConnection class
Hi Julien, Thank you for the quick response. I have a bug to fix. Once I fix, I will create an issue
unread,
Best practice to overwrite ElasticSearchConnection class
Hi Julien, Thank you for the quick response. I have a bug to fix. Once I fix, I will create an issue
20/05/2021
Lary
28/04/2021
#In_Voice #Number: LODA-420/564......
Dear digita...@googlegroups.com, Thank You For Choosing #N0RT0N# Life Lock Renewal Program. We
unread,
#In_Voice #Number: LODA-420/564......
Dear digita...@googlegroups.com, Thank You For Choosing #N0RT0N# Life Lock Renewal Program. We
28/04/2021
Steven Zhu
,
DigitalPebble
3
01/04/2021
Build failed at storm-crawler-elasticsearch due to test error
Yes, this is from master branch. But I didn't install docker Docker. Thanks Steven On Thu, Apr 1,
unread,
Build failed at storm-crawler-elasticsearch due to test error
Yes, this is from master branch. But I didn't install docker Docker. Thanks Steven On Thu, Apr 1,
01/04/2021
Aaron Gray
,
DigitalPebble
6
12/12/2020
Catching 404's in a separate Elasticsearch index
On Sat, 12 Dec 2020 at 12:25, DigitalPebble <jul...@digitalpebble.com> wrote: Hi Aaron,
unread,
Catching 404's in a separate Elasticsearch index
On Sat, 12 Dec 2020 at 12:25, DigitalPebble <jul...@digitalpebble.com> wrote: Hi Aaron,
12/12/2020
Aaron Gray
,
DigitalPebble
2
05/12/2020
StormCrawler architectural overview ?
Hi Aaron, Not as such, but the overall picture is pretty straightforward. Understanding Storm would
unread,
StormCrawler architectural overview ?
Hi Aaron, Not as such, but the overall picture is pretty straightforward. Understanding Storm would
05/12/2020
Erik Graf
,
DigitalPebble
2
27/11/2020
Optimal Hardware
Hi Erik, That sounds very interesting. Are you planning to share the resulting dataset in one way or
unread,
Optimal Hardware
Hi Erik, That sounds very interesting. Are you planning to share the resulting dataset in one way or
27/11/2020
Yuxin Zhu
, …
Sebastian Nagel
6
01/10/2020
News-Crawler: Is there an option to use HTTP/1.0 or 1.1 for the Warc File
Update and correction: - recent Java 8 JDK packages may support HTTP/2 if they include the ALPN
unread,
News-Crawler: Is there an option to use HTTP/1.0 or 1.1 for the Warc File
Update and correction: - recent Java 8 JDK packages may support HTTP/2 if they include the ALPN
01/10/2020
ravis...@gmail.com
,
DigitalPebble
6
18/03/2020
Crawling Sitemap with Custom Metadata
You're welcome. Glad you got it to work On Wed, 18 Mar 2020 at 13:51, <ravis...@gmail.com
unread,
Crawling Sitemap with Custom Metadata
You're welcome. Glad you got it to work On Wed, 18 Mar 2020 at 13:51, <ravis...@gmail.com
18/03/2020
ravis...@gmail.com
,
DigitalPebble
4
03/03/2020
how to pass cookie from curl response to oulinks
Thanks a lot Julien for sending the details. http basic authentication does work. I am testing on set
unread,
how to pass cookie from curl response to oulinks
Thanks a lot Julien for sending the details. http basic authentication does work. I am testing on set
03/03/2020
gcr
,
DigitalPebble
2
19/02/2020
First time StormCrawler no output
Hi, Looks like you submitted your topology to a Storm cluster in deployed mode, which is great, but
unread,
First time StormCrawler no output
Hi, Looks like you submitted your topology to a Storm cluster in deployed mode, which is great, but
19/02/2020
dans...@gmail.com
,
DigitalPebble
2
20/01/2020
incremental crawling and/or recrawl
Hi Please use StackOverflow for questions like these, you'll get a wider audience. Does
unread,
incremental crawling and/or recrawl
Hi Please use StackOverflow for questions like these, you'll get a wider audience. Does
20/01/2020
dgdesi...@gmail.com
,
DigitalPebble
2
20/11/2019
Async worker died! ... clojure.lang.PersistentVector cannot be cast to class java.lang.String
Hi Replying on https://stackoverflow.com/questions/58960271/async-worker-died-clojure-lang-
unread,
Async worker died! ... clojure.lang.PersistentVector cannot be cast to class java.lang.String
Hi Replying on https://stackoverflow.com/questions/58960271/async-worker-died-clojure-lang-
20/11/2019
madhavave...@gmail.com
04/11/2019
Stormcrawler error when updating details to elastic search
Good morning, I am trying to do poc on stormcrawler. I have ES6.5.0, downloaded storm 1.2.3 and
unread,
Stormcrawler error when updating details to elastic search
Good morning, I am trying to do poc on stormcrawler. I have ES6.5.0, downloaded storm 1.2.3 and
04/11/2019
jbri...@gmail.com
,
DigitalPebble
3
06/09/2019
StormCrawler StdOutIndexer shows content with a count but no content - text- is visible
Hi Julien, Thanks for the response. Next time I will post questions like this on StackOverflow. I
unread,
StormCrawler StdOutIndexer shows content with a count but no content - text- is visible
Hi Julien, Thanks for the response. Next time I will post questions like this on StackOverflow. I
06/09/2019
yashchaud...@gmail.com
,
DigitalPebble
2
11/03/2019
Content no getting indexed in elasticsearch
Hi Please use StackOverflow with the tag stormcrawler to ask questions like this. The kibana
unread,
Content no getting indexed in elasticsearch
Hi Please use StackOverflow with the tag stormcrawler to ask questions like this. The kibana
11/03/2019
VP
,
DigitalPebble
3
28/01/2019
Crawler not crawling a few pages, crawls everything else
Sure, I will post this on SO. Can you please let me know what details would you need from the setup?
unread,
Crawler not crawling a few pages, crawls everything else
Sure, I will post this on SO. Can you please let me know what details would you need from the setup?
28/01/2019
dennis...@googlemail.com
,
DigitalPebble
4
17/10/2018
Question about Design Consideration in AbstractSpout (regarding es.status.ttl.purgatory)
Hi Dennis You are welcome. Feel free to give any feedback and / or tell us what you are using SC for.
unread,
Question about Design Consideration in AbstractSpout (regarding es.status.ttl.purgatory)
Hi Dennis You are welcome. Feel free to give any feedback and / or tell us what you are using SC for.
17/10/2018
anve...@gmail.com
,
DigitalPebble
2
12/10/2018
Pdf, MS Documents are not Crawling
https://stackoverflow.com/questions/tagged/stormcrawler On Fri, 12 Oct 2018 at 16:30, <anveshdd@
unread,
Pdf, MS Documents are not Crawling
https://stackoverflow.com/questions/tagged/stormcrawler On Fri, 12 Oct 2018 at 16:30, <anveshdd@
12/10/2018
Julien Nioche
06/10/2018
Announcement
Free 1-day workshop on web crawling with StormCrawler and Elasticsearch
Hi, In case you haven't seen the announcements on other channels, I'll be running a free 1-
unread,
Announcement
Free 1-day workshop on web crawling with StormCrawler and Elasticsearch
Hi, In case you haven't seen the announcements on other channels, I'll be running a free 1-
06/10/2018
woloszy...@gmail.com
,
Julien Nioche
3
11/09/2018
Injecting new URL to crawl without restarting the topology
hi Rafał would you mind asking the question on StackOverflow with the tag storm-crawler? Could you
unread,
Injecting new URL to crawl without restarting the topology
hi Rafał would you mind asking the question on StackOverflow with the tag storm-crawler? Could you
11/09/2018
pankaj....@gmail.com
20/08/2018
Storm crawler (news crawler not working..)
I am trying one example based on storm crawler in github https://github.com/commoncrawl/news-crawl. I
unread,
Storm crawler (news crawler not working..)
I am trying one example based on storm crawler in github https://github.com/commoncrawl/news-crawl. I
20/08/2018
fear...@gmail.com
,
DigitalPebble
5
09/06/2018
[StormCrawler] [ElasticSearch] Configuration Documentation / Technical Questions
Hi Richard That's great to hear, thanks for the feedback and looking forward to having you as an
unread,
[StormCrawler] [ElasticSearch] Configuration Documentation / Technical Questions
Hi Richard That's great to hear, thanks for the feedback and looking forward to having you as an
09/06/2018
aiguz...@gmail.com
,
DigitalPebble
2
01/06/2018
[Stormcrawler] URL content to HdfsBolt
Hi Artur, Please use stack overflow so that more people get the answer. Thanks Julien On Fri, 1 Jun
unread,
[Stormcrawler] URL content to HdfsBolt
Hi Artur, Please use stack overflow so that more people get the answer. Thanks Julien On Fri, 1 Jun
01/06/2018
ch...@allthemoocs.com
,
DigitalPebble
2
17/03/2018
elasticsearch and stromcrawler with injector flux file
Hi Chris See https://github.com/DigitalPebble/storm-crawler/issues/526, the ERROR is not a real
unread,
elasticsearch and stromcrawler with injector flux file
Hi Chris See https://github.com/DigitalPebble/storm-crawler/issues/526, the ERROR is not a real
17/03/2018
Suman Mallela
, …
DigitalPebble
26
15/03/2018
Solr and Stormcrawler - [WARN] Found data point value of class class java.util.HashMap
Suman Instead of posting every 10 minutes, why don't you try to work things out by yourself a bit
unread,
Solr and Stormcrawler - [WARN] Found data point value of class class java.util.HashMap
Suman Instead of posting every 10 minutes, why don't you try to work things out by yourself a bit
15/03/2018
Suman Mallela
,
DigitalPebble
2
12/02/2018
FetcherThread Null Errors - Elastic Search and Storm crawler Integration
Suman, I am happy to help people use StormCrawler but am finding quite tedious when my comments and
unread,
FetcherThread Null Errors - Elastic Search and Storm crawler Integration
Suman, I am happy to help people use StormCrawler but am finding quite tedious when my comments and
12/02/2018