Groups
Sign in
Groups
DigitalPebble
Conversations
About
Send feedback
Help
DigitalPebble
Contact owners and managers
1–30 of 170
This user group is about the various open source projects developed by
DigitalPebble Ltd
: StormCrawler, Behemoth, text-classification API etc...
See
https://github.com/
DigitalPebble
Mark all as read
Report group
0 selected
Julien Nioche
Mar 28
StormCrawler move to the ASF
Hi, Just in case you have missed it, StormCrawler has moved into incubation at the Apache Software
unread,
StormCrawler move to the ASF
Hi, Just in case you have missed it, StormCrawler has moved into incubation at the Apache Software
Mar 28
Maulin
7/26/21
1_Thank you for choosing us.......!
Dear digita...@googlegroups.com, We heartedly welcom you as a premium member of Norton . Your
unread,
1_Thank you for choosing us.......!
Dear digita...@googlegroups.com, We heartedly welcom you as a premium member of Norton . Your
7/26/21
Emily
6/22/21
Your plan expired soon
Dear digita...@googlegroups.com, Greetings from MSR Insure System. Amount received, thanks for
unread,
Your plan expired soon
Dear digita...@googlegroups.com, Greetings from MSR Insure System. Amount received, thanks for
6/22/21
Steven Zhu
,
Julien Nioche
3
5/27/21
REDIRECTION question
Hi Steven, Comments inlined below I just finished https security access elasticsearch from SC.
unread,
REDIRECTION question
Hi Steven, Comments inlined below I just finished https security access elasticsearch from SC.
5/27/21
Steven Zhu
,
Julien Nioche
3
5/20/21
Best practice to overwrite ElasticSearchConnection class
Hi Julien, Thank you for the quick response. I have a bug to fix. Once I fix, I will create an issue
unread,
Best practice to overwrite ElasticSearchConnection class
Hi Julien, Thank you for the quick response. I have a bug to fix. Once I fix, I will create an issue
5/20/21
Lary
4/28/21
#In_Voice #Number: LODA-420/564......
Dear digita...@googlegroups.com, Thank You For Choosing #N0RT0N# Life Lock Renewal Program. We
unread,
#In_Voice #Number: LODA-420/564......
Dear digita...@googlegroups.com, Thank You For Choosing #N0RT0N# Life Lock Renewal Program. We
4/28/21
Steven Zhu
,
DigitalPebble
3
4/1/21
Build failed at storm-crawler-elasticsearch due to test error
Yes, this is from master branch. But I didn't install docker Docker. Thanks Steven On Thu, Apr 1,
unread,
Build failed at storm-crawler-elasticsearch due to test error
Yes, this is from master branch. But I didn't install docker Docker. Thanks Steven On Thu, Apr 1,
4/1/21
Aaron Gray
,
DigitalPebble
6
12/12/20
Catching 404's in a separate Elasticsearch index
On Sat, 12 Dec 2020 at 12:25, DigitalPebble <jul...@digitalpebble.com> wrote: Hi Aaron,
unread,
Catching 404's in a separate Elasticsearch index
On Sat, 12 Dec 2020 at 12:25, DigitalPebble <jul...@digitalpebble.com> wrote: Hi Aaron,
12/12/20
Aaron Gray
,
DigitalPebble
2
12/5/20
StormCrawler architectural overview ?
Hi Aaron, Not as such, but the overall picture is pretty straightforward. Understanding Storm would
unread,
StormCrawler architectural overview ?
Hi Aaron, Not as such, but the overall picture is pretty straightforward. Understanding Storm would
12/5/20
Erik Graf
,
DigitalPebble
2
11/27/20
Optimal Hardware
Hi Erik, That sounds very interesting. Are you planning to share the resulting dataset in one way or
unread,
Optimal Hardware
Hi Erik, That sounds very interesting. Are you planning to share the resulting dataset in one way or
11/27/20
Yuxin Zhu
, …
Sebastian Nagel
6
10/1/20
News-Crawler: Is there an option to use HTTP/1.0 or 1.1 for the Warc File
Update and correction: - recent Java 8 JDK packages may support HTTP/2 if they include the ALPN
unread,
News-Crawler: Is there an option to use HTTP/1.0 or 1.1 for the Warc File
Update and correction: - recent Java 8 JDK packages may support HTTP/2 if they include the ALPN
10/1/20
ravis...@gmail.com
,
DigitalPebble
6
3/18/20
Crawling Sitemap with Custom Metadata
You're welcome. Glad you got it to work On Wed, 18 Mar 2020 at 13:51, <ravis...@gmail.com
unread,
Crawling Sitemap with Custom Metadata
You're welcome. Glad you got it to work On Wed, 18 Mar 2020 at 13:51, <ravis...@gmail.com
3/18/20
ravis...@gmail.com
,
DigitalPebble
4
3/3/20
how to pass cookie from curl response to oulinks
Thanks a lot Julien for sending the details. http basic authentication does work. I am testing on set
unread,
how to pass cookie from curl response to oulinks
Thanks a lot Julien for sending the details. http basic authentication does work. I am testing on set
3/3/20
gcr
,
DigitalPebble
2
2/19/20
First time StormCrawler no output
Hi, Looks like you submitted your topology to a Storm cluster in deployed mode, which is great, but
unread,
First time StormCrawler no output
Hi, Looks like you submitted your topology to a Storm cluster in deployed mode, which is great, but
2/19/20
dans...@gmail.com
,
DigitalPebble
2
1/20/20
incremental crawling and/or recrawl
Hi Please use StackOverflow for questions like these, you'll get a wider audience. Does
unread,
incremental crawling and/or recrawl
Hi Please use StackOverflow for questions like these, you'll get a wider audience. Does
1/20/20
dgdesi...@gmail.com
,
DigitalPebble
2
11/20/19
Async worker died! ... clojure.lang.PersistentVector cannot be cast to class java.lang.String
Hi Replying on https://stackoverflow.com/questions/58960271/async-worker-died-clojure-lang-
unread,
Async worker died! ... clojure.lang.PersistentVector cannot be cast to class java.lang.String
Hi Replying on https://stackoverflow.com/questions/58960271/async-worker-died-clojure-lang-
11/20/19
madhavave...@gmail.com
11/4/19
Stormcrawler error when updating details to elastic search
Good morning, I am trying to do poc on stormcrawler. I have ES6.5.0, downloaded storm 1.2.3 and
unread,
Stormcrawler error when updating details to elastic search
Good morning, I am trying to do poc on stormcrawler. I have ES6.5.0, downloaded storm 1.2.3 and
11/4/19
jbri...@gmail.com
,
DigitalPebble
3
9/6/19
StormCrawler StdOutIndexer shows content with a count but no content - text- is visible
Hi Julien, Thanks for the response. Next time I will post questions like this on StackOverflow. I
unread,
StormCrawler StdOutIndexer shows content with a count but no content - text- is visible
Hi Julien, Thanks for the response. Next time I will post questions like this on StackOverflow. I
9/6/19
yashchaud...@gmail.com
,
DigitalPebble
2
3/11/19
Content no getting indexed in elasticsearch
Hi Please use StackOverflow with the tag stormcrawler to ask questions like this. The kibana
unread,
Content no getting indexed in elasticsearch
Hi Please use StackOverflow with the tag stormcrawler to ask questions like this. The kibana
3/11/19
VP
,
DigitalPebble
3
1/28/19
Crawler not crawling a few pages, crawls everything else
Sure, I will post this on SO. Can you please let me know what details would you need from the setup?
unread,
Crawler not crawling a few pages, crawls everything else
Sure, I will post this on SO. Can you please let me know what details would you need from the setup?
1/28/19
dennis...@googlemail.com
,
DigitalPebble
4
10/17/18
Question about Design Consideration in AbstractSpout (regarding es.status.ttl.purgatory)
Hi Dennis You are welcome. Feel free to give any feedback and / or tell us what you are using SC for.
unread,
Question about Design Consideration in AbstractSpout (regarding es.status.ttl.purgatory)
Hi Dennis You are welcome. Feel free to give any feedback and / or tell us what you are using SC for.
10/17/18
anve...@gmail.com
,
DigitalPebble
2
10/12/18
Pdf, MS Documents are not Crawling
https://stackoverflow.com/questions/tagged/stormcrawler On Fri, 12 Oct 2018 at 16:30, <anveshdd@
unread,
Pdf, MS Documents are not Crawling
https://stackoverflow.com/questions/tagged/stormcrawler On Fri, 12 Oct 2018 at 16:30, <anveshdd@
10/12/18
Julien Nioche
10/6/18
Announcement
Free 1-day workshop on web crawling with StormCrawler and Elasticsearch
Hi, In case you haven't seen the announcements on other channels, I'll be running a free 1-
unread,
Announcement
Free 1-day workshop on web crawling with StormCrawler and Elasticsearch
Hi, In case you haven't seen the announcements on other channels, I'll be running a free 1-
10/6/18
woloszy...@gmail.com
,
Julien Nioche
3
9/11/18
Injecting new URL to crawl without restarting the topology
hi Rafał would you mind asking the question on StackOverflow with the tag storm-crawler? Could you
unread,
Injecting new URL to crawl without restarting the topology
hi Rafał would you mind asking the question on StackOverflow with the tag storm-crawler? Could you
9/11/18
pankaj....@gmail.com
8/20/18
Storm crawler (news crawler not working..)
I am trying one example based on storm crawler in github https://github.com/commoncrawl/news-crawl. I
unread,
Storm crawler (news crawler not working..)
I am trying one example based on storm crawler in github https://github.com/commoncrawl/news-crawl. I
8/20/18
fear...@gmail.com
,
DigitalPebble
5
6/9/18
[StormCrawler] [ElasticSearch] Configuration Documentation / Technical Questions
Hi Richard That's great to hear, thanks for the feedback and looking forward to having you as an
unread,
[StormCrawler] [ElasticSearch] Configuration Documentation / Technical Questions
Hi Richard That's great to hear, thanks for the feedback and looking forward to having you as an
6/9/18
aiguz...@gmail.com
,
DigitalPebble
2
6/1/18
[Stormcrawler] URL content to HdfsBolt
Hi Artur, Please use stack overflow so that more people get the answer. Thanks Julien On Fri, 1 Jun
unread,
[Stormcrawler] URL content to HdfsBolt
Hi Artur, Please use stack overflow so that more people get the answer. Thanks Julien On Fri, 1 Jun
6/1/18
ch...@allthemoocs.com
,
DigitalPebble
2
3/17/18
elasticsearch and stromcrawler with injector flux file
Hi Chris See https://github.com/DigitalPebble/storm-crawler/issues/526, the ERROR is not a real
unread,
elasticsearch and stromcrawler with injector flux file
Hi Chris See https://github.com/DigitalPebble/storm-crawler/issues/526, the ERROR is not a real
3/17/18
Suman Mallela
, …
DigitalPebble
26
3/15/18
Solr and Stormcrawler - [WARN] Found data point value of class class java.util.HashMap
Suman Instead of posting every 10 minutes, why don't you try to work things out by yourself a bit
unread,
Solr and Stormcrawler - [WARN] Found data point value of class class java.util.HashMap
Suman Instead of posting every 10 minutes, why don't you try to work things out by yourself a bit
3/15/18
Suman Mallela
,
DigitalPebble
2
2/12/18
FetcherThread Null Errors - Elastic Search and Storm crawler Integration
Suman, I am happy to help people use StormCrawler but am finding quite tedious when my comments and
unread,
FetcherThread Null Errors - Elastic Search and Storm crawler Integration
Suman, I am happy to help people use StormCrawler but am finding quite tedious when my comments and
2/12/18