News Crawler Bug?

39 views
Skip to first unread message

jaypa...@gmail.com

unread,
Jun 7, 2021, 6:16:39 AM6/7/21
to Common Crawl
Hi,

It seems like the news datasets are not updating currently, with the last one uploaded with timestamp:

2021-06-06 20:35:03 1072722205 crawl-data/CC-NEWS/2021/06/CC-NEWS-20210605195038-00266.warc.gz

Is there an error with the news crawler or is there some routine maintenance going on that has delayed the news datasets from being uploaded?

Thanks,

Jay M. Patel.

Sebastian Nagel

unread,
Jun 7, 2021, 6:20:49 AM6/7/21
to common...@googlegroups.com
Hi Jay,

thanks for the notice, see also
https://github.com/commoncrawl/news-crawl/issues/46

I'm working on a solution...

Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com
> <mailto:common-crawl...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/common-crawl/7168b669-08d8-4ded-acf1-121813992505n%40googlegroups.com
> <https://groups.google.com/d/msgid/common-crawl/7168b669-08d8-4ded-acf1-121813992505n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Jay Patel

unread,
Jun 7, 2021, 6:23:14 AM6/7/21
to common...@googlegroups.com
Hi Sebestian,

My bad, I didnt look at Github issues before mentioning it on the mailing list. Thanks for already looking into it though!

Jay. 

To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/common-crawl/15151d12-066e-3ccf-431b-28734898f5d7%40commoncrawl.org.


--
Jay M. Patel
Cofounder and Principal Data Scientist
Specrom Analytics
+1-678-740-7834 (US)
+91-9323998105 (India)

Sebastian Nagel

unread,
Jun 7, 2021, 11:05:54 AM6/7/21
to common...@googlegroups.com
Hi Jay,

it's better to see an error reported twice than not. Thanks!

The issue is fixed and the news crawl is back to normal.

Best,
Sebastian

On 6/7/21 12:23 PM, Jay Patel wrote:
> Hi Sebestian,
>
> My bad, I didnt look at Github issues before mentioning it on the mailing list. Thanks for already looking into it though!
>
> Jay.
>
> On Mon, Jun 7, 2021 at 3:50 PM Sebastian Nagel <seba...@commoncrawl.org <mailto:seba...@commoncrawl.org>> wrote:
>
> Hi Jay,
>
> thanks for the notice, see also
> https://github.com/commoncrawl/news-crawl/issues/46 <https://github.com/commoncrawl/news-crawl/issues/46>
>
> I'm working on a solution...
>
> Best,
> Sebastian
>
> On 6/7/21 12:16 PM, jaypa...@gmail.com <mailto:jaypa...@gmail.com> wrote:
> > Hi,
> >
> > It seems like the news datasets are not updating currently, with the last one uploaded with timestamp:
> >
> > 2021-06-06 20:35:03 1072722205 crawl-data/CC-NEWS/2021/06/CC-NEWS-20210605195038-00266.warc.gz
> >
> > Is there an error with the news crawler or is there some routine maintenance going on that has delayed the news datasets from being
> uploaded?
> >
> > Thanks,
> >
> > Jay M. Patel.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com
> <mailto:common-crawl%2Bunsu...@googlegroups.com>
> > <mailto:common-crawl...@googlegroups.com <mailto:common-crawl%2Bunsu...@googlegroups.com>>.
> <https://groups.google.com/d/msgid/common-crawl/7168b669-08d8-4ded-acf1-121813992505n%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/msgid/common-crawl/7168b669-08d8-4ded-acf1-121813992505n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com
> <mailto:common-crawl%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/common-crawl/15151d12-066e-3ccf-431b-28734898f5d7%40commoncrawl.org
> <https://groups.google.com/d/msgid/common-crawl/15151d12-066e-3ccf-431b-28734898f5d7%40commoncrawl.org>.
>
>
>
> --
> Jay M. Patel
> Cofounder and Principal Data Scientist
> Specrom Analytics
> +1-678-740-7834 (US)
> +91-9323998105 (India)
> www.specrom.com <http://www.specrom.com> | www.jaympatel.com <http://www.jaympatel.com>
> Pate...@specrom.com <mailto:Pate...@specrom.com>
> j...@jaympatel.com <mailto:j...@jaympatel.com>
>
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com
> <mailto:common-crawl...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/common-crawl/CAHcUgvOGXyU4t%3D-EBJbzomBSWx%3DGuZgh_AY9tskbaqyNuVVKaw%40mail.gmail.com
> <https://groups.google.com/d/msgid/common-crawl/CAHcUgvOGXyU4t%3D-EBJbzomBSWx%3DGuZgh_AY9tskbaqyNuVVKaw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages