Catching 404's in a separate Elasticsearch index

4 views
Skip to first unread message

Aaron Gray

unread,
Dec 4, 2020, 8:16:33 PM12/4/20
to DigitalPebble
Hi,

is there an easy way to catch 404's into a separate Elasticsearch index without doing any coding ?

Regards,

Aaron

DigitalPebble

unread,
Dec 5, 2020, 8:36:16 AM12/5/20
to DigitalPebble
Hi again,

Can you post questions like this one to StackOverflow? You'll get a wider audience.

You could simply from ES copy all the docs in the status index that have a 404 code into a separate index - see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html.
Alternatively, you could use ingestion and set up a pipeline to deal with these.

Note that you'd need to set the status code to be persisted + would want to make sure their status is ERROR i.e. the URLs have got  404s more than once.

Hope it helps

Julien


--
You received this message because you are subscribed to the Google Groups "DigitalPebble" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digitalpebbl...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/digitalpebble/a7a43299-f2c5-4d6c-87fd-2c742e5edbb2n%40googlegroups.com.


--

Aaron Gray

unread,
Dec 5, 2020, 1:09:22 PM12/5/20
to DigitalPebble
I don't use Stack Overflow out of principle, its points system stops people from asking questions after a period of time, and it also appears at the top of Google Searches overtop of other information sources that are often better, its part of the cut'n'paste culture.

Aaron Gray

unread,
Dec 5, 2020, 1:24:08 PM12/5/20
to DigitalPebble
Hi again,

Sorry, StackOverflow really gets my goat :)

I will read up properly on elasticsearch and StormCrawler docs and look at doing a patch to StormCrawler and possibly use ingestion.

Thanks,

Aaron

DigitalPebble

unread,
Dec 12, 2020, 7:24:58 AM12/12/20
to DigitalPebble
Hi Aaron, 

Especially for that caprine of yours => https://github.com/DigitalPebble/storm-crawler/discussions
Have a goat week end

Julien

Aaron Gray

unread,
Dec 12, 2020, 1:52:36 PM12/12/20
to digita...@googlegroups.com
On Sat, 12 Dec 2020 at 12:25, DigitalPebble <jul...@digitalpebble.com> wrote:
Hi Aaron, 

Especially for that caprine of yours => https://github.com/DigitalPebble/storm-crawler/discussions
Have a goat week end

Oh, that was five minutes of constant laughter, I have never had anyone email me something so funny ! And I love goats too, wonderful :)

Great thanks for the discussion space.

Aaron

 
You received this message because you are subscribed to a topic in the Google Groups "DigitalPebble" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/digitalpebble/OcNpwbKjPiU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to digitalpebbl...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/digitalpebble/CALv%2Baz1AcRBdUqrq3UYO6UMHDcQ%3D8S93fJYMBQAZwmu%2BBQY5FQ%40mail.gmail.com.


--
Aaron Gray

Independent Open Source Software Engineer, Computer Language Researcher, Information Theorist, and amateur computer scientist.

Reply all
Reply to author
Forward
0 new messages