Human 404s vs Bot 404s

32 views
Skip to first unread message

Nat Dunn

unread,
Jun 1, 2016, 9:39:27 AM6/1/16
to FusionReactor
In FusionReactor > Requests > Response Codes, we can see the 404 and 500 errors, but most or the errors are generated by bots. While that's useful to know, it'd be nice to filter out the bots to see if actual people are getting any error pages.

Is there a way to do that?

Thanks,

Nat

eleni_grosdouli

unread,
Jun 1, 2016, 11:24:25 AM6/1/16
to FusionReactor
Hey Nat,

I will speak with the development team about this issue and I will let you know!

Regards,
Eleni

Nat Dunn

unread,
Jun 1, 2016, 10:52:51 PM6/1/16
to FusionReactor
Thanks, Eleni!

Nat

eleni_grosdouli

unread,
Jun 2, 2016, 6:23:38 AM6/2/16
to FusionReactor
Hi Nat,

Could you please tell me how do you identify that the request is coming from a bot?

Eleni

Nat Dunn

unread,
Jun 2, 2016, 8:02:48 AM6/2/16
to fusion...@googlegroups.com
Eleni,

Sure. Look at the Transaction Details on the Headers tab, and then look for user agent. For bots, you see something like:

user-agent Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Nat

--
You received this message because you are subscribed to a topic in the Google Groups "FusionReactor" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fusionreactor/_vJuzx5znTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fusionreacto...@googlegroups.com.
To post to this group, send email to fusion...@googlegroups.com.
Visit this group at https://groups.google.com/group/fusionreactor.
For more options, visit https://groups.google.com/d/optout.



--

eleni_grosdouli

unread,
Jun 2, 2016, 8:58:17 AM6/2/16
to FusionReactor
Hi Nat,

Unfortunately, FusionReactor is not able to filter out the bots. However, I am going to create a ticket for this issue in order to implement this functionality in future versions of FusionReactor.

Thank you for your input and support.

Eleni

Charlie Arehart

unread,
Jun 2, 2016, 10:28:01 AM6/2/16
to fusion...@googlegroups.com

Charlie Arehart

unread,
Jun 2, 2016, 10:49:54 AM6/2/16
to fusion...@googlegroups.com

Thanks, Nate, for bringing up the topic of enhancing FR’s potentially filtering on request traffic from spiders and bots. I’d like to add to it, to go a bit further beyond what you’ve requested, if you don’t mind me piggy-backing on it. :-)

A) The impact of spider/bot traffic is far more important issue than many realize (judging from my work helping people each day to troubleshoot web app servers using FR, whether they’re running CF, Lucee, Railo, BD, or java app servers like Tomcat, jBoss, and so on). Unexpected traffic from spiders and bots is a VERY common cause of various problems, and sometimes it accounts for as much as 80% of a site’s traffic as tracked by FR!

B) Beyond Nate’s request to have an option to filter out bots for the requests>response codes display, I hope the FR team might consider broader improvements related to (optional) spider/bot tracking:

- I would propose that there could be value in such filtering in many places, beyond that one page, such as users>sessions (as of FR6, whereas before that was in metrics>custom series), as well as perhaps even the other requests pages (history, slow/long requests), and perhaps even the graphs (like metrics>web metrics, and the requests>graph pages)

- besides Nate’s request to “filter them out” in that display, I’d also propose there could be value in the option to either a) not show them, b) show them only, or c) show both them and other user agents (because sometimes it does not matter which type they are, and for now that’s how most people view things anyway)

- of course, since user agent strings are quite fluid, there should probably be a mechanism (a settings page) to control what strings are regarded as spiders and bots. For instance, yahoo’s spider doesn’t use either term, calling it “slurp” instead, and some may even want to consider scheduled tasks or cfhttp calls to also be spiders (or the equivalent in other platforms that may call into an FR-monitored insteance), and they each have their own user agent strings

- I’d think this best done not as a permanent setting (like the FR>restrictions page, which once enabled affects all pages) but rather as a toggle available on all such screens, as it’s more something one may want to “turn on and turn off” to see the impact of spiders

- indeed, I’d love it if an option could somehow highlight the lines of requests in the CP alert which show such spider user agents. Yes, fortunately in FR6 and above the CP alert emails do now show the user agent in the detail lines for running requests at the top (rather than at the bottom, as in FR5 and before) but as they are also now html-formatted, it would not be difficult to have an option to shade or color those request lines that DID have a user agent identified by this feature as a spider/bot, because again sometimes it’s quite surprising for people to realize that most of the running requests in an alert may well be from spiders and bots and other automated traffic (load balancer pings, internal and external site uptime or request monitors, rss feed readers, scheduled tasks,  and much more)

C) I know that’s taking things well beyond what Nat asked for, but I hope it’s considered, because if you’re going to step a toe into the water of evaluating some pages for spider/bot impact, it seems you may as well do it in a way that can be useful on most pages. :-)

D) One might even go so far as to propose that user agent strings alone are not really sufficient for detecting such automated requests, because of course some know that a client can lie in providing its user agent header. I’ve found many cases of clearly automated requests coming with user agent strings that “looked like” regular browser user agents, but the volume and pace were clearly not possible from human users.  So an equally useful indicator is whether the request has any incoming cookies (in the http-cookie header). Automated requests tend to have no cookie (even if the user agent looks “legit”).

It could be useful if FR could FR filter on the presence of any cookies.  Of course, legit first-time users  to a site will also have no cookies, and if a site doesn’t use sessions or client variables (a CFML concept) or otherwise use cookies, then all requests would have no cookies. But the fact is that at least 99% of sites (from my experience) DO use cookies in some fashion, and so being able to highlight request activity without such cookies could be just as useful as filtering on user agents (but I realize this takes some rather advanced understanding to leverage properly).

E) And I realize that adding all the above could seem to some to potentially add overhead to the processing of FR (I doubt it myself, but I grant the concern), and so I’d not be surprised if the FR folks may propose that this is something better done in the coming “FR in the cloud” offering, which some here may have seen demonstrated by the FR folks at recent conferences. It would have all the analysis and UI presentation done in a cloud-based implementation of FR, where your local FR within CF (or Lucee or Tomcat or whatever) would simply push the “live” data from your local environment up to that cloud-based implementation, and stored in a DB managed by FR in the cloud. Then, adding capability to FR would not add “weight” to the java agent FR runs within your JVM, and I suspect many powerful ideas might be better implemented when that new platform becomes a reality.

And so, again, I don’t want to derail Nate’s simpler request, if that might be done in the current “local” implementation of FR. :-) Feel free to defer my ideas to the cloud, if you feel you must. But as long as he brought up the topic, I wanted to share these thoughts for any who may appreciate a deeper discussion of the topic.

/charlie

 

From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of eleni_grosdouli
Sent: Thursday, June 02, 2016 7:58 AM
To: FusionReactor
Subject: Re: [fusionreactor] Re: Human 404s vs Bot 404s

 

Hi Nat,

Nat Dunn

unread,
Jun 3, 2016, 6:17:29 PM6/3/16
to fusion...@googlegroups.com
Hi Charlie,

Thanks for thinking so much on this! We're just getting started with FusionReactor, but I can see that being able to differentiate between bot traffic and human traffic through FusionReactor would be super useful. We don't care about bot traffic except when we do. And when we do, we'd rather see it separate from the human traffic. Thanks again!

Nat

--
You received this message because you are subscribed to a topic in the Google Groups "FusionReactor" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fusionreactor/_vJuzx5znTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fusionreacto...@googlegroups.com.
To post to this group, send email to fusion...@googlegroups.com.
Visit this group at https://groups.google.com/group/fusionreactor.
For more options, visit https://groups.google.com/d/optout.

eleni_grosdouli

unread,
Jun 6, 2016, 4:38:40 AM6/6/16
to FusionReactor
Hey Nat,

If you have further questions regarding FusionReactor, please do not hesitate to ask.

Eleni

Nat Dunn

unread,
Jun 6, 2016, 6:24:03 AM6/6/16
to fusion...@googlegroups.com
Thanks Eleni!

Nat
Reply all
Reply to author
Forward
0 new messages