Robots.txt in a Rails app

65 views
Skip to first unread message

Chris McCann

unread,
Mar 13, 2014, 2:17:29 PM3/13/14
to sdr...@googlegroups.com
I've had a rash of exception notifications come through from one of my Rails apps lately.

A ActionController::MethodNotAllowed occurred in application#index:
  Only put and delete requests are allowed.

The culprit appears to be this:

HTTP_USER_AGENT  : Mozilla/5.0 (compatible; Genieo/1.0 http://www.genieo.com/webfilter.html) 

You can click the link to see what Genieo is (tl;dr: widely panned as crapware).  It looks to me like one of my users has this crapware on their computer, had it running when they interacted with my app at a URL that should take a PUT, and now Genieo keeps hitting that link with a GET, triggering the error.

Looking for ways to prevent this it seemed that putting a disallow for Genieo in robots.txt would solve it.  Here's my robots.txt file:

# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
User-Agent: Genieo
Disallow: /

After redeploying I'm still getting the hits but don't understand why.  I've put a request into their support to figure this out, but I'm curious if anyone else here has dealt with a problem like this in a similar fashion.

Cheers,

Chris

Adam Grant

unread,
Mar 13, 2014, 2:24:38 PM3/13/14
to sdr...@googlegroups.com
Question: Does the crapware modify your user's browser request, or is it making a second request unbeknownst to your user?

Crapware doesn't usually respect robots.txt. You might need to put a middleware in your Rails app to filter out that user agent, or something at the Apache/Nginx level. Not ideal, but at least it won't trigger a routing error.

Regards,
- Adam


--
--
SD Ruby mailing list
sdr...@googlegroups.com
http://groups.google.com/group/sdruby
---
You received this message because you are subscribed to the Google Groups "SD Ruby" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sdruby+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chris McCann

unread,
Mar 13, 2014, 2:28:31 PM3/13/14
to sdr...@googlegroups.com
Thanks, Adam.  According to the documentation Genieo provides it sounds like their crapware is making additional requests on behalf of the user:

Genieo is designed to automatically retrieve and filter information items from across the web, based on the user's specific individual interests, and display them on a personal Homepage.
Genieo studies the user's preferred individual interests and sources at a high resolution, by analyzing their browsing routine. Our desktop configuration runs on the user's compuer to increase privacy. Genieo then continuously explores the internet for specific information items which are mostly related to these interests, and presents them on the personal Homepage.
The Genieo homepage is styled as a newspaper front page with selected items, consisting of title, short snippet, media, link to article, and share button

My initial request to them for help indicated that their software DID respect robots.txt, which is why I went this route.

I am also looking at a Rack-based solution to simply redirect any requests with that user agent.

Chris 


You received this message because you are subscribed to a topic in the Google Groups "SD Ruby" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sdruby/v8YBg5uRDj0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sdruby+un...@googlegroups.com.

Adam Grant

unread,
Mar 13, 2014, 2:36:35 PM3/13/14
to sdr...@googlegroups.com
The only other thing I can think of is case sensitivity: http://www.genieo.com/robots.txt

They use lower case "g" in "genieo". Might just double check in your web server logs that the user agent isn't lowercased, even though their site says it's uppercased.

Good luck!
- Adam

Chris McCann

unread,
Mar 13, 2014, 2:38:39 PM3/13/14
to sdr...@googlegroups.com
Yeah, the user agent I'm getting is what I posted in the original post, uppercase "Genieo".  I could add another disallow for the lower case to see if it makes a difference.

James Miller

unread,
Mar 13, 2014, 2:54:12 PM3/13/14
to sdr...@googlegroups.com
Chris,

I throw rack-rewrite (https://github.com/jtrupiano/rack-rewrite) in most apps for various purposes, this being one of them. You can use an if: -> (rack_env) { rack_env["HTTP_USER_AGENT"].downcase == "genieo" } or something like it to handle your case.

James
Reply all
Reply to author
Forward
0 new messages