robots.txt issue on orkut.com.br - blocking adsense crawler

160 views
Skip to first unread message

anatoly

unread,
Nov 17, 2011, 3:20:54 PM11/17/11
to opensoci...@googlegroups.com
To orkut team:
 
 
Since last week due to some issue on robots.txt in orkut.com.br google adsense crawler has troubles reaching applications pages on the site.
Which causes less targeted ads and less revenue for devwelopers and for google.
 
This is the email we got from adsense:
 
We noticed that our AdSense ad crawler is having some difficulty crawling some parts of your site on orkut.com.br. Specifically, we've detected  696,147  failed crawl requests over a 4-day period last week, which is caused by an issue within your robots.txt file. Essentially, this file is blocking certain sections of your site from our AdSense ad crawler. Because of this, your AdSense ads are less targeted and are generating less revenue on average.
 
Fortunately, you can fix this issue today by editing your robots.txt file and giving our AdSense ad crawler the ability to view your site. To do this, find your robots.txt file, located at orkut.com.br/robots.txt and add the following two lines at the very top:

User-agent: Mediapartners-Google
Disallow:
 
 
Please add the above lines to robots.txt.
Please note this change will not affect search and is only related to adsense crawler.
 
thanks a lot
 

anatoly

unread,
Nov 18, 2011, 7:23:47 AM11/18/11
to opensoci...@googlegroups.com
Anyone from orkut team here who is able to respond?
 
there is a line in orkut.com.br/robots.txt
which blocks adsense access:
 
Disallow: /Application.aspx
 
 
thanks

Bruno Oliveira (Google)

unread,
Nov 21, 2011, 6:48:11 AM11/21/11
to opensoci...@googlegroups.com
We're looking into it. Thanks for reporting!

anatoly

unread,
Nov 21, 2011, 6:59:15 AM11/21/11
to opensoci...@googlegroups.com
Bruno
 
 
thanks a lot for reply
 
it is related also to orkut.com/robots.txt as well as orkut.com.br/robots.txt
 
specific lines that block adsense crawler from app pages are:
Disallow: /Application
Disallow: /Application.aspx
 
 
Per adsense team suggestion adding
User-agent: Mediapartners-Google
Disallow:
 
should unblock adsense crawler only and shouldn't have any effect on google crawler or other search bots. 

anatoly

unread,
Nov 23, 2011, 6:46:24 PM11/23/11
to opensoci...@googlegroups.com
any updates?
 
thanks

anatoly

unread,
Nov 27, 2011, 4:46:46 PM11/27/11
to opensoci...@googlegroups.com
any updates?
 
lately it seems like this forum is abandoned by orkut :( hopefully devs at orkut are just busy...

Bruno Oliveira (Google)

unread,
Nov 29, 2011, 9:13:19 AM11/29/11
to opensoci...@googlegroups.com
Hello,

Sorry for our delay.
We're still looking into the issue.

Thank you!

anatoly

unread,
Dec 10, 2011, 6:31:18 AM12/10/11
to opensoci...@googlegroups.com
any updates? thanks

anatoly

unread,
Dec 26, 2011, 3:07:47 PM12/26/11
to opensoci...@googlegroups.com
any updates regarding adsense crawler issue?
 
thanks

Bruno Oliveira (Google)

unread,
Jan 2, 2012, 1:12:17 PM1/2/12
to opensoci...@googlegroups.com
Hello,

We're still looking into this! Sorry for the delay.

Thank you!

Bruno Oliveira (Google)

unread,
Jan 2, 2012, 3:02:36 PM1/2/12
to opensoci...@googlegroups.com
Hello,

Just found a possible reason. Are you using PublicAppInfo or AppInfo as a landing page for your ad? The correct is to use PublicAppInfo. If you are using AppInfo, then it's normal to see this error, because AppInfo is not open to the crawler, but PublicAppInfo is.

Can you check what your landing page is please?

Thanks!

anatoly

unread,
Jan 2, 2012, 3:24:26 PM1/2/12
to opensoci...@googlegroups.com
Hello Bruno
 
 
I'm using PublicAppInfo everywhere, however I'm also
using requestNavigateTo and this method navigates to "Application.aspx" (which is closed to crawlers)
 
 

var canvas_view = new gadgets.views.View("canvas");

gadgets.views.requestNavigateTo(canvas_view, params);

Can you check this please? Based on your info sounds like the bug is in the implementation of requestNavigateTo.

Does it make sense?

Bruno Oliveira (Google)

unread,
Jan 3, 2012, 8:29:48 AM1/3/12
to opensoci...@googlegroups.com
But why would your call to requestNavigateTo() affect the crawler?

Sorry if I'm asking something obvious... I don't really know how the Adsense crawler works, as that's not my area :-)

-
Bruno Oliveira (Google)

anatoly

unread,
Jan 3, 2012, 9:01:48 AM1/3/12
to opensoci...@googlegroups.com
I'm not sure it is related solely to requestNavigateTo
 
 
there are several app urls
 
1) orkut links (for example from profile page)
 
2) PublicAppInfo
which redirects to
(which is similar to #1)
 
3) orkut implementation of OpenSocial method requestNavigateTo
goes to
which briefly redirects to
 
 
adsense crawler reports errors on
and
 
since /Application.aspx and /Application are blocked by robots.txt
These are real urls before url rewrite so somehow it might be related to url rewrite on orkut.
 
 
 
In any case you can just solve it by adding
User-agent: Mediapartners-Google
Disallow:
 
to robots txt (or just allow /Application.aspx and /Application for Mediapartners-Google) which will not effect other crawlers
It will be in any case for Google's benefit (and for developers benefit).
 
 
 
 
 
 
 

anatoly

unread,
Jan 3, 2012, 9:05:18 AM1/3/12
to opensoci...@googlegroups.com
In any case unrelated to crawlers error you can unify
/Main#Application.aspx?appId=1234&appParams=bla
 
with
/Main#Application?appId=1234&appParams=bla
 
as
/Main#Application
 
is a standard scheme on orkut and only requestNavigateTo uses Application.aspx
 

Bruno Oliveira (Google)

unread,
Jan 3, 2012, 3:08:35 PM1/3/12
to opensoci...@googlegroups.com
We do not plan to modify robots.txt at the moment... but as far as I understand, the crawler would not need to crawl over /Application, because /PublicAppInfo does not automatically redirect to /Application unless the user is logged in and has the app installed.

Since the crawler is not an Orkut user, this redirection would not happen and it would never see /Application or any other page. At least that's how it should happen :-) I've tested /PublicAppInfo myself while not logged in (which mimics the way the adsense crawler would see it) and indeed it does not perform any redirections that I could tell...

Thanks!

-
Bruno Oliveira (Google)
Message has been deleted

anatoly

unread,
Jan 3, 2012, 3:19:53 PM1/3/12
to opensoci...@googlegroups.com
Hello Bruno
Google Adsense crawler crawls the page where the ad tag is placed - it is on the application page
which is an iframe on orkut, when the crawler sees that the page is the iframe it attempts to crawl it's parent page
(in our case it is /Application, /Application.aspx) which is blocked by robots.txt.
 
Somehow it detects that the page is /Application and not /Main#Application.
I suspect the issue might be with url rewriting from /Application to /Main#Application
 
I suggest to at least change requestNavigateTo implementation to redirect to
/Main#Application instead of /Main#Application.aspx to make it standard.
 
Otherwise .aspx looks like a redundant orkut url format for pages.

anatoly

unread,
Mar 7, 2012, 4:32:44 PM3/7/12
to opensoci...@googlegroups.com
Any update on the issue?
The minimum orkut team should do is fix the bug of requestNavigateTo implementation
 
Currently requestNavigateTo redirects to /Application.aspx instead of /Application and it blocks adsense crawler.
I'd suggest to work closely with adsense team, you're the same company, I can't coordinate your internal cooperation.
Just ask Adsense team how much crawler errors do they get on google.com.br, I bet there are billions.
 
 
Please fix and update, otherwise it makes counterproductive to use adsense on apps.
 
 
thanks

Bruno Oliveira (Google)

unread,
Mar 8, 2012, 12:04:08 PM3/8/12
to opensoci...@googlegroups.com
Hello,

Thanks for the explanation, it clarified the finer points of how the adsense crawler works which I wasn't aware of. I assumed it crawled the landing page only, not the page on which the ad appears. We'll look into it. Thanks!

-
Bruno Oliveira (Google)

anatoly

unread,
Mar 8, 2012, 12:39:22 PM3/8/12
to opensoci...@googlegroups.com
Hi Bruno
 
 
thanks for the update
 
Please update this thread if you make any decisions or changes about this.
I think it crawls the page where the ad markup appears plus sometimes the parent page if it the former is iframe.
Adsense cralwer errors are reported to the publisher, so I can see where it fails. If you need any info let mwe know.
 
 
thanks

Bruno Oliveira (Google)

unread,
Mar 12, 2012, 5:42:35 PM3/12/12
to opensoci...@googlegroups.com
Our team is looking into the issue with the additional information you provided! I'll update the thread as soon as I hear news.

Thanks!

-
Bruno Oliveira (Google)

Bruno Oliveira (Google)

unread,
Mar 15, 2012, 4:37:56 PM3/15/12
to opensoci...@googlegroups.com
Hello,

So, I've been discussing this with the engineers here. Adsense crawlers don't crawl over authenticated pages (right?). So, even if /Application.aspx were not blocked, it would not be able to crawl it. So, isn't this warning spurious? That is, the fact that Application.aspx is blocked shouldn't alter your ads quality at all...

Or am I mistaken?

Thanks!

-
Bruno Oliveira (Google)

anatoly

unread,
Mar 15, 2012, 7:43:01 PM3/15/12
to opensoci...@googlegroups.com
Hi Bruno
 
 
there are several issues:
1) users should not land on /Application.aspx at all. You have an issue with url rewriting in implementation of

gadgets.views.requestNavigateTo(canvas_view, params);

this call land on /Application.aspx
rather than on /Application
 
this should be fixed, I think it influences the crawler.
 
2) I'm not sure either /Application.aspx or /Application are behind the login. Should they be behind the login? It's on the part of the app to check the viewer, and if it is null
it should display a link to install, I think that is the flow explained to us by orkut team over the years, (i.e. there should not be automatic redirect to install page).
 
3) I'm getting also a lot of adsense crawler errors that the content is behind the login on urls of the type: 
 
etc.
 
Adsense crawler can crawl over authenticated pages if it can land on them, i.e. if there is no redirect, etc.
 
 
I get thousands of crawler errors from adsense to fix, while you should get them and not orkut developers since it is entirely out of developer's control.
Since adsense and orkut is the same company, I think it depends on your teams cooperation to resolve it and improve adsense quality. At the end of the day google would earn more if there are more relevant adsense ads.
 
 
thanks

anatoly

unread,
Apr 12, 2012, 3:54:30 PM4/12/12
to opensoci...@googlegroups.com
Hi Bruno
 
 
Following this thread discussion let me clarify the main issue:
 
 
urls are behind a login so adsense crawler can't access these pages while trying to crawl adsense ads on app pages.
If it is unable to crawl using adsense ads is not effective on app pages.
 
Is there any workaround?
It is impossible to define a crawler login for google hosted urls in adsense configuration.
 
 
thanks

davew

unread,
Jul 11, 2012, 2:59:35 PM7/11/12
to opensoci...@googlegroups.com
Hi Bruno,

Will the Orkut team work on resolving the Adsense crawler problems?  In order for app developers to support Orkut we need to make money on the platform. To make money we need targeted ads.  If Google's Adsense crawler can't crawl pages then those ads have very low rpm. 

At one point I know the Google Adsense for Games team was using Orkut user profile data to target ads on application pages. I know because we use Adsense for Games on BuddyPoke.. But I'm guessing, based on the drop of rpm that Orkut is now completely blocking Adsense's crawlers, including Adsense for Games?  Why can't Google Adsense servers access Google Orkut pages?

Please help?

Thanks

Dave @ BuddyPoke

anatoly

unread,
Jul 11, 2012, 3:12:58 PM7/11/12
to opensoci...@googlegroups.com
+1

Bruno Oliveira (Google)

unread,
Jul 12, 2012, 4:37:30 PM7/12/12
to opensoci...@googlegroups.com
Hello,

We are looking into the issue but we don't have a position to share on how/when this issue will be addressed yet. The Orkut team has debated it a lot with the AdSense team since the beginning of this thread and we'll keep you updated on the progress.

Thanks for your input and continued support!

-
Bruno Oliveira (Google)
Reply all
Reply to author
Forward
0 new messages