My app uses the Google Accounts integration to provide login required
home pages for my users. The text on their pages is the same, but
they see their personal statistics. The AdSense ads on these pages
are extremely poorly targeted. Adsense has a site authentication
feature, but I can't see how to use it with the google accounts
integration in GAE.
How can provide a simple way for the adsense crawler to log in without
losing the simplicity of google accounts for my user services?
Continue using the Users API to sign in your normal users, but create
an authentication URL specifically for the adsense bot (eg,
www.mysite.com/adsense_authenticate). Configure the handler for that
page to check the supplied username and password (or secret token, or
whatever system you wish to use), and if it is valid, issue a session
cookie to the bot. Then, on your normal pages, allow access if the
user is signed in with the Users API, or if they have a valid session
cookie.
You can also use HTTP authentication, in which case the crawler will
send an Authenticate: header with each request, and you can verify
their credentials each time, without having to store a cookie.
On Wed, May 13, 2009 at 11:27 PM, Steve <unetright.thebas...@xoxy.net> wrote:
> Hi,
> My app uses the Google Accounts integration to provide login required
> home pages for my users. The text on their pages is the same, but
> they see their personal statistics. The AdSense ads on these pages
> are extremely poorly targeted. Adsense has a site authentication
> feature, but I can't see how to use it with the google accounts
> integration in GAE.
> How can provide a simple way for the adsense crawler to log in without
> losing the simplicity of google accounts for my user services?
On Thu, May 14, 2009 at 8:03 AM, Nick Johnson (Google) wrote: > You can also use HTTP authentication, in which case the crawler will > send an Authenticate: header with each request, and you can verify > their credentials each time, without having to store a cookie.
This is the easiest solution. Check for Basic HTTP Auth requests and allow access to Google's crawler.
Thank you very much for your reply. Could you please elaborate on how
the HTTP authentication method would avoid the need to send the bot a
cookie?
I have implemented the first suggestion of using an special adsense
authenticate URL which sets a cookie. And this works ~90% of the
time. However when I look in the logs I can see the adsense bot post
to the authenticate URL (which sets the cookie). Then about 60ms
later query the protected URL. This query get's redirected to the
login page which tells me the cookie was not sent along. Subsequent
queries to restricted URL _do_ get the cookie and are passed through.
I noticed that the post to the authenticate URL and the subsequent
query to the restricted URL come from different Mediapartners ip
addresses. Since only the first restricted query fails to get through
with the cookie, I suspect the adsense bot is not able to replicate
the cookie to nodes doing the restricted indexing fast enough for that
first 60ms later request.
I would like to use the HTTP authentication method to get around this
problem, but I don't understand how it will get me away from the need
for cookies. When I select that method, the adsense configuration
still has a place for an authenticate and a restricted url. That
leads me to believe it is only going to use the Authenticate: header
when contacting the authenticate URL. So I would still need to store
a cookie for the restricted URLs. If it would send the authenticate
header to the restricted URLs themselves, I would not need the cookie,
but there would be no reason to have the authenticate URL.
Your first suggestion got me 90% of the way there. Can you help me
understand how to get that last 10%?
Thanks!
Steve
On May 14, 4:03 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Continue using the Users API to sign in your normal users, but create
> an authentication URL specifically for the adsense bot (eg,www.mysite.com/adsense_authenticate). Configure the handler for that
> page to check the supplied username and password (or secret token, or
> whatever system you wish to use), and if it is valid, issue a session
> cookie to the bot. Then, on your normal pages, allow access if the
> user is signed in with the Users API, or if they have a valid session
> cookie.
> You can also use HTTP authentication, in which case the crawler will
> send an Authenticate: header with each request, and you can verify
> their credentials each time, without having to store a cookie.
> -Nick Johnson
> On Wed, May 13, 2009 at 11:27 PM, Steve <unetright.thebas...@xoxy.net> wrote:
> > Hi,
> > My app uses the Google Accounts integration to provide login required
> > home pages for my users. The text on their pages is the same, but
> > they see their personal statistics. The AdSense ads on these pages
> > are extremely poorly targeted. Adsense has a site authentication
> > feature, but I can't see how to use it with the google accounts
> > integration in GAE.
> > How can provide a simple way for the adsense crawler to log in without
> > losing the simplicity of google accounts for my user services?
On Thu, May 14, 2009 at 6:24 PM, Steve <unetright.thebas...@xoxy.net> wrote:
> Hi Nick,
> Thank you very much for your reply. Could you please elaborate on how
> the HTTP authentication method would avoid the need to send the bot a
> cookie?
> I have implemented the first suggestion of using an special adsense
> authenticate URL which sets a cookie. And this works ~90% of the
> time. However when I look in the logs I can see the adsense bot post
> to the authenticate URL (which sets the cookie). Then about 60ms
> later query the protected URL. This query get's redirected to the
> login page which tells me the cookie was not sent along. Subsequent
> queries to restricted URL _do_ get the cookie and are passed through.
> I noticed that the post to the authenticate URL and the subsequent
> query to the restricted URL come from different Mediapartners ip
> addresses. Since only the first restricted query fails to get through
> with the cookie, I suspect the adsense bot is not able to replicate
> the cookie to nodes doing the restricted indexing fast enough for that
> first 60ms later request.
That seems like a reasonable hypothesis, but these questions would be
better posed in one of the AdSense groups - I'm not an expert on the
subject. :)
> I would like to use the HTTP authentication method to get around this
> problem, but I don't understand how it will get me away from the need
> for cookies. When I select that method, the adsense configuration
> still has a place for an authenticate and a restricted url. That
> leads me to believe it is only going to use the Authenticate: header
> when contacting the authenticate URL. So I would still need to store
> a cookie for the restricted URLs. If it would send the authenticate
> header to the restricted URLs themselves, I would not need the cookie,
> but there would be no reason to have the authenticate URL.
You're quite right. I suspect that the adsense bot will still make an
initial request to your authenticate URL in case you're using digest
authentication (in which case, it needs a server-supplied secret, and
can't rely on getting a 403 from your normal URL). If you use basic
auth, though, it'll send its username and password along in the clear
for subsequent requests, so you can trivially check if it's the
adsense bot with the credentials you gave it. Thus, using HTTP
authentication will eliminate the need for you to use cookies (and
potentially track valid ones), but it may not help with the issue
you're experiencing.
Again, I'm not an expert on AdSense, so these sort of indepth
questions would probably be better posed on the support forums for
AdSense.
One other consideration is whether or not you need to care: Presumably
the adsense bot will only have to authenticate very rarely, so this
shouldn't measurably affect your users.
> Your first suggestion got me 90% of the way there. Can you help me
> understand how to get that last 10%?
> Thanks!
> Steve
> On May 14, 4:03 am, "Nick Johnson (Google)" <nick.john...@google.com>
> wrote:
>> Hi Steve,
>> The easiest way to do this would be as follows:
>> Continue using the Users API to sign in your normal users, but create
>> an authentication URL specifically for the adsense bot (eg,www.mysite.com/adsense_authenticate). Configure the handler for that
>> page to check the supplied username and password (or secret token, or
>> whatever system you wish to use), and if it is valid, issue a session
>> cookie to the bot. Then, on your normal pages, allow access if the
>> user is signed in with the Users API, or if they have a valid session
>> cookie.
>> You can also use HTTP authentication, in which case the crawler will
>> send an Authenticate: header with each request, and you can verify
>> their credentials each time, without having to store a cookie.
>> -Nick Johnson
>> On Wed, May 13, 2009 at 11:27 PM, Steve <unetright.thebas...@xoxy.net> wrote:
>> > Hi,
>> > My app uses the Google Accounts integration to provide login required
>> > home pages for my users. The text on their pages is the same, but
>> > they see their personal statistics. The AdSense ads on these pages
>> > are extremely poorly targeted. Adsense has a site authentication
>> > feature, but I can't see how to use it with the google accounts
>> > integration in GAE.
>> > How can provide a simple way for the adsense crawler to log in without
>> > losing the simplicity of google accounts for my user services?
> can't rely on getting a 403 from your normal URL). If you use basic
> auth, though, it'll send its username and password along in the clear
> for subsequent requests, so you can trivially check if it's the
> adsense bot with the credentials you gave it. Thus, using HTTP
> authentication will eliminate the need for you to use cookies (and
Great, I'll give it a try. Falling back to checking an authenticate
header is much simpler than what I had to do to shoehorn the cookie
in.
> Again, I'm not an expert on AdSense, so these sort of indepth
> questions would probably be better posed on the support forums for
> AdSense.
Agreed, but thankfully you are here and unlike the adsense forums we
do get answers.
> One other consideration is whether or not you need to care: Presumably
> the adsense bot will only have to authenticate very rarely, so this
> shouldn't measurably affect your users.
Unfortunately, that cookie timing bug I mentioned earlier means the
crawler has never been able to index pages like /home and /charts.
It's only indexed the things it tried later, like /chart/month/
200904. Which means that the places my users spend 99% of their time
(/home) get really poorly targetted ads based of the text of the
google login page.
So you're right, I don't need them crawled frequently, but I do need
them crawled successfully at lease once to get away from the email
hosting ad hell that those pages are trapped in now.
Thank you so much, you've been extremely helpful. And thanks to
Rodrigo Moraes as well who also answered my query.
> > can't rely on getting a 403 from your normal URL). If you use basic
> > auth, though, it'll send its username and password along in the clear
> > for subsequent requests, so you can trivially check if it's the
> > adsense bot with the credentials you gave it. Thus, using HTTP
> > authentication will eliminate the need for you to use cookies (and
> Great, I'll give it a try. Falling back to checking an authenticate
> header is much simpler than what I had to do to shoehorn the cookie
> in.
If you're using Django, you can try installing this Django app into
your project: