StatelessRequests per user agent?

49 views
Skip to first unread message

Alex Black

unread,
Aug 19, 2011, 11:24:26 AM8/19/11
to Lift
Google crawls our sites, a LOT :) I'm pretty sure a new session gets
generated for every request from google (or any bot/search engine).

Is there any way to mark requests as stateless by user agent or some
other arbitrary logic? Or could the existing statelessRequest rules be
extended to support something like this?

- Alex

David Pollak

unread,
Aug 19, 2011, 11:58:46 AM8/19/11
to lif...@googlegroups.com
Alex,

The way I address this issue on http://demo.liftweb.net is to expire sessions that meet a particular criteria.  See:
https://github.com/lift/examples/blob/master/combo/example/src/main/scala/net/liftweb/example/lib/SessionChecker.scala
and SessionInfoDumper in:
https://github.com/lift/examples/blob/master/combo/example/src/main/scala/bootstrap/liftweb/Boot.scala

It's possible to every 20 seconds to delete all sessions created by the Google user agent.

The Foursquare guys do some custom Stateless testing... I don't know how.

More broadly, the current Stateless testing could be enhanced in the following ways:
  • Having a stateless test that include the HTTPRequest so that the user agent can be extracted
  • Having a Loc-based Stateless test that allows for per-request statelessness testing
  • Having a trait that can be mixed into snippets that will provide behavior if the snippet is invoked from a stateless session (this will allow automagic separation of snippets on pages that can be either stateful or stateless)
  • Having a default behavior for CometActors invoked from a stateless page
If you or others have thoughts on the above, please chime in.  If you'd be so kind as to open a ticket for the above issues and reference this thread, I'd appreciate it.

Thanks,

David



--
You received this message because you are subscribed to the Google Groups "Lift" group.
To post to this group, send email to lif...@googlegroups.com.
To unsubscribe from this group, send email to liftweb+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/liftweb?hl=en.




--
Lift, the simply functional web framework http://liftweb.net

Alex Black

unread,
Aug 19, 2011, 12:30:30 PM8/19/11
to Lift
Thanks, my co-worker has actually already implemented your session
killing logic, and in our load tests it increases scalability a LOT
(when there are lots of requests each creating their own session). One
problem we hit was that the code calls rt.gc, forcing a garbage
collect, which we didn't notice, this happens every 10s (?) and caused
big delays (probably due to our large heaps) on our servers, took us a
couple days to notice and fix that.

I think there would still be value in marking requests stateless for
google, and either never keeping the session in the first place or
purging it immediately.

Your suggestion of including the request in the stateless test sounds
good. Would (or did) you consider not throwing exceptions when
something attempts to store state? The last thing I'd want is to
accidentally throw exceptions as google crawls the site. I see the
value in being strict, perhaps there are three modes: Stateless,
TransientStateful, and Stateful. In TransientStateful you allow state
to be set, but you discard it.

On Aug 19, 11:58 am, David Pollak <feeder.of.the.be...@gmail.com>
wrote:
> Alex,
>
> The way I address this issue onhttp://demo.liftweb.netis to expire
> sessions that meet a particular criteria.  See:https://github.com/lift/examples/blob/master/combo/example/src/main/s...
> and SessionInfoDumper in:https://github.com/lift/examples/blob/master/combo/example/src/main/s...
>
> It's possible to every 20 seconds to delete all sessions created by the
> Google user agent.
>
> The Foursquare guys do some custom Stateless testing... I don't know how.
>
> More broadly, the current Stateless testing could be enhanced in the
> following ways:
>
>    - Having a stateless test that include the HTTPRequest so that the user
>    agent can be extracted
>    - Having a Loc-based Stateless test that allows for per-request
>    statelessness testing
>    - Having a trait that can be mixed into snippets that will provide
>    behavior if the snippet is invoked from a stateless session (this will allow
>    automagic separation of snippets on pages that can be either stateful or
>    stateless)
>    - Having a default behavior for CometActors invoked from a stateless page
>
> If you or others have thoughts on the above, please chime in.  If you'd be
> so kind as to open a ticket for the above issues and reference this thread,
> I'd appreciate it.
>
> Thanks,
>
> David
>
>
>
>
>
>
>
>
>
> On Fri, Aug 19, 2011 at 8:24 AM, Alex Black <a...@alexblack.ca> wrote:
> > Google crawls our sites, a LOT :)  I'm pretty sure a new session gets
> > generated for every request from google (or any bot/search engine).
>
> > Is there any way to mark requests as stateless by user agent or some
> > other arbitrary logic? Or could the existing statelessRequest rules be
> > extended to support something like this?
>
> > - Alex
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Lift" group.
> > To post to this group, send email to lif...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > liftweb+u...@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/liftweb?hl=en.
>
> --
> Lift, the simply functional web frameworkhttp://liftweb.net
> Simply Lifthttp://simply.liftweb.net

David Pollak

unread,
Aug 19, 2011, 12:38:12 PM8/19/11
to lif...@googlegroups.com
On Fri, Aug 19, 2011 at 9:30 AM, Alex Black <al...@alexblack.ca> wrote:
Thanks, my co-worker has actually already implemented your session
killing logic, and in our load tests it increases scalability a LOT
(when there are lots of requests each creating their own session). One
problem we hit was that the code calls rt.gc, forcing a garbage
collect, which we didn't notice, this happens every 10s (?) and caused
big delays (probably due to our large heaps) on our servers, took us a
couple days to notice and fix that.

I think there would still be value in marking requests stateless for
google, and either never keeping the session in the first place or
purging it immediately.

Your suggestion of including the request in the stateless test sounds
good.  Would (or did) you consider not throwing exceptions when
something attempts to store state?

That's a possibility... one that I'm not keen on. ;-)

I was thinking if there are snippets that touch state and they can be marked with a trait as "return a default value if the snippet is accessed in stateless mode", then you can just mark all the snippets that access state with that marker and there'll be no exceptions thrown.
 
 The last thing I'd want is to
accidentally throw exceptions as google crawls the site.  I see the
value in being strict, perhaps there are three modes: Stateless,
TransientStateful, and Stateful.  In TransientStateful you allow state
to be set, but you discard it.

The concern that I have is that if stuff is silently thrown away, it will lead to really suboptimal experiences... a user doesn't understand why a button doesn't work, etc.  I strongly prefer that the developer know about the issue and makes a decision about it rather than having the issue swept into "silently discarded land."

I suspect that I'm in the minority here. ;-)
 



--
Lift, the simply functional web framework http://liftweb.net

Alex Black

unread,
Aug 19, 2011, 2:42:12 PM8/19/11
to Lift
I'm with you on requiring the developer to make an explicit decision.

Marking the snippets sounds good. I think the other suggestion can be
done explicitly too, e.g. instead of the StatelessTest returning a
true or false, it could return a tristate, indicating allowing for the
new state of stateless that silently drops changes to the session
(rather than throwing an exception).

On Aug 19, 12:38 pm, David Pollak <feeder.of.the.be...@gmail.com>
wrote:
> > > The way I address this issue onhttp://demo.liftweb.netisto expire

Alex Black

unread,
Aug 22, 2011, 10:37:33 AM8/22/11
to Lift
So at the moment the code to expire sessions more aggressively is not
working for us, it seems that without a call to System.GC it cannot
reliably determine if its running low on memory, and we can't have it
calling System.GC every 10s, that just kills performance.

We'll look at that code more closely and see if we can make it work
without the call to System.GC.

Should I file a ticket for functionality to support being able to mark
requests as stateless by user agent, so we could avoid storing
sessions for crawlers for example?

- Alex

David Pollak

unread,
Aug 22, 2011, 12:18:57 PM8/22/11
to lif...@googlegroups.com
On Mon, Aug 22, 2011 at 7:37 AM, Alex Black <al...@alexblack.ca> wrote:
So at the moment the code to expire sessions more aggressively is not
working for us, it seems that without a call to System.GC it cannot
reliably determine if its running low on memory, and we can't have it
calling System.GC every 10s, that just kills performance.

It's not necessary to call System.GC every 10 seconds.  That's only used for measuring how aggressive to get with the session purging.  You can set something up to automatically purge all sessions with a given user agent every 10 seconds or 90 seconds or whatever.
 

We'll look at that code more closely and see if we can make it work
without the call to System.GC.

Should I file a ticket for functionality to support being able to mark
requests as stateless by user agent, so we could avoid storing
sessions for crawlers for example?

Yes... please open a ticket that references this thread.  There are a number of features I'm planning to implement that I outlined in my first reply on this thread that should help out a lot.
 



--
Lift, the simply functional web framework http://liftweb.net

Alex Black

unread,
Aug 22, 2011, 3:10:11 PM8/22/11
to Lift
I created a ticket, I left a few fields blank since I wasn't sure what
to put.

https://www.assembla.com/spaces/liftweb/tickets/1094-stateless-requests-conditionally

On Aug 22, 12:18 pm, David Pollak <feeder.of.the.be...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages