API proposal for a persistent storage for net/http/cookiejar

534 views
Skip to first unread message

Volker Dobler

unread,
Jun 25, 2013, 8:19:47 AM6/25/13
to golan...@googlegroups.com
Currently package cookiejar has no way to persist cookies,
Jar is in-memory only.  This was a deliberate decision
because all proposed solution of persistent storage for a
cookiejar had some drawbacks.  Please see Nigel's excellent
writeup [1] for details and the open questions.

The following will use "disk" as a synonym for any kind
of persistent storage.

I did a (highly unscientific) experiment to see what kind
of actions happen on a cookie jar how often.  It seems as
if "normal web browsing" (some work in different web
applications, some information retrieval, procrastinating)
generates much more updates to the LastAccess field than
actual cookie mutations.  Also mutations of a cookie (creation
and updates of its values) dominate, deletions are very
rare.  Thus a distribution like 100 : 10 : 1 for 
(update LastAccess) : (create or modify cookie) : (delete cookie)
seems reasonable.  Other use cases, e.g. interaction with
a handful of web services might produce different relations
but I think deletions will still be rare.

One of the main issues with storing cookies to disk is
reporting errors:  Cookie handling happens opaque in package
http, saving might happen e.g. during following redirects
and it is unclear how to report or handle a "disk full"
error here.  Any scheme where the Jar itself writes to disk
will face these types of problems.  I'd thus like to propose
a different way in which the user of Jar is responsible for
initiating the dump to disk.  While a bit more complicated
for the user of Jar than the previously proposed solutions
(see references in [1]) this is very flexible and doesn't
suffer from the missing way to report write errors.

The user of Jar may save to disk anytime she wants, e.g.:
 - After each http request.
 - Periodically, maybe every second.
 - Only when the disk or overall system is idle.
 - ...
Errors are reported to the user and she might retry to
save the unsaved data or handle the error in any appropriate
other way.

In code this is realized by enumerating all the modifications
to Jar: Each cookie modification (create, change, update
LastAcces and delete) gets a serial number which acts as a
kind of time stamp.  The user may use these time stamps
in the form of type Marker to specify which changes to
a Jar should be dumped to disk.  This way of persisting
cookies should work pretty well as the number of deleted
cookies seems fairly low.
Only drawback/pitfall: Saving twice with the same marker
but to two different persistent storages might not delete
cookies in the second write.

Two more changes:
First, the LastAccess field is used for limiting the number of
cookies in a Jar (by deleting the least used ones).  I suggest to
expose this feature, again completely user controlled. 
Second, handling of session cookies: RFC 6265 section 5.3 requires
that the user agent MUST remove non-persistent cookies at session 
end but does not define what a session end is. I'd like to propose
that Jar gets an additional method to end a session and delete
all the non-persistent cookies while allowing the Save method to
store session cookies to disk.  This would allow to continue
a broken session. 

Any comments welcome.

V.


API proposal:

// Marker represents a certain time in the lifetime of a Jar.
type Marker uint64

// Save persists all accumulated changes (new, modified and deleted
// cookies) done to jar since from to storage.  A zero value for from
// means to save all cookies.
// PersistentOnly controls handling of non-persistent/session cookies.
// Next indicates the unsaved portion of changes. If err is non nil
// than none or just some of the accumulated changes have been saved.
func (jar *Jar) Save(from Marker, storage *Storage, persistentOnly bool) (next Marker, err error)

// Load merges the content from storage to jar's current content.
// The jar might drop cookies with domains which are not allowed 
// according to its public suffix list.  Also expired cookies won't
// be loaded.
// The returned next marker is the savepoint for upcoming changes.
// TODO: Explanation of next is incomprehensible.
func (jar *Jar) Load(storage *Storage) (next Marker, err error)

// CookieData is used to transfer cookies between a Jar and a Storage.
type CookieData struct {
Key  string // Key is the ID of his cookie in the form "Domain;Name;Path".
Data string // Data is the opaque payload data of the cookie.
}

// Storage is a persistent storage for cookies.
type Storage interface {
// Save writes cookies to the persistent storage.
// A nil Data in a cookie indicates to delete the cookie identified by Key.
// Save returns the number of successfully written or deleted cookies
// which might be less than len(cookies) in which case err contains
// the reason.
Save(cookies []CookieData) (nWritten int, err error)
// ReadAll calls callback for each persisted cookie.
ReadAll(callback func(cookie CookieData)) error
}

// Limit deletes the least used cookies from jar until jar
// contains no more than n cookies.  The number of deleted cookies
// is returned.
// Calling Limit(0) is the most expensive way to empty a jar.
func (jar *Jar) Limit(n int) int

// EndSession deletes all non-persistent (session) cookies from jar.
func (jar *Jar) EndSession()




Volker Dobler

unread,
Jun 26, 2013, 4:05:02 AM6/26/13
to golan...@googlegroups.com
Sleeping one more night over this problem I now think that exposing
the timestamp marker to the user is useless, complicated and error
prone.

Thus a second version of the API: The complicated Marker stuff
is gone, it is now Save and SaveAll with one additional method
in the Storage interface: Clear.

Sorry for the noise.


A simple application might work like:

jar.Load(storage)
// some http requests here
jar.SaveAll(storage, true)
// done


While a complicated, long-running application with the need to
resume work at the least point might look like.

jar.Load(storage)
for {
    // a http request here
    jar.Save(storage, false)
}
// done


A browser like application which wants to provide session
restart functionality might look like:

jar.Load(storage)
ticker := time.NewTicker(1 * time.Second)
go func() {
    for {
        <-ticker.C
        jar.Save(storage, false) // include Session cookies
    }
} ()
for browserOpen {
    // do web browsing
}
ticker.Stop()
jar.SaveAll(storage, true) // only persistent cookies
// done



API proposal version 2:

// Save persists all accumulated changes (new, modified and deleted
// cookies) done to jar since the last call to Save (or since the creation of jar)
// to storage.
// PersistentOnly controls handling of non-persistent/session cookies.
func (jar *Jar) Save(storage *Storage, persistentOnly bool) error

// SaveAll persists the current content of jar to storage.
// PersistentOnly controls handling of non-persistent/session cookies.
func (jar *Jar) SaveAll(storage *Storage, persistentOnly bool) error

// Load merges the content from storage to jar's current content.
// The jar might drop cookies with domains which are not allowed 
// according to its public suffix list.  Also expired cookies won't
// be loaded.
func (jar *Jar) Load(storage *Storage) error

// CookieData is used to transfer cookies between a Jar and a Storage.
type CookieData struct {
Key  string // Key is the ID of his cookie in the form "Domain;Name;Path".
Data string // Data is the opaque payload data of the cookie.
}

// Storage is a persistent storage for cookies.
type Storage interface {
// Save writes cookies to the persistent storage.
// A nil Data in a cookie indicates to delete the cookie identified by Key.
// Save returns the number of successfully written or deleted cookies
// which might be less than len(cookies) in which case err contains
// the reason.
Save(cookies []CookieData) (nWritten int, err error)
// ReadAll calls callback for each persisted cookie.  If callback
// returns a non nil error the process stops.
ReadAll(callback func(cookie CookieData) error) error

// Clear removes all stored cookies from the storage.
Clear() error

Nigel Tao

unread,
Jun 27, 2013, 7:36:35 AM6/27/13
to Volker Dobler, golang-dev
On Wed, Jun 26, 2013 at 6:05 PM, Volker Dobler
<dr.volke...@gmail.com> wrote:
> API proposal version 2:

So, a key design decision is that a Jar still has to fit all of its
Cookies in memory? In other words, a Jar with N cookies will still
require O(N) memory? A Storage provides a persistence mechanism but a
Storage is not consulted per se inside the Jar.Cookies and
Jar.SetCookies implementations?

Also, LastAccess is part of the opaque Data field, so updating a
cookie's LastAccess is no cheaper that changing its Value?

If a Jar controls what CookieData values it passes on to the Storage,
and can choose to not pass on non-persistent cookies, why does it need
an explicit EndSession method?

Volker Dobler

unread,
Jun 27, 2013, 8:43:34 AM6/27/13
to golan...@googlegroups.com, Volker Dobler
Am Donnerstag, 27. Juni 2013 13:36:35 UTC+2 schrieb Nigel Tao:
On Wed, Jun 26, 2013 at 6:05 PM, Volker Dobler
<dr.volke...@gmail.com> wrote:
> API proposal version 2:

So, a key design decision is that a Jar still has to fit all of its
Cookies in memory? In other words, a Jar with N cookies will still
require O(N) memory? A Storage provides a persistence mechanism but a
Storage is not consulted per se inside the Jar.Cookies and
Jar.SetCookies implementations?

Yes exactly. Chromium and FF e.g. handle cookies the same.
(At least last time I checked.)

 
Also, LastAccess is part of the opaque Data field, so updating a
cookie's LastAccess is no cheaper that changing its Value?

Yes. This basically prevents a user to Save after each and
every request if she has to do lots of req/s and has a slow
storage. 

There are two ways to solve this:
 - Don't Save on every call, but e.g. only every n request or periodically
   every m second.
   This can be done with the proposed API, the jar will act as a buffer
   and collect several updates to a cookie's LastAccess field before
   dumping the latest value.
 - Add one more method to Jar:
       // SaveModifications stores the cookies in jar which are
       // new, deleted or changed since the last call to Save or
       // Save Modifications to s. A cookie is considered "changed"
       // if it has a new value, expiration time or different Secure or
       // HttpOnly flag (LastAccess is not considered.)
       // TODO: Could be merge into Save and an extra flag.
       func (jar *Jar) SaveModifications(s *storage) error
   On "normal" web-browsing this SaveModifications could
   possible be called after each request, because most of the times
   it will be a noop.
 
If a Jar controls what CookieData values it passes on to the Storage,
and can choose to not pass on non-persistent cookies, why does it need
an explicit EndSession method?

Right, EndSession is redundant and should be dropped. 


I thought about three different use cases:
 A) The trivial: Log into some site, retrieve some information, done.
 B) Kind of web application: Talk to a handful of different applications
       through stateful http.
 C) A Go based browser, 50 tabs open, four different inkognito
      windows, heavy browsing, open for weeks.

For A neither memory consumption nor the LastAccess is a problem.

For B the memory consumption seems negligible too. LastAccess
might impose additional work, e.g. call Save only after reception of a
Set-Cookie header and not generally. Maybe code some BufferdStorage
which buffers in memory before dumping to disk (and report errors
in an out of band fashion). Whatever is the proper solution for the
actual requirements.

For C LastAccess "noise" prohibits Saving after each request, but
how dramatic are some lost cookies? Saving the handful jars every
5 to 10 seconds (and on browser close) should be perfectly fine.
Memory: RFC 6265 requires at least 3000 cookies. Lets assume 
5 jars (one normal browsing + 4 incognito windows) with 10k
cookies each at 5k memory consumption each (4k req. by RFC 6265 plus
internal overhead): 5 * 10k * 5k = 250M. This is not trivial but I do not
think that this will prevent a Go based browser as this really can
be considered a upper limit: Keeping 3000 cookies in one jar at a
realistic 2k bytes is just 6M or 5 highres photos.

V.

Nigel Tao

unread,
Jun 28, 2013, 4:33:32 AM6/28/13
to Volker Dobler, golang-dev
On Thu, Jun 27, 2013 at 10:43 PM, Volker Dobler
<dr.volke...@gmail.com> wrote:
> Yes exactly. Chromium and FF e.g. handle cookies the same.
> (At least last time I checked.)

Huh, I'm surprised by that. I would have thought that e.g. with an
sqlite backend, they'd have some sort of in-memory MRU cache but
otherwise keep most state on disk. Let me check...


> For C LastAccess "noise" prohibits Saving after each request, but
> how dramatic are some lost cookies? Saving the handful jars every
> 5 to 10 seconds (and on browser close) should be perfectly fine.
> Memory: RFC 6265 requires at least 3000 cookies. Lets assume
> 5 jars (one normal browsing + 4 incognito windows) with 10k
> cookies each at 5k memory consumption each (4k req. by RFC 6265 plus
> internal overhead): 5 * 10k * 5k = 250M. This is not trivial but I do not
> think that this will prevent a Go based browser as this really can
> be considered a upper limit: Keeping 3000 cookies in one jar at a
> realistic 2k bytes is just 6M or 5 highres photos.

Doing both I/O every 5-10 seconds and keeping 5-10M pinned in memory
isn't trivially cheap on e.g. mobile. There may be a better way. Give
me some more time to think about it.


BTW if you want to write some code, you are free to write some code.
As we discovered in the previous cookiejar design discussions, there
may not be one 'perfect API' that has no trade-offs. It may actually
be worth trying (and sharing on github etc.) multiple API designs, and
running production traffic through them, before trying to bless one
for the standard library.

Volker Dobler

unread,
Jun 28, 2013, 5:58:18 AM6/28/13
to Nigel Tao, golang-dev
On Fri, Jun 28, 2013 at 10:33 AM, Nigel Tao <nige...@golang.org> wrote:
On Thu, Jun 27, 2013 at 10:43 PM, Volker Dobler
<dr.volke...@gmail.com> wrote:
> Yes exactly. Chromium and FF e.g. handle cookies the same.
> (At least last time I checked.)

Huh, I'm surprised by that. I would have thought that e.g. with an
sqlite backend, they'd have some sort of in-memory MRU cache but
otherwise keep most state on disk. Let me check...

 
> For C LastAccess "noise" prohibits Saving after each request, but
> how dramatic are some lost cookies? Saving the handful jars every
> 5 to 10 seconds (and on browser close) should be perfectly fine.
> Memory: RFC 6265 requires at least 3000 cookies. Lets assume
> 5 jars (one normal browsing + 4 incognito windows) with 10k
> cookies each at 5k memory consumption each (4k req. by RFC 6265 plus
> internal overhead): 5 * 10k * 5k = 250M. This is not trivial but I do not
> think that this will prevent a Go based browser as this really can
> be considered a upper limit: Keeping 3000 cookies in one jar at a
> realistic 2k bytes is just 6M or 5 highres photos.

Doing both I/O every 5-10 seconds and keeping 5-10M pinned in memory
isn't trivially cheap on e.g. mobile. There may be a better way. Give
me some more time to think about it.

Oh, mobile. Sorry, I never think about such stuff as I'm still happy
with an old Nokia...

For mobile the SaveModifications (which ignores LastAccess)
could be an option: It should result in a noop very often.
I'll do some proper statistics on cookie sizes in the next days.
Additionally: The entry struct can be compacted a lot which
should help especially for the tiny cookies (reduce overhead compared
to cookie value.

BTW if you want to write some code, you are free to write some code.
As we discovered in the previous cookiejar design discussions, there
may not be one 'perfect API' that has no trade-offs. It may actually
be worth trying (and sharing on github etc.) multiple API designs, and
running production traffic through them, before trying to bless one
for the standard library.
I'll give it a try once I find some time :-)

V.

Volker Dobler

unread,
Jul 1, 2013, 8:32:39 AM7/1/13
to Nigel Tao, golang-dev
Some statistics on cookies.

Experimental setup: Surf the web, local intranet, web-applications,
do shopping (load carts), mail, etc. with 99 open tabs in Chrome.
Export all cookies and measure their data length as len(Value)
and len(Domain+Name+Path+Value).

Histograms here: http://imgur.com/a/vsHck

Result: 1349 cookies (both session and persistent cookies)
with an average length of ~50 bytes (Value only) and 75 bytes
(all user controlled data).

Only the following sites store cookies with len(Value)>500
A really large surfing session produces 100 kByte of 
cookie data.

My conclusion: While required to accept 3000 cookies
at 4k by RFC 6265 this does not reflect average case
storage requirements.

V.

Volker Dobler

unread,
Jul 30, 2013, 2:24:17 AM7/30/13
to Nigel Tao, golang-dev
Gentle ping.

Any suggestions or further insights on persisting cookies?

V.
--
Dr. Volker Dobler

Nigel Tao

unread,
Jul 30, 2013, 3:17:17 AM7/30/13
to Volker Dobler, golang-dev
On Tue, Jul 30, 2013 at 4:24 PM, Volker Dobler
<dr.volke...@gmail.com> wrote:
> Any suggestions or further insights on persisting cookies?

No, I haven't thought much about it. Like I said, I think the best
course is to experiment outside of the standard package library for
now.
Reply all
Reply to author
Forward
0 new messages