Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Places] History expiration rewrite project. Looking for feedback.

31 views
Skip to first unread message

Marco Bonardo

unread,
Oct 16, 2009, 10:11:26 AM10/16/09
to
I'm starting reimplementing Places expiration as a separate component
that should mostly act asynchronously.

I'm starting this discussion to get feedback and ideas about current
implementation, issues, and ideas expressed here on in the wiki project
page.

Most of the progblems and ideas are already exposed and explained in the
project wiki page:
https://wiki.mozilla.org/Firefox/Projects/Places_async_expiration
So please, start by reading that page since here i could forget about
some thought i put there.

The basic problems we are trying to solve are:
- expiration should completely be async (no UI locking)
- it should not have a consistent weight on shutdown
- should live separately from history and Sync(DBFlush) services
- should not rely too much on idle (both for battery problems, and
because browsing habits on mobile devices could not leave much space to
idle)

The idea is to just run small expiration steps in background and do a
single larger step on idle (just one). Ideally when we hit shutdown most
should have be done, but we can't warranty that.
Step size should autoadapt to the situation, that means that if history
is quite dirty, then expiration chunks will be larger, if it's clean
they will be smaller.

At the same time we should provide a small notifications cleanup, since
actually we provide confusing (and not always complete) notifications.
onDeleteURI should always be notified unless we are clearing history, in
such a case we will only provide onClearHistory (it is clear that all
history pages have been removed then).
onPageExpired is actually a confusing hybrid notification, i'd replace
it with onDeleteVisits(aURI), just to notify we have removed visits but
not the page (otherwise we would just notify onDeleteURI).


A separate discussion should be done for current lazy expiration
algorythm. We currently have 2 limits: a lazy limit (90 days), an hard
limit (180 days) and a pages limit.
If a page falls in the lazy limit it won't be removed, if the page falls
out of hard limit it will always be removed, if it's in the middle it
will be removed only if we are over the pages limit and starting from
the oldest page.
Even if this did not show "problems" so far, it has a couple disadvantages:
- Needs more queries and some of them are slower (multiple conditions)
- it's hard to understand for users

This is not carved in stone, but since we have Places Stats data (please
provide your data if you have not yet done
https://places-stats.mozilla.com/) we can probably evaluate better.

Actually avg number of pages is 230000, our page limit is 40000, users
that usually increase the number of days also increase pages limit. So
we hardly hit the limit and for most of the users we retain 180 days.
We proved that we can handle quite well 180 days of history (UI
performances are getting better in 3.6 and 3.7, and lot of devs have
history setup to 1 year).
So i'm thinking if we should go back to a simpler approach, more
understandable by the user, 2 possibilities:

1. Retain pages for AT LAST X days.
This was FX2 behavior, quite clear, discard any page older than that.
We put the pages limit to avoid overhead, but actually users increasing
day limits are also increasing pages limits... we could maybe just put a
large non-tweakable limit "just-to-be-safe" like 150/200 000 pages, we
can measure that and evaluate a real maximum value over which our
performances are awful.

2. Retain a maximum of X pages for AT LAST Y days.
This is a clear 2 limits behavior, discard pages that don't satisfy both
conditions.

Both of these would require less and simpler (faster) queries, giving
back a cleaner preference to the user.
I'm not too much concerned about history explosion for advanced users,
they tend to put really large limits to prefs, so our safe limits are
just an annoyance for them, while casual users won't ever hit the limit.

Feedback on these thoughts is more than appreciated, please comment!

Marco

Robert Kaiser

unread,
Oct 16, 2009, 10:23:08 AM10/16/09
to
Marco Bonardo wrote:
> 1. Retain pages for AT LAST X days.
> This was FX2 behavior, quite clear, discard any page older than that.

I think if we are reasonably sure that non-dev users will not run up to
a number of entries that is unbearable, we should go for such a simple
one-entry time-based expiration again. We did that for Mork history all
the way and it went fine as far as we can tell (the expiration itself,
we know all the other problems that implementation had, of course).

Robert Kaiser

Shawn Wilsher

unread,
Oct 16, 2009, 1:29:02 PM10/16/09
to dev-apps...@lists.mozilla.org
On 10/16/09 7:23 AM, Robert Kaiser wrote:
> I think if we are reasonably sure that non-dev users will not run up to
> a number of entries that is unbearable, we should go for such a simple
> one-entry time-based expiration again. We did that for Mork history all
> the way and it went fine as far as we can tell (the expiration itself,
> we know all the other problems that implementation had, of course).
Places should be able to scale to any reasonable value I'd suspect, and
if it can't, bugs should be filed.

/sdwilsh

Mike Connor

unread,
Oct 17, 2009, 5:40:35 PM10/17/09
to dev-apps...@lists.mozilla.org


First, a little history:

* From a user perspective, (we'll ignore privacy for the moment, but
I'll come back in a moment) uncapped history is the ideal. This is
increasingly part of how people use email (i.e. gmail's "never delete
email" concept). I may want to selectively prune pieces, but there
isn't a clear use case where users want to deterministically expire
old data. Except, of course, for performance reasons.
** Privacy, when we're dealing in months of history, is a red herring
that keeps coming up. If I have your entire visit history for the
last six months, expiring old visits is not an effective privacy
protection.

* Our original goal was to make the pref 365, and later 180 days,
based on the old system. (after we concluded that never expire was
not a feasible starting point.) For various reasons, that wasn't
ideal, so we ended up with the "at least N days" concept, with limit/
day counts to provide safety valves for performance reasons. The
ideal is that we really wanted more history, but we wanted to make
sure even the heaviest users could guarantee some sort of minimum time-
before-uselessness.

Now, details:

Option 1:

Destroying data has a cost (I can't find the thing I looked up in
March) and a benefit (freed up disk/memory). I'd rather not have a
max-age cap, unless it's something really really high. I think we
should only pay the cost when the benefit is real enough to matter.

Option 2:

Other than the (hidden) max age preference, this is basically what we
have now, correct? I strongly prefer this option, of the two, but it
essentially means that, short of major _increases_ in the value of
this pref, min-age is mostly an ineffective knob. It is now, as you
note.


Ultimately, I don't like either of these models, except for
familiarity. Thus, I want us to go in another direction...

A Modest Proposal:

Key Rationale:

* At this point, we can determine where most/all of our pain points
are, and we can make evidence-based decisions on acceptable resource
usage on various platforms.
* The only clear reasons for deleting months-old history is the
tradeoff between usefulness and performance.
* Expiration of old data has two requirements: it has to happen fast
enough to keep pace with influx once we get to saturation, but this
needs to be on an equilibrium basis, not a hard line.

Plan:

* Obsolete _all_ time-based prefs for history.
* Caps will be driven solely on appropriate resource-based values.
Memory-constrained mobile devices get small caps, big fat workstations
get dramatically bigger caps. We do stuff like this for bfcache,
memory cache limits on sqlite DBs, and the like already. So rather
than one-size-fits-all, let's use what we know now, and only delete
data when we have to.
* If we're over the cap, just keep expiring incrementally until we're
not. Shutdown happens, and failing to get caught up shouldn't be a
cause for concern.
* Have a bailout switch at 125% of the max-size, which triggers
heavily-aggressive expiration to get us back under the cap if passive
expiration fails to keep pace.

UI is hard with this, so maybe we hide the knobs in about:config if
people really want to cap this more aggressively than necessary for
their hardware?

This basically means "keep stuff as long as we can" which shouldn't
hurt us if we set the right values with the dynamic caps.

So, thoughts?

-- Mike


Shawn Wilsher

unread,
Oct 18, 2009, 4:26:37 AM10/18/09
to dev-apps...@lists.mozilla.org
On 10/17/09 2:40 PM, Mike Connor wrote:
> So, thoughts?
+1

/sdwilsh

Marco Bonardo

unread,
Oct 18, 2009, 6:28:23 AM10/18/09
to
Il 17/10/2009 23.40, Mike Connor ha scritto:
> UI is hard with this, so maybe we hide the knobs in about:config if
> people really want to cap this more aggressively than necessary for
> their hardware?

yeah i was mostly concerned by those privacy concerned users, who want
to avoid having more than 2 or 3 days of history.
So for those i think makes sense to retain 1 (hidden?) pref to expire on
max days.

I'll post more thoughts on the proposal after a bit more of thinking.

Marco

Nickolay Ponomarev

unread,
Oct 18, 2009, 6:28:22 AM10/18/09
to dev-apps...@lists.mozilla.org
On Sun, Oct 18, 2009 at 1:40 AM, Mike Connor <mco...@mozilla.com> wrote:

> * Caps will be driven solely on appropriate resource-based values.
> Memory-constrained mobile devices get small caps, big fat workstations get
> dramatically bigger caps. We do stuff like this for bfcache, memory cache
> limits on sqlite DBs, and the like already. So rather than
> one-size-fits-all, let's use what we know now, and only delete data when we
> have to.
>

There's a difference between cache sizes and the cap on persistent data like
history though, keep it in mind so that, for example, synchronizing Fennec
with desktop doesn't expire desktop history.

Nickolay

David McRitchie

unread,
Oct 18, 2009, 10:17:49 AM10/18/09
to
"Marco Bonardo"
> Feedback on these thoughts is more than appreciated, please comment!

I think add to topic "Things to expire" within
https://wiki.mozilla.org/Firefox/Projects/Places_async_expiration
* Searches older than nn days (like 7 days or less, for myself)
* 404 Not Found

(referenced in topic "Deleting history items")
http://kb.mozillazine.org/Viewing_the_browsing_history_-_Firefox#Deleting_history_items

Mike Connor

unread,
Oct 18, 2009, 9:03:47 PM10/18/09
to Marco Bonardo, dev-apps...@lists.mozilla.org

That's a weird use-case. We've repeatedly said we're not trying to
provide "expire sites I stop going to" as a meaningful use-case for
privacy, so I'm not really concerned with this. Manually setting a
visit cap would do the trick as well, without complicating queries or
codepaths, really.

-- Mike

0 new messages