There's another aspect that I'd like to discuss, mostly because I have
direct expertise in the subject: how to better use the /cache
partition. This isn't an entirely new idea: it's been in the back of
my mind for a while, and others have suggested doing something more
intelligent with that space (credit goes to "Disconnect" here, who's
been picking my brain about it on IRC).
A bit of background:
Android on the G1 is designed to run around 3 "large" partitions:
-/system, read-only, contains the system image. That's the only one of
the "big 3" to be preserved through a factory reset. 67.5 MiB on the
G1. Can't do anything about that one: if we store any more or any less
on it, factory reset doesn't work, and it's pretty much full so
shrinking it isn't really possible and wouldn't buy us much.
-/data, shared between applications, the one in which users tend to
run out of space. 74.75MiB on the G1.
-/cache, private partition reserved for the download manager, which is
used to temporarily store DRM files downloaded by the Browser as well
as applications downloaded by Market, to cache attachment previews in
Gmail, and (most importantly) as storage for OTA upgrades while
they're being downloaded. 67.5MiB on the G1.
While what happens on /system and /data is beyond my first-hand
expertise, I'm very familiar with what happens on /cache, having
written the download manager.
The /cache partition sits mostly empty most of the time. There's no
requirement for it to actually be empty at any time, the only hard
requirement in that domain is that the download manager must be able
to free up enough space to download a system image that could be about
as large as /cache itself.
I'd like to find ways to make /cache more available to other applications.
In a first phase, I'd like to investigate ways for other system
applications to cooperate with the download manager to store their
cached data in /cache. This would probably be useful for the browser,
market, maps, gmail, and email.
Restricting it to system apps for now has the advantage that those
apps can usually be trusted to not be malicious (though they might
still be buggy, and the might be running libraries that can be users
as attack vectors, e.g. WebKit). It also has the advantage that it
helps 100% of the users of a given device. That being said, I still
prefer to design for a future where all apps can access it, i.e. I'd
rather avoid design decisions that would knowingly not scale to
3rd-party apps.
My best guess would be to allow applications to create entries in the
download manager's database that don't trigger actual HTTP downloads,
but rather end up creating the appropriate purgeable files in /cache
(which can then be accessed back through the regular download manager
API).
I'm facing the following challenges, and I'd like some design
contributions on these specific aspects:
-I'd prefer to not let any app other than the download manager (and
the system updater) have writable file descriptors on /cache, to
prevent situations where the download manager has to make space but an
application is growing a file by writing to it. My preferred approach
would be for the download manager to read a content:// URI provided by
the app, but that raises a pair of security issues: the app won't want
to give access to that content:// URI to all other apps, and the
download manager must be sure that the content:// URI in question does
belong to the app (otherwise a malicious app could access another
app's data).
-I'd like to design enough safeguards in the system to prevent
thrashing: the amount of space available to cache data will vary over
time by more than an order of magnitude, and I definitely want to
avoid situations where multiple apps would collectively try to store
more data in the cache than is available (and where they would end up
pushing each other's data out of the cache, re-downloading it over the
network, which would push more data out of the cache, in a neverending
loop of network usage, energy consumption and flash wear).
-The internal structures of the download manager's provider don't
scale well, and definitely won't scale well enough to support hundreds
or even thousands of individual entries. The reason, primarily, is
that the download manager's provider keeps a copy of its database in
RAM, so that it can easily spot database changes when notified through
its own ContentObserver. This is pretty gross (and has the added cost
of making the code fragile and hard to maintain as the in-RAM copy is
also used by the active download threads). Does anyone have experience
writing a content provider that takes explicit action in its insert,
update and delete functions based of the provided ContentValues and
the state of an underlying database? (You can assume that the query
function was modified to return read-only cursors).
Disclaimers:
-I don't currently have much time to do active development in that
area. Even if those discussions reach conclusions, I can't promise
that I'll be able to move to an implementation phase any time soon.
I'll gladly accept code contributions (and I'll be looking forward to
working with contributors on the details of the design in order to
minimize the review churn and to maximize the chance of having those
changes approved).
-The code as it currently lives in the public repository, like a lot
of the public repository, is a fair bit out of date. We're working on
pushing a more recent version, in which the download manager has more
than half a dozen new features that aren't visible on the public git
repository, as well as some fairly heavy changes in the implementation
of the provider itself. So, as a warning, don't bother preparing code
patches against the current version, as those'd cause some merge
headaches down the line.
JBQ
Ummmmm...
Maybe I'm just being thick, but, to me, a content:// URI implies that
the content is already on the device. Why, then, would we be wanting to
cache it on the device, if it's already there?
There might be a few specialized cases where an application has a
content:// ContentProvider that on-the-fly fetches material via non-HTTP
protocols over "teh Intarwebs" (e.g., IMAP attachments), but that can't
possibly be the primary use case.
Shouldn't the download manager accept more URI formats than content://
(e.g., http://)?
And please forgive me if, indeed, I'm a moron.
--
Mark Murphy (a Commons Guy)
http://commonsware.com
Android Training on the Ranch! -- Mar 16-20, 2009
http://www.bignerdranch.com/schedule.shtml
I tend to see ContentProviders as not much more than an IPC channel,
addressed by URIs, which exchanges ContentValues, Cursors, and
ParcelFileDescriptors. It's true that ContentProviders are often used
for data that is shared "anonymously" (i.e. where one app makes data
available to many other apps) and that lives for a long time, but none
of that is fundamentally in the API.
The reason why I put a ContentProvider in here is that I think that it
is currently the easiest way to share ParcelFileDescriptors.
Here's the flow that I have in mind (let's take e.g. the example of an
IMAP client, though it might turn out to not be appropriate for that
use case):
-Email app declares a content provider in its manifest.
-Email app downloads a message over IMAP, stores meta-information
"locally" in whatever format it wants (including anything that would
be necessary for an overall display, and information to re-retrieve
the data from the server if necessary).
-Email app then calls the download manager's insert function, where
one of the columns (possibly the existing URI column) is a content://
URI recognized by the Email app.
-download manager calls openFileDescriptor on the content URI passed
by the Email app.
-when the download manager is done copying the data, it calls delete
on that content URI, and the Email app can discard the actual data
(since it can go get it back through the download manager).
Notice how that doesn't require the Email app to implement insert,
update or query in the content provider that it exposes to the
download manager, it just has to deal with openFile and delete (which
are much simpler, API-wise).
Admittedly, there'd be another option, and in fact I had talked about
it in a hallway discussion with some other Google employees a while
back: it'd be quite elegant for that very specific use to allow to
pass a ParcelFileDescriptor in ContentValues. At a general level, that
would make it easy for an app to insert large files into a content
descriptor, but I don't know whether that'd actually be useful enough
to justify putting it in the core API (since we still only have a
single use case in mind).
(BTW, right now the download manager does accept http and https, the
problem here being that it doesn't accept anything else).
JBQ
If a ContentProvider had a way of declaring itself to be returning an
immutable dataset, where any given content:// URL will only ever return
the same data, then it ought to be possible to cache all this invisibly.
The theoretical IMAP app would be a candidate for this; it would expose
such an immutable ContentProvider that would fetch messages by
Message-ID, which are globally unique. So, the first time the app asked
for content://com.fnord.imap/messagedata/123...@thingy.net, its
ContentProvider would be called to download the data, and then the
framework would cache it. The second time, the cached data would be
returned.
This has the advantage that from the app side the API change would be
tiny, and from the framework side it would be possible to produce a
conformant implementation that was simply a noop. It also means that
control of the cache is left in the hands of the ContentProvider
framework, which can do whatever it likes with it, including having
smart heuristics to figure out what data to expire or simply nuking the
lot if the space is needed for something else. The only way an app would
ever know is that its ContentProvider gets called more often.
This would allow caching of any expensive data, even if it didn't use
the network. content://com.cowlark.crypto/mersenneprime/47, for example...
(It might even be possible to cache mutable data this way, but making
sure that all the correct change-notification calls are made makes it a
much harder problem.)
--
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│
│ ⍎'⎕',∊N⍴⊂S←'←⎕←(3=T)⋎M⋏2=T←⊃+/(V⌽"⊂M),(V⊝"M),(V,⌽V)⌽"(V,V←1⎺1)⊝"⊂M)'
│ --- Conway's Game Of Life, in one line of APL
JBQ
Independently from 3rd-party access, I still think that allowing apps
to cache data on /cache is worth doing. It can be used for a good
number of existing system apps, so it benefits all users. What's more,
I've heard some interest in people investigating the browser, and
there are already several people looking at Email.
About the security issues: the binder allows to know the UID of the
caller of a given transaction through the content resolver, and there
might be APIs to query the UID that's behind a a content URI, so that
might do the trick to deal with security. The download manager does
something along those lines to check the UID of a broadcast receiver,
so I'm thinking along the same lines, but using plain obscurity sounds
like it should work too. I'll keep that in mind if we can't find a
strong UID validation.
Back to letting the app get back to its data, the download manager
exposes a content provider, so when the app inserts its entry in the
download manager it gets a content:// URI back that represents that
entry, and it can get back to it with that same content URI.
JBQ
Ah, but this way nothing *is* stored on /data --- the application's
ContentProvider downloads/computes the data and then just returns it.
Then the ContentProvider framework is at liberty to stash a copy in
/cache somewhere if it thinks it's worth cacheing.
Of course, this approach is only useful for data that can usefully be
stored in memory and returned as a ContentProvider row --- icons yes,
MP3 files no --- but I do think that it would be fairly easy to
implement as it's an isolated change in just one place, involves very
little API change for applications, and could be really useful.
(Yes, I do have an application that could benefit from this. If I could
flag one of my ContentProvider URLs as 'cache this, please', then I
could throw away quite a lot of code that manually caches stuff in /data.)
--
David Given
d...@cowlark.com
Furthermore, I think that some of the apps that cause a lot of storage
pressure on /data do it through actual files, not through meta-data
(browser, market, maps) *and* don't go through content providers at
all, and modifying that in the short term could be a fair bit of work.
JBQ
Here's the complete set of columns that the download manager currently
keeps for each download:
http://android.git.kernel.org/?p=platform/frameworks/base.git;a=blob;f=core/java/android/provider/Downloads.java;h=42e9d95a2ad741cca20bea80bae34aa9520d7e2f;hb=HEAD
Why did I look at expanding the download manager for this?
-Because it's my code, so it's much easier for me to do something about it.
-Because it's the part of the system that has near-exclusive ownership
of a huge chunk of storage on the device (at least on the G1)
-Because it already has mechanisms in place to use that storage as a
purgeable cache, so I might actually be able to find time to make it
happen in less than a year.
-Because it already exposes that storage through content providers,
with access controls to take care of security, and probably enough
columns to expose most of what some applications would would to
remember about their cached data.
Could there be other options?
Yes, another option would be to somewhat decouple the actual
management of cached files on /cache from the download manager, such
that the download manager would sit on top of the cache manager.
That's more work for now, and nothing would prevent from moving to
that model later (especially as long as the download manager remains a
system-only facility)
JBQ
Is it possible to give up OTA updates and have the cache at a more
reasonable size?
The asynchronous aspect is a good question. Ultimately, it's hard to
imagine that it's going to be entirely transparent: after all, copying
data on /cache will take time, and that means that there'll always be
a window between the point when an app requests to insert data in the
cache and the point when that data is available.
I think that there are really two ways to deal with it:
-slow synchronous insertion: when the app asks to cache data, the IPC
blocks until the data is actually copied. The advantage is that the
app can't even attempt to access cached data before it's fully copied,
the drawback is that the app can't insert from its UI thread, which
means that in fact it's not as synchronous as it seems.
-fast asynchronous insertion: when the app asks to cache the data, the
IPC returns very quickly, but the data isn't actually copied at that
point. When the data is copied, the app gets notified back about it
(most probably through a ContentObserver) and it can then access it.
I can't quite get myself to decide which of the two designs is best.
The former has fewer IPCs (but keeps some long-lived IPCs around),
while the latter has a better chance of allowing single-threaded apps.
Single-threaded apps seem like a net win to me, though like you've
correctly pointed out the interval of time between the two IPCs is a
delicate one for the app to handle. Blocking on get has its own
problems, because get is usually fast (even though it can never be
guaranteed to be, a simple app might normally assume that it's always
fast and get away with it); opening the possibility that get might be
very slow even while the system isn't under load would break that
unwritten assumption.
JBQ
I was thinking more an option to enable/disable. Even somethingsimple like a secret code similar to OTANOT, maybe OTANUL. I realise
this wouldn't sit well with the carrier overlords but it would be
something nice for those who aren't enslaved. I'll certainly be
hacking my Dev Phone to size the cache partition to something
reasonable.
JBQ
(PS: this isn't really the topic of this discussion).
--
Jean-Baptiste M. Queru
Android Engineer, Google.