Storage in Gecko

Gregory Szorc

未讀,

2013年4月26日下午2:17:242013/4/26

收件者：dev-platform

I'd like to start a discussion about the state of storage in Gecko.

Currently when you are writing a feature that needs to store data, you
have roughly 3 choices:

1) Preferences
2) SQLite
3) Manual file I/O

Preferences are arguably the easiest. However, they have a number of
setbacks:

a) Poor durability guarantees. See bugs 864537 and 849947 for real-life
issues. tl;dr writes get dropped!
b) Integers limited to 32 bit (JS dates overflow b/c milliseconds since
Unix epoch).
c) I/O is synchronous.
d) The whole method for saving them to disk is kind of weird.
e) The API is awkward. See Preferences.jsm for what I'd consider a
better API.
f) Doesn't scale for non-trivial data sets.
g) Clutters about:config (all preferences aren't config options).

We have SQLite. You want durability: it's your answer. However, it too
has setbacks:

a) It eats I/O operations for breakfast. Multiple threads. Lots of
overhead compared to prefs. (But hard to lose data.)
b) By default it's not configured for optimal performance (you need to
enable the WAL, muck around with other PRAGMA).
c) Poor schemas can lead to poor performance.
d) It's often overkill.
e) Storage API has many footguns (use Sqlite.jsm to protect yourself).
f) Lots of effort to do right. Auditing code for 3rd party extensions
using SQLite, many of them aren't doing it right.

And if one of those pre-built solutions doesn't offer what you need, you
can roll your own with file I/O. But that also has setbacks:

a) You need to roll your own. (How often do I flush? Do I use many small
files or fewer large files? Different considerations for mobile (slow
I/O) vs desktop?)
b) You need to roll your own. (Listing it twice because it's *really*
annoying, especially for casual developers that just want to implement
features - think add-on developers.)
c) Easy to do wrong (excessive flushing/fsyncing, too many I/O
operations, inefficient appends, poor choices for mobile, etc).
d) Wheel reinvention. Atomic operations/transactions. Data marshaling. etc.

I believe there is a massive gap between the
easy-but-not-ready-for-prime-time preferences and
the-massive-hammer-solving-the-problem-you-don't-have-and-introducing-many-new-ones
SQLite. Because this gap is full of unknowns, I'm arguing that
developers tend to avoid it and use one of the extremes instead. And,
the result is features that have poor durability and/or poor
performance. Not good. What's worse is many developers (including
myself) are ignorant of many of these pitfalls. Yes, we have code review
for core features. But code review isn't perfect and add-ons likely
aren't subjected to the same level of scrutiny. The end result is the
same: Firefox isn't as awesome as it could be.

I think there is an opportunity for Gecko to step in and provide a
storage subsystem that is easy to use, somewhere between preferences and
SQLite in terms of durability and performance, and "just works." I don't
think it matters how it is implemented under the hood. If this were to
be built on top of SQLite, I think that would be fine. But, please don't
make consumers worry about things like SQL, schema design, and PRAGMA
statements. So, maybe I'm advocating a generic key-value store. Maybe
something like DOM Storage? Maybe SQLite 4 (which is emphasizing
key-value storage and speed)? Just... something. Please.

Anyway, I just wanted to see if others have thought about this. Do
others feel it is a concern? If so, can we formulate a plan to address
it? Who would own this?

Gregory

Kyle Huey

未讀,

2013年4月26日下午2:26:112013/4/26

收件者：Gregory Szorc、dev-platform

Have you explored using IndexedDB?

- Kyle

Andreas Gal

未讀,

2013年4月26日下午2:36:092013/4/26

收件者：Gregory Szorc、dev-platform

Preferences are as the name implies intended for preferences. There is no sane use case for storing data in preferences. I would give any patch I come across doing that an automatic sr- for poor taste and general insanity.

SQLite is definitely not cheap, and we should look at more suitable backends for our storage needs, but done right off the main thread, its definitely the saner way to go than (1).

While (2) is a foot-gun, (3) is a guaranteed foot-nuke. While its easy to use sqlite wrong, its almost guaranteed that you get your own atomic storage file use wrong, across our N platforms.

Chrome is working on replacing sqlite with leveldb for indexeddb and most their storage needs. Last time we looked it wasn't ready for prime time. Maybe it is now. This might be the best option.

Andreas

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Gregory Szorc

未讀,

2013年4月26日下午2:43:582013/4/26

收件者：Kyle Huey、dev-platform

On 4/26/2013 11:26 AM, Kyle Huey wrote:

> Have you explored using IndexedDB?

Not seriously. The "this is an experimental technology" warning on MDN
is off-putting.

Is IndexedDB ready for use by internal Gecko consumers, including
add-ons? Can you use it from the main thread (for cases where a
dedicated worker is too heavy weight)? What are current risks associated
with this "experimental technology?"

Aside from those concerns, I reckon IndexedDB could fill the gap I
described. I want to learn more.

Gavin Sharp

未讀,

2013年4月26日下午2:50:252013/4/26

收件者：Andreas Gal、dev-platform、Gregory Szorc

On Fri, Apr 26, 2013 at 11:36 AM, Andreas Gal <g...@mozilla.com> wrote:
> Preferences are as the name implies intended for preferences. There is no sane use case for storing data in preferences. I would give any patch I come across doing that an automatic sr- for poor taste and general insanity.

As Greg suggests, that ship has kind of already sailed. In practice
preferences often ends up being the "best" choice for storing some
small amounts of data. Which is a sad state of affairs, to be sure -
so I'm glad we have this thread to explore alternatives.

> While (2) is a foot-gun, (3) is a guaranteed foot-nuke. While its easy to use sqlite wrong, its almost guaranteed that you get your own atomic storage file use wrong, across our N platforms.

OS.File helps somewhat. Hand-rolled JSON-on-disk storage backends are
starting to crop up in a lot of places. Also perhaps not a great state
of affairs.

> Chrome is working on replacing sqlite with leveldb for indexeddb and most their storage needs. Last time we looked it wasn't ready for prime time. Maybe it is now. This might be the best option.

I have little experience actually trying to use indexedDB, so grain of
salt etc., but my impression is that it's somewhat overkill for use
cases currently addressed by preferences or custom JSON (e.g. a simple
key-value store).

Gavin

Kyle Huey

未讀,

2013年4月26日下午2:52:402013/4/26

收件者：Gregory Szorc、dev-platform

On Fri, Apr 26, 2013 at 11:43 AM, Gregory Szorc <g...@mozilla.com> wrote:

> On 4/26/2013 11:26 AM, Kyle Huey wrote:
>

Yeah that's not accurate. It's pretty solid now. It's the storage backend
for everything in b2g for instance ... and it's not going to see any
changes that aren't backwards compatible.

> Is IndexedDB ready for use by internal Gecko consumers, including add-ons?
>

Yes.

Can you use it from the main thread (for cases where a dedicated worker is
> too heavy weight)?
>

You can only use it from the main thread right now. Worker support is
coming (Bug 798875).

> What are current risks associated with this "experimental technology?"
>

It's not experimental.

Aside from those concerns, I reckon IndexedDB could fill the gap I
> described. I want to learn more.
>

Always happy to answer questions.

- Kyle

Ryan VanderMeulen

未讀,

2013年4月26日下午2:57:312013/4/26

收件者：

On 4/26/2013 2:52 PM, Kyle Huey wrote:
>
> Yeah that's not accurate. It's pretty solid now. It's the storage backend
> for everything in b2g for instance ... and it's not going to see any
> changes that aren't backwards compatible.
>
>
>> Is IndexedDB ready for use by internal Gecko consumers, including add-ons?
>>
>
> Yes.
>
> Can you use it from the main thread (for cases where a dedicated worker is
>> too heavy weight)?
>>
>
> You can only use it from the main thread right now. Worker support is
> coming (Bug 798875).
>
>
>> What are current risks associated with this "experimental technology?"
>>
>
> It's not experimental.
>
> Aside from those concerns, I reckon IndexedDB could fill the gap I
>> described. I want to learn more.
>>
>
> Always happy to answer questions.
>
> - Kyle
>

The current level of flakiness in the IndexedDB test suite (especially
on OSX) makes me concerned about what to expect if it starts getting
heavier use across the various platforms.

Justin Lebar

未讀,

2013年4月26日下午3:07:512013/4/26

收件者：Ryan VanderMeulen、dev-pl...@lists.mozilla.org

> The current level of flakiness in the IndexedDB test suite (especially on
> OSX) makes me concerned about what to expect if it starts getting heavier
> use across the various platforms.

Is that just in the OOP tests, or everywhere?

Benjamin Smedberg

未讀,

2013年4月26日下午3:10:052013/4/26

收件者：Gavin Sharp、Andreas Gal、dev-platform、Gregory Szorc

On 4/26/2013 2:50 PM, Gavin Sharp wrote:
> On Fri, Apr 26, 2013 at 11:36 AM, Andreas Gal <g...@mozilla.com> wrote:
>> Preferences are as the name implies intended for preferences. There is no sane use case for storing data in preferences. I would give any patch I come across doing that an automatic sr- for poor taste and general insanity.
> As Greg suggests, that ship has kind of already sailed. In practice
> preferences often ends up being the "best" choice for storing some
> small amounts of data. Which is a sad state of affairs, to be sure -
> so I'm glad we have this thread to explore alternatives.

The key problem with expanding this is that the pref API is designed to
be synchronous because it controls a bunch of behavior early in startup.
Our implementation is therefore to read all the prefs in (synchronously)
and operate on them in-memory. That strategy only continues to work as
long as the set of data in prefs is tightly constrained.

I really hope the outcome of this discussion is that we end up storing
everything that isn't a true preference in some other datastore, and
that is an async-by-default datastore ;-)

> I have little experience actually trying to use indexedDB, so grain of
> salt etc., but my impression is that it's somewhat overkill for use
> cases currently addressed by preferences or custom JSON (e.g. a simple
> key-value store).

With a pretty simple JSM wrapper, indexeddb could be a very good
solution for saving JSON or JSON-like things (you don't even need JSON,
because indexeddb does structured cloning). It can of course be used for
more complex things as well, but if we want a durable key-value store,
it could be as simple as:

ChromeData.get('key', function(value) {
// null if unset
});

ChromeData.set('key', value [, function()]); // asynchronous

Or maybe there's a better syntax using promises, but in any case it
could probably be this simple.

Does anyone use indexeddb in chrome right now?

--BDS

Kyle Huey

未讀,

2013年4月26日下午3:11:042013/4/26

收件者：Gregory Szorc、dev-platform

Resending to list.

On Fri, Apr 26, 2013 at 12:02 PM, Gregory Szorc <g...@mozilla.com> wrote:

> On 4/26/2013 11:52 AM, Kyle Huey wrote:
>
> Could you please point me at a "good" implementation of a Gecko consumer
> of IndexedDB? If you don't know which are good, an MXR search URL will
> suffice :)
>

I haven't looked at any of them closely but there are lots of uses at
http://mxr.mozilla.org/gaia/search?string=indexeddb

> I'm looking at
>
> https://mxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/sdk/indexed-db.js
> .
> Is all that principal magic necessary? Is there an MDN page documenting
> all this?
>

I think that's just necessary for separating jetpack addons from one
another. If you're ok with it being possible for some other piece of
chrome to access your database and just use a unique name for it it's
unnecessary, AIUI.

- Kyle

Kyle Huey

未讀,

2013年4月26日下午3:13:582013/4/26

收件者：Ryan VanderMeulen、dev-pl...@lists.mozilla.org

On Fri, Apr 26, 2013 at 11:57 AM, Ryan VanderMeulen <rya...@gmail.com>wrote:

> The current level of flakiness in the IndexedDB test suite (especially on
> OSX) makes me concerned about what to expect if it starts getting heavier
> use across the various platforms.

Of the 24 open intermittent failure bugs in the IndexedDB component at
least 9 are IPC related. Another 6 or 7 are all the same bug and are being
dealt with. And those two account for the high frequency oranges. The
remainder are all pretty low frequency from what I can tell.

- Kyle

Ryan VanderMeulen

未讀,

2013年4月26日下午3:18:432013/4/26

收件者：

Mostly IPC.

Dirkjan Ochtman

未讀,

2013年4月26日下午3:21:492013/4/26

收件者：Gregory Szorc、taras....@glek.net、dev-platform

On Fri, Apr 26, 2013 at 8:17 PM, Gregory Szorc <g...@mozilla.com> wrote:
> Anyway, I just wanted to see if others have thought about this. Do
> others feel it is a concern? If so, can we formulate a plan to address
> it? Who would own this?

AIUI the Performance team is experimenting with moving things into
JSON files, it would likely make sense to coordinate with them (and
their findings WRT the downsides of SQLite).

Also, I wonder if SQLite 4 (which is more like a key-value store)
could alleviate some of the issues here (although it might be
insufficiently mature, or, more specifically, less mature than
LevelDB).

Cheers,

Dirkjan

Gavin Sharp

未讀,

2013年4月26日下午3:27:482013/4/26

收件者：Benjamin Smedberg、Andreas Gal、dev-platform、Gregory Szorc

On Fri, Apr 26, 2013 at 12:10 PM, Benjamin Smedberg
<benj...@smedbergs.us> wrote:
> I really hope the outcome of this discussion is that we end up storing
> everything that isn't a true preference in some other datastore, and that is
> an async-by-default datastore ;-)

> With a pretty simple JSM wrapper, indexeddb could be a very good solution
> for saving JSON or JSON-like things (you don't even need JSON, because
> indexeddb does structured cloning). It can of course be used for more
> complex things as well, but if we want a durable key-value store, it could
> be as simple as:
>
> ChromeData.get('key', function(value) {
> // null if unset
> });
>
> ChromeData.set('key', value [, function()]); // asynchronous
>
> Or maybe there's a better syntax using promises, but in any case it could
> probably be this simple.

OK, sounds like we should do this. I filed
https://bugzilla.mozilla.org/show_bug.cgi?id=866238.

> Does anyone use indexeddb in chrome right now?

The patch in bug 789348 does (though that's actually running in
content). I don't know of any existing users in code that runs on
desktop (metro seems to use it, some core b2g-related code might).

Gavin

Gregory Szorc

未讀,

2013年4月26日下午3:30:522013/4/26

收件者：Benjamin Smedberg、Gavin Sharp、Andreas Gal、dev-platform

On 4/26/2013 12:10 PM, Benjamin Smedberg wrote:
> On 4/26/2013 2:50 PM, Gavin Sharp wrote:
>> On Fri, Apr 26, 2013 at 11:36 AM, Andreas Gal <g...@mozilla.com> wrote:
>>> Preferences are as the name implies intended for preferences. There
>>> is no sane use case for storing data in preferences. I would give
>>> any patch I come across doing that an automatic sr- for poor taste
>>> and general insanity.
>> As Greg suggests, that ship has kind of already sailed. In practice
>> preferences often ends up being the "best" choice for storing some
>> small amounts of data. Which is a sad state of affairs, to be sure -
>> so I'm glad we have this thread to explore alternatives.
> The key problem with expanding this is that the pref API is designed
> to be synchronous because it controls a bunch of behavior early in
> startup. Our implementation is therefore to read all the prefs in
> (synchronously) and operate on them in-memory. That strategy only
> continues to work as long as the set of data in prefs is tightly
> constrained.

Perhaps this should be advertised more, especially to the add-on
community. Looking at about:config of my main profile, about 2/3 of my
preferences are user set. There are hundreds of preferences apparently
being used for key-value storage by add-ons (not to pick on one, but
HTTPS Everywhere has a few hundred prefs).

This shouldn't be surprising: Preferences quacks like a generic
key-value store. In the absence of something similar and just as easy to
use, people will use (and abuse) it for storage needs.

IMO we can't just say "don't use Preferences for that" without offering
something equivalent. If we do, we'll have SQLite/raw I/O and we're no
better off.

> With a pretty simple JSM wrapper, indexeddb could be a very good
> solution for saving JSON or JSON-like things (you don't even need
> JSON, because indexeddb does structured cloning). It can of course be
> used for more complex things as well, but if we want a durable
> key-value store, it could be as simple as:
>
> ChromeData.get('key', function(value) {
> // null if unset
> });
>
> ChromeData.set('key', value [, function()]); // asynchronous
>
> Or maybe there's a better syntax using promises, but in any case it
> could probably be this simple.

I strongly believe a simple wrapper would go a long way. The current
pitfalls of storage have bitten me enough times that I'm tentatively
volunteering to add one to Toolkit.

However, before that happens, I'd like some consensus that IndexedDB is
the best solution here. I'd especially like to hear what Performance
thinks: I don't want to start creating a "preferred" storage solution
without their blessing. If they have suggestions for specific ways we
should use IndexedDB (or some other solution) to minimize perf impact,
we should try to enforce these through the preferred/wrapper API.

Reuben Morais

未讀,

2013年4月26日下午4:09:002013/4/26

收件者：dev-platform

We use IndexedDB extensively in a lot of the WebAPIs, see Contacts, Settings, SMS, MMS, Push, NetworkStats…

Right now there's a lot of boilerplate[1] involved in setting up IndexedDB, and people end up duplicating a lot of the boilerplate code. It'd be great to see a more polished wrapper around it. The callback chains of death involved in writing IDB code are also not very pleasant to read and write, so bonus points if we could have a syntax like Task.jsm, where you can do |result = yield objStore.get("foo");|

I don't know how much of this overlaps with the work to expose a simpler KV-store like API for saving snippets of data, but I figured I'd mention that this is also a problem for consumers wh need all the functionality of IDB.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/base/IndexedDBHelper.jsm

-- reuben

Andrew Sutherland

未讀,

2013年4月26日下午4:18:202013/4/26

收件者：dev-pl...@lists.mozilla.org

On 04/26/2013 03:21 PM, Dirkjan Ochtman wrote:
> Also, I wonder if SQLite 4 (which is more like a key-value store)

SQLite 4 is not actually more like a key-value store. The underlying
storage model used by the SQL-interface-that-is-the-interface changed
from being a page-centric btree structure to a key-value store that is
more akin to a log-structured merge implementation, but which will still
seem very familiar to anyone familiar with the page-centric vfs
implementation that preceded it. Specifically, it does not look like
IndexedDB's model; it still does a lot of fsync's in order to maintain
the requisite SQL ACID semantics.

Unless we exposed that low level key-value store, SQLite 4 would look
exactly the same to consumers. The main difference would be that
because records would actually be stored in their (lexicographic)
PRIMARY KEY order, performance should improve in general, especially on
traditional (non-SSD) hard drives. Our IndexedDB implementation, for
one, could probably see a good performance boost from a switch to SQLite4.

Andrew

Andrew Sutherland

未讀,

2013年4月26日下午4:33:372013/4/26

收件者：dev-pl...@lists.mozilla.org

On 04/26/2013 03:30 PM, Gregory Szorc wrote:
> However, before that happens, I'd like some consensus that IndexedDB is
> the best solution here. I'd especially like to hear what Performance
> thinks: I don't want to start creating a "preferred" storage solution
> without their blessing. If they have suggestions for specific ways we
> should use IndexedDB (or some other solution) to minimize perf impact,
> we should try to enforce these through the preferred/wrapper API.

I'm not on the performance team, but I've done some extensive
investigation into SQLite performance[1] and a lot of thinking about how
to efficiently do disk I/O for various workloads from my work with
mozStorage for Thunderbird's global database.

I would say that the IndexedDB API has a very good API[2] that allows
for very efficient back-end implementations. Our existing
implementation could do a lot of things to go faster, especially on
non-SSDs. But that can be done as an enhancement and does not need to
happen yet. I think LevelDB broadly has the right idea, although
Chrome's IndexedDB implementation has some surprising limitations (no
File Blob storage) that suggests it's not there yet.

The API can indeed be a bit heavy-weight for simple needs; shims over
IndexedDB like gaia's asyncStorage helper are the way to go:
https://github.com/mozilla-b2g/gaia/blob/master/shared/js/async_storage.js

Andrew

1:
http://www.visophyte.org/blog/2010/04/06/performance-annotated-sqlite-explaination-visualizations-using-systemtap/

2: The only enhancement I would like is non-binding hinting of desired
batches so that IndexedDB could pre-fetch data that the consumer knows
it is going to want anyways in order to avoid ping-ponging fetch
requests back and forth to the async thread and the main thread. (Right
now mozGetAll can be used to accomplish similar, if non-transparent and
dangerously foot-gunny results.)

Justin Dolske

未讀,

2013年4月26日下午5:33:472013/4/26

收件者：

On 4/26/13 11:17 AM, Gregory Szorc wrote:

> But, please don't
> make consumers worry about things like SQL, schema design, and PRAGMA
> statements.

Ideally, yes. But I suspect there will never be a one-size-fits all
solution, and so we should probably be clear about what it's
appropriate/intended for (see: prefs today!).

> Anyway, I just wanted to see if others have thought about this. Do
> others feel it is a concern? If so, can we formulate a plan to address
> it? Who would own this?

I'd really like to see a simple, standard way to mirror a JS object
to/from disk. There would barely be any API, it Just Works. EG,
something like:

Cu.import("jsonator2000.jsm");
jsonator.init("myfile.json", onready);
var o;
function onready(aVerySpecialProxyObject) {
o = aVerySpecialProxyObject;
}
...
o.foo = 42;
if (o.flagFromLastSession) { ... }

With a backend that takes care of file creation, error handling,
flushing to disk, doing writes OMT/async, handling writes-on-shutdown,
etc. Maybe toss in optional compression, checksumming, or other goodies.

It's been on my maybe-some-weekend list for a while. :)

Justin

Mike de Boer

未讀,

2013年4月26日下午6:48:512013/4/26

收件者：Justin Dolske、dev-pl...@lists.mozilla.org

I have to admit; I've been thinking about this as well… and considering the complexities involved with developing algorithms to deal with caches, fsyncs and concurrency I tend to lean toward NOT rolling your own, but instead look at what's out there.

Event though it's oriented towards server usage, I became quite the fan of Redis[1] over the years. I've been pondering this for a while now: how bad would it be to have (something like) Redis as the backing datastore for anything key/value persistence related? Many areas, not only prefs, but more importantly session history et al would benefit from a datastore that is more performant than raw file I/O than Gecko APIs provide. I meant the roll-your-own scenario there.

Perhaps I'm thinking a tad too far out of the box this time, but think about it!

Mike.

[1] http://redis.io/

Mounir Lamouri

未讀,

2013年4月26日晚上7:03:572013/4/26

收件者：dev-pl...@lists.mozilla.org

On 26/04/13 11:17, Gregory Szorc wrote:
> Anyway, I just wanted to see if others have thought about this. Do
> others feel it is a concern? If so, can we formulate a plan to address
> it? Who would own this?

As others, I believe that we should use IndexedDB for Gecko internal
storage. I opened a bug regarding this quite a while ago:
https://bugzilla.mozilla.org/show_bug.cgi?id=766057

We could easily imagine an XPCOM component that would expose a simple
key/value storage available from JS or C++ using IndexedDB in the backend.

--
Mounir

Andreas Gal

未讀,

2013年4月26日晚上7:26:332013/4/26

收件者：Mounir Lamouri、dev-pl...@lists.mozilla.org

We filed a bug for this and I am working on the patch.

Andreas

Sent from Mobile.

Neil

未讀,

2013年4月26日晚上7:30:152013/4/26

收件者：

Gregory Szorc wrote:

>c) I/O is synchronous.
>
>
To be fair, the pref API is mostly reading and writing a big hashtable;
few functions actually do any I/O.

>e) The API is awkward.
>

Well, XPCOM was all the rage at the time. (Then again, so was RDF, and
its dynamic bulk read API is still better than anything else we have.)

Anyway, the problem with files or sqlite is that you have to roll your
own code, or worse, copy someone else who rolled their own code
incorrectly. At least preferences don't have that problem.

I understand Session Store now uses asynchronous JSON storage. If we
could separate that out into its own module, it could prove very popular.

--
Warning: May contain traces of nuts.

Matt Brubeck

未讀,

2013年4月26日晚上8:51:542013/4/26

收件者：

On 4/26/2013 11:43 AM, Gregory Szorc wrote:
>> Have you explored using IndexedDB?
>
> Not seriously. The "this is an experimental technology" warning on MDN
> is off-putting.

The largest audience for MDN is web developers, so we put that warning
on anything that's not ready for widespread use on the public web,
including most things that are prefixed in current browsers.

Here are some other things with the same "experimental technology"
warning on their MDN pages:

* JavaScript "for...of" loops
* CSS transform, transition, animation
* WebSocket
* Set, Map, WeakMap

Obviously we have no qualms against using these ourselves. When an
experimental technology is one that *we* are promoting as part of the
development platform *we* are building, then of course we should using
it in our own code. In fact we should be early adopters, because if
there are issues that prevent us from using our own APIs, then they will
often affect other developers on our platform, so we need to know about
those and fix them.

bent

未讀,

2013年4月26日晚上11:42:342013/4/26

收件者：

IndexedDB is our answer for this for JS... C++ folks are still pretty
much on their own!

IndexedDB handles indexing (hence the rather awkward name),
transactions with abort/rollback, object-graph serialization (not just
JSON), usage from multiple tabs/windows/components/processes
simultaneously, data integrity guarantees, and easy single-copy file/
blob support. It's also completely asynchronous (and read-only
transactions run in parallel!), and one day soon it will be available
in Workers too. We're using it extensively in B2G (it's the only
storage option, really) and it's easily usable from chrome too (with a
two line bit of Cc/Ci init code). IE and Chrome both implement it and
we all have it available without a prefix because the API (v1) is
pretty much frozen.

What we've heard from a lot of JS developers (gaia folks included),
though, is that this feature set is more than some want or need. They
don't want to worry about indexes or transactions or serializing
complex objects. Luckily we anticipated this! Our aim was to provide a
sufficiently powerful tool such that we could build complex apps (e.g.
B2G's email app, ping asuth for details!) as well as simple key-value
stores (e.g. gaia's async_storage, mostly written by dflanagan I
think). Someone even implemented an early version of Chrome's
filesystem API on IndexedDB...

Nevertheless, I (and others) think it's clear that the big thing we
screwed up on is that we didn't release a "simple storage" wrapper
alongside IndexedDB. I think we expected this sort of thing to appear
on its own, but so far it hasn't. Sorry :(

So now we're working on wrappers. https://github.com/mounirlamouri/storage.js
is one in-progress example. I think there is another as well but the
link escapes me at the moment. In any case it should be trivial to do
something very similar as a JSM. (We already have an
IndexedDBHelper.jsm for more complicated databases that do actually
want control over tables and indexes.)

Hopefully IndexedDB meets most people's needs... That's what we tried
to build here, after all. The need to sprinkle some sugar here and
there is completely expected and very much encouraged. Please feel
free to ping me over irc or email or here on the list if anything is
unclear or difficult. (Of course there are definitely other folks that
are involved here but I don't want to volunteer them without their
consent!)

-bent

P.S. The "experimental" mark in MDN is outdated, and very unfortunate.
We should remove that ASAP.

Philip Chee

未讀,

2013年4月27日上午8:46:332013/4/27

收件者：

On 27/04/2013 02:17, Gregory Szorc wrote:
> I'd like to start a discussion about the state of storage in Gecko.
>
> Currently when you are writing a feature that needs to store data, you
> have roughly 3 choices:
>
> 1) Preferences
> 2) SQLite
> 3) Manual file I/O

....

> I think there is an opportunity for Gecko to step in and provide a
> storage subsystem that is easy to use, somewhere between preferences and
> SQLite in terms of durability and performance, and "just works." I don't
> think it matters how it is implemented under the hood. If this were to
> be built on top of SQLite, I think that would be fine. But, please don't
> make consumers worry about things like SQL, schema design, and PRAGMA
> statements. So, maybe I'm advocating a generic key-value store. Maybe
> something like DOM Storage? Maybe SQLite 4 (which is emphasizing
> key-value storage and speed)? Just... something. Please.

Has anyone suggested RDF yet?

Phil

--
Philip Chee <phi...@aleytys.pc.my>, <phili...@gmail.com>
http://flashblock.mozdev.org/ http://xsidebar.mozdev.org
Guard us from the she-wolf and the wolf, and guard us from the thief,
oh Night, and so be good for us to pass.

Jonathan Protzenko

未讀,

2013年4月27日上午11:58:412013/4/27

收件者：dev-pl...@lists.mozilla.org

Hi,

I once met a similar need: a simple key/storage API for my addon. I
ended up writing a "SimpleStorage" module that uses an underlying SQLite
database. I'm pretty sure I fell into most of the pitfalls of using
SQLite without being a guru, but here's the link for posterity:
- sample usage
<https://github.com/protz/thunderbird-stdlib/blob/master/tests/test_SimpleStorage.js>
(line 50-65)
- implementation
<https://github.com/protz/thunderbird-stdlib/blob/master/SimpleStorage.js>

The library exposes a has/set/get API, and uses iterators with a lot of
yield's to allow the client to write their code in a style that looks
synchronous but that actually isn't (see sample usage). I assume the new
hotness now is using a promises-style API instead of ES6
iterators/generators, but I'm sharing this in case anyone wants to check
it out. It's being used heavily by the Thunderbird Conversations addon
(~100 000 ADU).

Speaking as an addon author, I would definitely embrace anything that
offers such a dead-simple get/set/has API, even more so if it's
advertised as "doing things right". If someone were to bundle a JSM
somewhere in Gecko that exposes a similar API on top of IndexedDB, that
would be great relief for addon authors.

jonathan

Mounir Lamouri

未讀,

2013年4月27日中午12:37:132013/4/27

收件者：dev-pl...@lists.mozilla.org

On 26/04/13 20:42, bent wrote:
> IndexedDB is our answer for this for JS... C++ folks are still pretty
> much on their own!

Why? Wouldn't be the idea of such component to make sure it is usable
from C++?

--
Mounir

bent

未讀,

2013年4月29日中午12:31:002013/4/29

收件者：

On Apr 27, 9:37 am, Mounir Lamouri <mou...@lamouri.fr> wrote:
> Why? Wouldn't be the idea of such component to make sure it is usable
> from C++?

Perhaps some day, but IndexedDB was always designed with JS in mind.
To use it you pass special JS dictionaries for options, clone things
to/from JS objects, etc. Using it from C++ is not a pleasant
experience and requires lots of JSAPI.

We could implement a C++ API that reuses all the transactions and
threading and such but so far no one has been breaking down our door
asking for it.

-bent

Taras Glek

未讀,

2013年4月29日下午1:51:032013/4/29

收件者：Gregory Szorc、David Rajchenbach-Teller

So there is no general 'good for performance' way of doing IO.

However I think most people who need this need to write small bits of
data and there is a good way to do that.

Gregory Szorc wrote:
> I'd like to start a discussion about the state of storage in Gecko.
>
> Currently when you are writing a feature that needs to store data, you
> have roughly 3 choices:
>
> 1) Preferences
> 2) SQLite
> 3) Manual file I/O

* How to robustly write/update small datasets?

#3 above is it for small datasets. The correct way to do this is to
write blobs of JSON to disk. End of discussion.

Writes of data <= ~64K should just be implemented as atomic whole-file
read/write operations. Those are almost always single blocks on disk.

Writing a whole file at once eliminates risk of data corruption.
Incremental updates are what makes sqlite do the WAL/fsync/etc dance
that causes much of the slowness.

We invested a year worth of engineering effort into a pure-js IO library
to facilitate efficient application-level IO. See OS.File docs, eg
https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread

As you can see from above examples, manual IO is not scary

If one is into convenience APIs, one can create arbitrary json-storage
abstractions in ~10lines of code.

* What about writes > 64K?
Compression gives you 5-10x reduction of json.
https://bugzilla.mozilla.org/show_bug.cgi?id=846410
Compression also means that your read-throughput is up to 5x better too.

* What about fsync-less writes?
Many log-type performance-sensitive data-storage operations are ok with
lossy appends. By lossy I mean "data will be lost if there is a power
outage within a few seconds/minutes of write", consistency is still
important. For this one should create a directory and write out log
entries as checksummed individual files...but one should really use
compression(and get checksums for free).
https://bugzilla.mozilla.org/show_bug.cgi?id=846410 is about
facilitating such an API.

Use-cases here: telemetry saved-sessions, FHR session-statistics.

* What about large datasets?
These should be decided on a case-by-case basis. Universal solutions
will always perform poorly in some dimension.

* What about indexeddb?
IDB is overkill for simple storage needs. It is a restrictive wrapper
over an SQLite schema. Perhaps some large dataset (eg an addressbook) is
a good fit for it. IDB supports filehandles to do raw IO, but that still
requires sqlite to bootstrap, doesn't support compression, etc.
IDB also makes sense as a transitional API for web due to the need to
move away from DOM Local Storage...

* Why isn't there a convenience API for all of the above recommendations?
Because speculatively landing APIs that anticipate future consumers is
risky, results in over-engineering and unpleasant surprises...So give us
use-cases and we(ie Yoric) will make them efficient.

Taras

Andrew McCreight

未讀,

2013年4月29日下午1:57:572013/4/29

收件者：dev-pl...@lists.mozilla.org

----- Original Message -----
> On Apr 27, 9:37 am, Mounir Lamouri <mou...@lamouri.fr> wrote:
> > Why? Wouldn't be the idea of such component to make sure it is
> > usable
> > from C++?
>
> Perhaps some day, but IndexedDB was always designed with JS in mind.
> To use it you pass special JS dictionaries for options, clone things
> to/from JS objects, etc. Using it from C++ is not a pleasant
> experience and requires lots of JSAPI.

A WebIDL callback interface thing could probably be set up to make calling from C++ into JS less awful, if somebody has a need.

Andrew

>
> We could implement a C++ API that reuses all the transactions and
> threading and such but so far no one has been breaking down our door
> asking for it.
>
> -bent

Taras Glek

未讀,

2013年4月29日下午2:10:382013/4/29

收件者：Andreas Gal、Gregory Szorc

Andreas Gal wrote:
> Preferences are as the name implies intended for preferences. There is no sane use case for storing data in preferences. I would give any patch I come across doing that an automatic sr- for poor taste and general insanity.
>

> SQLite is definitely not cheap, and we should look at more suitable backends for our storage needs, but done right off the main thread, its definitely the saner way to go than (1).
>
> While (2) is a foot-gun, (3) is a guaranteed foot-nuke. While its easy to use sqlite wrong, its almost guaranteed that you get your own atomic storage file use wrong, across our N platforms.

3) is not a footnuke. Atomic file IO is how we do prefs, other things.
We provide a writeAtomic API in OS.File. So long as your 'io
transactions' are within a single file writeAtomic is your friend. If
one needs io transactions across files, then one is in trouble indeed,
but that is not the case for 99% of our code.

Big advantage of 3 is that one does not pay the abstraction penalty of
heavier sqlite or indexeddb solutions. cost of a write+fsync + followup
read is easy to reason about.

One can also layer compression/checksums on top of atomic file IO
easily. This hard with a more complex storage layer(eg we can't compress
our sqlite place database even though that'd be a nice win).

>
> Chrome is working on replacing sqlite with leveldb for indexeddb and most their storage needs. Last time we looked it wasn't ready for prime time. Maybe it is now. This might be the best option.

leveldb sounds nice, but the level of complexity there is overkill for
most of our usecases. Filesystems work well, complex abstractions on top
of them tend to be flakey.

Taras

ps. sorry for chiming in late. I was away doing outdoorsy vacation stuff
Thurs-Sun.

>
> Andreas

>
> On Apr 26, 2013, at 11:17 AM, Gregory Szorc<g...@mozilla.com> wrote:
>
>> I'd like to start a discussion about the state of storage in Gecko.
>>
>> Currently when you are writing a feature that needs to store data, you
>> have roughly 3 choices:
>>
>> 1) Preferences
>> 2) SQLite
>> 3) Manual file I/O
>>

>> Preferences are arguably the easiest. However, they have a number of
>> setbacks:
>>
>> a) Poor durability guarantees. See bugs 864537 and 849947 for real-life
>> issues. tl;dr writes get dropped!
>> b) Integers limited to 32 bit (JS dates overflow b/c milliseconds since
>> Unix epoch).
>> c) I/O is synchronous.
>> d) The whole method for saving them to disk is kind of weird.
>> e) The API is awkward. See Preferences.jsm for what I'd consider a
>> better API.
>> f) Doesn't scale for non-trivial data sets.
>> g) Clutters about:config (all preferences aren't config options).
>>
>> We have SQLite. You want durability: it's your answer. However, it too
>> has setbacks:
>>
>> a) It eats I/O operations for breakfast. Multiple threads. Lots of
>> overhead compared to prefs. (But hard to lose data.)
>> b) By default it's not configured for optimal performance (you need to
>> enable the WAL, muck around with other PRAGMA).
>> c) Poor schemas can lead to poor performance.
>> d) It's often overkill.
>> e) Storage API has many footguns (use Sqlite.jsm to protect yourself).
>> f) Lots of effort to do right. Auditing code for 3rd party extensions
>> using SQLite, many of them aren't doing it right.
>>
>> And if one of those pre-built solutions doesn't offer what you need, you
>> can roll your own with file I/O. But that also has setbacks:
>>
>> a) You need to roll your own. (How often do I flush? Do I use many small
>> files or fewer large files? Different considerations for mobile (slow
>> I/O) vs desktop?)
>> b) You need to roll your own. (Listing it twice because it's *really*
>> annoying, especially for casual developers that just want to implement
>> features - think add-on developers.)
>> c) Easy to do wrong (excessive flushing/fsyncing, too many I/O
>> operations, inefficient appends, poor choices for mobile, etc).
>> d) Wheel reinvention. Atomic operations/transactions. Data marshaling. etc.
>>
>> I believe there is a massive gap between the
>> easy-but-not-ready-for-prime-time preferences and
>> the-massive-hammer-solving-the-problem-you-don't-have-and-introducing-many-new-ones
>> SQLite. Because this gap is full of unknowns, I'm arguing that
>> developers tend to avoid it and use one of the extremes instead. And,
>> the result is features that have poor durability and/or poor
>> performance. Not good. What's worse is many developers (including
>> myself) are ignorant of many of these pitfalls. Yes, we have code review
>> for core features. But code review isn't perfect and add-ons likely
>> aren't subjected to the same level of scrutiny. The end result is the
>> same: Firefox isn't as awesome as it could be.

>>
>> I think there is an opportunity for Gecko to step in and provide a
>> storage subsystem that is easy to use, somewhere between preferences and
>> SQLite in terms of durability and performance, and "just works." I don't
>> think it matters how it is implemented under the hood. If this were to
>> be built on top of SQLite, I think that would be fine. But, please don't
>> make consumers worry about things like SQL, schema design, and PRAGMA
>> statements. So, maybe I'm advocating a generic key-value store. Maybe
>> something like DOM Storage? Maybe SQLite 4 (which is emphasizing
>> key-value storage and speed)? Just... something. Please.
>>

>> Anyway, I just wanted to see if others have thought about this. Do
>> others feel it is a concern? If so, can we formulate a plan to address
>> it? Who would own this?
>>

>> Gregory

Boris Zbarsky

未讀,

2013年4月29日下午2:10:402013/4/29

收件者：

On 4/29/13 1:57 PM, Andrew McCreight wrote:
> A WebIDL callback interface thing could probably be set up to make calling from C++ into JS less awful, if somebody has a need.

Sort of. IndexedDB keys are done on raw JS values ("any" in the IDL)
because they want to tell apart Arrays from other objects, want to tell
apart the number 5 and the string "5", etc.

-Boris

Joshua Cranmer 🐧

未讀,

2013年4月29日下午2:36:582013/4/29

收件者：

On 4/26/2013 1:17 PM, Gregory Szorc wrote:
> I'd like to start a discussion about the state of storage in Gecko.
>
> Currently when you are writing a feature that needs to store data, you
> have roughly 3 choices:
>
> 1) Preferences
> 2) SQLite
> 3) Manual file I/O

One of the ongoing tasks I dabbled in was replacing the message folder
cache in Thunderbird with a sane database backend (bug 418551). It's
currently implemented in mork, but it has a very simple database
structure, basically a map of folder URLs -> property name -> string or
integer. It's also a potential hot path in startup, so I actually took
the time to run it through some tests, using traces of actual execution
for my profile to benchmark. When I first compared SQLite to mork, I
never got satisfactory results, so I didn't try, but a LevelDB
implementation (it was the hotness when I decided to run a test) was
whooped soundly by mork--factor of 8 or so. It's really telling that
LevelDB -O3 was even beaten by a mork -O0 (factor of 2). I didn't try
IndexedDB because the API to access the cache in Thunderbird is
inherently synchronous [1] and it's much more pain than it's worth to make

I've come to the conclusion that any database-y solution for a
comparatively small key-value or key-key-value store (my test is
basically a 900 x 10 key store, with around 9000 calls to get/set) is a
performance nightmare. This is the sort of thing that basically needs to
be prefetched into memory and stay resident there forever. I haven't
profiled Taras's suggestion of just using a JSON written atomically (the
synchronous API notwithstanding, it's probably safe to allow for async
write attempts); since I have all the test data and scripts still
around, I might just try that.

[1] From the point of view of consumers, the API boils down to "get this
value [to immediately display in the UI]" and "set this value". Since
mork is pretty much a giant hashtable in memory, it has very fast access
and making everything go async would both greatly increase complexity
and probably also slow down a lot of code.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

Gregory Szorc

未讀,

2013年4月29日下午4:55:352013/4/29

收件者：Taras Glek、dev-platform、David Rajchenbach-Teller

Great post, Taras!

Per IRC conversations, we'd like to move subsequent discussion of
actions into a meeting so we can more quickly arrive at a resolution.

Please meet in Gregory Szorc's Vidyo Room at 1400 PDT Tuesday, April 30.
That's 2200 UTC. Apologies to the European and east coast crowds. If
you'll miss it because it's too late, let me know and I'll consider
moving it.

https://v.mozilla.com/flex.html?roomdirect.html&key=yJWrGKmbSi6S

Ehsan Akhgari

未讀,

2013年4月30日凌晨1:33:062013/4/30

收件者：Taras Glek、David Rajchenbach-Teller、dev-pl...@lists.mozilla.org

On 2013-04-29 1:51 PM, Taras Glek wrote:
> * How to robustly write/update small datasets?
>
> #3 above is it for small datasets. The correct way to do this is to
> write blobs of JSON to disk. End of discussion.

For an API that is meant to be used by add-on authors, I'm afraid the
situation is not as easy as this. For example, for a "simple" key/value
store which should be used for small datasets, one cannot enforce the
implicit requirement of this solution (the data fitting in a single
block on the disk. for example) at the API boundary without creating a
crappy API which would "fail" some of the times if the value to be
written violates those assumptions. In practice it's not very easy for
the consumer of the API to guarantee the size of the data written to
disk if the data is coming from the user, the network, etc.

> Writes of data <= ~64K should just be implemented as atomic whole-file
> read/write operations. Those are almost always single blocks on disk.
>
> Writing a whole file at once eliminates risk of data corruption.
> Incremental updates are what makes sqlite do the WAL/fsync/etc dance
> that causes much of the slowness.

Is that true even if the file is written to more than one physical block
on the disk, across all of the filesystems that Firefox can run on?

> As you can see from above examples, manual IO is not scary

Only if you trust the consumer of the API to know the trade-offs of what
they're doing. That is not the right assumption for a generic key/value
store API.

> * What about fsync-less writes?
> Many log-type performance-sensitive data-storage operations are ok with
> lossy appends. By lossy I mean "data will be lost if there is a power
> outage within a few seconds/minutes of write", consistency is still
> important. For this one should create a directory and write out log
> entries as checksummed individual files...but one should really use
> compression(and get checksums for free).
> https://bugzilla.mozilla.org/show_bug.cgi?id=846410 is about
> facilitating such an API.
>
> Use-cases here: telemetry saved-sessions, FHR session-statistics.

This is an interesting use case indeed, but I don't think that it falls
under the umbrella of the API being discussed here.

> * What about large datasets?
> These should be decided on a case-by-case basis. Universal solutions
> will always perform poorly in some dimension.
>
> * What about indexeddb?
> IDB is overkill for simple storage needs. It is a restrictive wrapper
> over an SQLite schema. Perhaps some large dataset (eg an addressbook) is
> a good fit for it. IDB supports filehandles to do raw IO, but that still
> requires sqlite to bootstrap, doesn't support compression, etc.
> IDB also makes sense as a transitional API for web due to the need to
> move away from DOM Local Storage...

Indexed DB is not a wrapper around SQLite. The fact that our current
implementation uses SQLite is an implementation detail which might
change. (And it's not true on the web across different browser engines.)

I'm sure that if somebody can provide testcases on bad IndexedDB
performance scenarios we can work on fixing them, and that would benefit
the web, and Firefox OS as well.

> * Why isn't there a convenience API for all of the above recommendations?
> Because speculatively landing APIs that anticipate future consumers is
> risky, results in over-engineering and unpleasant surprises...So give us
> use-cases and we(ie Yoric) will make them efficient.

The use case being discussed here is a simple key/value data store,
hopefully with asynchronous operations, and safety guarantees against
dataloss. I do not see the current discussion as speculative at all.

Cheers,
Ehsan

Joshua Cranmer 🐧

未讀,

2013年4月30日上午11:37:532013/4/30

收件者：

On 4/30/2013 12:33 AM, Ehsan Akhgari wrote:
> On 2013-04-29 1:51 PM, Taras Glek wrote:
>> * How to robustly write/update small datasets?
>>
>> #3 above is it for small datasets. The correct way to do this is to
>> write blobs of JSON to disk. End of discussion.
>
> For an API that is meant to be used by add-on authors, I'm afraid the
> situation is not as easy as this. For example, for a "simple"
> key/value store which should be used for small datasets, one cannot
> enforce the implicit requirement of this solution (the data fitting in
> a single block on the disk. for example) at the API boundary without
> creating a crappy API which would "fail" some of the times if the
> value to be written violates those assumptions. In practice it's not
> very easy for the consumer of the API to guarantee the size of the
> data written to disk if the data is coming from the user, the network,
> etc.

The "implicit requirement" of which you speak is not a hard "this will
break if you violate this" requirement but rather a "performance is
optimized for this use-case" requirement.

>
>> Writes of data <= ~64K should just be implemented as atomic whole-file
>> read/write operations. Those are almost always single blocks on disk.
>>
>> Writing a whole file at once eliminates risk of data corruption.
>> Incremental updates are what makes sqlite do the WAL/fsync/etc dance
>> that causes much of the slowness.
>
> Is that true even if the file is written to more than one physical
> block on the disk, across all of the filesystems that Firefox can run on?

OS.File.writeAtomic works correctly regardless of file size (it's
basically "write to temporary, move temporary to real filename"), which
works correctly so long as the move is not a cross-filesystem move.

Taras Glek

未讀,

2013年4月30日下午1:01:362013/4/30

收件者：Ehsan Akhgari、David Rajchenbach-Teller、dev-pl...@lists.mozilla.org

> Ehsan Akhgari <mailto:ehsan....@gmail.com>
> Monday, April 29, 2013 22:33

> On 2013-04-29 1:51 PM, Taras Glek wrote:

>> * How to robustly write/update small datasets?
>>
>> #3 above is it for small datasets. The correct way to do this is to
>> write blobs of JSON to disk. End of discussion.
>

> For an API that is meant to be used by add-on authors, I'm afraid the
> situation is not as easy as this. For example, for a "simple"
> key/value store which should be used for small datasets, one cannot
> enforce the implicit requirement of this solution (the data fitting in
> a single block on the disk. for example) at the API boundary without
> creating a crappy API which would "fail" some of the times if the
> value to be written violates those assumptions. In practice it's not
> very easy for the consumer of the API to guarantee the size of the
> data written to disk if the data is coming from the user, the network,
> etc.

I'm not saying that the json has to fit a single filesystem block. I'm
saying that if it's a few blocks, it's more efficient to rewrite the
data every time.

prefs are an example of an overused key/value store that is usually well
under the threshold of a few blocks.

I think if you look at the kinds of data extensions store, it's small
enough, especially when compressed.

>
>> Writes of data <= ~64K should just be implemented as atomic whole-file
>> read/write operations. Those are almost always single blocks on disk.
>>
>> Writing a whole file at once eliminates risk of data corruption.
>> Incremental updates are what makes sqlite do the WAL/fsync/etc dance
>> that causes much of the slowness.
>

> Is that true even if the file is written to more than one physical
> block on the disk, across all of the filesystems that Firefox can run on?

yes.

>
>> As you can see from above examples, manual IO is not scary
>

> Only if you trust the consumer of the API to know the trade-offs of
> what they're doing. That is not the right assumption for a generic
> key/value store API.

We can add a warning to the API when it crosses some magical boundary. I
think small datasets are the most common, so we should focus on that
usecase.

>> * What about fsync-less writes?
>> Many log-type performance-sensitive data-storage operations are ok with
>> lossy appends. By lossy I mean "data will be lost if there is a power
>> outage within a few seconds/minutes of write", consistency is still
>> important. For this one should create a directory and write out log
>> entries as checksummed individual files...but one should really use
>> compression(and get checksums for free).
>> https://bugzilla.mozilla.org/show_bug.cgi?id=846410 is about
>> facilitating such an API.
>>
>> Use-cases here: telemetry saved-sessions, FHR session-statistics.
>

> This is an interesting use case indeed, but I don't think that it
> falls under the umbrella of the API being discussed here.

I'm still not sure what the api needs discussed are. Hopefully we'll
narrow down the scope of this in the meeting today.

>
>> * What about large datasets?
>> These should be decided on a case-by-case basis. Universal solutions
>> will always perform poorly in some dimension.
>>
>> * What about indexeddb?
>> IDB is overkill for simple storage needs. It is a restrictive wrapper
>> over an SQLite schema. Perhaps some large dataset (eg an addressbook) is
>> a good fit for it. IDB supports filehandles to do raw IO, but that still
>> requires sqlite to bootstrap, doesn't support compression, etc.
>> IDB also makes sense as a transitional API for web due to the need to
>> move away from DOM Local Storage...
>

> Indexed DB is not a wrapper around SQLite. The fact that our current
> implementation uses SQLite is an implementation detail which might
> change. (And it's not true on the web across different browser engines.)
>
> I'm sure that if somebody can provide testcases on bad IndexedDB
> performance scenarios we can work on fixing them, and that would
> benefit the web, and Firefox OS as well.

I like solutions that we are well-suited to the problem being solved.
IndexedDB is not a natural fit, making it fit sounds like more work than
doing a natural fs-based solution.

>> * Why isn't there a convenience API for all of the above
>> recommendations?
>> Because speculatively landing APIs that anticipate future consumers is
>> risky, results in over-engineering and unpleasant surprises...So give us
>> use-cases and we(ie Yoric) will make them efficient.
>

> The use case being discussed here is a simple key/value data store,
> hopefully with asynchronous operations, and safety guarantees against
> dataloss. I do not see the current discussion as speculative at all.

It's speculative until we define concrete consumers of such an api. gps'
original email said 'maybe' a key/value store is the way to go. I'm
making a case that something lower level is simpler + one can layer
keyvalue on top.

Taras
> Taras Glek <mailto:tg...@mozilla.com>
> Monday, April 29, 2013 10:51

> So there is no general 'good for performance' way of doing IO.
>
> However I think most people who need this need to write small bits of
> data and there is a good way to do that.
>
>
>

Dave Townsend

未讀,

2013年4月30日下午2:31:542013/4/30

收件者：

Just don't do it too frequently:
https://bugzilla.mozilla.org/show_bug.cgi?id=438316

Marco Bonardo

未讀,

2013年4月30日下午2:52:182013/4/30

收件者：

On 26/04/2013 22:18, Andrew Sutherland wrote:
> Specifically, it does not look like
> IndexedDB's model; it still does a lot of fsync's in order to maintain
> the requisite SQL ACID semantics.

Right, we can't expect miracles just by moving from SQLite3 to SQLite4,
though it still uses an enhanced WAL mode that has a better impact on
fsyncs, and also improves a lot on database fragmentation, that is very
often one of the worst offenders in SQLite3.

> Unless we exposed that low level key-value store, SQLite 4 would look
> exactly the same to consumers.

It indeed allows us to directly use the low level store, we should
indeed have a more classical Storage-like API, wrapping the SQLite API
(possibly with single-threaded connections and far less mutexes), but
also make a very simple key/value API using the low level store API.

> Our IndexedDB implementation, for
> one, could probably see a good performance boost from a switch to SQLite4.

I agree, SQLite 4 won't be a solution, since, as Taras said, the
solution must be built with the use-case in mind and many solutions can
be better than a database engine.
Though, it can surely be part of the solution. The idea would be to
make a very clear documentation stating what's the best storage to use
for specific needs. Guiding consumers to the right choice is much
better than trying to follow a nonexisting general solution.

I think would be very good to start investigating a new Storage based on
SQlite 4 along this year, even if that project has not yet released a
stable branch, we would be ready for that time and probably have
measurements to make informed decisions.

PS: while I love indexedDB and I think a very good work has been done
there, I'm a bit scared we are evaluating a wrapper around a wrapper
(Storage) around a relational ACID database engine as a general solution.

-m

Lawrence Mandel

未讀,

2013年5月2日晚上7:13:002013/5/2

收件者：Gregory Szorc、Taras Glek、David Rajchenbach-Teller、dev-platform

----- Original Message -----
> Great post, Taras!
>
> Per IRC conversations, we'd like to move subsequent discussion of
> actions into a meeting so we can more quickly arrive at a resolution.
>
> Please meet in Gregory Szorc's Vidyo Room at 1400 PDT Tuesday, April
> 30.
> That's 2200 UTC. Apologies to the European and east coast crowds. If
> you'll miss it because it's too late, let me know and I'll consider
> moving it.
>
> https://v.mozilla.com/flex.html?roomdirect.html&key=yJWrGKmbSi6S

Did someone post a summary of this meeting? Is there a link to share?

Lawrence

>
> On 4/29/13 10:51 AM, Taras Glek wrote:
> > * How to robustly write/update small datasets?
> >
> > #3 above is it for small datasets. The correct way to do this is to
> > write blobs of JSON to disk. End of discussion.
> >

> > Writes of data <= ~64K should just be implemented as atomic
> > whole-file
> > read/write operations. Those are almost always single blocks on
> > disk.
> >
> > Writing a whole file at once eliminates risk of data corruption.
> > Incremental updates are what makes sqlite do the WAL/fsync/etc
> > dance
> > that causes much of the slowness.
> >

Gregory Szorc

未讀,

2013年5月2日晚上7:36:152013/5/2

收件者：Lawrence Mandel、Taras Glek、David Rajchenbach-Teller、dev-platform

On 5/2/2013 4:13 PM, Lawrence Mandel wrote:
>
> ----- Original Message -----
>> Great post, Taras!
>>
>> Per IRC conversations, we'd like to move subsequent discussion of
>> actions into a meeting so we can more quickly arrive at a resolution.
>>
>> Please meet in Gregory Szorc's Vidyo Room at 1400 PDT Tuesday, April
>> 30.
>> That's 2200 UTC. Apologies to the European and east coast crowds. If
>> you'll miss it because it's too late, let me know and I'll consider
>> moving it.
>>
>> https://v.mozilla.com/flex.html?roomdirect.html&key=yJWrGKmbSi6S
> Did someone post a summary of this meeting? Is there a link to share?

Notes at https://etherpad.mozilla.org/storage-in-gecko

We seemed to converge on a (presumably C++-based) storage service that
has named branches/buckets with specific consistency, flushing, etc
guarantees. Clients would obtain a handle on a "branch," and perform
basic I/O operations, including transactions. Branches could be created
ad-hoc at run-time. So add-ons could obtain their own storage namespace
with the storage guarantees of their choosing. Under the hood storage
would be isolated so failures in one component wouldn't affect everybody.

We didn't have enough time to get into prototyping or figuring out who
would implement it.

Going forward, I'm not sure who should own this initiative on a
technical level. In classical Mozilla fashion the person who brings it
up is responsible. That would be me. However, I haven't written a single
line of C++ for Firefox and I have serious doubts I'd be effective.
Perhaps we should talk about it at the next Platform meeting.

Kyle Huey

未讀,

2013年5月2日晚上7:40:282013/5/2

收件者：Gregory Szorc、Taras Glek、David Rajchenbach-Teller、dev-platform、Lawrence Mandel

On Thu, May 2, 2013 at 4:36 PM, Gregory Szorc <g...@mozilla.com> wrote:

> We seemed to converge on a (presumably C++-based) storage service that has
> named branches/buckets with specific consistency, flushing, etc guarantees.
> Clients would obtain a handle on a "branch," and perform basic I/O
> operations, including transactions. Branches could be created ad-hoc at
> run-time. So add-ons could obtain their own storage namespace with the
> storage guarantees of their choosing. Under the hood storage would be
> isolated so failures in one component wouldn't affect everybody.
>

So this is basically prefs on steroids?

- Kyle

Gregory Szorc

未讀,

2013年5月2日晚上7:43:142013/5/2

收件者：Kyle Huey、Taras Glek、David Rajchenbach-Teller、dev-platform、Lawrence Mandel

Sure. I'd say it's key-value storage done right with a sane API and
consistency expectations and without the association with
"preferences"/about:config.

David Teller

未讀,

2013年5月3日凌晨1:51:472013/5/3

收件者：Gregory Szorc、Taras Glek、dev-platform、Lawrence Mandel

Whatever you do, please, please, please make sure that everything is worker-friendly.
If we can't write (or at least read) contents to that Key-Value store from a worker, we will need to reimplement everything in a few months.

Cheers,
David

----- Original Message -----
From: "Gregory Szorc" <g...@mozilla.com>
To: "Lawrence Mandel" <lma...@mozilla.com>
Cc: "David Rajchenbach-Teller" <dte...@mozilla.com>, "Taras Glek" <tg...@mozilla.com>, "dev-platform" <dev-pl...@lists.mozilla.org>
Sent: Friday, May 3, 2013 1:36:15 AM
Subject: Re: Storage in Gecko

On 5/2/2013 4:13 PM, Lawrence Mandel wrote:
>
> ----- Original Message -----
>> Great post, Taras!
>>
>> Per IRC conversations, we'd like to move subsequent discussion of
>> actions into a meeting so we can more quickly arrive at a resolution.
>>
>> Please meet in Gregory Szorc's Vidyo Room at 1400 PDT Tuesday, April
>> 30.
>> That's 2200 UTC. Apologies to the European and east coast crowds. If
>> you'll miss it because it's too late, let me know and I'll consider
>> moving it.
>>
>> https://v.mozilla.com/flex.html?roomdirect.html&key=yJWrGKmbSi6S
> Did someone post a summary of this meeting? Is there a link to share?

Notes at https://etherpad.mozilla.org/storage-in-gecko

We seemed to converge on a (presumably C++-based) storage service that
has named branches/buckets with specific consistency, flushing, etc
guarantees. Clients would obtain a handle on a "branch," and perform
basic I/O operations, including transactions. Branches could be created
ad-hoc at run-time. So add-ons could obtain their own storage namespace
with the storage guarantees of their choosing. Under the hood storage
would be isolated so failures in one component wouldn't affect everybody.

Ehsan Akhgari

未讀,

2013年5月3日凌晨2:51:242013/5/3

收件者：Gregory Szorc、Taras Glek、David Rajchenbach-Teller、dev-platform、Lawrence Mandel

On 2013-05-02 7:36 PM, Gregory Szorc wrote:
> On 5/2/2013 4:13 PM, Lawrence Mandel wrote:
>>
>> ----- Original Message -----
>>> Great post, Taras!
>>>
>>> Per IRC conversations, we'd like to move subsequent discussion of
>>> actions into a meeting so we can more quickly arrive at a resolution.
>>>
>>> Please meet in Gregory Szorc's Vidyo Room at 1400 PDT Tuesday, April
>>> 30.
>>> That's 2200 UTC. Apologies to the European and east coast crowds. If
>>> you'll miss it because it's too late, let me know and I'll consider
>>> moving it.
>>>
>>> https://v.mozilla.com/flex.html?roomdirect.html&key=yJWrGKmbSi6S
>> Did someone post a summary of this meeting? Is there a link to share?
>
> Notes at https://etherpad.mozilla.org/storage-in-gecko
>
> We seemed to converge on a (presumably C++-based) storage service that
> has named branches/buckets with specific consistency, flushing, etc
> guarantees. Clients would obtain a handle on a "branch," and perform
> basic I/O operations, including transactions. Branches could be created
> ad-hoc at run-time. So add-ons could obtain their own storage namespace
> with the storage guarantees of their choosing. Under the hood storage
> would be isolated so failures in one component wouldn't affect everybody.

It would be nice to come up with an implementation plan. During the
meeting a number of issues raised questions in my mind, such as how
we're going to do a flush free writeAtomic function, and what we're
going to do in case a writeAtomic call fails (our current implementation
seems to ignore such failures:
https://bugzilla.mozilla.org/show_bug.cgi?id=867406).

Does anybody know how we can implement the above?

Ehsan

Axel Hecht

未讀,

2013年5月3日清晨5:15:182013/5/3

收件者：

Did you guys talk about build-time configuration stuff we're storing in
prefs?

I'm asking because I just added some prefs that really no user should
temper with. browser.searchorder might be another example. Arguably, we
could use this storage to switch off features, too.

IOW, can we bootstrap the data in storage by build-time data?

Axel

Benjamin Smedberg

未讀,

2013年5月3日上午9:29:082013/5/3

收件者：Gregory Szorc、Taras Glek、Kyle Huey、dev-platform、David Rajchenbach-Teller、Lawrence Mandel

On 5/2/2013 7:43 PM, Gregory Szorc wrote:
> On 5/2/2013 4:40 PM, Kyle Huey wrote:
>>
>>
>> On Thu, May 2, 2013 at 4:36 PM, Gregory Szorc <g...@mozilla.com
>> <mailto:g...@mozilla.com>> wrote:
>>

>> We seemed to converge on a (presumably C++-based) storage service
>> that has named branches/buckets with specific consistency,
>> flushing, etc guarantees. Clients would obtain a handle on a
>> "branch," and perform basic I/O operations, including
>> transactions. Branches could be created ad-hoc at run-time. So
>> add-ons could obtain their own storage namespace with the storage
>> guarantees of their choosing. Under the hood storage would be
>> isolated so failures in one component wouldn't affect everybody.
>>
>>

>> So this is basically prefs on steroids?
>
> Sure. I'd say it's key-value storage done right with a sane API and
> consistency expectations and without the association with
> "preferences"/about:config.

Will it also be an asynchronous API from the main thread and a
potentially synchronous API from workers?

Transactions already sound like too much API for most clients; perhaps
we need them, but can we also have an optimized API for "store this blob
of data in key 'foo.bar.baz'" ?

--BDS

Ehsan Akhgari

未讀,

2013年5月3日上午11:54:562013/5/3

收件者：Axel Hecht、dev-pl...@lists.mozilla.org

We're not going to change how the existing preferences work in this project.

Cheers,
Ehsan

Honza Bambas

未讀,

2013年5月3日下午6:37:472013/5/3

收件者：dev-pl...@lists.mozilla.org

If you guys don't need transactions and only a simple key+value storage,
the new localstorage code be used. Only thing needed to make it work in
a completely non-blocking way is to expose API telling the consumer that
localstorage has loaded from disk and is fully cached in memory and thus
access to it won't block. All writes are async.

-hb-

David Dahl

未讀,

2013年5月6日中午12:41:082013/5/6

收件者：Gregory Szorc、Taras Glek、David Rajchenbach-Teller、dev-platform、Lawrence Mandel、Kyle Huey

KyotoCabinet might make a good backend for a new storage API:

http://fallabs.com/kyotocabinet/

There is also a companion indexing engine: http://fallabs.com/kyototycoon/

Regards,

David

----- Original Message -----
From: "Gregory Szorc" <g...@mozilla.com>
To: "Kyle Huey" <m...@kylehuey.com>
Cc: "Taras Glek" <tg...@mozilla.com>, "David Rajchenbach-Teller" <dte...@mozilla.com>, "dev-platform" <dev-pl...@lists.mozilla.org>, "Lawrence Mandel" <lma...@mozilla.com>
Sent: Thursday, May 2, 2013 6:43:14 PM
Subject: Re: Storage in Gecko

On 5/2/2013 4:40 PM, Kyle Huey wrote:
>
>
> On Thu, May 2, 2013 at 4:36 PM, Gregory Szorc <g...@mozilla.com
> <mailto:g...@mozilla.com>> wrote:
>

> We seemed to converge on a (presumably C++-based) storage service
> that has named branches/buckets with specific consistency,
> flushing, etc guarantees. Clients would obtain a handle on a
> "branch," and perform basic I/O operations, including
> transactions. Branches could be created ad-hoc at run-time. So
> add-ons could obtain their own storage namespace with the storage
> guarantees of their choosing. Under the hood storage would be
> isolated so failures in one component wouldn't affect everybody.
>
>

> So this is basically prefs on steroids?

Sure. I'd say it's key-value storage done right with a sane API and
consistency expectations and without the association with
"preferences"/about:config.

Jed Davis

未讀,

2013年5月6日下午2:34:542013/5/6

收件者：David Dahl、Taras Glek、David Rajchenbach-Teller、Gregory Szorc、Lawrence Mandel、Kyle Huey、dev-platform

On Mon, May 06, 2013 at 09:41:08AM -0700, David Dahl wrote:
> KyotoCabinet might make a good backend for a new storage API:
>
> http://fallabs.com/kyotocabinet/

It's released under the GPL, so it's MPL-incompatible, if I understand
correctly. As for the "Kyoto Products Specific FOSS Library Linking
Exception", at http://fallabs.com/license/linkexception.txt -- it
currently lists exactly one library (not us) and seems to indicate that,
even if Gecko were so listed, a "Specific Library" that re-exports Kyoto
Cabinet's functionality to other applications would not be allowed.

--Jed (not a lawyer)

David Dahl

未讀,

2013年5月6日下午3:12:542013/5/6

收件者：Jed Davis、Taras Glek、David Rajchenbach-Teller、Gregory Szorc、Lawrence Mandel、Kyle Huey、dev-platform

That is unfortunate. The Kyoto-* tools are FAST and easy to use. I wonder if the author would be willing to issue Mozilla a license that is compatible with MPL?

Cheers,

David

----- Original Message -----
From: "Jed Davis" <j...@mozilla.com>
To: "David Dahl" <dd...@mozilla.com>
Cc: "Gregory Szorc" <g...@mozilla.com>, "Taras Glek" <tg...@mozilla.com>, "David Rajchenbach-Teller" <dte...@mozilla.com>, "dev-platform" <dev-pl...@lists.mozilla.org>, "Lawrence Mandel" <lma...@mozilla.com>, "Kyle Huey" <m...@kylehuey.com>
Sent: Monday, May 6, 2013 1:34:54 PM
Subject: Re: Storage in Gecko

Gervase Markham

未讀,

2013年5月7日清晨7:04:052013/5/7

收件者：David Dahl、Jed Davis、Taras Glek、David Rajchenbach-Teller、Gregory Szorc、Lawrence Mandel、Kyle Huey、dev-platform

On 06/05/13 20:12, David Dahl wrote:
> That is unfortunate. The Kyoto-* tools are FAST and easy to use. I
> wonder if the author would be willing to issue Mozilla a license that
> is compatible with MPL?

That would be the functional equivalent of relicensing under the MPL,
which is a weaker copyleft than he is using. Given that they have
thought about their licensing hard enough to have a FLOSS linking
exception etc. etc., I suspect that would not go down well. But if you
want the licensing team to look into it, file a bug :-)

Gerv

Gervase Markham

未讀,

2013年5月7日清晨7:04:052013/5/7

收件者：David Dahl、Jed Davis、Taras Glek、David Rajchenbach-Teller、Gregory Szorc、Lawrence Mandel、Kyle Huey、dev-platform

On 06/05/13 20:12, David Dahl wrote:

> That is unfortunate. The Kyoto-* tools are FAST and easy to use. I
> wonder if the author would be willing to issue Mozilla a license that
> is compatible with MPL?

Marco Bonardo

未讀,

2013年5月7日下午1:46:362013/5/7

收件者：

On 06/05/2013 18:41, David Dahl wrote:> KyotoCabinet might make a good

backend for a new storage API:
>
> http://fallabs.com/kyotocabinet/
>
> There is also a companion indexing engine: http://fallabs.com/kyototycoon/
>
> Regards,
>
> David

SQLite4 implements something very similar (log-structured merge
database) and, so far, is reported being faster or on-par with KyotoCabinet.
It's also already compatible with our license and we are already working
with them, that means we basically have a red carpet for using it.
Though, as previously expressed, a database implementation is only a
possible part of the solution, not the solution.

-m

Robert Kaiser

未讀,

2013年5月9日下午6:47:592013/5/9

收件者：

Gregory Szorc schrieb:
> Perhaps this should be advertised more, especially to the add-on
> community. Looking at about:config of my main profile, about 2/3 of my
> preferences are user set. There are hundreds of preferences apparently
> being used for key-value storage by add-ons (not to pick on one, but
> HTTPS Everywhere has a few hundred prefs).

FWIW, we had a pretty high-ranking topcrash in
https://bugzilla.mozilla.org/show_bug.cgi?id=836263 where an add-on
stored strings up to at least 128MB in prefs, and Nightly now limits
prefs to 1MB max, and that add-on switched to indexedDB instead. This
should happen more, for sure. I'd think that anything larger than a few
KB probably doesn't belong in a pref.
We couldn't easily set that limit mentioned above as low as we'd like
because up to now we had no limit at all and some add-ons definitely
store large stuff in prefs. Given that we probably read prefs in startup
and before we can do anything useful with the launched Firefox, I wonder
how much of a perf problem those large prefs tend to be, actually.

Robert Kaiser

David Rajchenbach-Teller

未讀,

2013年5月10日清晨6:07:462013/5/10

收件者：Robert Kaiser、dev-pl...@lists.mozilla.org

I'd even go as far as limiting it to 16kb.
(possibly with a transition phase during which going above 16kb only
prints warnings)

On 5/10/13 12:47 AM, Robert Kaiser wrote:
> FWIW, we had a pretty high-ranking topcrash in
> https://bugzilla.mozilla.org/show_bug.cgi?id=836263 where an add-on
> stored strings up to at least 128MB in prefs, and Nightly now limits
> prefs to 1MB max, and that add-on switched to indexedDB instead. This
> should happen more, for sure. I'd think that anything larger than a few
> KB probably doesn't belong in a pref.
> We couldn't easily set that limit mentioned above as low as we'd like
> because up to now we had no limit at all and some add-ons definitely
> store large stuff in prefs. Given that we probably read prefs in startup
> and before we can do anything useful with the launched Firefox, I wonder
> how much of a perf problem those large prefs tend to be, actually.
>
> Robert Kaiser

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

--
David Rajchenbach-Teller, PhD
Performance Team, Mozilla

Robert Kaiser

未讀,

2013年5月15日晚上8:26:382013/5/15

收件者：

David Rajchenbach-Teller schrieb:

> I'd even go as far as limiting it to 16kb.
> (possibly with a transition phase during which going above 16kb only
> prints warnings)

I think most of us agree, but the problem is that apparently a number of
add-ons rely on large prefs atm, so right now we did set to 1MB.

Adding a warning for everything over 10KB or 16KB or something and
targeting to move the limit down to that at some point would surely be a
good idea, and I'd be happy about someone filing a bug and patch about this.

For now, the important thing was to prohibit the case that caused
crashes, but we surely can do better and reduce that kind of "prefs
mis-use" in favor of indexedDB, etc.

Robert Kaiser

David Rajchenbach-Teller

未讀,

2013年5月16日清晨6:27:112013/5/16

收件者：Robert Kaiser、dev-pl...@lists.mozilla.org

On 5/16/13 2:26 AM, Robert Kaiser wrote:
> David Rajchenbach-Teller schrieb:
>> I'd even go as far as limiting it to 16kb.
>> (possibly with a transition phase during which going above 16kb only
>> prints warnings)
>
> I think most of us agree, but the problem is that apparently a number of
> add-ons rely on large prefs atm, so right now we did set to 1MB.
>
> Adding a warning for everything over 10KB or 16KB or something and
> targeting to move the limit down to that at some point would surely be a
> good idea, and I'd be happy about someone filing a bug and patch about
> this.

Filed:
https://bugzilla.mozilla.org/show_bug.cgi?id=872980
https://bugzilla.mozilla.org/show_bug.cgi?id=872981

Cheers,
David