Meeting with Mozilla about the FileSystem API

112 views
Skip to first unread message

Eric Uhrhane

unread,
Apr 26, 2013, 7:46:37 PM4/26/13
to stora...@chromium.org
[once again from the right address]

These are pretty rough; feel free to ask about anything that you'd
like to know more about.

The basic summary is that Mozilla has come around to the idea of
supporting a FileSystem API, but doesn't like the one we've been
working on in the WebApps working group for the past few years, so
they're proposing a new one:
http://lists.w3.org/Archives/Public/public-webapps/2013AprJun/0382.html

There was more discussion of this at the WebApps WG face-to-face
yesterday as well:
http://www.w3.org/2013/04/25-webapps-minutes.html#item13

2013-04024

Attendees:

Eric Uhrhane - Google - FileSystem

Brian Stell - Google - Internationalization

Steve VanDeBogart - Google - Media Galleries

Jonas Sicking - Mozilla - Web API standards, especially IDB and File

Jan Varga - Mozilla - storage API implementer

Ben Turner - Mozilla - IDB implementer


Eric

No strict agenda

Want to hear thoughts on new filesystem api

Brian is a FileSystem api user

Steve works on media gallery


Jonas

based on blog post, we think we need an FS API

persistent urls

people understand it

don’t want to use IDB as a FileSystem

not married to current api, but want the capabilities

met with apple they had similar concerns, api is too complicated

came up with similar proposal to theirs

want file locking

at meeting with Apple about 3 weeks ago (before blink fork), came up
with joint proposal between Maciej’s and Jonas’s specs

will publish spec later today

Is apple really interested in implementing this?

not opposed to it, perhaps, but that was before the blink fork


Discussion of the API proposal below.

potential security issues with “give me a writable form of this File”,
given that other APIs that vend Files will assume they’re read-only

still thinking about directory enumeration; future does not have cursor yet

Tab Atkins has some thoughts on this

get file -> future

also readAsText, readAsArraybuffer

maintains locks

thinking move these to file object

file object can interact with other apis

awkward to add locking, awkward to use

The FileReader is like XHR because people asked for XHR, since everyone knows it

discussion public cord list

why not like other JS apis

keep it consistent with DOM apis

better to create a good api rather than consistent with DOM apis

Steve: to validate as media file before writing, write locally first, then move

wants atomic move to a different filesystem

can future have moveTo/copyTo

want simple put method

here is name, here is blob, done

works as a file copy, but can’t be used for easy directory copy

put could be a small amount of JS

common use case

recursive copy would be more complicated and messy

put got renamed as create

rename is an alternate to move

prefer move

this API has only one filesystem

Same as Google API; the name field is only decorative

temporary/persistent split means we can still have multiple
filesystems in either API, which means we can use this for media
galleries

apple not interested in access to media

no strong opinion on recursive copy support

if api gets too big apple will likely push back

api is nice locking

differences vs. Google FS API

do not have file locking

do not have atomic append

do not have flush

persistent URLs

what the URLs would be

persistent URLs in IDB

G not looking for this

not necessary if we have a fileystem

not sure, if we persistent URLs in filesystem

do we need them in IDB

annoying to register with each data

perhaps navigation controller do this

performance, latency

apple have their own particular problems

blink

what would IDB need to make a poly

persistent URL

file locking

cramming the filesystem in IDB will make it bloated

only put in IDB if

filesystem is failing

?

palatable vs potentially acceptable

only if other things fail

is apple considering IDB

could be made to work in webkit with not too much work

apple is not against IDB

apple asked if these feature could be handled in IDB

microsoft, only if 3 players

could do as a poly fill with some extensions to IDB

not expected to share outside apps

very complicated to make work with code outside the browser

path name length issues

feedback on blog post was confused about who can see the files

scary to use real filename / filename extensions

make security issues worse

real files give speed

do we want the directory abstraction?

had top level directory

what do people understand

directory abstraction is useful

helps with disjoint storage area

treat directory as a capability, pass directory object to someone else

share between domains

get parent

need to consider security

no get(“..”)?

can one get file path to find parent?

using ‘./’

directory level locking

not needed

what happens if one is writing a file into and the other is removing

for more serious locking use IDB

define what happens when one is writing and another is removing

don't lock by creating a file

want common file behavior


how many browsers

chrome has filesystem api

in use

no future

want a filesystem api

poly fill

missing locking

if there are 2 other browsers then chrome will implement

adrian - we have no interest


Brian’s use case:

He works on internationalization. They’re experimenting with web
fonts to get consistency across platforms. Chinese/Japanese fonts are
huge, and have terrible download latency.

Two use cases: drive-by web, and returning to a site previously visited.

On first visit, background-download the font for later. Add some CSS
to the filesystem as well for use. On second visit, use persistent
URL to CSS file in HEAD request to see if you can use the stored font
or not.


Latency in fonts is critical. If you draw in a fallback font and
later switch to the good font, you get a “Flash Of Unstyled Text”
[FOUT], which is unacceptable. So we need to decide one way or the
other before drawing at all, so there’s no time to do an async IDB
request. The decision is made in the HEAD.


Separate use case: In CJK pages, count #unique characters per page.

Simplified Chinese has about 21000 characters. To cover 50% of web
pages, you need more like 5000 characters. On an individual page,
though, 98% of pages uses fewer than 800 characters. So if caching
per-page subset, your download is much smaller, and can be computed
server-side. You can’t do this in AppCache, but you can
incrementally-load the font files and construct them on disk using the
filesystem API. So each new page gets the benefit of previous pages,
but downloads are smaller.


[discussion on speed effects of persistent URLs vs IDB; conclusion is
that URLs are faster, by some unknown amount, and are likely to stay
that way]


CORS

very tight sandbox

SHA1 ids

copy-on-write

would like a way for origins to cooperate


Media galleries

(some group) sys apps?

media galleries in phase 2

Darin Fisher

unread,
Apr 29, 2013, 7:57:24 PM4/29/13
to Eric Uhrhane, stora...@chromium.org
It's not so clear to me that the FileSystem API should expose locking.  It is my recollection that some network-based filesystems may have trouble implementing that well.

That said, application level locking is important.  I think we could decouple it from the filesystem, and expose an API like the following:

On a different topic, +1 for adding a "flush" method :-)

-Darin

Eric Uhrhane

unread,
Apr 29, 2013, 8:00:46 PM4/29/13
to Darin Fisher, stora...@chromium.org
On Mon, Apr 29, 2013 at 4:57 PM, Darin Fisher <da...@chromium.org> wrote:
> It's not so clear to me that the FileSystem API should expose locking. It
> is my recollection that some network-based filesystems may have trouble
> implementing that well.

This has nothing to do with the OS-level file locks, if I understand
things correctly. It's just for access to the file through the
browser API, so it should be no problem to implement. This API
doesn't even guarantee [or even hint] that you're accessing real
files; they could be data in a database somewhere.

> That said, application level locking is important. I think we could
> decouple it from the filesystem, and expose an API like the following:
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-September/022810.html

Interesting, but orthogonal.

Jonas Sicking

unread,
Apr 29, 2013, 9:06:59 PM4/29/13
to Eric Uhrhane, Darin Fisher, stora...@chromium.org
On Mon, Apr 29, 2013 at 5:00 PM, Eric Uhrhane <er...@chromium.org> wrote:
> On Mon, Apr 29, 2013 at 4:57 PM, Darin Fisher <da...@chromium.org> wrote:
>> It's not so clear to me that the FileSystem API should expose locking. It
>> is my recollection that some network-based filesystems may have trouble
>> implementing that well.
>
> This has nothing to do with the OS-level file locks, if I understand
> things correctly. It's just for access to the file through the
> browser API, so it should be no problem to implement. This API
> doesn't even guarantee [or even hint] that you're accessing real
> files; they could be data in a database somewhere.

I agree. The implementation of the locking doesn't have to depend on
OS-level locks at all. The implementation that we currently have does
all the locking management in the UA.

At the same time we do try to open the file using exclusive locks such
that users interacting with the file through other means is less
likely to see inconsistent data or stomp on half-written data.

But this is much less important in the sandboxed filesystem since it's
less likely that anyone other than the browser will be in there
messing around. So if the platform doesn't support that type of
locking then that's still unlikely to result in bugs.

/ Jonas

Darin Fisher

unread,
Apr 29, 2013, 11:34:03 PM4/29/13
to Eric Uhrhane, stora...@chromium.org
Hmm, OK... I get that this doesn't have to be about real file locking.  (Although, it is curious that Jonas says Mozilla is going to do real file locking even though that is known to be flaky.  I'm not sure what the use cases are for needing real file locking.)

Stepping back, why do we need a file locking API?  I presume it is so that content from the same origin running in different contexts (e.g., a document context and a worker context) can coordinate access to a file.

I think the locking / synchronization problem is way more general.  It seems like a mistake to focus only on file locking.  People do all manner of ugly things to implement application level locks today.  The problem with those solutions is that they don't get automatically cleaned up when an owning context closes.

Why is it better to implement locking on files rather than an independent, named locks API such as the one I proposed?

-Darin

Eric Uhrhane

unread,
Apr 30, 2013, 12:26:36 AM4/30/13
to Darin Fisher, stora...@chromium.org
On Mon, Apr 29, 2013 at 8:34 PM, Darin Fisher <da...@chromium.org> wrote:
> Hmm, OK... I get that this doesn't have to be about real file locking.
> (Although, it is curious that Jonas says Mozilla is going to do real file
> locking even though that is known to be flaky. I'm not sure what the use
> cases are for needing real file locking.)
>
> Stepping back, why do we need a file locking API? I presume it is so that
> content from the same origin running in different contexts (e.g., a document
> context and a worker context) can coordinate access to a file.

Can coordinate, or at least cannot easily self-stomp by accident. If
you just want to append to a log file, or do a quick
read-modify-write, granting the guarantee of atomicity seems simple
and helpful, and far less complicated than asking someone to use a
separate locking API.

Darin Fisher

unread,
Apr 30, 2013, 12:34:03 AM4/30/13
to Eric Uhrhane, stora...@chromium.org
On Mon, Apr 29, 2013 at 9:26 PM, Eric Uhrhane <er...@chromium.org> wrote:
On Mon, Apr 29, 2013 at 8:34 PM, Darin Fisher <da...@chromium.org> wrote:
> Hmm, OK... I get that this doesn't have to be about real file locking.
> (Although, it is curious that Jonas says Mozilla is going to do real file
> locking even though that is known to be flaky.  I'm not sure what the use
> cases are for needing real file locking.)
>
> Stepping back, why do we need a file locking API?  I presume it is so that
> content from the same origin running in different contexts (e.g., a document
> context and a worker context) can coordinate access to a file.

Can coordinate, or at least cannot easily self-stomp by accident.  If
you just want to append to a log file, or do a quick
read-modify-write, granting the guarantee of atomicity seems simple
and helpful, and far less complicated than asking someone to use a
separate locking API.


I see.  It is simpler because it is just built-in semantics of the FileHandle and FileHandleWritable types?  The lifetime of those objects defines the lifetime of the locks.  Makes sense.

Hmm. it seems plausible to implement the locking API I mentioned in terms of FileHandles then.  The only downside to that approach is the somewhat gratuitous creation of files used to name the locks.  I know that's par for the course on some operating systems.  I was just hoping we could do better.

-Darin

David Barrett-Kahn

unread,
Apr 30, 2013, 12:08:04 PM4/30/13
to Darin Fisher, Eric Uhrhane, stora...@chromium.org
2c from docs offline experience, we could certainly use a general purpose cross-browsing-context locking system.  There is no good way to do that right now.  If we didn't want to design anything purpose specific, there is a small change which could be made which would enable developers to write something themselves.

MessageChannel currently does not tell one holder of a MessagePort when the other one has gone away or discarded the port.  If it did, a reliable locking system could be built on top of that and a shared worker.  At one point disconnect events were specified, but they were removed over concerns that they would reveal the time at which the MessagePorts were garbage collected.  That didn't and doesn't seem like a good enough reason not to include the feature.

Regards,

-Dave
--
-Dave

Darin Fisher

unread,
Apr 30, 2013, 12:18:40 PM4/30/13
to David Barrett-Kahn, Eric Uhrhane, stora...@chromium.org
That said, shared workers seem like a really heavyweight solution to locking.  Granted, if you already have a shared worker, then adding such an API might be nice.  Perhaps using FileHandle as a lock would be sufficient?

One other concern I have about the proposed API:

Today, we leverage FileEntry when we need a way to refer to a writable file location.  Unlike FileHandleWritable, FileEntry does not hold a lock on the file.  As it is today, a File is just a readable file location.  I suppose if we added an API to get the Directory containing a File, then we'd be all set.

There are use cases with Chrome Apps to launch an app, passing it a writable file location.

Thoughts?
-Darin

David Barrett-Kahn

unread,
Apr 30, 2013, 12:24:36 PM4/30/13
to Darin Fisher, Eric Uhrhane, stora...@chromium.org
Agreed shared workers are not a great answer unless you have one already.  I just wanted to point out an opportunity to add an easy to implement feature with non-obvious benefits.

-Dave
--
-Dave

Jonas Sicking

unread,
Apr 30, 2013, 2:54:18 PM4/30/13
to Darin Fisher, Eric Uhrhane, storage-dev
On Mon, Apr 29, 2013 at 9:34 PM, Darin Fisher <da...@chromium.org> wrote:
> On Mon, Apr 29, 2013 at 9:26 PM, Eric Uhrhane <er...@chromium.org> wrote:
>>
>> On Mon, Apr 29, 2013 at 8:34 PM, Darin Fisher <da...@chromium.org> wrote:
>> > Hmm, OK... I get that this doesn't have to be about real file locking.
>> > (Although, it is curious that Jonas says Mozilla is going to do real
>> > file
>> > locking even though that is known to be flaky. I'm not sure what the
>> > use
>> > cases are for needing real file locking.)
>> >
>> > Stepping back, why do we need a file locking API? I presume it is so
>> > that
>> > content from the same origin running in different contexts (e.g., a
>> > document
>> > context and a worker context) can coordinate access to a file.
>>
>> Can coordinate, or at least cannot easily self-stomp by accident. If
>> you just want to append to a log file, or do a quick
>> read-modify-write, granting the guarantee of atomicity seems simple
>> and helpful, and far less complicated than asking someone to use a
>> separate locking API.
>
> I see. It is simpler because it is just built-in semantics of the
> FileHandle and FileHandleWritable types?

Yeah. I would expect that most developers don't think about the fact
that there can be race conditions if the user has several tabs open.
So if the default behavior of the API can catch the most common
patterns then that's great.

> The lifetime of those objects defines the lifetime of the locks. Makes sense.

It's not actually tied to the lifetime of the objects. Instead it
works similar to WebSQL and IDB in that a lock is being held as long
as it's used. As soon as the last callback happens without new
requests being placed, the lock is released.

> Hmm. it seems plausible to implement the locking API I mentioned in terms of
> FileHandles then. The only downside to that approach is the somewhat
> gratuitous creation of files used to name the locks. I know that's par for
> the course on some operating systems. I was just hoping we could do better.

I wouldn't mind investigating this as well. The current locking
mechanisms that we have fairly aggressively automatically release the
lock which isn't always fulfilling people's requirements. So something
like this could help bridge the gap.

There are lots of tricky questions here though. How much should we
worry about "asynchronous deadlocks", i.e. two pages both holding a
lock while waiting for a callback to grab another lock? Or even having
the same thing happening within a single page?

Should we hand-hold people here or simply put dire warnings in the
challanges of lock usage in the docs?

Should we forbid a page grabbing multiple locks one after another and
instead create an API which allows grabbing/requesting multiple locks
at the same time?

Should we create some sort of inherent order between the locks and
forbid grabbing them out-of-order?

/ Jonas

Jonas Sicking

unread,
Apr 30, 2013, 3:02:07 PM4/30/13
to Darin Fisher, David Barrett-Kahn, Eric Uhrhane, stora...@chromium.org
On Tue, Apr 30, 2013 at 9:18 AM, Darin Fisher <da...@chromium.org> wrote:
> That said, shared workers seem like a really heavyweight solution to
> locking. Granted, if you already have a shared worker, then adding such an
> API might be nice. Perhaps using FileHandle as a lock would be sufficient?
>
> One other concern I have about the proposed API:
>
> Today, we leverage FileEntry when we need a way to refer to a writable file
> location. Unlike FileHandleWritable, FileEntry does not hold a lock on the
> file. As it is today, a File is just a readable file location. I suppose
> if we added an API to get the Directory containing a File, then we'd be all
> set.
>
> There are use cases with Chrome Apps to launch an app, passing it a writable
> file location.
>
> Thoughts?

We have a similar concept to FileEntry in Gecko. I.e. an unopened file
that can be opened and written to. I think that can be added to this
proposal. There wasn't a need yet so we didn't add it. But in Gecko I
think we'll rename our existing FileEntry equivalent to WritableFile
(or FileWritable or some such) and stick
openForWriting/openForAppending/openForReading on it.

I'm not sure that adding a way to get at the Directory solves the
problem in a particularly good way since that means that you can write
to *any* file in the directory.

Also, I quite like the fact that in the current API you can't ever
walk "up" the tree and get access to more objects than the ones you
are given a reference to. I.e. you can hand someone a reference to a
File without worrying that they can get to other Files. Or you can
hand someone a reference to a Directory without worrying that they can
get to parent Directories.

/ Jonas

Greg Billock

unread,
May 20, 2013, 5:55:03 PM5/20/13
to Jonas Sicking, Darin Fisher, David Barrett-Kahn, Eric Uhrhane, stora...@chromium.org
Have you considered a filesystem API consisting of a hierarchical Device Storage or async localStorage look-alike? That is, suppose there were a writable/lockable storage API consisting of objects which could contain other objects. The roots of those trees would have metadata like "in-memory", "session", "local", "persistent", etc. but they'd all basically be trees of storage nodes with the same API. Which themselves would look like Directory(Entry)'s containing FileHandle/FileEntry type objects as well as more Directory(Entry)s.

The bit of additional complexity for something like localStorage is mostly going to come with the async API anyway, and this something like this might make it more clear to developers what they get themselves into by using something like localStorage. Storage variants can be tied together in a common way to the quota API as well.

I think there's a good case for a qualitative difference between this and IndexedDB -- and this makes it a bit clearer: it's hierarchical and more natively File/Blob-like, rather than table-like and more string/bytes oriented. The use cases are clear to developers: database vs. filesystem-like storage.

Reply all
Reply to author
Forward
0 new messages