Layering a cache over FileReader provides unexpectedly impressive improvements

276 views
Skip to first unread message

Pierre Lebeaupin

unread,
Jul 5, 2016, 1:47:29 AM7/5/16
to Chromium-discuss
Hi all,
The short version: I found reading of local files (via FileReader API) to be slow in Chrome, so I added a caching layer in JavaScript to my code, and lo and behold, I get a 10x (ten times, 1000%) performance improvement. How come this isn't cached at a lower layer?!

The longer version: I'm exploring the viability of developing and deploying desktop-like apps using the web, in particular the recent functionalities that bridge the gap: offline usage using AppCach… I mean Service Workers, high-performance JS engines, reading local files, locally saving files without going through a server, etc.

My first project applies binary patches (in a format you may or may not have heard of called IPS) and allows the user to save the result. I wrote code that correctly decodes the patch format in both Chrome and Firefox using File/Blob/FileReader/Uint8Array/etc. Web APIs, and after polishing it I published the app. However, despite some efforts Chrome remained massively slower than Firefox, and we're talking 50 seconds (Chrome desktop) against 7 seconds (Firefox desktop) to process a pair of files (attached here); it's not just "Chrome barely loses the benchmark race" here. I thought this was not something I could act on from within the confines of my web app, and moved on.

Later on, while developing support for another patch format, I had to process every single byte of the files (for CRC, if you want to know), which is best done in blocks of, say, 1024 bytes rather than reading from the file byte by byte. Given my prior experiences, I dreaded the performance penalty from having to (re-)visit every single byte of the file, but it turned out to perform surprisingly well. Why couldn't I get the same performance when processing IPS files?

So as a proof of concept I started developing a layer that would read from the file in blocks of 4096 bytes, then serve read requests from the loaded data, entirely in JavaScript*. It is the dumbest file cache you could possibly imagine: there is only one cache bucket, it can only be loaded from whole block-aligned ranges in the file, with the result that a number of requests, e.g. those that cross block-aligned boundaries, or those that load from the remainder of the file that can't form a whole block, have to sidestep the cache and be served from the file separately.

And now I turn on the cache, and measure the performance improvement… and files that used to take 50 seconds to process now take 5 seconds! And the 10x factor is consistent, applying over various source files, often turning the processing time into "too short to measure", and over various platforms: the same files which took ~200 seconds on Android now take 20.** You can try it out yourself with the attached files: put base as the "original file" and patchlolv4 as the "IPS format patch" in both http://wanderingcoder.net/projects/JPS/ (currently deployed version, without cache at the time of this writing) and http://wanderingcoder.net/projects/JPS-dev/cache-demonstrator/ (demo with proof of concept caching added), and compare processing times.***

What the heck?!

See, operating systems already cache filesystem reads to an important degree, and even if I don't know how much Windows and GNU/Linux do so exactly, Mac OS X has aggressive filesystem caching both in kernel and in userspace, so it is baffling that Chrome does not benefit from it (if it did benefit, I would not be able to extract such performance improvements by adding a file cache at such a high level). Either Chrome FileReader requests come with such an overhead that simply reducing how many are performed dramatically improves performance, of Chrome disabled lower level caches without having caching of its own, which would be a mistake, at the very least. This is such a generally useful performance improvement that you'd think the browser would provide it out of the box.

So, can't you Chrome/Chromium guys do better?

Pierre Lebeaupin

(obligatory references: this is using Chrome 51.0.2704.106 (64-bit) on Mac OS X 10.11.5 (15F34) on an early 2009 single-proc Mac pro, the files are on a spinning hard drive (7200 RPM IIRC) using a 3Gb SATA link, Firefox is Firefox 47 running on the same OS, and by Android, it is more specifically a Galaxy S II running Android 4.1.2 and Chrome 46; yes, I now realize this is a bit outdated)

*I first wanted to try and use an existing one, but did not manage to find one. Perhaps I did not search hard enough, though.

**interestingly, similar improvements can be observed in Firefox, though a bit less than 10x, but still impressive, so don't think this allows Chrome to catch up with Firefox.

***Usage patterns of the file APIs by my code is relatively straightforward: the file format is divided in chunks, the first 5 bytes of which is a header that has to be interpreted since it contains the chunk destination position and length, and so brought on through FileReader, and the remainder of the chunk is just sliced to be concatenated with chunks coming from the base file. The headers are small, but are scattered in the file, sometimes only a few bytes apart, and even if a chunk can be about 2^16 bytes long, in practice (especially given encoders are typically suboptimal) after a transient period they tend to be roughly 2^8 bytes apart in the most complex patches, i.e. those for which performance matters most. I cannot load multiple headers at the same time, as I can't know where a chunk starts without reading the header of the chunk immediately preceding it.
base
patchlolv4

PhistucK

unread,
Jul 5, 2016, 2:23:39 AM7/5/16
to wanderi...@sfr.fr, Chromium-discuss
You can search crbug.com for an existing issue and star it. If you cannot find one, file a new issue using the "New issue" link on the same page.
Please, do not add a "+1" or "Me too" or "Confirmed" (or similar) comment. It just wastes the time of Chrome engineers and sends unnecessary e-mails to all of the people who starred the issue.

You can reply with a link to the found or created issue and might get triaged (and fixed) faster.

Thank you.



PhistucK

--
--
Chromium Discussion mailing list: chromium...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-discuss

---
You received this message because you are subscribed to the Google Groups "Chromium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-discu...@chromium.org.

Pierre Lebeaupin

unread,
Jul 5, 2016, 4:15:43 AM7/5/16
to Chromium-discuss, wanderi...@sfr.fr
I wasn't sure I should directly file this as a bug, so thanks for the indication. I found and starred https://bugs.chromium.org/p/chromium/issues/detail?id=349304 and added a comment telling of the improvements I got from the cache, which should help assess the severity.

Thanks again.

Pierre Lebeaupin

unread,
Jul 6, 2016, 4:56:51 PM7/6/16
to Chromium-discuss, wanderi...@sfr.fr
FYI, I'm going to go ahead and productize the cache (which is a proof of concept at this point) since I still don't know if or when this Chrome performance issue is going to get fixed.

Owen Campbell-Moore

unread,
Aug 4, 2016, 3:15:41 AM8/4/16
to Chromium-discuss, wanderi...@sfr.fr
Very interesting. Thanks a lot for the precise expose of the issue, I wish all issues highlighted could be written like this!

I'll flag this issue to the storage team, hopefully they can chime in with more info.

Thanks again

Owen Campbell-Moore

unread,
Aug 4, 2016, 3:15:41 AM8/4/16
to Chromium-discuss, wanderi...@sfr.fr
Huh. That looks quite remarkable. Thanks for doing the research and putting it together in such a precise way.

I'll flag this to the storage team and see what they make of it.

Thanks

On Wednesday, July 6, 2016 at 1:56:51 PM UTC-7, Pierre Lebeaupin wrote:

Pierre Lebeaupin

unread,
Aug 7, 2016, 2:56:01 PM8/7/16
to Chromium-discuss, wanderi...@sfr.fr
You're welcome!

I have now productized the cache, it is hosted at https://bitbucket.org/Pierre_Lebeaupin/simplefilecachejs

One additional data point I found out during productization: due to unreliability of the initial version I replaced the initial todolist system by a systematic use of setTimeout(,0) (to simulate the async interface), but that version of the cache turned out to be even slower than without the cache; I had to reinstate the todolist, except in cases it is not being serviced in which case I use setTimeout(,0), and then I again got good performance.

This suggests the slowness to be simply the overhead from calling and getting called back by the web APIs (whether it is FileReader or setTimeout), which I evaluate to being at least 2ms per calling an API+getting called back by it.

Pierre Lebeaupin

unread,
Aug 16, 2016, 5:12:49 PM8/16/16
to Chromium-discuss
Small update: I recently had the opportunity to test on Chrome on Windows, and the performance improvement is the same, suggesting the inefficiency in Chrome to be unrelated to any platform dependency.

Pierre Lebeaupin

unread,
Feb 16, 2017, 8:42:08 PM2/16/17
to Chromium-discuss
Important update for any interested party: the cache has been officially published as simple-file-cache on NPM, at https://www.npmjs.com/package/simple-file-cache . Now go forth and use it everywhere it can be useful.
Reply all
Reply to author
Forward
0 new messages