I've now gotten far enough in local prototyping this that I can serialize numbers, strings, objects, arrays, etc. (DOM objects, array buffers, and a few other things are still missing.)Early perf results for the large JSON object from bug 148757: serialization time went from 1521 ms to 404 ms on my Z620 (I haven't done profiling yet to determine whether this can be further reduced, nor have I yet measured on other devices, though I expect similar results).I've responded to some comments on the document linked earlier, but no conclusions have been reached. I'm reluctant to prototype too far ahead of landing (especially without a thumbs-up on the approach).Are there additional concerns I should address, investigation I should do, or things I should explain/document?
+jsbellOn Fri, Aug 5, 2016 at 2:26 PM, Kentaro Hara <har...@chromium.org> wrote:HiBackground: Currently postMessage() is 10x slower in Chrome than in Firefox/Safari because V8 value serialization is slow. Jeremy started working on optimizing the serialization (design doc).On Fri, Aug 5, 2016 at 2:54 AM, Jeremy Roman <jbr...@google.com> wrote:I've now gotten far enough in local prototyping this that I can serialize numbers, strings, objects, arrays, etc. (DOM objects, array buffers, and a few other things are still missing.)Early perf results for the large JSON object from bug 148757: serialization time went from 1521 ms to 404 ms on my Z620 (I haven't done profiling yet to determine whether this can be further reduced, nor have I yet measured on other devices, though I expect similar results).I've responded to some comments on the document linked earlier, but no conclusions have been reached. I'm reluctant to prototype too far ahead of landing (especially without a thumbs-up on the approach).Are there additional concerns I should address, investigation I should do, or things I should explain/document?As far as I understand the comments in the document, I think we need to figure out how to deal with the following points:a) Whether the callback-style V8 API (SerializeHostObject() in the document) is good or not.b) Whether we should invent a new wire format or keep the existing wire format. If we invent a new wire format, how (or whether) to deprecate the existing format from the web.Regarding a), the key would be what the spec is saying. IIUC, the serialization order does matter to throw a DataCloneError per the spec when the serializer encounters something unexpected. If that is the case, we'll need the callback-style API.
I haven't yet found a DOM object whose [[Clone]] operation requires cloning a non-DOM JS object, so the API proposed in the doc would work if we're happy giving up this property. On the other hand, we could:1. Allow the SerializeHostObject to reinvoke the v8::ValueSerializer as part of its serialization work.This means that care must be taken to make the serializer re-entrant, so it somewhat complicates the interface.
2. Don't have Blink serialize directly into the buffer, but instead convert the DOM object into some value V8 can serialize.roughly: v8::MaybeLocal<v8::Value> SerializeHostObject(v8::Local<v8::Object> object);This gives a lot of nice properties: Blink doesn't have to deal with raw byte buffers and have its own logic for writing/reading integers etc. in a portable way, it gets reference deduplication "for free", and dealing with structured objects makes it more natural to add additional properties, etc. in the future without having to do much explicit work for this. e.g. an ImageData might get mapped to {width: w, height: h, data: d}
However, it would almost certainly be slower than having the DOM object's data serialized directly (because of the need to copy it into the V8 heap, and then from there to the serialized buffer (and similar for deserialization) and larger (due to having to write string keys or equivalent).The possible saving grace would be that I expect that #(DOM objects) << #(plain JS objects) in practice, so while there would be an overhead here it might be minor in practice. Clever but cautious use of an external ArrayBuffer could be used to avoid an extra copy of objects which contain a large block of data internally.
3. Some combination of the above (let Blink write its own data, _and_ optionally provide another V8 value to serialize).This would give some advantages of each, at the cost of a larger and more awkward API.Regarding b), Elliott seems to be fine with inventing a new wire format and worry about how to deprecate the existing format later. That sounds reasonable to me.--Kentaro Hara, Tokyo, Japan--Kentaro Hara, Tokyo, Japan
On Sat, Aug 6, 2016 at 12:51 AM, Jeremy Roman <jbr...@chromium.org> wrote:(Apologies; long email follows.)On Fri, Aug 5, 2016 at 1:51 AM, Jeremy Roman <jbr...@chromium.org> wrote:On Fri, Aug 5, 2016 at 1:27 AM, Kentaro Hara <har...@chromium.org> wrote:+jsbellOn Fri, Aug 5, 2016 at 2:26 PM, Kentaro Hara <har...@chromium.org> wrote:HiBackground: Currently postMessage() is 10x slower in Chrome than in Firefox/Safari because V8 value serialization is slow. Jeremy started working on optimizing the serialization (design doc).On Fri, Aug 5, 2016 at 2:54 AM, Jeremy Roman <jbr...@google.com> wrote:I've now gotten far enough in local prototyping this that I can serialize numbers, strings, objects, arrays, etc. (DOM objects, array buffers, and a few other things are still missing.)Early perf results for the large JSON object from bug 148757: serialization time went from 1521 ms to 404 ms on my Z620 (I haven't done profiling yet to determine whether this can be further reduced, nor have I yet measured on other devices, though I expect similar results).I've responded to some comments on the document linked earlier, but no conclusions have been reached. I'm reluctant to prototype too far ahead of landing (especially without a thumbs-up on the approach).Are there additional concerns I should address, investigation I should do, or things I should explain/document?As far as I understand the comments in the document, I think we need to figure out how to deal with the following points:a) Whether the callback-style V8 API (SerializeHostObject() in the document) is good or not.b) Whether we should invent a new wire format or keep the existing wire format. If we invent a new wire format, how (or whether) to deprecate the existing format from the web.Regarding a), the key would be what the spec is saying. IIUC, the serialization order does matter to throw a DataCloneError per the spec when the serializer encounters something unexpected. If that is the case, we'll need the callback-style API.Not only in the case of an exception, but also in the case where a getter modifies some object that has yet to be reached in the structured clone process.e.g. Consider the (silly, but nonetheless…) object{ imageData: new ImageData(1, 1), get x() { this.imageData.data[3] = 255; } }If the data from DOM (Blink) objects isn't serialized (or at least copied) at the correct time, then the x getter will make the pixel opaque black, whereas per spec (and our current implementation) it should be transparent black.On a related note, looking more closely at ImageData leads me to discover that it is a DOM object whose [[Clone]] operation recursively clones another JavaScript object (the Uint8ClampedArray it wraps).Another example is FileList, whose [[Clone]] recursively clones its contained File objects.
Interestingly, neither Chrome nor Firefox correctly handles these per spec. In particular, the spec demands that they be put into the common "memory" (object map) of the cloning operation, so that multiple references to the same object correspond to the same on the other side. However, cases like these:{ imageData: new ImageData(1, 1), get rawData() { return this.imageData.data; } }{ fileList: someFileList, firstFile: someFileList[0] }Produce objects which contain multiple copies of the wrapped objects (i.e. they don't compare equal with ===).Since it apparently doesn't work today, it's clearly not a big issue, but from an author's perspective I think it's reasonable to expect such references to be maintained correctly, just as ones between ordinary JS objects are.If our plan is to make a fundamental performance improvement by inventing a new wire format / APIs (instead of micro-optimizing the existing serializer), I don't think we should rush into a short-term solution; i.e., we should seriously consider how to make the existing broken implementation more correct when designing the new wire format & APIs.I haven't yet found a DOM object whose [[Clone]] operation requires cloning a non-DOM JS object, so the API proposed in the doc would work if we're happy giving up this property. On the other hand, we could:1. Allow the SerializeHostObject to reinvoke the v8::ValueSerializer as part of its serialization work.This means that care must be taken to make the serializer re-entrant, so it somewhat complicates the interface.The serializer must be re-entrant anyway, right? If some getter is overriden by user's script, any arbitrary code may run during the serializer (which may call the serializer again).2. Don't have Blink serialize directly into the buffer, but instead convert the DOM object into some value V8 can serialize.roughly: v8::MaybeLocal<v8::Value> SerializeHostObject(v8::Local<v8::Object> object);This gives a lot of nice properties: Blink doesn't have to deal with raw byte buffers and have its own logic for writing/reading integers etc. in a portable way, it gets reference deduplication "for free", and dealing with structured objects makes it more natural to add additional properties, etc. in the future without having to do much explicit work for this. e.g. an ImageData might get mapped to {width: w, height: h, data: d}I like the idea :) It will also prevent us from implementing a wire-format-dependent logic in both V8 and Blink.However, it would almost certainly be slower than having the DOM object's data serialized directly (because of the need to copy it into the V8 heap, and then from there to the serialized buffer (and similar for deserialization) and larger (due to having to write string keys or equivalent).The possible saving grace would be that I expect that #(DOM objects) << #(plain JS objects) in practice, so while there would be an overhead here it might be minor in practice. Clever but cautious use of an external ArrayBuffer could be used to avoid an extra copy of objects which contain a large block of data internally.jsbell@: Do you have any idea on this?
2. Don't have Blink serialize directly into the buffer, but instead convert the DOM object into some value V8 can serialize.roughly: v8::MaybeLocal<v8::Value> SerializeHostObject(v8::Local<v8::Object> object);This gives a lot of nice properties: Blink doesn't have to deal with raw byte buffers and have its own logic for writing/reading integers etc. in a portable way, it gets reference deduplication "for free", and dealing with structured objects makes it more natural to add additional properties, etc. in the future without having to do much explicit work for this. e.g. an ImageData might get mapped to {width: w, height: h, data: d}I like the idea :) It will also prevent us from implementing a wire-format-dependent logic in both V8 and Blink.However, it would almost certainly be slower than having the DOM object's data serialized directly (because of the need to copy it into the V8 heap, and then from there to the serialized buffer (and similar for deserialization) and larger (due to having to write string keys or equivalent).The possible saving grace would be that I expect that #(DOM objects) << #(plain JS objects) in practice, so while there would be an overhead here it might be minor in practice. Clever but cautious use of an external ArrayBuffer could be used to avoid an extra copy of objects which contain a large block of data internally.jsbell@: Do you have any idea on this?As noted in a comment in the doc: I agree with this intuition but don't have actual data. (We don't appear to have histograms on # records written vs. # files/blobs written for Indexed DB, but could collect that easily since we have to stash files/blobs on the side.)Anecdotally, high-performance apps using IDB use JSON.serialize to "work around" Chrome's slow serialization, and one can therefore assume they're mostly working with plain JS data. Also, at least for files/blobs (which I suspect are the most common DOM objects stored in IDB), the serialization time will be << the actual file I/O. (So #2 doesn't seem problematic)I don't have a sense of the use pattern for non-IDB usage, e.g. how common it is to postMessage ImageData (or other cloneable types) to/from a Worker for processing.3. Some combination of the above (let Blink write its own data, _and_ optionally provide another V8 value to serialize).This would give some advantages of each, at the cost of a larger and more awkward API.Regarding b), Elliott seems to be fine with inventing a new wire format and worry about how to deprecate the existing format later. That sounds reasonable to me.--Kentaro Hara, Tokyo, Japan--Kentaro Hara, Tokyo, Japan--Kentaro Hara, Tokyo, Japan
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAD649j7DFg5RCAG7WuSG1J%2BxfHOvBQSMYoHYSc5U0cx6L6TFpg%40mail.gmail.com.
Anecdotally, high-performance apps using IDB use JSON.serialize to "work around" Chrome's slow serialization, and one can therefore assume they're mostly working with plain JS data. Also, at least for files/blobs (which I suspect are the most common DOM objects stored in IDB), the serialization time will be << the actual file I/O. (So #2 doesn't seem problematic)
I don't have a sense of the use pattern for non-IDB usage, e.g. how common it is to postMessage ImageData (or other cloneable types) to/from a Worker for processing.
Anecdotally, high-performance apps using IDB use JSON.serialize to "work around" Chrome's slow serialization, and one can therefore assume they're mostly working with plain JS data. Also, at least for files/blobs (which I suspect are the most common DOM objects stored in IDB), the serialization time will be << the actual file I/O. (So #2 doesn't seem problematic)I don't have a sense of the use pattern for non-IDB usage, e.g. how common it is to postMessage ImageData (or other cloneable types) to/from a Worker for processing.It would be important to clarify this point. What benchmarks (use cases) are you trying to optimize?As I mentioned in the other thread, postMessage() is 10x slower in Chrome than in Safari, but I don't know how problematic the slowness is in real-world scenarios.
On Tue, Aug 9, 2016 at 9:09 AM, Kentaro Hara <har...@chromium.org> wrote:Anecdotally, high-performance apps using IDB use JSON.serialize to "work around" Chrome's slow serialization, and one can therefore assume they're mostly working with plain JS data. Also, at least for files/blobs (which I suspect are the most common DOM objects stored in IDB), the serialization time will be << the actual file I/O. (So #2 doesn't seem problematic)I don't have a sense of the use pattern for non-IDB usage, e.g. how common it is to postMessage ImageData (or other cloneable types) to/from a Worker for processing.It would be important to clarify this point. What benchmarks (use cases) are you trying to optimize?As I mentioned in the other thread, postMessage() is 10x slower in Chrome than in Safari, but I don't know how problematic the slowness is in real-world scenarios.I raised this topic in an arch-leads meeting and Elliott made a good point: Even if postMessage() is not a bottleneck of IndexedDB operations, it's worth optimizing because the serialization is causing a big jank on the main thread.But I still think that it's important to clarify what benchmarks we should optimize.
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAO9Q3i%2BWzGw4W9eo7pr00g7Hoo5h3r0%3DAGbPOxcT00_tyvxkcQ%40mail.gmail.com.
We have the existing Indexed DB perf tests in src/chrome/test/data/indexeddb/ - some of the subtests showed an improvement on the bots with peria@'s recent changes.Live test linked from this issue too: https://bugs.chromium.org/p/chromium/issues/detail?id=536620The real-world scenario where this matters is database initialization; not gmail loading 60 records, but when it gets 10000 or 100000 records from the server and dumps them all into the DB within one turn of the event loop, stalling the DOM. There's a mitigation - breaking up the insertion work to span multiple turns - but you still get stutter.On Mon, Aug 8, 2016 at 11:11 PM, Elliott Sprehn <esp...@chromium.org> wrote:On Mon, Aug 8, 2016 at 10:27 PM, Kentaro Hara <har...@chromium.org> wrote:On Tue, Aug 9, 2016 at 9:09 AM, Kentaro Hara <har...@chromium.org> wrote:Anecdotally, high-performance apps using IDB use JSON.serialize to "work around" Chrome's slow serialization, and one can therefore assume they're mostly working with plain JS data. Also, at least for files/blobs (which I suspect are the most common DOM objects stored in IDB), the serialization time will be << the actual file I/O. (So #2 doesn't seem problematic)I don't have a sense of the use pattern for non-IDB usage, e.g. how common it is to postMessage ImageData (or other cloneable types) to/from a Worker for processing.It would be important to clarify this point. What benchmarks (use cases) are you trying to optimize?As I mentioned in the other thread, postMessage() is 10x slower in Chrome than in Safari, but I don't know how problematic the slowness is in real-world scenarios.I raised this topic in an arch-leads meeting and Elliott made a good point: Even if postMessage() is not a bottleneck of IndexedDB operations, it's worth optimizing because the serialization is causing a big jank on the main thread.But I still think that it's important to clarify what benchmarks we should optimize.I'd say we should just build some simple benchmarks, for example:If we assume that an app like Gmail was saving and loading the inbox through IndexedDB or inside a Worker and postMessaging the data to the mail thread that'd be N rows of emails which are (star, date, array of recipients, array of labels, subject, reply count, ...). My Gmail has 60 rows visible on page load, so N=60 here. How fast can we do that? What about on a phone?
Remember that the main thread realistically has about 6ms to do all of the work to avoid jank during an A period in RAIL, so if that took 2ms and this patch makes it take 1ms, that's giving you 16% more time on the main thread to do other things which is pretty huge. :)fwiw there's a bunch of code health improvements here too, and the 3.5x is a very large improvement. I think the only question to really understand here is if we should make this backwards compatible with the old format or try to create the new one. Not having to reserve a bunch of enum values in the v8 API isn't very compelling. Is the new format faster? How much less space does it use? Using a new format is added risk so it'd be nice to understand what we get in return (though jsbell@ was happy about it, and his team is the primary one effected, so this is more about diligence than go/no-go :).
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsubsc...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
One suggestion would be to put the old format in v8 under some option like kWeirdAncientWebKitSerializationFormatProbablyNotWhatYouWant (literally something awful like that). That'll discourage anyone else using the v8 API from using it. Then I'd start implementing the new serialization format under something like kStandardFormat. Once it works we can try switching blink over to it and deleting the old mode.That separates the risk of rewriting the serializer/deserializer from the risk of creating a new format.I do still think we should try doing both, it doesn't seem like too much work here. I don't think we should be worried about some churn in the v8 API, we've done all kinds of things there before.
I'm liking this plan!One worry comes to mind: once the old serialization format appears in v8 (initially as the only format), are there likely to be non-Chrome users of v8 (e.g. some node.js binding to a database) that will make it more difficult to slim down/remove in the future as we wean Chrome off of it?We might mitigate that somewhat by surrounding the API with big scary warnings and ugly names (like Elliott suggests) even while it is the only supported format, until we have a format that we really are committing to maintaining indefinitely.
Another idea to throw out: at the point where we have a new wire format, we can immediately switch non-persistent uses in Blink over exclusively. Only the persistent uses (i.e. IndexedDB) need access to the legacy decode. We could move all of the legacy decode logic back into Blink; it would be slower (i.e. same perf as today) for values persisted before the switch but still fast for fresh data and it keeps v8 cleaner. Blink would need to maintain knowledge of the version signature.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAO9Q3i%2BWzGw4W9eo7pr00g7Hoo5h3r0%3DAGbPOxcT00_tyvxkcQ%40mail.gmail.com.
--Kentaro Hara, Tokyo, Japan
--Kentaro Hara, Tokyo, Japan
--Kentaro Hara, Tokyo, Japan
--Kentaro Hara, Tokyo, Japan
Camillo Bruni | | Software Engineer, V8 | | Google Germany GmbH | | Erika-Mann Str. 33, 80636 München |