Enabling window functions in third_party/sqlite

12 views
Skip to first unread message

Mikhail Khokhlov

unread,
Feb 26, 2020, 12:43:07 PM2/26/20
to stora...@chromium.org, Sami Kyostila, Eric Seckler, Stephen Nusko
Hi storage-dev team,

I work in the Chrometto team, focused on analyzing Chrome traces to improve the browser performance. We build and use a tool called trace_processor_shell that uses sqlite to provide SQL interface for computing metrics on traces. These metrics often require SQL features that are disabled in third_party/sqlite, like e.g. window functions. So we'd like to ask you for advice on how to make them available.

The simplest solution would be to enable them Chrome-wide, but this results in a binary size increase of 16 KB on Android (this is for window functions only; data obtained from this experimental CL [1]).

Another option is to create a separate sqlite target pointing to a separate source (amalgamated with different flags), that will have all the features enabled.

Which do you think is a better idea? Or is there any other way to do it?

Daniel Murphy

unread,
Feb 26, 2020, 12:56:24 PM2/26/20
to Mikhail Khokhlov, Victor Costan, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko
+Victor Costan might be the best person to talk to here.

--
To unsubscribe from this group and stop receiving emails from it, send an email to storage-dev...@chromium.org.

Victor Costan

unread,
Feb 26, 2020, 1:48:11 PM2/26/20
to Mikhail Khokhlov, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko
On Wed, Feb 26, 2020 at 9:43 AM 'Mikhail Khokhlov' via storage-dev <stora...@chromium.org> wrote:
Hi storage-dev team,

I work in the Chrometto team, focused on analyzing Chrome traces to improve the browser performance. We build and use a tool called trace_processor_shell that uses sqlite to provide SQL interface for computing metrics on traces. These metrics often require SQL features that are disabled in third_party/sqlite, like e.g. window functions. So we'd like to ask you for advice on how to make them available.

The simplest solution would be to enable them Chrome-wide, but this results in a binary size increase of 16 KB on Android (this is for window functions only; data obtained from this experimental CL [1]).

The binary increase isn't a concern if you have a good use case. I'm not opposed to enabling the features you need, once we work through them. Do you have a full list of the flags you'd like us to consider?

Here are some factors for reasoning through a decision.

1) Window functions were added in SQLite 3.25.0, and appear to be a source of bugs. Search the release notes for "window function".

I expect that once we'll enable the window functions in SQLite, our fuzzers will find problems. We can mitigate this by landing the CL enabling features earlier in a release cycle. We have ~2 weeks until M82 branches, so we should either do it this week, or wait until after the branch point.

2) SQLite is Web-exposed via WebSQL. Once we enable a feature, disabling it is really difficult or nearly impossible.

This isn't an issue if your use case is likely to be permanent. However, if you'll stop needing window functions in a year (for example), the project will still be paying the price to support them.

3) Two SQLite targets would require two copies of the amalgamation code in the tree. This increases maintenance cost, but could make things nicer for developers.

We have SQLite's amalgamation checked into git, because their build process requires Tcl, and Eng Review doesn't like the idea of adding yet another scripting language to Chromium checkouts.

Enabling window functions requires a different amalgamation build, because the generated parser code would change. This would add some ongoing maintenance cost.

At the same time, having a second developer-centered amalgamation build would be beneficial. For example, we disabled EXPLAIN and EXPLAIN QUERY PLAN, which would be useful while iterating on SQL queries.

A middle-of-the-road solution would be to enable the features you need in the amalgamation, but disable them at build time. I'm pretty sure this isn't supported, so we'd have to assess stability, aside from the impact on binary size. If SQLite compiles, doesn't crash while parsing disabled queries, and the binary overhead is OK, this would let us do two SQLite targets (and hopefully two shells) without an additional amalgamation.

4) Consider the option of using the system SQLite.

//third_party/sqlite and the //sql layer on top of it are designed for Chrome's production needs. Given that your tool targets developers, you may be able to get away with assuming that their systems have a recently up-to-date SQLite library installed, and build against that library.

This does create a bunch of work for your team, and may create some support burden. We should try to get some sense about the cost of this option, and balance it with the ongoing maintenance costs of the other alternatives.


I hope these general thoughts help. Happy to continue discussing here, in #storage on chromium's slack, or in a meeting! Please reach out on the google.com thread fork if you'd like to set up a time.

Thank you!
      Victor

PhistucK

unread,
Feb 26, 2020, 2:16:44 PM2/26/20
to Victor Costan, Mikhail Khokhlov, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko
2) SQLite is Web-exposed via WebSQL. Once we enable a feature, disabling it is really difficult or nearly impossible.

> This isn't an issue if your use case is likely to be permanent. However, if you'll stop needing window functions in a year (for example), the project will still be paying the price to support them.

Since the WebSQL API is unofficially deprecated, it would be good not to add even more features to it, no matter how much the project needs them. Unless you find a way to exclude them from WebSQL, adding them would be bad and send the wrong message to web developers, in my opinion.


PhistucK


--

Victor Costan

unread,
Feb 26, 2020, 3:43:30 PM2/26/20
to PhistucK, Mikhail Khokhlov, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko
Long message below. TL;DR: given that we're already (unintentionally) exposing new SQLite features to WebSQL, I don't think we should let this concern block Chrometto's performance work.

On Wed, Feb 26, 2020 at 11:16 AM PhistucK <phis...@gmail.com> wrote:
2) SQLite is Web-exposed via WebSQL. Once we enable a feature, disabling it is really difficult or nearly impossible.

> This isn't an issue if your use case is likely to be permanent. However, if you'll stop needing window functions in a year (for example), the project will still be paying the price to support them.

Since the WebSQL API is unofficially deprecated, it would be good not to add even more features to it, no matter how much the project needs them. Unless you find a way to exclude them from WebSQL, adding them would be bad and send the wrong message to web developers, in my opinion.

This is a very good point, thank you for making it!

I agree that it'd be ideal to freeze the SQLite dialect exposed to WebSQL. Pragmatically though, we're not going to hold the project back to cater to WebSQL users.

For instance, we'll keep updating SQLite, to mitigate security vulnerabilities. Updates bring new features, some of which cannot be disabled at build time. One such feature is generated columns, introduced in 3.31, which is now exposed to WebSQL. Also, even if new features can be disabled at build time or via DatabaseAuthorizer, the accelerated timelines imposed by some security vulnerabilities might not give us enough time to identify and disable newly introduced features.

In my opinion, this is the main reason why WebSQL wasn't sustainable as a standard. Building a thin JS binding over SQLite was brilliantly pragmatic, but left the browsers stuck between the letter of the standard, which requires shipping SQLite 3.6.19, and the reality of the fact that SQLite needs to be constantly kept up to date for security reasons.

So, given that we're already (unintentionally) exposing new SQLite features to WebSQL, I don't think we should let this concern block Chrometto's performance work.

As an aside, SQLite was not designed to execute untrusted queries, which is what we do when we allow Web applications to talk to our implementation in Blink. We are extremely fortunate and grateful that the SQLite authors agreed to support our use case, and that they're developing patches and in-depth defenses for this use case!

    Victor

PhistucK

unread,
Feb 26, 2020, 5:44:08 PM2/26/20
to Victor Costan, storage-dev, blink-api-owners-discuss
Adding blink-api-owners-discuss to the discussion, as this means that over the years, new features were introduced to the web platform, without intent threads or public service announcements.
(I understand the sensitivity, but I still prefer to bring the API owners into the public loop and get their opinion/ways to address this. Maybe WebSQL should just be removed sooner, for example)

PhistucK

Mikhail Khokhlov

unread,
Feb 27, 2020, 9:13:02 AM2/27/20
to Victor Costan, PhistucK, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko
Thanks Victor for the detailed answer!

From what I've learned, it seems that having two SQLite targets is the best approach here.

Our project is under active development, so we can't make a complete list of required SQL features yet. So far it's only window functions, but we may need other features later. Neither can we guarantee that we'll need them forever. So making an irreversible change doesn't sound good to me in this case.

We can't use system SQLite since our tool is also being built on Chromium CI.

So splitting targets is the only option left. How comfortable are you with the increased maintenance cost that goes with it?

Conrad Irwin

unread,
Feb 27, 2020, 9:48:14 AM2/27/20
to Mikhail Khokhlov, PhistucK, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko, Victor Costan
As a heavy user of the SQLite that ships with Chromium, I can say that the new behaviour of shipping upgraded versions is strictly better than the standardized "stick with version X". (And anecdotally, we've spent less time debugging SQLite related changes than many of the other Chrome changes that have broken Superhuman.)

It doesn't seem like an issue to me, to continue to tweak the exact flags that SQLite is compiled with (indeed it seems a lot less of an issue than two SQLite builds!), though it's unlikely that we would adopt new features until well after they've propagated to Brave and Electron.

Conrad

P.S.

I still maintain that having a standardized way to access a SQL database is good for the web. Cross-engine SQL support is a well solved problem in existing javascript frameworks, so I think that a new standard would not need to prescribe only SQLite, and definitely would not lock it to an old version.

I think this because SQL is powerful, widely adopted outside the web, existing SQL implementations are mature and fast, and all browsers already ship with one for internal use (We use WebSQL at Superhuman because it is orders of magnitude faster than IndexedDB for non-trivial queries).

I am excited for the WASM block-storage APIs, so that we could ship our own SQLite compiled to WASM. But in my mind that's a very clear second (slower and larger download size) to having access to this as a first class citizen.

P.S. Superhuman is hiring — referral bonus for Full Stack Engineers: $1,947.


On Thu, Feb 27, 2020 at 2:12 PM, Mikhail Khokhlov <khok...@google.com> wrote:
Thanks Victor for the detailed answer!

From what I've learned, it seems that having two SQLite targets is the best approach here.

Our project is under active development, so we can't make a complete list of required SQL features yet. So far it's only window functions, but we may need other features later. Neither can we guarantee that we'll need them forever. So making an irreversible change doesn't sound good to me in this case.

We can't use system SQLite since our tool is also being built on Chromium CI.

So splitting targets is the only option left. How comfortable are you with the increased maintenance cost that goes with it?

On Wed, Feb 26, 2020 at 8:43 PM Victor Costan <pwnall@chromium.org> wrote:
Long message below. TL;DR: given that we're already (unintentionally) exposing new SQLite features to WebSQL, I don't think we should let this concern block Chrometto's performance work.

On Wed, Feb 26, 2020 at 11:16 AM PhistucK <phistuck@gmail.com> wrote:
2) SQLite is Web-exposed via WebSQL. Once we enable a feature, disabling it is really difficult or nearly impossible.

> This isn't an issue if your use case is likely to be permanent. However, if you'll stop needing window functions in a year (for example), the project will still be paying the price to support them.

Since the WebSQL API is unofficially deprecated, it would be good not to add even more features to it, no matter how much the project needs them. Unless you find a way to exclude them from WebSQL, adding them would be bad and send the wrong message to web developers, in my opinion.

This is a very good point, thank you for making it!

I agree that it'd be ideal to freeze the SQLite dialect exposed to WebSQL. Pragmatically though, we're not going to hold the project back to cater to WebSQL users.

For instance, we'll keep updating SQLite, to mitigate security vulnerabilities. Updates bring new features, some of which cannot be disabled at build time. One such feature is generated columns, introduced in 3.31, which is now exposed to WebSQL. Also, even if new features can be disabled at build time or via DatabaseAuthorizer, the accelerated timelines imposed by some security vulnerabilities might not give us enough time to identify and disable newly introduced features.

In my opinion, this is the main reason why WebSQL wasn't sustainable as a standard. Building a thin JS binding over SQLite was brilliantly pragmatic, but left the browsers stuck between the letter of the standard, which requires shipping SQLite 3.6.19, and the reality of the fact that SQLite needs to be constantly kept up to date for security reasons.

So, given that we're already (unintentionally) exposing new SQLite features to WebSQL, I don't think we should let this concern block Chrometto's performance work.

As an aside, SQLite was not designed to execute untrusted queries, which is what we do when we allow Web applications to talk to our implementation in Blink. We are extremely fortunate and grateful that the SQLite authors agreed to support our use case, and that they're developing patches and in-depth defenses for this use case!

    Victor
 


PhistucK


On Wed, Feb 26, 2020 at 8:48 PM Victor Costan <pwnall@chromium.org> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to storage-dev+unsubscribe@chromium.org.

--
To unsubscribe from this group and stop receiving emails from it, send an email to storage-dev+unsubscribe@chromium.org.

Chris Mumford

unread,
Mar 5, 2020, 5:53:30 PM3/5/20
to Conrad Irwin, Mikhail Khokhlov, PhistucK, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko, Victor Costan
I had a few followup questions:
  1. Will the SQLite library which will be built with this new amalgamation be only for tools, or is the intention that this be shipped as part of Chrome?
  2. If the answer to #1 is "tools only", then is there any problem installing Tcl on those build/dev machines? Because if that's not a problem then it's easy to generate your own amalgamation at build time.

On Thu, Feb 27, 2020 at 6:48 AM 'Conrad Irwin' via storage-dev <stora...@chromium.org> wrote:
As a heavy user of the SQLite that ships with Chromium, I can say that the new behaviour of shipping upgraded versions is strictly better than the standardized "stick with version X". (And anecdotally, we've spent less time debugging SQLite related changes than many of the other Chrome changes that have broken Superhuman.)

It doesn't seem like an issue to me, to continue to tweak the exact flags that SQLite is compiled with (indeed it seems a lot less of an issue than two SQLite builds!), though it's unlikely that we would adopt new features until well after they've propagated to Brave and Electron.

Conrad

P.S.

I still maintain that having a standardized way to access a SQL database is good for the web. Cross-engine SQL support is a well solved problem in existing javascript frameworks, so I think that a new standard would not need to prescribe only SQLite, and definitely would not lock it to an old version.

I think this because SQL is powerful, widely adopted outside the web, existing SQL implementations are mature and fast, and all browsers already ship with one for internal use (We use WebSQL at Superhuman because it is orders of magnitude faster than IndexedDB for non-trivial queries).

I am excited for the WASM block-storage APIs, so that we could ship our own SQLite compiled to WASM. But in my mind that's a very clear second (slower and larger download size) to having access to this as a first class citizen.

P.S. Superhuman is hiring — referral bonus for Full Stack Engineers: $1,947.


On Thu, Feb 27, 2020 at 2:12 PM, Mikhail Khokhlov <khok...@google.com> wrote:
Thanks Victor for the detailed answer!

From what I've learned, it seems that having two SQLite targets is the best approach here.

Our project is under active development, so we can't make a complete list of required SQL features yet. So far it's only window functions, but we may need other features later. Neither can we guarantee that we'll need them forever. So making an irreversible change doesn't sound good to me in this case.

We can't use system SQLite since our tool is also being built on Chromium CI.

So splitting targets is the only option left. How comfortable are you with the increased maintenance cost that goes with it?

On Wed, Feb 26, 2020 at 8:43 PM Victor Costan <pwn...@chromium.org> wrote:
Long message below. TL;DR: given that we're already (unintentionally) exposing new SQLite features to WebSQL, I don't think we should let this concern block Chrometto's performance work.

On Wed, Feb 26, 2020 at 11:16 AM PhistucK <phis...@gmail.com> wrote:
2) SQLite is Web-exposed via WebSQL. Once we enable a feature, disabling it is really difficult or nearly impossible.

> This isn't an issue if your use case is likely to be permanent. However, if you'll stop needing window functions in a year (for example), the project will still be paying the price to support them.

Since the WebSQL API is unofficially deprecated, it would be good not to add even more features to it, no matter how much the project needs them. Unless you find a way to exclude them from WebSQL, adding them would be bad and send the wrong message to web developers, in my opinion.

This is a very good point, thank you for making it!

I agree that it'd be ideal to freeze the SQLite dialect exposed to WebSQL. Pragmatically though, we're not going to hold the project back to cater to WebSQL users.

For instance, we'll keep updating SQLite, to mitigate security vulnerabilities. Updates bring new features, some of which cannot be disabled at build time. One such feature is generated columns, introduced in 3.31, which is now exposed to WebSQL. Also, even if new features can be disabled at build time or via DatabaseAuthorizer, the accelerated timelines imposed by some security vulnerabilities might not give us enough time to identify and disable newly introduced features.

In my opinion, this is the main reason why WebSQL wasn't sustainable as a standard. Building a thin JS binding over SQLite was brilliantly pragmatic, but left the browsers stuck between the letter of the standard, which requires shipping SQLite 3.6.19, and the reality of the fact that SQLite needs to be constantly kept up to date for security reasons.

So, given that we're already (unintentionally) exposing new SQLite features to WebSQL, I don't think we should let this concern block Chrometto's performance work.

As an aside, SQLite was not designed to execute untrusted queries, which is what we do when we allow Web applications to talk to our implementation in Blink. We are extremely fortunate and grateful that the SQLite authors agreed to support our use case, and that they're developing patches and in-depth defenses for this use case!

    Victor
 


PhistucK


On Wed, Feb 26, 2020 at 8:48 PM Victor Costan <pwn...@chromium.org> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to storage-dev...@chromium.org.

--
To unsubscribe from this group and stop receiving emails from it, send an email to storage-dev...@chromium.org.

Chris Mumford

unread,
Mar 10, 2020, 11:16:31 AM3/10/20
to Conrad Irwin, Mikhail Khokhlov, PhistucK, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko, Victor Costan
bumping this thread for answers to the two last questions.

Mikhail Khokhlov

unread,
Mar 11, 2020, 9:19:39 AM3/11/20
to Chris Mumford, Conrad Irwin, PhistucK, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko, Victor Costan
Hi Chris,

I'm currently exploring different options to enable window functions for our tools. The option that looks most reasonable to me, is to use the new amalgamation for tools only, without exposing new features in Chrome. So the answer to question 1 is most probably "tools only", although the storage team might want to comment further on this.

As for the second question, it's an interesting suggestion, thanks! The main problem that I can see so far is that the current amalgamation-generation machinery doesn't support Windows, but we need the tools in question on all CI bots, including Windows ones.

Chris Mumford

unread,
Mar 11, 2020, 12:06:46 PM3/11/20
to Mikhail Khokhlov, Conrad Irwin, PhistucK, storage-dev, Sami Kyostila, Eric Seckler, Stephen Nusko, Victor Costan
Thx Mikhail:

Regarding #2, if it needs to ship with Chrome then for sure the second amalgamation needs to live in the chromium sqlite repo. If it's tools only then one option is to generate it on Linux, and commit it to the tools repo - basically what we do today on Chrome.

As an aside, it may be possible to build the amalgamation on windows - if our build machines have the correct prerequisites. However I think this is moot as the main script needs Tcl and I got pretty strong pushback from Chrome Eng Review on adding that as a dependency.

Let me know when you're ready to move forward and I can help if any changes are needed to our sqlite repo.
Reply all
Reply to author
Forward
0 new messages