> Disclaimer: I am a Qwik core maintainer. We are focused on web performance and we're looking for better ways to measure the Qwik performance gains compared to other web frameworks.
For web apps relying on javascript for interactivity, a page can present fully rendered, visually complete pages but are unable to respond to user inputs until the code has finished downloading. This applies to all current reactive frameworks like Angular, React, Vue, Svelte, Solid and even Qwik (although to a much lesser extent).
It is frequent for an interaction to be lost or delayed by a blocking download, but among the current Core Web Vitals metrics, only INP measures interactivity and it does not account for network delays. I believe it would be a great addition for core web vitals and the web to have a metric that measures download delays in a stable and accurate way. I know this is a long shot so I'm hoping to get the ball rolling :).
This issue (1) describes context we have from the Qwik framework side and what we are currently measuring, and (2) discusses the issues I see in the current and deprecated google interactivity metrics regarding network delays (including INP, but also TBT, FID and TTI).
## Framework Context
### Qwik?
Qwik is a new reactive web framework similar to Angular, React, Vue, Svelte or Solid. It's main innovation is what we call "javascript streaming": Compared to the other reactive web frameworks which must execute all and thefore download all of the javascript code present or visible on a page in order to become interactive, Qwik has this unique ability to buffer javascript code bit by bit, prioritize preloading and executing code in case of user events, and continuing to preload the rest of the code when idle.
You can think of it like how video streaming compares to downloading. Instead of having to wait for the download to complete before being able press play, users can start running the video most often right away, and they can jump to other parts of the video, which will resume instantly if the packets have already been buffered or only after a short delay if not. Except the difference from videos is that javascript streaming is about avoiding executing code, and therefore the associated downloads.
Historically, reactive web frameworks were designed for CSR (Client Side Rendering). It is only later that they introduced SSR (Server Side Rendering) to show content to the user sooner. The way they've achieved this is through a process called hydration, where the server generates and sends the html to the client, and then the client uses javascript to regenerate the tree and attach event listeners for a given page/section to be interactive. The problem of this approach is that because the framework uses javascript to attach event listeners, it still has to execute all of the code for that page/section and therefore must also download all of it.
### Manual testing
In manual local tests between Qwik, React, Vue, Svelte and Solid apps, on firefox under 3G and CPU calibrated to low-end device throttling, on fairly similar applications, we are measuring input delays of only ~3s in Qwik, vs ~10s to ~20s for the others (depending on the implementation, lazy-loading, etc.).
Those numbers already make for good demos but mindful developers and lead engineers considering Qwik might be wary of such manual throttling measurements. For example 3G throttling in chrome adds a 2s latency penalty on each network request, which inflates the Qwik numbers to ~6s total because of it's small bundles streaming architecture, more than is arguably the case in reality. This is why we prefer to use Firefox's 3G throttling as it's 100ms latency yields more accurate results for Qwik in our experience.
Because of the chrome 3G throttling 2s per-request latency penalty, we spent a fair amount of time looking into inlining the Qwik preloading logic or putting it into a worker instead of keeping it as a separate module. This would make for better chrome demos, but in reality we now understand that it is unsure whether or not this would benefit end users. This is the kind of optimization we could only produce with a stable and accurate field metric. The answer might be that it depends on the type of application, but a field metric would at least help engineers pick the right choice.
## State of the art of network delays measurement
Among INP, FID, TBT and TTI, TTI is currently the best metric we have at our disposal to measure network delays. The problem is that it is not only deprecated, but also too sensitive to outlier network requests and long tasks to be reliably measured in the lab, let alone in the field.
Even though INP, TBT and FID measure blocking delays, none of them effectively takes network delays into account.
For testing the metrics in real world conditions, I use the following links:
-
https://qwikui.com/docs/styled/accordion/-
https://ui.shadcn.com/docs/components/radix/accordion-
https://shadcn-vue.com/docs/components/accordion-
https://www.shadcn-svelte.com/docs/components/accordion-
https://www.solid-ui.com/docs/components/accordionAlthough optimization strategies might differ, those 5 libraries are all heavily inspired by
ui.shadcn.com and therefore are somewhat comparable to one another.
### INP
As a CWV, INP does a fairly good job of tracking CPU delays and appears to be quite stable across the board, but it is pretty much blind to network delays and also seems to mis-report certain CPU delays.
The issues:
- INP cannot track user events until event listeners are attached, which is only the case once the hydration process is completed. In the case a user clicks pre-hydration, INP will simply report a good ~10-20ms value, even though nothing meaningful happens from a user perspective. If the user re-clicks on the same element post-hydration, the event listener will be attached and the real CPU delay once the code has been downloaded will be recorded. Notice that the first click is not recorded by INP even though the user did experience unresponsiveness. This is easily reproducible in the performance tab with
https://ui.shadcn.com/docs/components/radix/accordion.
- Even in the case event listeners are attached, INP can easily be fooled by in-flight network requests. In Qwik while preloads are still ongoing, event listeners on the html trigger a few small scripts to preload the user events code in priority and replay them once they're ready. Those are the 2-3s delays we experience on 3G throttling, but INP will report a delay of ~80ms. This is easily reproducible in the performance tab with
https://qwikui.com/docs/styled/accordion/.
- When a user clicks during the hydration execution phase which induces some long blocking tasks, INP will report a slower value as a result. On big apps it is not uncommon to have this execution phase last for a few seconds, which users on good networks but low-end devices may very well encounter. This can be reproduced in
https://ui.shadcn.com/docs/components/radix/accordion using low-end device calibration with 3G thorttling vs no network throttling and clicking repeatedly.
- In the performance tab, the hydration CPU execution phase might run for much longer than what INP reports, even when clicking repeatedly as soon as the element is visible. Here's a screenshot where hydration takes roughly ~5s of scripting but INP only reports 2s: 
Considering all those issues, I believe it is fair to say that INP is not representative of the real user experience regarding interactivity and responsiveness.
### FID
FID also required event listeners to be attached in order to be recorded, so I take it that the same INP issues apply.
### TBT
TBT only measures excess CPU delays over long tasks. It clearly does not measure network delays.
### TTI
Because TTI does not rely on user inputs to measure interactivity delays, it **can** detect long delays caused by the first long task and network downloads that preceded it.
The issues:
- Outlier network requests can prevent it from working in the field.
- The long task 50ms threshold is arbitrary: there might be blocking downloads, but TTI will be as fast as FCP as long as there is no blocking tasks recorded.
TTI is therefore not a stable metric and cannot be used in the field. It is nevertheless the best metric we currently have at our disposal to take network delays into account.
## What next?
While INP already gives useful insights regarding interactivity, it paints an incomplete picture of what users actually experience. It would be great to have an interactivity metric that would reliably and accurately account for network delays.
I am curious to hear what you folks think about this, whether this is desirable, whether this is doable, etc. I haven't spent years dealing with the problem space so I'm probably missing a lot of things/context.
I have some ideas but I'll open a new thread for those as I imagine it is better to keep this thread focused on feedback and understanding of the problem.
Thanks,
Maïeul