Measuring load-time interactivity network delays

77 views
Skip to first unread message

Maïeul Chevalier

unread,
May 25, 2026, 4:12:05 AMMay 25
to web-vitals-feedback
> Disclaimer: I am a Qwik core maintainer. We are focused on web performance and we're looking for better ways to measure the Qwik performance gains compared to other web frameworks.

For web apps relying on javascript for interactivity, a page can present fully rendered, visually complete pages but are unable to respond to user inputs until the code has finished downloading. This applies to all current reactive frameworks like Angular, React, Vue, Svelte, Solid and even Qwik (although to a much lesser extent).

It is frequent for an interaction to be lost or delayed by a blocking download, but among the current Core Web Vitals metrics, only INP measures interactivity and it does not account for network delays. I believe it would be a great addition for core web vitals and the web to have a metric that measures download delays in a stable and accurate way. I know this is a long shot so I'm hoping to get the ball rolling :).

This issue (1) describes context we have from the Qwik framework side and what we are currently measuring, and (2) discusses the issues I see in the current and deprecated google interactivity metrics regarding network delays (including INP, but also TBT, FID and TTI).

## Framework Context

### Qwik?

Qwik is a new reactive web framework similar to Angular, React, Vue, Svelte or Solid. It's main innovation is what we call "javascript streaming": Compared to the other reactive web frameworks which must execute all and thefore download all of the javascript code present or visible on a page in order to become interactive, Qwik has this unique ability to buffer javascript code bit by bit, prioritize preloading and executing code in case of user events, and continuing to preload the rest of the code when idle.

You can think of it like how video streaming compares to downloading. Instead of having to wait for the download to complete before being able press play, users can start running the video most often right away, and they can jump to other parts of the video, which will resume instantly if the packets have already been buffered or only after a short delay if not. Except the difference from videos is that javascript streaming is about avoiding executing code, and therefore the associated downloads.

Historically, reactive web frameworks were designed for CSR (Client Side Rendering). It is only later that they introduced SSR (Server Side Rendering) to show content to the user sooner. The way they've achieved this is through a process called hydration, where the server generates and sends the html to the client, and then the client uses javascript to regenerate the tree and attach event listeners for a given page/section to be interactive. The problem of this approach is that because the framework uses javascript to attach event listeners, it still has to execute all of the code for that page/section and therefore must also download all of it.

### Manual testing

In manual local tests between Qwik, React, Vue, Svelte and Solid apps, on firefox under 3G and CPU calibrated to low-end device throttling, on fairly similar applications, we are measuring input delays of only ~3s in Qwik, vs ~10s to ~20s for the others (depending on the implementation, lazy-loading, etc.).

Those numbers already make for good demos but mindful developers and lead engineers considering Qwik might be wary of such manual throttling measurements. For example 3G throttling in chrome adds a 2s latency penalty on each network request, which inflates the Qwik numbers to ~6s total because of it's small bundles streaming architecture, more than is arguably the case in reality. This is why we prefer to use Firefox's 3G throttling as it's 100ms latency yields more accurate results for Qwik in our experience.

Because of the chrome 3G throttling 2s per-request latency penalty, we spent a fair amount of time looking into inlining the Qwik preloading logic or putting it into a worker instead of keeping it as a separate module. This would make for better chrome demos, but in reality we now understand that it is unsure whether or not this would benefit end users. This is the kind of optimization we could only produce with a stable and accurate field metric. The answer might be that it depends on the type of application, but a field metric would at least help engineers pick the right choice.

## State of the art of network delays measurement

Among INP, FID, TBT and TTI, TTI is currently the best metric we have at our disposal to measure network delays. The problem is that it is not only deprecated, but also too sensitive to outlier network requests and long tasks to be reliably measured in the lab, let alone in the field.

Even though INP, TBT and FID measure blocking delays, none of them effectively takes network delays into account.

For testing the metrics in real world conditions, I use the following links:
- https://qwikui.com/docs/styled/accordion/
- https://ui.shadcn.com/docs/components/radix/accordion
- https://shadcn-vue.com/docs/components/accordion
- https://www.shadcn-svelte.com/docs/components/accordion
- https://www.solid-ui.com/docs/components/accordion

Although optimization strategies might differ, those 5 libraries are all heavily inspired by ui.shadcn.com and therefore are somewhat comparable to one another.


### INP
As a CWV, INP does a fairly good job of tracking CPU delays and appears to be quite stable across the board, but it is pretty much blind to network delays and also seems to mis-report certain CPU delays.

The issues:
- INP cannot track user events until event listeners are attached, which is only the case once the hydration process is completed. In the case a user clicks pre-hydration, INP will simply report a good ~10-20ms value, even though nothing meaningful happens from a user perspective. If the user re-clicks on the same element post-hydration, the event listener will be attached and the real CPU delay once the code has been downloaded will be recorded. Notice that the first click is not recorded by INP even though the user did experience unresponsiveness. This is easily reproducible in the performance tab with https://ui.shadcn.com/docs/components/radix/accordion.
- Even in the case event listeners are attached, INP can easily be fooled by in-flight network requests. In Qwik while preloads are still ongoing, event listeners on the html trigger a few small scripts to preload the user events code in priority and replay them once they're ready. Those are the 2-3s delays we experience on 3G throttling, but INP will report a delay of ~80ms. This is easily reproducible in the performance tab with https://qwikui.com/docs/styled/accordion/.
- When a user clicks during the hydration execution phase which induces some long blocking tasks, INP will report a slower value as a result. On big apps it is not uncommon to have this execution phase last for a few seconds, which users on good networks but low-end devices may very well encounter. This can be reproduced in https://ui.shadcn.com/docs/components/radix/accordion using low-end device calibration with 3G thorttling vs no network throttling and clicking repeatedly.
- In the performance tab, the hydration CPU execution phase might run for much longer than what INP reports, even when clicking repeatedly as soon as the element is visible. Here's a screenshot where hydration takes roughly ~5s of scripting  but INP only reports 2s: ![image](https://hackmd.io/_uploads/SJTo8rggMl.png)

Considering all those issues, I believe it is fair to say that INP is not representative of the real user experience regarding interactivity and responsiveness.

### FID
FID also required event listeners to be attached in order to be recorded, so I take it that the same INP issues apply.

### TBT
TBT only measures excess CPU delays over long tasks. It clearly does not measure network delays.

### TTI
Because TTI does not rely on user inputs to measure interactivity delays, it **can** detect long delays caused by the first long task and network downloads that preceded it.

The issues:
- Outlier network requests can prevent it from working in the field.
- The long task 50ms threshold is arbitrary: there might be blocking downloads, but TTI will be as fast as FCP as long as there is no blocking tasks recorded.

TTI is therefore not a stable metric and cannot be used in the field. It is nevertheless the best metric we currently have at our disposal to take network delays into account.

## What next?

While INP already gives useful insights regarding interactivity, it paints an incomplete picture of what users actually experience. It would be great to have an interactivity metric that would reliably and accurately account for network delays.

I am curious to hear what you folks think about this, whether this is desirable, whether this is doable, etc. I haven't spent years dealing with the problem space so I'm probably missing a lot of things/context.

I have some ideas but I'll open a new thread for those as I imagine it is better to keep this thread focused on feedback and understanding of the problem.

Thanks,
Maïeul

Amit

unread,
May 25, 2026, 4:41:00 AMMay 25
to web-vitals-feedback
Maïeul, 

It is not a sales pitch. It is not a place where you promote your framework.

Barry Pollard

unread,
May 25, 2026, 4:49:18 AMMay 25
to Amit, web-vitals-feedback
FWIW, while I appreciate it's easy to go over the line I didn't find the original mail overly pitchy and was honest up front and also gave good context of where this thinking was coming. I think it raises interesting questions worthy of discussion—as will be shown by my larger response to that that I'm working on.

This is a moderated forum so while we don't want it to be promotion forum, we do encourage feedback based on people's experience. And sometimes (oftentimes?) that means including context of what you're working on. And that's fine as long as it doesn't take away from the intent of the post.

Also please do keep it respectful here.

Barry

P.S. I will say the markdown formatting is a little weird for an email (are we all AI now!?) and maybe suggests this could have been a blog post (and maybe was intended to be originally?).

--
You received this message because you are subscribed to the Google Groups "web-vitals-feedback" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-vitals-feed...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/web-vitals-feedback/29312e64-80c5-48a5-b694-e969620b9a17n%40googlegroups.com.

Barry Pollard

unread,
May 25, 2026, 5:20:14 AMMay 25
to Maïeul Chevalier, web-vitals-feedback
Hi Maïeul,

I think you bring up a few key points, that I'll summarise as the following two points if my understandind is correct:
  • INP is (and FID was) based around event handling measurement. No event handlers being attached mean they are not measuring that type before an app is "hydrated".
  • INP does nor measure the full impact of an interaction, especially of network interactions.
Those are both fair points and basically come down to the limitations of trying to set a broad medtric that can be applicable to the web as a whole.

To explain some of the evolution of the thinking here, FID was one of our first interactivity metrics and was perhaps closer to what you're considering with a pre-event handler metric, in that the thinking was it was important to measure that first delay time while the page was busy. There was also some concern that measuring more would create the wrong incentives. Despite it's detractors now, FID had it's usefulness when it first came about, we and did see good improvements in responsiveness but it quickly showed it's weaknesses in only measure the delay and only in that first interaction.

INP was the next evolution and sought to measure all interactions AND more of the interaction. But yes it's true that it still does not cover the two missing points you raise. Its primary intent was to encourage quick initial responsiveness, and so keeping a healthy bit of breathing room on the main thread (so just exiting quickly, like we were concerned with FID would likely catch you in the end, or if done in a non-blocking manner would be a good thing).

We believe INP is a good broad measure of page responsiveness that is broadly comparable across sites, and that it encourages good practices benefiting users. These were some of the key aims of the Web Vitals initiative.

However, INP is not a full end-to-end measurement of when an interaction has been all (or even mostly) completed, and was not intended to be. This is more difficult to measure for a couple of reasons. For a start it's quite difficult for the browser to know what processing is important to the user and what is not (e.g. screen updates likely are important, but sending of analytics beacons is not). And secondly it's impossible to fairly measure two very different interactions across a page (a video upload is going to take more time, than opening a details/summary selector).

INP is intended to be a starting measure for site owners and for comparisions, rather than the end point. We encourage site owners to dig beyond this with custom metrics, that can be hyper specific to their particular sites.

At the moment I'm not convinced measuring network delays is necessarily a good user-centric measure. Some of these will impact users, but many will not (the analytics beacon examples) will not. And therefore I think we should ideally look to measure the impact, rather than the potential impact.

The new intection-contentful-paint performance entry, being launched as part of the Soft Navigation API allows more paints to be attributed to each interaction (at present only each larger paints similar to LCP, but we are definitely thinking about expanding that to all paints). This allows measurement of the full time of interaction until its largest paint, which goes a llong way to solving 2. Though as I say, unlike INP, this is not likely to be comparable across sites, or even interactions within a site, so is more of a custom metric.

For the first bullet, we don't get have a good standard metric. A few RUM providers have experimented with measuring "rage clicks" (and Google has too btw!) which counts some of this, but also other parts. I agree it would be good to do more thinking in this space...


--
You received this message because you are subscribed to the Google Groups "web-vitals-feedback" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-vitals-feed...@googlegroups.com.

Maïeul

unread,
May 26, 2026, 5:31:46 AMMay 26
to web-vitals-feedback
Hi Barry, thanks for taking the time to discuss this, the context and insights.

Sorry for the markdown, I actually originally asked Claude to summarize our long chat, but ended up basically rewriting everything by hand as the writing was really bad 😅. I thought the markdown wouldn't hurt so didn't do the effort to adapt the formatting...

The way I see it, is that INP is a good metric for measuring responsiveness of "all interactions", but only once the javascript code has been downloaded (since pre-hydration will report false-positives).

One thing I should have made more explicit is that the main bottleneck is blocking javascript downloads (for a user interaction) rather than general network delays (fonts, images, videos, beacons etc.). Indeed it would not be fair to give a lower score to an app full of high resolution images that take a long time to download on the critical path vs one that has only a few small low-priority images. On the other hand, ranking two web apps per their ability to load code sooner than the other is similarly fair to ranking per their ability to execute code sooner than the other. If I'm not mistaken the INP score basically depends on the CPU (low vs high end) and the amount/efficiency of blocking main thread operations triggered by the user input, which allows for fair comparisons accross the web. A metric that would depend on the network speed (3G, 4G, etc.), latency, and the amount of blocking javascript downloads related to the user input sounds similarly fair and I'm thinking would make a great candidate for a user-centric CWV. 

Imo, measuring blocking javascript downloads appears at least similarly important as CPU delays since they are fundamental to the web and the resulting user experience. I don't think it is the kind of measurement that should be done through custom metrics since blocking javascript downloads for interactivity likely apply to the overwhelming majority of internet traffic. Besides, I assume most application developers never go through the hassle of custom metrics, and many might not be aware of the impact of such blocking downloads and therefore never go through the hassle of measuring them through custom metrics.

On feasibility, if there is no technical way to measure such delays in a fair CWV ranking, I believe a pagespeed non CWV field metric would already be great, and at the very least the web would already benefit quite a lot from a lighthouse lab metric.

Since you brought up some leads towards a solution I'll share my high-level ideas in this thread. Tbh, I didn't know about the `interactionContentfulPaint` API, but it's pretty close to what I had in mind so I'll use it (thanks for sharing!).

I believe there are two approaches for handling the measurements:
  • (1) Experienced javascript blocking download delays
  • (2) Potential javascript blocking download delays
Both have their pros and cons and their set of challenges.

(A) Experienced javascript blocking downloads (EBD)
Mechanism: start from the user event until there is an interaction contentful paint.

Issues:
  • Like INP, this cannot record pre-hydration scenarios where there are no event handlers yet attached. So a complementary metric for unhandled user events or at least rage clicks would be necessary. Without a complementary metric, EBD would be unfair to non hydration setups and favor hydration since it would not report hydration blocking downloads which are usually the longest javascript downloads in such setups. So that complementary metric to EBD is sort of non-optional.
  • Similar to INP when a user clicks while hydration is executing and INP will record a longer value than usual, if a user clicks while a download is ongoing, then it is not the entire duration of that blocking download that will be recorded but only the time the user experienced the blocking download. This can happen in Qwik for example if a blocking download takes 5s and the user clicks when the download already started (e.g. at 2s, and the recorded value is only 3s). In practice however it is a rare occurence and the added noise is probably similar to INP and shouldn't favor certain types of apps vs others.

(B) Potential javascript blocking downloads (PBD)
Mechanism: record all "user input to interaction contentful paint" windows, see if javascript ran and if yes, retrieve the associated downloads, then measure the potential time it could have taken under the user's network conditions (bandwidth, latency, etc.).

The advantage would be that hydration blocking downloads would be taken into account (!). 

Issues:
  • A "user input to interaction contentful paint" window might also run unrelated javascript code (e.g. analytics or setInterval tasks) that don't lead to any sort of contentful paint. To measure PBD we need a way to measure causality between a user input and a contentful paint. This is likely the trickiest part here. I don't know if it feasible.
  • Blocking downloads can get longer in the case of waterfalls and there is no way to measure those when extrapolating based on the network conditions. For example if bundle A and B both take 2s to download separately, then it would take 4s with a waterfall. On the other hand there is no fundamental reason that leads to waterfalls for js bundles and should probably be considered a bug if it happens, so the metric could measure the downloads as if they happened in parallel.

——— 

In the case both measurements are feasible, I'd say that they are both user-centric and would encentivize application developers to reduce such delays. Developers could improve their apps through lazy-loading, shipping less javascript to the client, etc. In the case of hydration they could look into selective hydration strategies or look for alternatives. Of course, with such a metric in place and given that it can measure pre-hydration (no event handlers) scenarios, javascript streaming would have quite an unfair advantage, but at the same time it would push the web towards adopting this innovation. Qwik is it currently at the forefront at the moment, but I believe other frameworks/tools could adopt it too eventually.

I'm curious to hear your thoughts on the user-centricity of javascript blocking downloads, the feasibility of the experienced and potential metrics, and if there's any chance that this could eventually become a CWV or official field/lab metric.

Thanks,
Maïeul

Jay Stephenson

unread,
May 26, 2026, 5:31:58 AMMay 26
to web-vitals-feedback
This is not a 'sales pitch'. If you had ingested the contents of the entire email you would have understood immediately that the original poster is looking for feedback on the development platform he is using.

Michal Mocny

unread,
May 26, 2026, 10:26:14 AMMay 26
to Maïeul, web-vitals-feedback
In both your proposed metrics I think you start with "user input to interaction contentful paint", which is exactly how we propose to measure duration for this new api.  I'm glad to hear that is also the way you consider thinking about it!

I am not sure why users would benefit from measuring the "js blocking downloads" within that duration, but I think that if you consider that a useful diagnostic for developers using the Qwik framework, you can already get this from filtering the Resource Timing data within that ICP time range?  It might be a good diagnostic similar to LCP-subparts for hard-navs, not sure.


One concern you mention is early interactions before hydration leading to "no-op" clicks.  In my experience this was an issue with much older frameworks, but most will now capture even very early events (and then replay when ready), especially when the page is server rendered and the final UI is available from first paints.

If you tested and found that this is not true, then perhaps it is just that the current InteractionContentfulPaint measurement mechanism needs improvement for certain complex scheduling cases (I have experienced that paint tracking on Qwik examples isn't perfect).

For example: ICP will "just work" if you call fetch() or setTimeout() from and event listener.  But if you have a custom js event listener loading mechanism that is already in the middle of downloading the required JS, and your framework just appends to an internal "workQueue = [...]" or something, and that work queue is just automatically processed periodically-- we have automatic way to know that.  There is an expectation that framework authors may need to help coordinate advanced user-land scheduling for such things.  (Today: via expressing the apis in terms of Promises, Future: explicit AsyncContext apis)

Please let us know how InteractionContentfulPaint works out for your examples!

Cheers!
-Michal

Maïeul

unread,
May 26, 2026, 2:04:56 PMMay 26
to web-vitals-feedback
Thanks Michal for joining the conversion. We also have a Michal in the Qwik team 😋

You seem to wonder whether measuring js blocking downloads is any useful for applications other than Qwik. I think it useful to any kind of application that uses javascript (really!), since downloading javascript on page load is inherent to the web and the internet user experience. Event replay improves the user experience but it is not a silver bullet. In fact Qwik does it with the qwikloader and I believe Angular and React can also do it now. But an application that takes 20 seconds on 3G to become interactive due to blocking js downloads is not a good user experience, and event replay does nothing to help in those circumstances, as proven by rage clicks even being a thing 😄 (and I suspect most users just bounce off in those situations). I'd even consider event replay as a fallback to an undesirable situation. In Qwik we try to avoid it as much as possible. 

As I said, the lighthouse and pagespeed would be able to point to optimization strategies such as lazy-loading, shipping less js, but I forgot to mention also caching optimizations (CDN and browser) would also improve the score. Those are important optimizations that can lead to real UX improvements. For the reactive frameworks they have their own set of built-in optimizations. I mentioned Astro islands and React Server Components, but there is also Nuxt Server Islands, etc. Ironically measuring js blocking downloads is probably the least useful to Qwik apps since they are supposed to be constant in time. Yet even in Qwik there is room for optimization at the framework level like reducing js download waterfalls if they happen or using the html link fetchpriority API.

Feedback on ICP

I gave a shot at playing with ICP and it's really cool! It can measure Qwik js blocking downloads, and SPA navigations (!), although somewhat a bit unreliably, and is indeed blind to SSR hydration or CSR "no-op" clicks (until the event handlers are attached).

Code I used to play with it on 3G throttling is pretty simple:
```js
new PerformanceObserver((list) => {
  for (const e of list.getEntries()) console.log('ICP', e.duration, e);
}).observe({ type: 'interaction-contentful-paint', buffered: true });

addEventListener('click', () => console.log(`${performance.now().toFixed(0)}`), true);
```

Just add the code and click ASAP on page load.

For https://qwikui.com/docs/styled/accordion/:
  • On single clicks the recorded value seems pretty stable and true to the real user experience
    Screenshot 2026-05-26 at 19.06.45.png
  • When rage-clicking the reported value is much smaller (somehow it picks a click in the middle and reports it as ICP duration). I imagine that this should be considered a bug.
Screenshot 2026-05-26 at 19.04.46.png


For the other hydration links, there is no ICP reported until hydration has attached the event handlers. When I tried SPA navigations with https://ui.shadcn.com/docs/components/radix/accordion the reported ICP value seemed correct, but with https://shadcn-vue.com/docs/components/accordion the experienced contentful paint (with my own eyes) happened 1s later than reported.

So I find that ICP API quite promising to add as a field metric to lighthouse or even CWVs eventually. The only problem being that the API itself is blind to SSR pre-hydration or CSR "no-op" clicks and therefore as a metric on its own would be unfair to Qwik apps where events are already attached on the html during SSR.

This is why my proposal A on experienced js blocking downloads must include a complementary metric and proposal B on potential js blocking downloads would measure potentiality by measuring the blocking downloads retrospectively even after they happened (so even after hydration completed).

Michal Mocny

unread,
May 26, 2026, 2:46:03 PMMay 26
to Maïeul, web-vitals-feedback
Thanks for trying it out and giving feedback.  Glad to hear the positive feedback!

WRT your concerns, here are a few thoughts:
  1. I think there are a few implementations gaps / bugs that still remain and which affect qwik in particular.  I know that in my own testing our detection is not perfect.  You do some fancy scheduling stuff :). Over time, I think we can work to resolve such cases via a combination of spec / implementation changes + small framework fixes to "play well" with this feature.
  2. You make a point that when an interaction actualy is a "no-op" then ICP will simply not report anything. If the event listener explicitly initiates and waits for e.g. a fetch response, then you will eventually get a large ICP value and capture this latency-- but if the site just silently drops the event and "is broken", you don't get that report.  Fair paint.
  3. Specifically about some of the test pages you linked-- All paint timings APIs (LCP, element timing etc) are a bit weird for content that animates.  The accordion demos slowly render the content, so the ICP timestamps might represent e.g. the first pixel, while your eyes perceive the completed content.  Or, we may not measure until the animation is completed-- it all depends is the animation was compositor driven and if the first paint was fully hidden, and other factors.  This area continually improves over time.
Net / net, I think we should separate "existing bugs" from "design deficiencies" when proposing new API extensions.

I think your suggestions are neat, but I'm still not convinced that this is a fundamental first class API vs just being a good diagnostic (perhaps framework specific diagnostic, perhaps generic) for cases where ICP is large.

-Michal

Maïeul

unread,
May 26, 2026, 3:27:04 PMMay 26
to web-vitals-feedback
Yes I don't think my proposed metrics should be a first class API either. Just like INP is not an API but a metric that uses the Event Timing API to report a value.
Reply all
Reply to author
Forward
0 new messages