--
You received this message because you are subscribed to the Google Groups "input-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to input-dev+...@chromium.org.
Fixing domain mixing...On Tue, Jul 24, 2018 at 12:31 PM Tim Dresser <tdre...@google.com> wrote:Re Chrome OS Data: see the colab here. Some devices are definitely worse than others.The problem is that we don't have a good way to distinguish between really broken cases and slightly broken cases. We can certainly correct timestamps which would result in negative durations, or timestamps which result in excessively large durations. However, we're likely to see some weird effects due to clamping excessively large durations. How do we pick what we mean by "excessively large"?
This does slightly reduce our accuracy, but historically, the delta between the hardware timestamp and hitting the browser thread has been tiny relative to overall event latency. We actually removed our metrics coverage of this because it was uninterestingly low and flat.
The uneven distribution across Android hardware classes is what makes me fairly confident that this is broken hardware timestamps.
--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To post to this group, send email to schedu...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAHTsfZALqzQDLYaEjsfwTNgjY5hDKz-XP3YdseH5XHr7a-YGTQ%40mail.gmail.com.
Ad
It does seem like on Chrome OS we had introduced a bug in the touchpad timestamp generation, although Sean did not succeed in reproducing it. Note that the new timestamp generation work was done to move things in the opposite direction that you're proposing because the kernel timestamps weren't good enough for velocity calculations for bluetooth devices (see b/117569252). That speaks directly to the desire to have the timestamp be the input sampling time as accurately as possible, not some arbitrary later time on the CPU. The primary purpose and intent of input timestamps is to provide the timing of the input. It's about the physical event. Collecting histograms that break down latencies in the event processing pipeline is a secondary purpose. On Chrome OS all input on the system passes through a ui::Event including input that goes to the Android container. I would not be happy if we were to discard the raw timestamp in all ui::Events on that platform. Michael On Thu, Feb 7, 2019 at 3:32 PM Timothy Dresser wrote: > > We still have a non-trivial number of invalid reports on Android and it does appear to be more frequent on specific devices. > > I think that for web developers, the benefits of switching to the browser timestamp outweighs the cost - I believe we stopped reporting the delta between the hardware and browser timestamps, because we never saw a regression in that value (with the exception of when we were tweaking how we handled event interpolation). > > Even if we can filter invalid results out, it means we rely on web developers to do the same, which doesn't seem reasonable to me. > > Does that make sense to folks? > > On Thu, Jul 26, 2018 at 10:13 AM Timothy Dresser wrote: >> >> Based on feedback, before making any decision here we'll: >> >> See how big an improvement the clamping patch provides. >> Disable reporting for synthetic input. >> Figure out what's happening on Chrome OS. >> >> Tim >> >> On Wed, Jul 25, 2018 at 7:50 PM Michael Spang wrote: >>> >>> Thanks for updating the table. I filed http://crbug.com/867696. I'd bet the 3 hardware ids with bad data on Chrome OS are all related. >>> >>> Michael >>> >>> On Wed, Jul 25, 2018 at 9:43 AM Timothy Dresser wrote: >>>> >>>> Thanks Michael, some good thoughts in there. >>>> Filtering to exclude devices with very few samples does help some, but it still looks to me like there's a fair bit of device bias here. Much less than I was previously claiming though. I've updated the colab (Internal only) >>>> >>>> The worst popular Chrome OS devices have ~40% of inputs being invalid, and the worst popular Android devices have just over 1% of inputs being invalid. >>>> >>>> Hmmm, I wonder how much of the problem would go away if we took the event timestamp on Android to be: >>>> base::TimeTicks() - (uptimeMillis() - eventTimestamp) >>>> >>>> On Tue, Jul 24, 2018 at 6:59 PM Michael Spang wrote: >>>>> >>>>> On Tue, Jul 24, 2018 at 2:52 PM Timothy Dresser wrote: >>>>>> >>>>>> The uneven distribution across Android hardware classes is what makes me fairly confident that this is broken hardware timestamps. >>>>> >>>>> >>>>> If you're getting 60% error rate on your "event age > 3 days" test on some Android device, that sounds like the timebase used by the system is not the one we expect. >>>>> >>>>> Our assumption that we can compare to base::TimeTicks appears to be based on the AOSP implementation, not the Android API documentation or CTS. >>>>> >>>>> The only thing we can do with Android input timestamps if we restrict ourselves to the documented behavior is to subtract them from one another and from the current time according to uptimeMillis(). >>>>> >>>>> Michael >>>>> >>>>>> >>>>>> >>>>>> I'm still collecting data on whether the fix I landed helps matters. >>>>>> >>>>>> Devtools injected input is another possible source. We wouldn't expect it to be biased by hardware class though. Unless maybe there are specific hardware classes that folks only use for development? I'll dig into whether we're reporting synthetic events independently. >>>>>> >>>>>> You're definitely right that there are other factors at play. >>>>>> >>>>>> I don't think your summary of the options is quite right. We can't actually tell the difference between valid and invalid timestamps. We can only guess, and we may introduce some strange biases when we believe a timestamp is valid, but it actually isn't. >>>>>> >>>>>> I think that with the threshold based approach, we either: >>>>>> >>>>>> Make the threshold low enough that we make it impossible for us to ever see interesting data here. OR >>>>>> Make the threshold high enough that we see some skew from invalid event timestamps. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 24, 2018 at 2:31 PM Majid Valipour wrote: >>>>>>> >>>>>>> One thing that will be nice to see is if the fix you landed actually affects the unexpected timestamp rate you are seeing. This will confirm that the issue is indeed in platform provided timestamp. >>>>>>> >>>>>>> BTW, devtools agents injected input can also be a source for unexpected timestamp? Devtools protocol allows its client to inject input with arbitrary unix epoch timestamp. We do a best effort translation of that but it is a arbitrary time! Devtools agent is popular for test automation but I am not sure if it is actually problematic. The one example I look into was pupeteer which does not in fact send its own timestamp so it should be fine. >>>>>>> >>>>>>> On Tue, Jul 24, 2018 at 1:13 PM Timothy Dresser wrote: >>>>>>>> >>>>>>>> Fixing domain mixing... >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 24, 2018 at 12:31 PM Tim Dresser wrote: >>>>>>>>> >>>>>>>>> Re Chrome OS Data: see the colab here. Some devices are definitely worse than others. >>>>>>>>> >>>>>>>>> The problem is that we don't have a good way to distinguish between really broken cases and slightly broken cases. We can certainly correct timestamps which would result in negative durations, or timestamps which result in excessively large durations. However, we're likely to see some weird effects due to clamping excessively large durations. How do we pick what we mean by "excessively large"? >>>>>>> >>>>>>> >>>>>>> So to summarize: >>>>>>> Option A) Uniformly use browser timestamp for all input events >>>>>>> Option B) Use hardware timestamp where it is "valid" but use browser timestamp otherwise. >>>>>>> >>>>>>> BTW, on Linux where we have always had timestamp correction logic you are still seeing 0.00001% of unexpected samples. So there are other factors at play it seems. And even with A in place the best you can get is this. >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This does slightly reduce our accuracy, but historically, the delta between the hardware timestamp and hitting the browser thread has been tiny relative to overall event latency. We actually removed our metrics coverage of this because it was uninterestingly low and flat. >>>>>>> >>>>>>> >>>>>>> So if this is flat and low, it suggests a threshold based filter should work well without impacting your statistics. To be clear, I am suggesting that we look at the platform provided time when event is received on the browser and ```t = abs(now - t) < THRESHOLD ? t : now```. >>>>>>> >>>>>>> Majid >>>>>> >>>>>> -- >>>>>> >>>>>> You received this message because you are subscribed to the Google Groups "input-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to input-dev+...@chromium.org. >>>>>> >>>>>> Ad > > -- > You received this message because you are subscribed to the Google Groups "input-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to input-dev+...@chromium.org.
>> It does seem like on Chrome OS we had introduced a bug in the touchpad timestamp generation, although Sean did not succeed in reproducing it. Note that the new timestamp generation work was done to move things in the opposite direction that you're proposing because the kernel timestamps weren't good enough for velocity calculations for bluetooth devices (see b/117569252). That speaks directly to the desire to have the timestamp be the input sampling time as accurately as possible, not some arbitrary later time on the CPU. The primary purpose and intent of input timestamps is to provide the timing of the input. It's about the physical event. Collecting histograms that break down latencies in the event processing pipeline is a secondary purpose. On Chrome OS all input on the system passes through a ui::Event including input that goes to the Android container. I would not be happy if we were to discard the raw timestamp in all ui::Events on that platform. Michael On Thu, Feb 7, 2019 at 3:32 PM Timothy Dresser wrote: > > We still have a non-trivial number of invalid reports on Android and it does appear to be more frequent on specific devices. > > I think that for web developers, the benefits of switching to the browser timestamp outweighs the cost - I believe we stopped reporting the delta between the hardware and browser timestamps, because we never saw a regression in that value (with the exception of when we were tweaking how we handled event interpolation). > > Even if we can filter invalid results out, it means we rely on web developers to do the same, which doesn't seem reasonable to me. > > Does that make sense to folks? > > On Thu, Jul 26, 2018 at 10:13 AM Timothy Dresser wrote: >> >> Based on feedback, before making any decision here we'll: >> >> See how big an improvement the clamping patch provides. >> Disable reporting for synthetic input. >> Figure out what's happening on Chrome OS. >> >> Tim >> >> On Wed, Jul 25, 2018 at 7:50 PM Michael Spang wrote: >>> >>> Thanks for updating the table. I filed http://crbug.com/867696. I'd bet the 3 hardware ids with bad data on Chrome OS are all related. >>> >>> Michael >>> >>> On Wed, Jul 25, 2018 at 9:43 AM Timothy Dresser wrote: >>>> >>>> Thanks Michael, some good thoughts in there. >>>> Filtering to exclude devices with very few samples does help some, but it still looks to me like there's a fair bit of device bias here. Much less than I was previously claiming though. I've updated the colab (Internal only) >>>> >>>> The worst popular Chrome OS devices have ~40% of inputs being invalid, and the worst popular Android devices have just over 1% of inputs being invalid. >>>> >>>> Hmmm, I wonder how much of the problem would go away if we took the event timestamp on Android to be: >>>> base::TimeTicks() - (uptimeMillis() - eventTimestamp) >>>> >>>> On Tue, Jul 24, 2018 at 6:59 PM Michael Spang wrote: >>>>> >>>>> On Tue, Jul 24, 2018 at 2:52 PM Timothy Dresser wrote: >>>>>> >>>>>> The uneven distribution across Android hardware classes is what makes me fairly confident that this is broken hardware timestamps. >>>>> >>>>> >>>>> If you're getting 60% error rate on your "event age > 3 days" test on some Android device, that sounds like the timebase used by the system is not the one we expect. >>>>> >>>>> Our assumption that we can compare to base::TimeTicks appears to be based on the AOSP implementation, not the Android API documentation or CTS. >>>>> >>>>> The only thing we can do with Android input timestamps if we restrict ourselves to the documented behavior is to subtract them from one another and from the current time according to uptimeMillis(). >>>>> >>>>> Michael >>>>> >>>>>> >>>>>> >>>>>> I'm still collecting data on whether the fix I landed helps matters. >>>>>> >>>>>> Devtools injected input is another possible source. We wouldn't expect it to be biased by hardware class though. Unless maybe there are specific hardware classes that folks only use for development? I'll dig into whether we're reporting synthetic events independently. >>>>>> >>>>>> You're definitely right that there are other factors at play. >>>>>> >>>>>> I don't think your summary of the options is quite right. We can't actually tell the difference between valid and invalid timestamps. We can only guess, and we may introduce some strange biases when we believe a timestamp is valid, but it actually isn't. >>>>>> >>>>>> I think that with the threshold based approach, we either: >>>>>> >>>>>> Make the threshold low enough that we make it impossible for us to ever see interesting data here. OR >>>>>> Make the threshold high enough that we see some skew from invalid event timestamps. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jul 24, 2018 at 2:31 PM Majid Valipour wrote: >>>>>>> >>>>>>> One thing that will be nice to see is if the fix you landed actually affects the unexpected timestamp rate you are seeing. This will confirm that the issue is indeed in platform provided timestamp. >>>>>>> >>>>>>> BTW, devtools agents injected input can also be a source for unexpected timestamp? Devtools protocol allows its client to inject input with arbitrary unix epoch timestamp. We do a best effort translation of that but it is a arbitrary time! Devtools agent is popular for test automation but I am not sure if it is actually problematic. The one example I look into was pupeteer which does not in fact send its own timestamp so it should be fine. >>>>>>> >>>>>>> On Tue, Jul 24, 2018 at 1:13 PM Timothy Dresser wrote: >>>>>>>> >>>>>>>> Fixing domain mixing... >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 24, 2018 at 12:31 PM Tim Dresser wrote: >>>>>>>>> >>>>>>>>> Re Chrome OS Data: see the colab here. Some devices are definitely worse than others. >>>>>>>>> >>>>>>>>> The problem is that we don't have a good way to distinguish between really broken cases and slightly broken cases. We can certainly correct timestamps which would result in negative durations, or timestamps which result in excessively large durations. However, we're likely to see some weird effects due to clamping excessively large durations. How do we pick what we mean by "excessively large"? >>>>>>> >>>>>>> >>>>>>> So to summarize: >>>>>>> Option A) Uniformly use browser timestamp for all input events >>>>>>> Option B) Use hardware timestamp where it is "valid" but use browser timestamp otherwise. >>>>>>> >>>>>>> BTW, on Linux where we have always had timestamp correction logic you are still seeing 0.00001% of unexpected samples. So there are other factors at play it seems. And even with A in place the best you can get is this. >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> This does slightly reduce our accuracy, but historically, the delta between the hardware timestamp and hitting the browser thread has been tiny relative to overall event latency. We actually removed our metrics coverage of this because it was uninterestingly low and flat. >>>>>>> >>>>>>> >>>>>>> So if this is flat and low, it suggests a threshold based filter should work well without impacting your statistics. To be clear, I am suggesting that we look at the platform provided time when event is received on the browser and ```t = abs(now - t) < THRESHOLD ? t : now```. >>>>>>> >>>>>>> Majid >>>>>> >>>>>> -- >>>>>> >>>>>> You received this message because you are subscribed to the Google Groups "input-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to input-dev+unsubscribe@chromium.org. >>>>>> >>>>>> Ad > > -- > You received this message because you are subscribed to the Google Groups "input-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to input-dev+unsubscribe@chromium.org.
>
> --
> You received this message because you are subscribed to the Google Groups "input-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to input-dev+unsubscribe@chromium.org.