Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

autowebcompat and webcompat.com

14 views

Skip to first unread message

Karl Dubost

unread,

Jan 31, 2018, 6:36:49 PM1/31/18

to Marco Castelluccio, compatibility

Hi Marco,

This morning through an issue [1] on the issue parser code, I discovered autowebcompat [2], which looks interesting and cool.

Can you tell us more about it? The goals, the context and the history.

Thanks.

[1]: https://github.com/webcompat/issue_parser/issues/10
[2]: https://github.com/marco-c/autowebcompat

--
Karl Dubost, mozilla 💡 Webcompat
http://www.la-grange.net/karl/moz

Karl Dubost

unread,

Feb 4, 2018, 6:29:57 PM2/4/18

to Marco Castelluccio, compatibility

Marco,

Thanks for the context and a couple of thoughts.

Le 3 févr. 2018 à 07:00, Marco Castelluccio <mcaste...@mozilla.com> a écrit :
> The idea is to automatically detect web compatibility issues (at least a
> subset) using a convolutional neural network (or something similar).

Yes it's probably good to start with a subset of webcompat issues which are well defined.

There are things which are low hanging fruits.
* Desktop version instead of mobile site for Firefox Android.
* Forms with weird renderings.

# About screenshots

> I've thought of using a machine learning
> technique because a simple diff wouldn't work (as e.g. a small
> difference in the fonts used, or a different image in a carousel, etc.,
> should not be considered meaningful).

So here comes the difficult part. With some issues which are about 1px of a text cutout because of font variations and these are really webcompat issues. But there are issues where the text is readable even in different fonts and these are not webcompat issues, except if Firefox was expected to have the right font. :)

It reminds me of something I posted a while about screenshots and variability.
https://groups.google.com/forum/#!topic/mozilla.compatibility/oU9eVcHSPng

For training a system (when it is possible), you might want to use plenty of screenshots on a couple of days of the same page, so you get the "visual reference template" of the page. And you could do that in both browsers. Maybe the templates will be different. You could even diff the reference templates in the end.

Another interesting source is markup with the same technique. Once you trained multiple times on the same site, you get a fairly good idea of what is the content template. Once you get this content template, same thing you could compare with another browser and see if we receive the same markup.

Wild thoughts about learning automagically from a website through a neural network.
Some websites (most?) give different JavaScript codepaths depending on the browsers. Having a way to learn for a same website the routes which have been taken depending on the browser could be probably useful. This is probably for BattleStar Webcompat Gallactica.

> I thought I could use the webcompat.com database to collect
> examples of incompatibilities.

So here will be an issue for the project. The issues are not that well labeled where you could learn programmatically from the issue itself. But if someone starts to label them it indeed will increase the quality of the project. That in itself would improve the project too.

> So I'm running a Selenium script that
> loads the websites from the webcompat.com issues in both Firefox and
> Chrome, then takes a screenshot, then tries to interact with some
> elements (e.g. by clicking on a button, writing in a text field, etc.)

So it would be useful for you to have the URL of the site.
What are the data you would like?

> I've just noticed that webcompat.com issues list the preferences the
> user has modified, so that could be an extra bit of information we could
> use when trying to replicate.

Yeah not entirely satisfying for now. We have a couple of prefs that we cherry picked, but not the full set.

All of this also is in a semi-structured plain text body. So it's not like it is easily accessible. Templates and content structured have changed. And the issue parser might give wrong information and here again is an area where we can probably improve.

# Issues getting fixed.

Also for the old issues, some of them have been fixed

* because the site changed or libraries used by the site changed
* because the site made a change on our request
* because Gecko has implemented a fix

Plenty of challenges (which project do not have), but probably plenty of good things to explore and ways to improve the data quality of webcompat.com itself.

Marco Castelluccio

unread,

Feb 8, 2018, 7:45:52 AM2/8/18

to Karl Dubost, compatibility

Il 05/02/2018 00:29, Karl Dubost ha scritto:

> Marco,
>
> Thanks for the context and a couple of thoughts.
>
> Le 3 févr. 2018 à 07:00, Marco Castelluccio <mcaste...@mozilla.com> a écrit :
>> The idea is to automatically detect web compatibility issues (at least a
>> subset) using a convolutional neural network (or something similar).
> Yes it's probably good to start with a subset of webcompat issues which are well defined.
>
> There are things which are low hanging fruits.
> * Desktop version instead of mobile site for Firefox Android.
> * Forms with weird renderings.

Do we have a way to gather these issues automatically? Or do we have to
find them manually?

> # About screenshots
>
>> I've thought of using a machine learning
>> technique because a simple diff wouldn't work (as e.g. a small
>> difference in the fonts used, or a different image in a carousel, etc.,
>> should not be considered meaningful).
> So here comes the difficult part. With some issues which are about 1px of a text cutout because of font variations and these are really webcompat issues. But there are issues where the text is readable even in different fonts and these are not webcompat issues, except if Firefox was expected to have the right font. :)
>
> It reminds me of something I posted a while about screenshots and variability.
> https://groups.google.com/forum/#!topic/mozilla.compatibility/oU9eVcHSPng
>
> For training a system (when it is possible), you might want to use plenty of screenshots on a couple of days of the same page, so you get the "visual reference template" of the page. And you could do that in both browsers. Maybe the templates will be different. You could even diff the reference templates in the end.

This is a great idea. We can increase the training set by a lot by doing
this. I've filed https://github.com/marco-c/autowebcompat/issues/43.

> Another interesting source is markup with the same technique. Once you trained multiple times on the same site, you get a fairly good idea of what is the content template. Once you get this content template, same thing you could compare with another browser and see if we receive the same markup.

This is interesting too. We could use both the image data and the markup
data to train the network.
I've filed https://github.com/marco-c/autowebcompat/issues/44.

>
> Wild thoughts about learning automagically from a website through a neural network.
> Some websites (most?) give different JavaScript codepaths depending on the browsers. Having a way to learn for a same website the routes which have been taken depending on the browser could be probably useful. This is probably for BattleStar Webcompat Gallactica.

Yes, this sounds quite hard, but it would be awesome :P

>
>
>
>> I thought I could use the webcompat.com database to collect
>> examples of incompatibilities.
>
> So here will be an issue for the project. The issues are not that well labeled where you could learn programmatically from the issue itself. But if someone starts to label them it indeed will increase the quality of the project. That in itself would improve the project too.

So I guess the labeling that we will do for autowebcompat could somehow
be then integrated in webcompat.com? I know we won't be able to
replicate all issues with our crawler, so we can't just close the
webcompat.com issues where we don't find discrepancies between browsers,
but for issues we can reproduce we can add a comment on webcompat.com
and upload the screenshots.

>> So I'm running a Selenium script that
>> loads the websites from the webcompat.com issues in both Firefox and
>> Chrome, then takes a screenshot, then tries to interact with some
>> elements (e.g. by clicking on a button, writing in a text field, etc.)
>
> So it would be useful for you to have the URL of the site.
> What are the data you would like?

Ideally, the exact data we need in order to replicate the issue (e.g.
click on this button, insert this text into this field, etc.), but I
guess we don't have that in webcompat.com (as in any bug tracking system).
In practice, the more information we have to replicate the issue the
better. What else could we gather other than the website URL and some
preferences?

>
>
>> I've just noticed that webcompat.com issues list the preferences the
>> user has modified, so that could be an extra bit of information we could
>> use when trying to replicate.
>
> Yeah not entirely satisfying for now. We have a couple of prefs that we cherry picked, but not the full set.
>
> All of this also is in a semi-structured plain text body. So it's not like it is easily accessible. Templates and content structured have changed. And the issue parser might give wrong information and here again is an area where we can probably improve.
>
>
> # Issues getting fixed.
>
> Also for the old issues, some of them have been fixed
>
> * because the site changed or libraries used by the site changed
> * because the site made a change on our request
> * because Gecko has implemented a fix
>
>
> Plenty of challenges (which project do not have), but probably plenty of good things to explore and ways to improve the data quality of webcompat.com itself.

Yes, indeed. And I guess there are also issues that have been fixed but
then regressed again.
I'm not too concerned by this, as we will have to perform the labeling
anyway.

Thanks for the additional info and ideas,
Marco.

0 new messages