Intent to Experiment: Shape Detection

238 views
Skip to first unread message

owe...@chromium.org

unread,
Jun 29, 2017, 3:31:27 PM6/29/17
to blink-dev

Contact emails

mca...@chromium.org, owe...@chromium.org


Spec

https://wicg.github.io/shape-detection-api/


Summary

OEMs have long been providing hardware-accelerated detection of image features such as faces and QR codes, given their high computational complexity. The Shape Detection API provides access to these hardware-accelerated detectors where available. (Note that it does _not_ provide any software fallback)


Link to “Intent to Implement” blink-dev discussion


Goals for experimentation

We would like to collect feedback on:

  • Ergonomics and amount of metadata provided by the API, e.g. face detection results include eyes and mouth currently, but some APIs provide more, which ones would be interesting to surface?

  • Performance of the API in general (via developer feedback)

  • Relative performance of the different DOM sources (the API works with a number of data inputs, e.g. <canvas>, <video>, <img>)

  • Any impact on bug/crash rate due to hardware/software combinations

  • Whether there are use cases we didn’t already consider


UMA usage count: https://uma.googleplex.com/timeline_v2?sid=5febed43e9e109b411b4bee056314100

Bug category: Blink>ImageCapture && label:ShapeDetection


Experimental timeline

Enabled in Chrome 61, 62, 63


Any risks when the experiment finishes?

None


Ongoing technical constraints

None.


Will this feature be supported on all five Blink platforms supported by Origin Trials (Windows, Mac, Linux, Chrome OS, and Android)?

Android and Mac are supported initially. Support for Windows 10 is on it’s way, expected to land during the trial.


ChromeOS support will need to rely on libraries, depending on the demand we gauge for this platform during the trial.


OWP launch tracking bug

crbug.com/646035


Link to entry on the feature dashboard

https://www.chromestatus.com/features/4757990523535360

Chris Harrelson

unread,
Jun 30, 2017, 1:18:08 PM6/30/17
to Owen, blink-dev
LGTM

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/b49b26bc-0571-43de-9744-71a3052b6610%40chromium.org.

Rick Byers

unread,
Jun 30, 2017, 3:42:47 PM6/30/17
to Chris Harrelson, Owen, blink-dev, Philip Jägenstedt
Looks cool!

What's your thinking for how we might write interop tests for this API?  Since this is relying on underlying platform APIs (even harder) I suppose we can't test the exact results of example test cases.  But maybe we could have tests with some kind of fuzzy matching (eg. don't test the exact bounding box, just that a box was found or not containing a given point) and land test cases known to pass in all major OS implementations?  How hard is it to find a set of test cases that pass in the Android, Mac and Windows implementations?

sko...@chromium.org

unread,
Jun 30, 2017, 8:43:52 PM6/30/17
to blink-dev
Looks cool!  This would be a great API to help bootstrap some AR-type scenarios.

A couple of questions:

  1. The spec doesn't seem to provide a straightforward way to use a live camera feed as an image source.  Is that intentional?
  2. The spec also doesn't seem to support object detection (a la Google Cloud Vision API, Amazon Rekognition, etc.).  Is that something you'd see coming?
Thanks,
Stephen

Miguel Casas

unread,
Jul 4, 2017, 3:00:50 AM7/4/17
to Rick Byers, Chris Harrelson, Owen, blink-dev, Philip Jägenstedt
On 1 July 2017 at 04:42, Rick Byers <rby...@chromium.org> wrote:
Looks cool!

What's your thinking for how we might write interop tests for this API?  Since this is relying on underlying platform APIs (even harder) I suppose we can't test the exact results of example test cases.  But maybe we could have tests with some kind of fuzzy matching (eg. don't test the exact bounding box, just that a box was found or not containing a given point) and land test cases known to pass in all major OS implementations?  How hard is it to find a set of test cases that pass in the Android, Mac and Windows implementations?

It's true that every implementation is going to be a little different, but they should be comparable to within certain limits, i.e. GMS core and Android Face provide different bounding boxes (see e.g. the CL to sort this out w/ tests); IOW we expect to have repeatable tests enforcing there's a face in-or-around these coordinates, with eyes and mouth here and there, and similarly with barcode/qr codes etc.  

FTR there's a relevant discussion on the TAG review, which raised a similar point among others, and my reply was that current OS implementations are based on the same set of algorithms, so the results are quite comparable.

The biggest catch IMHO is not repeatability but accuracy: some implementations might detect faces better than others, e.g. in this codepen, not all implementations detect the faces w/ glasses. 
 

Miguel Casas

unread,
Jul 4, 2017, 3:06:28 AM7/4/17
to sko...@chromium.org, blink-dev
On 1 July 2017 at 09:43, <sko...@chromium.org> wrote:
Looks cool!  This would be a great API to help bootstrap some AR-type scenarios.

A couple of questions:

  1. The spec doesn't seem to provide a straightforward way to use a live camera feed as an image source.  Is that intentional?
I took ImageBitmapSource as input because it was a convenient union with security concerns hashed out, but indeed we're missing MediaStreamTrack.  As a workaround, you can just plug MST into a <video> and use the latter as source for detect(). (For to use Hw capabilities, we can find the underlying video capture device through that same <video>).
 
  1. The spec also doesn't seem to support object detection (a la Google Cloud Vision API, Amazon Rekognition, etc.).  Is that something you'd see coming?
In this API we're only interested in local detectors, so if those are cloud-based then it would be quite hard to convince the TAG to RS them.  If they're purely local detectors, one note is that the API surfaces capabilities that must have been stable for a few years in at least 3 out of the platforms detailed here (forgetting about barcode/qr in win for a second), and we're applying the same criteria to the detected features. 
 

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Rick Byers

unread,
Jul 4, 2017, 3:39:56 PM7/4/17
to Miguel Casas, Chris Harrelson, Owen, blink-dev, Philip Jägenstedt
On Tue, Jul 4, 2017 at 3:00 AM, Miguel Casas <mca...@google.com> wrote:


On 1 July 2017 at 04:42, Rick Byers <rby...@chromium.org> wrote:
Looks cool!

What's your thinking for how we might write interop tests for this API?  Since this is relying on underlying platform APIs (even harder) I suppose we can't test the exact results of example test cases.  But maybe we could have tests with some kind of fuzzy matching (eg. don't test the exact bounding box, just that a box was found or not containing a given point) and land test cases known to pass in all major OS implementations?  How hard is it to find a set of test cases that pass in the Android, Mac and Windows implementations?

It's true that every implementation is going to be a little different, but they should be comparable to within certain limits, i.e. GMS core and Android Face provide different bounding boxes (see e.g. the CL to sort this out w/ tests); IOW we expect to have repeatable tests enforcing there's a face in-or-around these coordinates, with eyes and mouth here and there, and similarly with barcode/qr codes etc.  

FTR there's a relevant discussion on the TAG review, which raised a similar point among others, and my reply was that current OS implementations are based on the same set of algorithms, so the results are quite comparable.

Great, thanks! 

The biggest catch IMHO is not repeatability but accuracy: some implementations might detect faces better than others, e.g. in this codepen, not all implementations detect the faces w/ glasses. 

Yeah so we'll just have to have only test-cases in web-platform-tests that all implementations can handle, and probably keep some additional platform-specific test-cases in the chromium tree.  You might get some argument from web-platform-test folks claiming the tests go beyond strictly validating the spec, but I'm optimistic that there will be pragmatism here (since the alternative probably is just not to have non-trivial web-platform-tests at all).

Miguel Casas-Sanchez

unread,
Jul 26, 2017, 2:22:36 AM7/26/17
to Rick Byers, Chris Harrelson, Owen, blink-dev, Philip Jägenstedt
On 5 July 2017 at 04:39, Rick Byers <rby...@chromium.org> wrote:


On Tue, Jul 4, 2017 at 3:00 AM, Miguel Casas <mca...@google.com> wrote:


On 1 July 2017 at 04:42, Rick Byers <rby...@chromium.org> wrote:
Looks cool!

What's your thinking for how we might write interop tests for this API?  Since this is relying on underlying platform APIs (even harder) I suppose we can't test the exact results of example test cases.  But maybe we could have tests with some kind of fuzzy matching (eg. don't test the exact bounding box, just that a box was found or not containing a given point) and land test cases known to pass in all major OS implementations?  How hard is it to find a set of test cases that pass in the Android, Mac and Windows implementations?

It's true that every implementation is going to be a little different, but they should be comparable to within certain limits, i.e. GMS core and Android Face provide different bounding boxes (see e.g. the CL to sort this out w/ tests); IOW we expect to have repeatable tests enforcing there's a face in-or-around these coordinates, with eyes and mouth here and there, and similarly with barcode/qr codes etc.  

FTR there's a relevant discussion on the TAG review, which raised a similar point among others, and my reply was that current OS implementations are based on the same set of algorithms, so the results are quite comparable.

Great, thanks! 

Just to update this i2e, there was a bit more discussion on the Tag review but the outcomes are essentially either small fixes or are compat-concerns (centered on the Text Detection though).  Nothing major that would send us back to the white board.

Reilly Grant

unread,
Aug 1, 2018, 8:15:07 PM8/1/18
to blink-dev, Miguel Casas-Sanchez
Apologies to everyone who has been waiting for this Origin Trial to launch for the last year. The new plan is to run a trial from Chrome 70 to 72. Since the original Intent was sent support for face and text detection on Windows has also landed and detection accuracy on macOS has been expanded through use of the newer Vision API where available (10.13+).

The visibility of the API will not be restricted by platform so that developers can test that their feature detection code reacts correctly on platforms where hardware acceleration is not available. 
Reilly Grant | Software Engineer | rei...@chromium.org | Google Chrome


--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
Reply all
Reply to author
Forward
0 new messages