ML Kit face detection easily crashes in real time scenario

1,174 views
Skip to first unread message

Shai Ben-Tovim

unread,
May 17, 2018, 10:12:33 AM5/17/18
to Firebase Google Group
Hi,

Trying out the new ML Kit face detection feature.
Am using it with device camera using AVCaptureSession (Xcode 9.3, iOS 11.3.1, tested on iPhone 6 ) video stream.
Have everything set up as documented and its working as expected - BUT - after several  seconds app crashes due to out-of-memory.
Xcode console clearly shows chunks of memory being allocated and never released and the culprit is somewhere in the face detector code.
When I reduce the face detection frame-rate (lets say run it every 10 or 15 frames instead of every frame at 30fps) the memory leak slows down but still eventually crashes.

I am using the default config for the detector (+ tried with tracking enabled as well):

faceDetector = Vision.vision().faceDetector()


And this is my captureOutput() function:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {


           let metadata
= VisionImageMetadata()

            let orientation
= exifOrientationFromDeviceOrientation()

            metadata
.orientation = VisionDetectorImageOrientation(rawValue: UInt(orientation))!

           

            let image
= VisionImage(buffer: sampleBuffer)

            image
.metadata = metadata

           faceDetector
.detect(in: image)  { (faces, error) in
             
...
           
}
}


Really unusable...too bad.



David East

unread,
May 17, 2018, 9:39:03 PM5/17/18
to Firebase Google Group
Hey Shai!

What preset are you using? If you're using a high quality preset like .photo, then it will crash. I use .medium in the quickstart-ios sample.

I also recommend processing only certain frames. I use a function in captureOutput that only processes every 10th frame.

Josh Leong

unread,
May 18, 2018, 11:07:56 AM5/18/18
to Firebase Google Group
Wait I figured it out, the VisionImage function needed to be passed a CMsamplebuffer.

Josh Leong

unread,
May 18, 2018, 11:07:56 AM5/18/18
to Firebase Google Group
Hey David,

I've also had some issues with real-time face detection tonight. I'm able to get it to work on a static imageView.image, but when I call it on captureOutput and pass the CMSampleBuffer it says that no faces are detected on the same image that I use through just reading it off of imageView.image. I put a new detectFaces function into the FrameProcessingViewController. It also doesn't seem to work when I pass it through takePhoto.

I've had success with using detectLabels however, and when I run them concurrently detectLabels will work but detectFaces will not.

Best,
Josh


On Thursday, May 17, 2018 at 6:39:03 PM UTC-7, David East wrote:

Shai Ben-Tovim

unread,
May 21, 2018, 10:51:00 AM5/21/18
to Firebase Google Group
Hi David,

I'm using .high as the session preset. 
Tried  with .medium and its just leaks at a slower pace (~3Mbps vs. ~10Mbps)

Also processing different frame rates (ranging from 3-30 fps) - only difference is the time it takes the OOM crash to happen as the allocations that are leaking aggregate slower/faster.

Shai

Sachin Kotwani

unread,
May 21, 2018, 1:22:20 PM5/21/18
to Firebase Google Group
Hi Shai,

Thanks for flagging this issue. Our team has been able to repro it and is investigating. I don't have any timelines to share at the moment, but we hope to have a fix in one of the upcoming releases.

Thanks,

Sachin (Product Manager on the ML Kit team)

Shai Ben-Tovim

unread,
May 22, 2018, 10:14:04 AM5/22/18
to Firebase Google Group
Thanks Sachin.

Looking forward for a working package :-)

Shai

Sachin Kotwani

unread,
Jun 21, 2018, 8:11:30 PM6/21/18
to fireba...@googlegroups.com
Hi Shai,

The latest SDK with a fix for this issue is now available. Do you mind doing a pod update and giving it a try?

Thanks,

Sachin.-

--
You received this message because you are subscribed to the Google Groups "Firebase Google Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebase-tal...@googlegroups.com.
To post to this group, send email to fireba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/firebase-talk/48781b21-4f8a-4520-abdd-0f567c7e321a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shai Ben-Tovim

unread,
Jun 27, 2018, 10:07:52 AM6/27/18
to Firebase Google Group
Hi Sachin,

Busy so took some time to get to it:
Updated to new pod (5.3.0) and memory seems to be under control. Max consumption was ~150MB at detection rate of 3fps. I'll dig more into performance next week as I did  notice that even at 3fps I'm "skipping" detection frames.

One more question: I'm a little confused on the coordinate system I get for detection rects (VisionFace.frame).
The documents state: "The rectangle that holds the discovered relative to the detected image in the view coordinate system."
What does this mean? what "view" ? Does this mean that (0,0) is always top-left even if image has different orientation (and then requires adjustment to orientation)? or does the VisionFace.frame take into account orientation so it's always top-left of the image in human's [upright] view of the image?

Thanks

Shai

Dong Chen

unread,
Jul 2, 2018, 10:52:53 AM7/2/18
to Firebase Google Group
Hi Shai,

For real-time face tracking from a video feed, you will need to build the app in the release/optimized mode (as opposed to the debug mode).  That makes a huge difference in performance.  

ML Kit's quickstart-ios sample app shows how to do real-time face tracking: https://github.com/firebase/quickstart-ios/tree/master/mlvision

In particular, you can look into this class:


For continuous face tracking, please make sure to enable tracking:


You should also try use kCVPixelFormatType_32BGRA as the AVCaptureVideoDataOutput's pixel format:


If not already, please use AVCaptureSession.Preset.medium:


Regarding your question about coordinate system, the output coordinate is the same as the input coordinate. For camera stream, the camera frame coordinate space and preview coordinate space are not the same. If one wants to display the results on top of the preview, then translation between coordinate systems is needed. To apply the translation, AVCapturePreviewVideoLayer object provides the translate coordinate space methods (e.g pointForCaptureDevicePointOfInterest, rectForMetadataOutputRectOfInterestlayerRectConvertedetc).  Here is how the above sample app does the translation:


Hope this helps.

-Dong
Reply all
Reply to author
Forward
0 new messages