I've attempted to update the proposal per all of the recent discussion. I've almost certainly missed something so please let me know what.
The controversial part of this proposal may be not allowing persistent access to the camera from unauthenticated content, only from trusted apps. The reason is that besides the UI challenges we are looking at, regular content has very different security properties from authenticated apps and allowing full persisted access to camera from an HTTP website has troubling consequences in a mobile computing environment.
Name of API: Camera API
References:
http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html ("Section 2 Scenarios") are use case
scenarios from the media capture task that is creating getUserMedia() which is what this API is based on.
Brief purpose of API: Let content take photos and capture video and/or audio
Use cases: have been moved to their respective app categories
Inherent threats: Steal or spy on user video/audio
Threat severity: High per
https://wiki.mozilla.org/Security_Severity_Ratings
== Regular web content (unauthenticated) ==
Use cases:
*App allows user to take a picture for a profile
*App allows user to take a picture and record an audio clip
*App allows user to record a video with audio to send to someone else
*App allows user to record an audio clip to send to someone else
*App allows the user to start a podcast, open other tabs/apps while the recording continues (to look up and comment on
information, etc) and then comes back to the tab/original app to finish the podcast. Note: the user may continue to
record while opening or switching to other tabs/apps
*App allows foreground photo sharing with realtime preview and special effects. Needs live video stream and the ability
to manipulate the stream on the fly (this one might be a bit of a stretch; can work with the magic button or WebGL shader approach but requires some more research)
Authorization model for normal content: user-mediated OS UI
Authorization model installed content: user-mediated OS UI
Potential mitigations: App can launch a user-mediated viewfinder UI take a picture, record the video, or use the
camera/mic feed which user approves prior to it being provided to the content. Uses <video>
tag (or some such) and is validated to have a non-collapsed extent, not be off-screen, not be (mostly) obscured by other
content. Additionally (contingent upon addressing UX and clickjacking concerns), we could potentially use a "magic button" rendered by OS with the app context. There is a persistent recording indicator (blinking red light?). App can continuing recording if it loses focus. Only top level content can request access. There is no "always allow" option in this app category.
TBD: Appropriate limitations to device fingerprinting
== Trusted (authenticated by publisher) ==
Use cases:
*App allows users to record video from multiple webcams
*App allows video monitoring such as a baby monitor or security camera that can run for extended periods of time
Authorization model: explicit (at install, at runtime, with "always allow/deny" option)
Potential mitigations: Prompt for camera access, app then retains access to video/audio stream until exit. There is a persistent recording indicator (blinking red light?) App can continuing recording if it loses focus.
== Certified (vouched for by trusted 3rd party) ==
Use cases:
*App starts recording video and/or audio in the background on some signal that the device has been stolen. Recordings
are uploaded.
Authorization model: implicit
Potential mitigations: Settings manager could enumerate which apps have implicit access to camera.
Notes:
*Trusted & certified apps have access to the constraints/capabilities API
On Apr 10, 2012, at 5:49 PM, Lucas Adamski wrote:
> This discussion will be a bit more involved I think but I'd like to wrap this up by Tue 17th EOD PDT.
>
> Name of API: Camera API
>
> References:
>
http://dvcs.w3.org/hg/dap/raw-file/tip/media-stream-capture/scenarios.html ("Section 2 Scenarios") are use case
> scenarios from the media capture task that is creating getUserMedia() which is what this API is based on.
>
> Brief purpose of API: Let content take photos and capture video and/or audio
>
> Use cases are the same for all content (regular web, trusted, certified):
> *App allows user to take a picture for a profile
> *App allows user to take a picture and record an audio clip
> *App allows user to record a video with audio to send to someone else
> *App allows user to record an audio clip to send to someone else
> *App allows users to record video from multiple webcams [JStraus: How is this using the Camera API?]
> *App allows foreground photo sharing with realtime preview and special effects. Needs live video stream and the ability
> to manipulate the stream on the fly.
> *App allows video monitoring such as a baby monitor or security camera that can run for extended periods of time [Lucas:
> Is this really a universal use case or an installed-only use case?]
> *App allows the user to start a podcast, open other tabs/apps while the recording continues (to look up and comment on
> information, etc) and then comes back to the tab/original app to finish the podcast. Note: the user may continue to
> record while opening or switching to other tabs/apps [Lucas: Is this really a universal use case or an installed-only
> use case?]
> *App starts recording video and/or audio in the background on some signal that the device has been stolen. Recordings
> are uploaded. [Lucas: Is this really a universal use case or a certified-only use case?]
>
> Inherent threats: Steal or spy on user video/audio
> Threat severity: High per
https://wiki.mozilla.org/Security_Severity_Ratings
>
> == Regular web content (unauthenticated) ==
> Authorization model for normal content: explicit runtime
> Authorization model installed content: explicit runtime
> Potential mitigations: Prompt user to take a picture, record video, record an audio clip, or use the camera feed or
> microphone feed. If permitted, agent mediated viewfinder UI is launched to take a picture, record the video, or use the
> camera/mic feed which user approves prior to it being provided to the content. A/V stream only accessible while app has
> focus. Only top level content can request access.
> TBD: what gets shown when recording audio only?
> TBD: Is there a visible indicator that the camera and/or microphone is active (because this is currently mandated by the
> getUserMedia spec)? Is this indicator visible even if the browser window is partially or completed obscured? What if
> there is no browser window (like for Apps and B2G?)
> TBD: Appropriate limitations to device fingerprinting
> TBD: Should recording stop when content loses focus? If it doesn't, how do we resolve concurrent audio/video feed
> requests? How does the user determine which tabs are recording?
>
> == Trusted (authenticated by publisher) ==
> Authorization model: explicit [upfront|runtime]??
> Potential mitigations: Prompt for camera access, app then retains access to video/audio stream until exit. Uses <video>
> tag (or some such) and is validated to have a non-collapsed extent, not be off-screen, not be (mostly) obscured by other
> content. Note: Video stream may need to be accessible while focus is given to another app
>