Javascript Screen Capture

1 view

Skip to first unread message

In Libman

unread,

Aug 5, 2024, 2:29:14 PM8/5/24

to quicongthecar

TheScreen Capture API introduces additions to the existing Media Capture and Streams API to let the user select a screen or portion of a screen (such as a window) to capture as a media stream. This stream can then be recorded or shared with others over the network.

The Screen Capture API is relatively simple to use. Its sole method is MediaDevices.getDisplayMedia(), whose job is to ask the user to select a screen or portion of a screen to capture in the form of a MediaStream.

Provides methods that can be used to further manipulate a capture session separate from its initiation via MediaDevices.getDisplayMedia(). A CaptureController object is associated with a capture session by passing it into a getDisplayMedia() call as the value of the options object's controller property.

The getDisplayMedia() method is added to the MediaDevices interface. Similar to getUserMedia(), this method creates a promise that resolves with a MediaStream containing the display area selected by the user, in a format that matches the specified options.

Indicates whether or not the video in the stream represents a logical display surface (that is, one which may not be entirely visible onscreen, or may be completely offscreen). A value of true indicates a logical display surface is to be captured.

Controls whether the audio playing in a tab will continue to be played out of a user's local speakers when the tab is captured, or whether it will be suppressed. A value of true indicates that it will be suppressed.

A string which indicates whether or not the display surface currently being captured includes the mouse cursor, and if so, whether it's only visible while the mouse is in motion or if it's always visible. The value is one of always, motion, or never.

User agents that support Permissions Policy (either using the HTTP Permissions-Policy header or the attribute allow) can specify a desire to use the Screen Capture API using the directive display-capture:

In this article, we will examine how to use the Screen Capture API and its getDisplayMedia() method to capture part or all of a screen for streaming, recording, or sharing during a WebRTC conference session.

Note: It may be useful to note that recent versions of the WebRTC adapter.js shim include implementations of getDisplayMedia() to enable screen sharing on browsers that support it but do not implement the current standard API. This works with at least Chrome, Edge, and Firefox.

Capturing screen contents as a live MediaStream is initiated by calling navigator.mediaDevices.getDisplayMedia(), which returns a promise that resolves to a stream containing the live screen contents. The displayMediaOptions object referenced in the below examples might look something like this:

Either way, the user agent responds by presenting a user interface that prompts the user to choose the screen area to share. Both of these implementations of startCapture() return the MediaStream containing the captured display imagery.

For the purposes of the Screen Capture API, a display surface is any content object that can be selected by the API for sharing purposes. Sharing surfaces include the contents of a browser tab, a complete window, and a monitor (or group of monitors combined together into one surface).

A logical display surface is one which is in part or completely obscured, either by being overlapped by another object to some extent, or by being entirely hidden or offscreen. How these are handled by the Screen Capture API varies. Generally, the browser will provide an image which obscures the hidden portion of the logical display surface in some way, such as by blurring or replacing with a color or pattern. This is done for security reasons, as the content that cannot be seen by the user may contain data which they do not want to share.

A user agent might allow the capture of the entire content of an obscured window after gaining permission from the user to do so. In this case, the user agent may include the obscured content, either by getting the current contents of the hidden portion of the window or by presenting the most-recently-visible contents if the current contents are not available.

The video and audio objects passed into the options object can also hold additional constraints particular to those media tracks. See Properties of shared screen tracks for details about additional constraints for configuring a screen-capture stream that are added to MediaTrackConstraints, MediaTrackSupportedConstraints, and MediaTrackSettings).

None of the constraints are applied in any way until after the content to capture has been selected. The constraints alter what you see in the resulting stream. For example, if you specify a width constraint for the video, it's applied by scaling the video after the user selects the area to share. It doesn't establish a restriction on the size of the source itself.

Note: Constraints never cause changes to the list of sources available for capture by the Screen Sharing API. This ensures that web applications can't force the user to share specific content by restricting the source list until only one item is left.

Note: For privacy and security reasons, screen sharing sources are not enumerable using enumerateDevices(). Related to this, the devicechange event is never sent when there are changes to the sources available for getDisplayMedia().

getDisplayMedia() is most commonly used to capture video of a user's screen (or parts thereof). However, user agents may allow the capture of audio along with the video content. The source of this audio might be the selected window, the entire computer's audio system, or the user's microphone (or a combination of all of the above).

Before starting a project that will require sharing of audio, be sure to check the browser compatibility for getDisplayMedia() to see if the browsers you wish compatibility with have support for audio in captured screen streams.

This allows the user total freedom to select whatever they want, within the limits of what the user agent supports. This could be refined further by specifying additional options, and constraints inside the audio and video objects:

In this example the display surface captured is to be the whole window. The audio track should ideally have noise suppression and echo cancellation features enabled, as well as an ideal audio sample rate of 44.1kHz, and suppression of local audio playback.

The promise returned by getDisplayMedia() resolves to a MediaStream that contains at least one video stream that contains the screen or screen area, and which is adjusted or filtered based upon the constraints specified when getDisplayMedia() was called.

Privacy and security issues surrounding screen sharing are usually not overly serious, but they do exist. The largest potential issue is users inadvertently sharing content they did not wish to share.

For example, privacy and/or security violations can easily occur if the user is sharing their screen and a visible background window happens to contain personal information, or if their password manager is visible in the shared stream. This effect can be amplified when capturing logical display surfaces, which may contain content that the user doesn't know about at all, let alone see.

First, some constants are set up to reference the elements on the page to which we'll need access: the into which the captured screen contents will be streamed, a box into which logged output will be drawn, and the start and stop buttons that will turn on and off capture of screen imagery.

The startCapture() method, below, starts the capture of a MediaStream whose contents are taken from a user-selected area of the screen. startCapture() is called when the "Start Capture" button is clicked.

After clearing the contents of the log in order to get rid of any leftover text from the previous attempt to connect, startCapture() calls getDisplayMedia(), passing into it the constraints object defined by displayMediaOptions. Using await, the following line of code does not get executed until after the promise returned by getDisplayMedia() resolves. Upon resolution, the promise returns a MediaStream, which will stream the contents of the screen, window, or other region selected by the user.