This document covers the language specific requirements for US English. Please make sure to also review the General Requirements section and related guidelines for comprehensive instructions surrounding timed text deliveries to Netflix.
I. Subtitles for the Deaf and Hard of Hearing (SDH)
This section applies to subtitles for the deaf and hard of hearing created for English language content (i.e. intralingual subtitles). For English subtitles for non-English language content, please see Section II
Text in each line in a dual speaker subtitle must be a contained sentence and should not carry into the preceding or subsequent subtitle. Creating shorter sentences and timing appropriately helps to accommodate this.
II. English Subtitles
This section applies to English subtitles created for non-English language content (i.e. interlingual subtitles). For subtitles for the deaf and hard of hearing for English language content, please see Section I.
This specification defines WebVTT, the Web Video Text Tracks format. Its main use is for marking up external text track resources in connection with the HTML element.WebVTT files provide captions or subtitles for video content, and also text video descriptions [MAUR], chapters for content navigation, and more generally any form of metadata that is time-aligned with audio or video content.
This section describes the status of this document at the time of its publication. Otherdocuments may supersede this document. A list of current W3C publications and the latest revisionof this technical report can be found in the W3C technical reportsindex at
For this specification to exit the CR stage, at least 2 independent implementations of everyfeature defined in this specification need to be documented in the implementation report. Theimplementation report is based on implementer-provided test results for thetest suite. The WorkingGroup does not require that implementations are publicly available but encourages them to be so.
Publication as a Candidate Recommendation does not imply endorsement by theW3C Membership. This is a draft document and may beupdated, replaced or obsoleted by other documents at any time. It is inappropriate to cite thisdocument as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
WebVTT files provide captions or subtitles for video content, and also text video descriptions [MAUR], chapters for content navigation, and more generally any form of metadata that istime-aligned with audio or video content.
The majority of the current version of this specification is dedicated to describing how to useWebVTT files for captioning or subtitling. There is minimal information about chapters andtime-aligned metadata and nothing about video descriptions at this stage.
You can see that a WebVTT file in general consists of a sequence of text segments associated witha time-interval, called a cue (definition). Beyond captioning and subtitling,WebVTT can be used for time-aligned metadata, typically in use for delivering name-value pairs incues. WebVTT can also be used for delivering chapters, which helps with contextual navigation aroundan audio/video file. Finally, WebVTT can be used for the delivery of text video descriptions, whichis text that describes the visual content of time-intervals and can be synthesized to speech to helpvision-impaired users understand context.
The first cue is simple, it will probably just display on one line. The second will take two lines, one for each speaker. The third will wrap to fit the width of the video, possibly taking multiple lines. For example, the three cues could look like this:
In this example, an HTML page has a CSS style sheet in a style element that styles all cues in the video with a gradient background and a text color, as well as changing the text color for all WebVTT Bold Objects in cues in the video.
The string "-->" cannot be used in the style sheet. If the style sheet is wrapped in"", then those strings can just be removed. If"-->" appears inside a CSS string, then it can use CSS escaping e.g."--\>".
Due to the syntax rules of CSS, some characters need to be escaped with CSS character escape sequences. For example, an ID that starts with a number 0-9 needs to be escaped. The ID 123 can be represented as "\31 23" (31 refers to the Unicode code point for "1"). See Using character escapes in markup and CSS for more information on CSS escapes.
In this example, each cue says who is talking using voice spans. In the first cue, the span specifying the speaker is also annotated with two classes, "first" and "loud". In the third cue, there is also some italics text (not associated with a specific speaker). The last cue is annotated with just the class "loud".
Since the cues in these examples are horizontal, the "position" setting refers to a percentage of the width of the video viewpoint. If the text were vertical, the "position" setting would refer to the height of the video viewport.
The "line-left" or "line-right" only refers to the physical side of the box to which the "position" setting applies, in a way which is agnostic regarding the horizontal or vertical direction of the cue. It does not affect or relate to the direction or position of the text itself within the box.
The second cue has its cue box right aligned at the 90% mark of the video viewport width ("right" aligned text right aligns the box). The same effect can be achieved with "position:55%,line-left", which explicitly positions the cue box. The third cue has center aligned text within the same positioned cue box as the first cue.
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", "MAY", and "OPTIONAL" in the normativeparts of this document are to be interpreted as described in RFC2119. The key word "OPTIONALLY" inthe normative parts of this document is to be interpreted with the same normative meaning as "MAY"and "OPTIONAL". For readability, these words do not appear in all uppercase letters in thisspecification. [RFC2119]
Requirements phrased in the imperative as part of algorithms (such as "strip any leading spacecharacters" or "return false and abort these steps") are to be interpreted with the meaning of thekey word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in anymanner, so long as the end result is equivalent. (In particular, the algorithms defined in thisspecification are intended to be easy to follow, and not intended to be performant.)
All processing requirements in this specification apply. The user agent must also be conforming implementations of the IDL fragments in this specification, as described in the Web IDL specification. [WEBIDL-1]
All processing requirements in this specification apply, except parts of 6 Parsing that relate to stylesheets and CSS, and all of 7 Rendering and 8 CSS extensions. The user agent must instead only render the text inside WebVTT caption or subtitle cue text in an appropriate manner and specifically support the color classes defined in 5 Default classes for WebVTT Caption or Subtitle Cue Components. Any other styling instructions are optional.
All processing requirements in this specification apply, including the color classes defined in 5 Default classes for WebVTT Caption or Subtitle Cue Components. However, the user agent will need to apply the CSS related features in 6 Parsing, 7 Rendering and 8 CSS extensions in such a way that the rendered results are equivalent to what a full CSS supporting renderer produces.
All processing requirements in this specification apply. However, only a limited set of CSS styles is allowed because user agents that do not support a full HTML CSS engine will need to implement CSS functionality equivalents. User agents that support a full CSS engine must therefore limit the CSS styles they apply for WebVTT so as to enable identical rendering without bleeding in extra CSS styles that are beyond the WebVTT specification.
Conformance checkers must verify that a WebVTT file conforms to the applicable conformance criteria described in this specification. The term "validator" is equivalent to conformance checker for the purpose of this specification.
When an authoring tool is used to edit a non-conforming WebVTT file, it may preserve the conformance errors in sections of the file that were not edited during the editing session (i.e. an editing tool is allowed to round-trip erroneous content). However, an authoring tool must not claim that the output is conformant if errors have been so preserved.
Different kinds of data can be carried in WebVTT files. The HTML specification identifiescaptions, subtitles, chapters, audio descriptions and metadata as data kinds and specifies which oneis being used in the text track kind attribute of the text track element [HTML51].
A boolean indicating whether the line is an integer number of lines (using the line dimensions of the first line of the cue), or whether it is a percentage of the dimension of the video. The flag is set to true when lines are counted, and false otherwise.
If the line is numeric, return the value of the WebVTT cue line and abort these steps. (Either the WebVTT cue snap-to-lines flag is true, so any value, not just those in the range 0..100, is valid, or the value is in the range 0..100 and is thus valid regardless of the value of that flag.)
In this example, the second cue will have a right-to-left base direction, rendering as ".I think ,يلاع". (Note that the text below shows all characters left-to-right; a text editor would not necessarily have the same rendering.)
b1e95dc632