geryala otomars farina

0 views

Skip to first unread message

Janeen Bahrke

unread,

Aug 2, 2024, 11:48:35 AM8/2/24

to payrodreucar

This post describes how to formulate the optimal encoding ladder with VMAF. This analysis is excerpted from a lesson in the online course Streaming Media 101: Technical Onboarding for Streaming Media Professionals.

Netflix performs this analysis for each video they distribute, which makes sense when your files are viewed tens of millions of times (actually, Netflix has moved on to Dynamic Optimization, which uses a similar analysis for each shot). Most other producers should do this with 5-10 files per genre (sports, animation, talk shows, movies) to create an average ladder for that content.

Table 1 shows the VMAF scores at each resolution and data rate. The yellow rows are the rungs of the ladder selected as described below. The Max column identifies the highest score for each data rate. The green box is the cell identified via conditional formatting as equalling the Max score which is the highest VMAF score at that resolution. More on this below.

To achieve this, I multiply each rung by .6 which gives you a spacing of 1.66x between rungs. Going lower (say to 1.5x) increases the rungs in your encoding ladder and your encoding and storage cost. Going higher (say to 2.0) does the reverse. For this analysis, I kept adding rungs until I had one rung at or under 300 kbps.

This analysis assumes full-screen playback for all clips which is obviously fair to assume for Netflix. If your website has player window sizes like 720p or 360p that are commonly used I would make sure to have at least one rung at that resolution and might bunch several at that resolution.

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

By way of background, I'm preparing a per-title encoding comparison. To derive the "ideal" ladder for x264 and x265, I'm using the Netflix Convex Hull analysis that encodes the same file at multiple resolutions and bitrates and chooses the highest-scoring resolution at each rung.

My interpretation of this is shown above. I've encoded at multiple resolutions and bitrates, and the 1080p file is the highest quality resolution all the way down to 200 kbps (this is a simple animated video). This means that the highest quality ladder would use 1080p all the way through. The question is, does a ladder like this exclude any relevant devices?

Bitmovin, who rated the highest in the analysis, produced a ladder for the same El Ultimo animation where the lowest resolution was 1024x576. Another largely screencam-based video had a low resolution of 1600x900.
One very large and very well-respected encoding shop produced a ladder for the same El Ultimo animation where the lowest quality rung was 640x360, with the screencam file bottoming out at 800x450. The lowest resolution file for most other files was 320x180.

So, the question is, does my ideal ladder need rungs at lower resolutions than 1080p if 1080p delivers higher quality? For this analysis, let's assume full-screen playback for all files. I'm assuming that if you have a 640x360 or 720p window for playback on your website, it makes sense to have rungs at those resolutions.

Jan Ozer develops training courses for streaming media professionals; provides encoding-related testing services to encoder developers; helps video producers perfect their encoding ladders and deploy new codecs. Jan blogs primarily at the Streaming Learning Center.

I modified this analysis in three ways. As discussed in the aforementioned article, I considered 99th percentile scores as a measure of quality variability. Specifically, I chose the lowest bitrate where the total VMAF score was 95 or higher, and 99% of all frames had a score of 89 or higher.

The final modification is related to the resolution of the lowest rung. Particularly with HEVC, the highest quality low rung was often 720p or higher. However, as you can read about here, digital rights management and other considerations dictated that the lowest rung be smaller than 720p, so I chose 640360. Accordingly, even if the top quality resolution at 200 kbps was 720p, I used 360p.

Briefly, Figure 1 plots the vertical resolution and data rate of the average HEVC encoding ladders produced by the services that I analyzed. The purple ladder is the convex hull, or the theoretical perfect ladder. You see that it reaches close to 1080p by about 1200 kbps. One ladder, in green, is even more aggressive, reaching 1080p at around 600 kbps. The light blue ladder tracks the convex hull through about 800 kbps and then gets slightly more aggressive. The other three ladders are much more conservative, reaching 1080p at 2800 kbps to 3700 kbps.

All ladders use different codecs and/or different settings for the same codecs (like x265). But in general, the closer the ladder was to the convex hull, the better the service performed in overall quality and other measured criteria.

Animated content typically encodes more efficiently than entertainment content and benefits from higher resolutions lower in the encoding ladder. While the total bitrates are much lower than the entertainment clips, proving the first point, the results were mixed as to ladder steepness. The x264 convex hull hit 1080p much sooner than with the entertainment clips (2100 kbps as opposed to 4875), but both x265 clips deployed lower resolution rungs higher in the ladder.

Office content tends towards the easiest to encode, with a screencam, a PowePoint-based video, and a simple talking head, plus other office-related content. The disruption in the ladders comes from the fact that several clips only had three rungs. Since the bottom rung was always 640360, the resolution of rung 3 was actually lower than rung 4.

Develops training courses for streaming media professionals; provides encoding-related testing services to encoder developers; helps video producers perfect their encoding ladders and deploy new codecs. Jan blogs primarily at the Streaming Learning Center.

This article will introduce you to the concept of an encoding ladder, identifying what an encoding ladder is, what it does, and how to create one. It concludes with a look at the finer details of creating and deploying encoding ladders.

You create encoding ladders whenever you stream using HLS or DASH with Wowza Video or Wowza Streaming Engine. For VOD experiences, you create the ladder and upload that to your Wowza product. For live, you can either create the streams in your own encoder and deliver them to the server, or use Wowza Transcoder to create the encoding ladder from your source. You see this in Figure 2, where the Transcoder is ingesting a live stream and creating an encoding ladder with the source as the top rung.

The smaller the jump (say 1.5), the more rungs you produce, which increases encoding and storage costs but may increase the quality of experience slightly by delivering more higher quality rungs. The larger the jump (say 2.0), the lower the cost of encoding and storage, but the quality of experience may drop slightly.

You may want to consider the source when setting these resolutions. For example, sports content might look better at slightly smaller resolutions while animations will definitely look better at larger resolutions. This leads us to the next point about per-title encoding.

A bitrate ladder is a collection of video files encoded at varying bitrates, resolutions, and qualities. These files are carefully crafted to cater to different network conditions and device capabilities. The higher the bitrate, the more data is allocated to represent a video frame, resulting in higher quality but larger file sizes. By having multiple versions of a video at different bitrates, the streaming platform can dynamically adjust and deliver the most appropriate version based on the network conditions and device capabilities.

Per-Title Encoding and Context-Aware Encoding represent two different methodologies utilized in constructing bitrate ladders for ABS. These methods offer varying approaches to optimizing video quality and viewer experience, bearing unique strengths that cater to diverse video content and distribution conditions.

On the other hand, Context-Aware Encoding factors in variables beyond the individual video, considering the distribution conditions and the specific devices on which the content will be viewed. As the name suggests, it maintains an awareness of the context, adjusting the encoding strategy based on network conditions and device capabilities. Context-aware encoding ensures a more holistic optimization by considering bandwidth, screen resolution, and battery life.

The JSON Web Encryption (JWE) key ladder uses the Web Crypto API unwrap function for key exchange. The keying material is specified within a JSON Web Key (JWK) that specifies algorithm, usage, and extractable attributes. The JWK is then wrapped into a JWE structure. This combination will be referred to as JWE+JWK. All keys are marked non-extractable.

When coupled with the pre-shared keys Kpw or model group keys Kdw wrapping key, this scheme guarantees that key exchange is being performed for the requesting entity but does not provide perfect forward secrecy.

Unlike other key exchange schemes, the key ladder returns three keys: an AES-128-KeyWrap wrapping key Kwrap, an AES-128-CBC session encryption key Kenc, and an HMAC-SHA256 session HMAC key Khmac. Kwrap will be wrapped using AESWrap with Kpw, Kdw, or a previously issued Kwrap. The session keys Kenc and Khmac will be wrapped using AESWrap with Kwrap.

An intermediate wrapping key Kwrap is used to limit the use of a single wrapping key for all unwrap operations in much the same way as the session keys are time-limited to restrict their usage. Use of a previously issued Kwrap instead of Kpw or Kdw is likely to be more efficient as the Kpw and Kdw keys require a higher level of security due to their permanent nature.