This is often done to maximize the number of compatible playback devices, making the encoded data more accessible to a wider audience. In other words, this is the technology that makes it possible for you to watch your favorite Netflix show both on your television and on your smartphone.
When you encode raw media files, you compress and format those files to make them small enough for transfer over a network. Transcoding happens after data is already encoded and is the process for unencoding (decompressing), changing, and re-encoding that data.
Transmuxing goes by many names, including transcode-multiplexing, repackaging, and packetizing. Repackaging is possibly a more apt name as this process involves repackaging data into a different file container or delivery format. For example, you could use MPEG-TS containers for an HTTP Live Streaming (HLS) stream and change them to fMP4 containers for Dynamic Adaptive Streaming over HTTP (MPEG-DASH). As different playback devices work with different streaming protocols and containers, this is an effective method for ensuring your media is more widely accessible.
Using a transcoding software or service, you can simultaneously create a set of time-aligned video streams, each with a different bitrate and frame size, while converting the codecs and protocols to reach additional viewers. This set of internet-friendly streams can then be packaged into several adaptive streaming formats (e.g. HLS), allowing playback on almost any screen on the planet.
The biggest benefits of transcoding are best understood through the lens of ABR. This strategy builds out a variety of media file versions for streaming to a wide range of devices across different end user bandwidths. ABR is an excellent example of how various forms of transcoding, including transizing and transrating, work together to improve a stream.
Diversifying bitrates and resolutions makes it possible to not only reach more devices, but also stream more effectively (and at the highest possible quality) across those devices. Your stream (live or otherwise) will no longer be at the mercy of heavy buffering caused by unexpected dips in bandwidth. This is especially effective for live streaming, as an uninterrupted viewer experience is essential to its promise.
Ultimately how beneficial transcoding is for your stream will depend on your use case and target audience. However, the adaptive and scalable benefits of transcoding typically outweigh the added steps in your workflow.
Wowza provides robust live transcoding software capabilities to power any workflow. With Wowza Video, you get a one-stop integrated solution for transcoding, content management, delivery, playback, and end-to-end analytics. We also offer Wowza Streaming Engine media server software for organizations that need to deploy a software solution on premises or offline. Either solution could be right for you, depending on your unique needs.
This depends on the software you use, whether you are transcoding with lossy or lossless compression, and the size of the video files going in. In any case, transcoding does take time, which is part of makes live transcoding so tricky.
Sydney works for Wowza as a content writer and Marketing Communications Specialist, leveraging roughly a decade of experience in copywriting, technical writing, and content development. When observed in the wild, she can be found gaming, reading, hiking, parenting, overspending at the Renaissance Festival, and leaving coffee cups around the house.
Special thanks go to Christopher Kennedy, Staff Video Engineer at Crunchyroll/Ellation John Nichols, Principal Software Engineer at Xilinx, jni...@xilinx.com for their information on FFmpeg and reviewing this article.
Twitch, like many other live streaming services, receives live stream uploads in Real-Time Messaging Protocol (RTMP) from its broadcasters. RTMP is a protocol designed to stream video and audio on the Internet, and is mainly used for point to point communication. To then scale our live stream content to countless viewers, Twitch uses HTTP Live Streaming (HLS), an HTTP-based media streaming communications protocol that most video websites also use.
Within the live stream processing pipeline, the transcoder module is in charge of converting an incoming RTMP stream into the HLS format with multiple variants (e.g., 1080p, 720p, etc.). These variants have different bitrates so that viewers with different levels of download bandwidth are able to consume live video streams at the best possible quality for their connection.
FFmpeg ( ) is a popular open-source software project, designed to record, process and stream video and audio. It is widely deployed by cloud encoding services for file transcoding and can also be used for live stream transmuxing and transcoding.
Suppose we are receiving the most widely used video compression standard of H.264 in RTMP at 6mbps and 1080p60 (resolution of 1920 by 1080 with a frame rate of 60 frames per second). We want to generate 4 HLS variants of:
One solution is to run 4 independent instances of FFmpeg, each of them processing one variant. Here we set all of their Instantaneous Decoding Refresh (IDR) intervals to 2 seconds and turn off the scene change detection, thus allowing the output HLS segments of all variants are perfectly time aligned, which is required by the HLS standard.
hls_list_size is used to determine the maximum number of segments in the playlist (e.g., we can use 6 for live streaming or set it equal to 0 to have a playlist of all the segments). The segment duration (the optional hls_time flag) will be same as the IDR interval, in our case is 2 seconds.
Since H.264 is a lossy compression standard, transcoding will inevitably trigger video quality degradation. Moreover, encoding is a very computationally expensive process, particularly for high resolution and high frame rate video. Due to these two constraints, we would ideally like to transmux rather than transcode the highest variant from the source RTMP to save the computational power and preserve the video quality.
In the above example, if we want to transmux an input 1080p60 RTMP source to HLS, we can actually use the above commands without specifying a size or target FPS and specifying copy for the codecs (to avoid decoding and re-encoding the source):
Transmuxing the source bitstream is an effective technique, but might cause output HLS to lose its spec compliancy, causing it to be unplayable on certain devices. We will explain the nature of the problem and its ramifications in the next section.
If we wanted to transmux instead of transcode the highest variant while transcoding the rest of the variants, we can replace the the first output configuration with the previously specified copy codec:
An alternative to running the following transcodes in a single FFmpeg instance is to run multiple instances, one for each desired output in parallel. The 1-in-N-out FFmpeg is a computationally cheaper process, whose reason we will explain next.
However, in both the 1-in-1-out and the 1-in-**N__-out FFmpeg instances, the N encoders associated with the N output variants are independent. As explained above, the result IDRs will not be aligned (see Figure 4) unless all variants are transcoded (i.e. the highest variant is also transcoded not transmuxed from the source).
As discussed in Figure 2, our RTMP-HLS transcoder takes in an input of 1 stream and produces an output of __N__ streams (N = the number of HLS variants. E.g., N = 4 in Figure 5). The simplest way to achieve this output is to create N independent 1-in-1-out transcoders, with each generating 1 output stream. The FFmpeg solution described above utilizes this model and has N FFmpeg instances.
Since the N decoders are identical, the transcoder should ideally eliminate the redundant N-1 decoders and feed decoded images from the only decoder to the N downstream scalers and encoders (see Figure 7).
The two transcoded variants 720p60 and 720p30 in this example can share one scaler. As gathered from experiments, scaling is a very computationally expensive step in the transcoding pipeline. Avoiding the unnecessary repeated scaling process altogether can significantly optimize the performance of our transcoder. Figure 8 depicts a threading model of combining the scalers of 720p60 and 720p30 variants.
Besides the decoder and scaler sharing, a more important feature is multithreading. Since both the encoder and scaler are very computationally expensive processes, it is critical to utilize the modern multi-core CPU architecture and let the multiple variants be processed simultaneously. From our experiment, we find multithreading very useful for achieving higher density and is critical for certain applications like 4K.
Our typical incoming bitstreams have 60fps (frames per second), and we transcode them to 30fps for lower bit-rate variants (e.g., 720p30, 480p30, etc.). On the other hand, since Twitch is a global platform, we often receive incoming streams of 50fps, most of which are from PAL countries. In this case, the lower bit-rate variant should be downsampled to 25fps, instead of 30fps.
Simply deleting every second frame is not a good solution here. Our downsampler needs to behave differently for the two different kinds of incoming bitstreams. One kind has constant frame rates less than 60fps, and the other has irregular frame dropping, which makes its average frame rates less than 60fps.
Certain information needs to be inserted into the HLS bitstreams to enhance the user experience on the player side. By building our own transcoder and player, Twitch can control the full end-to-end ingest-transcode-CDN-playback pipeline. This allows us to insert proprietary metadata structures into the transcoder output, which are eventually parsed by our player and utilized to produce effects unique to Twitch.
Especially watching an ongoing recording, perhaps started 30 minutes before watching, gives issues as breaks and freezes, and sometimes the playing will stop, even though the recording is still ongoing.
c80f0f1006