Video Transcoding Pipeline

0 views

Skip to first unread message

Granville Turley

unread,

Aug 3, 2024, 4:00:36 PM8/3/24

to sibornpupe

AWS Elemental MediaConvert is a new file-based video transcoding service that provides a comprehensive suite of advanced transcoding features, with on-demand rates starting at $0.0075/minute. Learn more.

Adaptive streaming offers better user experience by adjusting to network conditions and CPU utilization, automatically switching to higher or lower quality streams. Amazon Elastic Transcoder can create a set of segmented output renditions at different resolutions and bit rates, and a corresponding playlist or manifest file, all stored in Amazon S3. Amazon Elastic Transcoder supports the following implementations:

Amazon Elastic Transcoder has several default limits for the number of transcoding pipelines, custom transcoding presets and outputs per job. For details about these limits, please refer to the developer guide. If these limits are insufficient for your needs, please contact us. We will evaluate and respond to your request within two days.

Already using Amazon Elastic Transcoder? It's simple to migrate to MediaConvert. For more information, see this overview which includes valuable information about the migration process and links to additional resources.

Pipelines are queues that manage your transcoding jobs. When you create a job, you specify the pipeline to which you want to add the job. Elastic Transcoder starts processing the jobs in a pipeline in the order in which you added them.

If there are other jobs in a pipeline when you create a job, Elastic Transcoder starts processing the new job when resources are available. A pipeline can process more than one job simultaneously, and the time required to complete a job varies significantly based on the size of the file you're converting and the job specifications. As a result, jobs don't necessarily complete in the order in which you create them.

This is a pretty simple and straightforward architecture that gives you the power to build/automate your very own "video encoding" workflow on graviton 2 processors. It's a complete solution in form of CloudFormation template and very little manual setup is required.

You can configure whatever you want according to your requirements. This setup is focused on using Graviton 2 because it offers 40% higher performance at 20% lower cost than its predecessors. Even Netflix uses these processors.

You can run this setup as is and every time you want to trigger an encode, just upload a valid video file in the Input "folder" in the S3 bucket created via this template. S3 doesn't have the concept of "Folders/Directories", however, to visualize the data, you can create a "Folder".

A few years back when I was on very limited storage space, I had to run my precious videos through heavy encoding methods on my sluggish computer. It took at least 40 minutes to encode a 20 minute video and that too in 848x480 resolution. And running multiple videos through the same tool manually wasn't an ideal situation as well. I've always wanted to automate that process, as it can be handy not just for people like me, but for other businesses as well.

It's a completely automated system in AWS cloud that could take videos and compress them into smaller sizes with a bare minimum reduction in video quality. And the user could control almost every aspect of the encoding settings without directly interacting with FFmpeg or any other command-line utility. This project is completely built in a CloudFormation template, which makes it even easier for people/businesses to try out Graviton 2.

This project can serve as a way for businesses to test and port their existing workloads to Graviton 2 processors without the headache of setting up things manually. Just spin up a CloudFormation stack from this template, test your work and decide whether you're ready for the switch or not.

Initially I made multiple iterations of how this whole thing should be built. I began with a very manual setup of creating the resources like an EC2 instance and then setting up FFmpeg on it to encode videos. But, it wasn't a handy solution and would be very difficult for people to use it.Took me a few weeks and trials and errors to write a CloudFormation template that actually works in 2021.

First things first, this is my first time actually writing a CloudFormation template. Manually creating the services via AWS Console is easy, but isn't "transportable". I found some instances of older CloudFormation templates to create batch jobs, however, as expected, they were outdated and now I have 5+ CloudFormation stacks that are stuck and wouldn't delete.

I got stuck for a few days while building the "Compute Environment" in AWS Batch and even Stackoverflow couldn't come to the rescue. So, I tried joining the AWS slack and Discord channels. Well, as expected, no help there as well. But, a friend noticed the mistake and helped me out of the pickle. Here's the StackOverflow question for interested folks: Can't perform Sts:AssumeRole

Last but not the least, I worked on this whole thing alone (architecture design, implementation, documentation, 2 POC web apps, demo video, video editing etc.) and it's challenging to work on such a piece all alone and especially when you have a Full-time job as an SWE. It's difficult to spend workdays and weekends doing the same thing.

It's a big one. The idea was pretty interesting to build everything into a single CloudFormation template, but midway I had almost given up on it due to the issues I kept on running into. But, finally, I was able to pick everything I wanted and put them in a working CoudFormation template.

Another thing I'm proud of is that on top of building this AWS CloudFormation template, I was able to build 2 POCs to showcase how this pipeline could be used. Another feat is that there's almost no up-to-date documentation on how to use aws-sdk with Angular. Well, I was able to work it out and run those 2 web apps and communicate with AWS services just fine.

I had an idea about "dockers", but had never worked with them and didn't fully understand their working. With this project, I can say that I have a much better understanding of dockers now. I was able to build multi-arch supported docker images and was also able to use Amazon ECR and ECS.

When I started writing the video encoding python script, I had many ideas in mind on how to make it much more flexible. I've implemented some parts of it, but I know few areas that I can optimize further to make this whole architecture much more customizable and flexible.

Also, this project is much more of an architectural idea/implementation and those 2 web apps are just POCs (Proof of concept) to show how this pipeline can be utilized. You can, however, go through the Cloudformation template and code of both the POCs on my Github.

I developed this idea of an Automated video encoding pipeline. I was responsible for coming up with the architecture, implementing it, covering CloudFormation template, building 2 web based POCs along with all the documentation, PPT and Demo Video.

x264enc by default has a latency higher than the default queue sizes causing your pipeline to stall. (It has to consume more data than it currently gets to actually create an output buffer. That way the pipeline will never finish pre-roll).

First thing you have to manage is mkv container demuxing/muxing.
Second thing is that there is a bug in nvv4l2decoder with H264 parsing in some L4T versions. The workaround is just removing h264parse before nvv4l2decoder.
So just try:

You would use multiple sub-pipelines, using queue for each output from muxer (as you would do after tee) and adding a queue for each input of muxer. This would save synchronization problems.
Assuming you have no transcoding for audio, you would use the following for transcoding first H264 stream into H265 while keeping the first audio track with same encoding:

HI ,
Once again thanks for the help so far, its been useful in my little home project. I do have need of some help, i am really struggling understanding much about gstreamer, its pipeline, pads,caps etc. if you could provide an example based on this example that will recode the video from x264 to x265 and copy two audio tracks, and copy two subtitle tracks that would help me a bunch.

This will recode the video from H264 to H265 and then passthrough audio tracks using the track index. I do need to know how many tracks there are, and build the audio parts in a loop, but I can do this in a script quite happily.

As it stands with the two niggles above, I can work around it by dumping the TRUEHD tracks to individual files using MKVExtract along with timestamps, and then remux them back in using ffmpeg, or MKVMerge, the same goes for subtitles. Its a shame as it adds a fair bit of time to the process, So if you did have some insight into the subtitles that would be fantastic . I tried doing subtitles like this:-

If you want to get a gstreamer transcoding pipeline, it may be less generic depending on how many formats you have, but you would post the output of this command that may tell what items/formats are in your mkv container:

Special thanks go to: Christopher Kennedy, Staff Video Engineer at Crunchyroll/EllationJohn Nichols, Principal Software Engineer at Xilinx, jni...@xilinx.com for their information on FFmpeg and reviewing this article.

How does FFmpeg programmatically deal with instances where a single input stream is required to generate multiple transcoded and/or transmuxed outputs? We went directly into the latest FFmpeg Release 3.3. source code in order to understand its threading model and transcoding pipeline.

Following the frame to the end of the pipeline, it enters process_input_packet() (line 2591) which decodes the frame and processes it through all the applicable filters. Timestamp correction and subtitle handling also occurs in this function. Finally, prior to returning, the decoded frame is copied to each relevant output stream.

To determine if the TwitchTranscoder would perform better than FFmpeg on daily transcoding tasks, we performed a series of basic benchmark tests. For our tests, we fed both tools with a Twitch live stream as well as a 1080p60 video file using the same presets, profiles, bitrates, and other flags. Each source was transcoded to our typical stack of 720p60, 720p30, 480p30, 360p30, and 160p30.