How to glue the .mjr files received after recording from videocall participants into one video, taking into account the delay in connecting to the call?

John Doe

unread,

Jul 28, 2022, 8:54:45 AM7/28/22

to meetecho-janus

I found this article that describes how you can collect a video record of only one user. https://ourcodeworld.com/articles/read/1198/how-to-join-the-audio-and-video-mjr-from-a-recorded-session-of-janus-gateway-in-ubuntu-18-04 I would like someone to share their experience of how to make a video from both participants of a video call.

Erwin Dee Junio Villejo

unread,

Jul 28, 2022, 10:23:23 AM7/28/22

to meetecho-janus

Hi,

The `mjr` file contains a start timestamp which you can get through `janus-pp-rec --extended-json`. It is the `u` field in the returned JSON metadata. You can use this timestamp to offset the media files so that they align with each other.

Mixing audio is trivial. You use `adelay` filter to delay each audio recording according to its start offset. Then you pass them all to `amix` to mix them all into one audio file. There is some nuance regarding amix's loudness normalization (the output volume may be too low) so you might need to dynamically re-normalize the output, disable amix's input normalization, or both.

As for compositing video, it is a bit trickier. The counterpart for `adelay` for video is the `tpad` filter. This will delay the video by adding black frames at the start.

Now, the hard part. Each participant may have intervals where they have video and where they do not (e.g., when they turn off their camera). For those intervals where they do not have video, you need at least a thumbnail with their name on it (ideally an avatar/badge, but that's another layer of complexity). The simple thumbnail can be achieved through the `color` filter to create a background color, then the `drawtext` filter to add text to that background. The participant's `tpad`-ded videos are `overlay`-ed to this thumbnail but using `enable` timeline editing so that the padded black frames are not overlayed to the thumbnail.

After you have the participant's videos, you decide on the layout. Then use the `scale` filter on each video according to its position on the layout. For scaling cameras, I use the `force_original_aspect_ratio=increase` scaling option then pass it to the `crop` filter, so that 4:3 cameras are scaled to 16:9 (by cropping the top and bottom). But for scaling screen shares, it should be scaled with `force_original_aspect_ratio=decrease` and then passed to the `pad` filter, so that you don't crop any part of the screen share (which may contain important text!).

Finally, you can use the `xstack`, `vstack`, or `hstack` filters to stack the videos on a grid or the layout of your choice. If the number of videos is a perfect square (e.g., 2x2, 3x3, or 4x4 grids), `xstack` will do the trick. If not, you will have to use a combination of `vstack`, `hstack`, and `scale` filters.

Of course, you combine the composite video and mixed audio in the end.

Br, Erwin

John Doe

unread,

Aug 3, 2022, 2:22:36 AM8/3/22

to meetecho-janus

Thank a lot.

Here's what I got:

Bash script for mixing audio: Script mixing Janus .mjr audio files (videocall recordings) (github.com)

Bash script for mixing video: Script mixing Janus .mjr video files (videocall recordings) (github.com)

Combining videos takes a very long time. Can someone tell me if it is possible to speed up the process somehow?

четверг, 28 июля 2022 г. в 17:23:23 UTC+3, erwin....@truedigital.com:

Lorenzo Miniero

unread,

Aug 3, 2022, 4:59:19 AM8/3/22

to meetecho-janus

Video mixing and transcoding is a CPU intensive process. Maybe you can speed things up with a hardware encoder, but I'm not sure.

L.

Erwin Dee Junio Villejo

unread,

Aug 3, 2022, 6:23:39 AM8/3/22

to meetecho-janus

I wouldn't recommend the `h264_nvenc` hardware encoder. It is optimized for realtime encoding so while it is fast, it comes at the cost of quality (poorer) and file size (larger). I would only use it for live streaming (as an RTP forwarding target that transcodes for HLS/DASH). But for postprocessing recordings and producing VODs, you usually put more weight on quality and file size instead of just raw speed.

For encoding audio, `libopus` is unmatched. You also can't go wrong with `aac` as it is more widely supported.

As for video, it's a toss-up between `libvpx-vp9` and `libx264`. Do you care more about video quality and output file size at the cost of slow rendering time? Go VP9 (.webm container, together with Opus audio). Do you want faster renders (as much as 4x faster) while producing slightly less quality at 1.5-2x the output file size? Go H264 (.mp4 container with AAC audio).

I created a Node.js FFmpeg tool which you can check for reference: https://github.com/erwinv/yaffu/blob/main/lib/codec.ts

Br, Erwin

Reply all

Reply to author

Forward