Hi,
The `mjr` file contains a start timestamp which you can get through `janus-pp-rec --extended-json`. It is the `u` field in the returned JSON metadata. You can use this timestamp to offset the media files so that they align with each other.
Mixing audio is trivial. You use `adelay` filter to delay each audio recording according to its start offset. Then you pass them all to `amix` to mix them all into one audio file. There is some nuance regarding amix's loudness normalization (the output volume may be too low) so you might need to dynamically re-normalize the output, disable amix's input normalization, or both.
As for compositing video, it is a bit trickier. The counterpart for `adelay` for video is the `tpad` filter. This will delay the video by adding black frames at the start.
Now, the hard part. Each participant may have intervals where they have video and where they do not (e.g., when they turn off their camera). For those intervals where they do not have video, you need at least a thumbnail with their name on it (ideally an avatar/badge, but that's another layer of complexity). The simple thumbnail can be achieved through the `color` filter to create a background color, then the `drawtext` filter to add text to that background. The participant's `tpad`-ded videos are `overlay`-ed to this thumbnail but using `enable` timeline editing so that the padded black frames are not overlayed to the thumbnail.
After you have the participant's videos, you decide on the layout. Then use the `scale` filter on each video according to its position on the layout. For scaling cameras, I use the `force_original_aspect_ratio=increase` scaling option then pass it to the `crop` filter, so that 4:3 cameras are scaled to 16:9 (by cropping the top and bottom). But for scaling screen shares, it should be scaled with `force_original_aspect_ratio=decrease` and then passed to the `pad` filter, so that you don't crop any part of the screen share (which may contain important text!).
Finally, you can use the `xstack`, `vstack`, or `hstack` filters to stack the videos on a grid or the layout of your choice. If the number of videos is a perfect square (e.g., 2x2, 3x3, or 4x4 grids), `xstack` will do the trick. If not, you will have to use a combination of `vstack`, `hstack`, and `scale` filters.
Of course, you combine the composite video and mixed audio in the end.
Br, Erwin