Hi everyone,
sorry for the late reply! Let me try to answer your questions and get
rid of some misconceptions about EBU R128 as well as FFmpeg's loudnorm
filter.
First of all, the `loudnorm` filter chain I provided will do a dynamic
loudness normalization, possibly including audio compression and true
peak limitation if required. It should hence archive the desired effect
that if a second speaker is louder than the first, the loudness of the
second one will be adjusted to appear equally loud. Of course, this
effect is configurable.
This may become a bit more clear when you have a look at my suggested
configuration:
I=-23
Set the overall target loudness to -23 LUFS
LRA=1
Set the loudness range to 1 LU
tp=-1
Set the true peak limiter to a target of -1 dBFS to avoid clipping.
This should produce a similar effect as a dedicated audio compression
filter with the benefit of having a similar output loudness regardless
of the input. Though you have certainly more control over a dedicated
compressor which, however, also mean that you probably need to know
more about your content.
Note that the algorithm also include ways of dealing with silence. Not
only as a fixed audio gate, but also as a dynamic gate measuring
relative loudness to previous content. That is also why small silent
parts in an audio stream should not heavily effect the normalization.
For details of how the algorithm works, take a look at this post by the
filter's author:
http://k.ylo.ph/2016/04/04/loudnorm.html
…as well as EBU TECH 3341 which describes in more details how this
recommended audio normalization should work:
https://tech.ebu.ch/docs/tech/tech3341.pdf
What is important to know is that the filter configuration I provided
is the single-pass or `live` version. A two-pass normalization is
possible and should yield more accurate results.
An obvious scenario where a single-pass version may yield unwanted
results is a recording starting with a very silent signal (e.g.
recording with overhead microphone and some students mumbling in the
room). Since the normalization filter only knows about the signal at
the beginning, it would normalize the mumbling to -23 LUFS already and
not identify this as silence as the two-pass filter would due to the
huge relative loudness difference compared to to the following signal.
Sill, if we cut material or apply the normalization to the material
after the editor this should be no big problem for resulting Opencast
recordings. And having a single-pass does make things a lot easier :)
That said, implementing a two-pass version, if we integrate the metadata
into the media package as part of the media package, should also be
relatively easy.
Maybe, I can generate a few audio samples to demonstrate these effects.
That would probably be helpful anyway if we create a proper
documentation from this thread.
Talking about the concern about „crappy devices“ and insufficient
loudness when you normalize content to -23 LUFS as recommended by EBU
R128, I found an AES recommendation for streaming content which
addresses exactly this concern (and which references EBU R128).
The AES recommendation still strongly discourages being as loud as
possible before clipping (applying strong compression and peak
limiters). Something I think is a very good idea because even if you
can usually get away with a very low dynamic range on Opencast
recordings, the students with crappy devices will probably still be
also playing things with a higher dynamic range which means that they
will need to make loudness adjustments if they jump between e.g.
Opencast and YouTube. That is something I always find very annoying.
In fact, the recommendation (AES TD1004.1.15-10) still outlines EBU
R128 as an optimal solution if only the loudness of mobile devices in
the EU weren't limited to prevent hearing loss… a very good idea, but
it also introduces a disadvantage ;)
In the end, to compensate for that, their recommendation is to
normalize all streams in the range of -16 LUFS to -20 LUFS to limit
loudness jumps when switching between content to an acceptable maximum.
By selecting a value in that range, you can either choose to be louder,
or have more space for a higher dynamic, but you are not too loud or too
quiet.
For more details on the recommendations as well as a background of why
these recommendations were chosen, have a look at the (relatively
short) AES document:
http://www.aes.org/technical/documents/AESTD1004_1_15_10.pdf
One additional interesting recommendation in this document we could
probably implement is to include the loudness levels into the streams
meta-data to allow devices to automatically adjust their volume
accordingly. If that would be properly implemented by everyone (I don't
think it is implemented anywhere) people could easily switch between
-23 LUFS broadcasting content and -16 LUFS streaming content without
having an annoying jump in loudness. Maybe, we could set a good example.
When reading the AES recommendations, I was also wondering if we should
select a value like -20 LUFS for normalization in Opencast. That would
make us compliant with the AES recommendations while switching to any
broadcasting channel on the same device would still be a tolerable
difference in loudness. Though from the content, we should certainly be
able to use -16 LUFS. As Jan pointed out, we do not need (probably not
even want) a high dynamic range.
Best regards,
Lars
On Tue, 27 Feb 2018 18:27:47 +0100
Lars Kiesow <
lki...@uos.de> wrote:
> Hi everyone
>
> tldr; Use the following FFmpeg filter as part of your encoding
> profiles for audio normalization in Opencast:
>
> ffmpeg -i … -filter:a loudnorm=I=-23:LRA=1:dual_mono=true:tp=-1 …
>
>
> For ETH, I had a look at including audio normalization into their
> workflows to deal with fluctuations in their audio loudness (e.g. to
> avoid having some very quiet recordings). Since 2014, Opencast has a
> SoX integration for exactly this purpose which works quite fine:
>
>
https://docs.opencast.org/develop/admin/workflowoperationhandlers/normalizeaudio-woh/
>
> Starting with a video recording, the normalization operation would
> extract the recordings audio stream, have SoX analyze and normalize it
> to a certain RMS dB value and then integrate the new audio stream
> again into the original video container (there are a few more modes
> but that is probably the most common use case).
>
> While this works fine, current versions of FFmpeg make things quite a
> bit easier by providing the `loudnorm` filter for EBU R128 loudness
> normalization which can be used as part of any FFmpeg operation.
> Hence, we can normalize the audio for example as part of generating
> the distribution artifacts or the work files. Of course, you could
> also still do it as a separate operation :)
>
> If you are now wondering what the hell EBU R128 is: EBU 128 is the
> name of the European Broadcasting Union's recommendation for loudness
> normalization and permitted maximum level of audio signals, an attempt
> to avoid huge loudness fluctuations between different broadcasting
> channels.
>
> The most important thing relevant for Opencast is probably the
> recommendation to have the audio normalized to a target level of -23.0
> LUFS. There are more recommendations on which I based the filter
> definition above. For details, read:
>
>
https://tech.ebu.ch/loudness
>
> Less relevant to Opencast but still another interesting read when it
> comes to audio normalization and EBU R128 is the following document.
> It also outlines some reasoning behind the recommendation and possible
> solutions for what can be done if broadcasters ignore the
> recommendations to be the loudest out there:
>
>
https://tech.ebu.ch/docs/techreview/trev_2012-Q3_Loudness_van_Everdingen.pdf
>
>