Audio level normalization

noquierouser · May 31, 2021, 7:51am

As far as I can see, many people in this forum work in big TV stations, such as NRK, SVT, TVP or even SBS; as well as there are those who are from smaller TV stations, or even community ones like the one I work in.

I am really grateful for spaces like this, where I can comment, learn and share experiences with people who work every day to make television a great experience for viewers.

Taking advantage of the available space, I need to ask a question about a topic that personally worries me a lot, and that is the normalization of loudness in the final TV output.

I have been studying this subject for some time, so I am aware of the ITU-R BS.1770 standard and its derivatives, such as EBU R128, ATSC A85 and ARIB TR-B32. Currently in Chile there is no legislation that regulates loudness in television stations, however in our channel we decided to adopt the EBU R128 standard for our recorded television programs, which gives us excellent results in the audience.

However, a couple of years ago we started to receive content produced by other television and audiovisual production companies, who do not work with loudness standards and therefore do not comply with the standard we adopted. These programs are at higher levels and compression than those handled by the rest of our programming, which causes us conflicts with the audience’s perception of our content, which is heard perceptibly lower than the other content and also with respect to other television channels.

This puts us in the difficult position of applying gain to the final output levels, which for our programs work without a problem, but create a tremendous clipping on external content, with the consequent complaints from our audience.

It is in this situation that I consult you for the solutions that you have generated, or that you propose, to be able to mitigate this problem in our signal; as well as to comment on the ideas that have arisen in our small technical team to face this dilemma and the solutions that we have tested with diverse results.

I thank you in advance for any help you can provide in this regard.

didikunz · May 31, 2021, 8:08am

The big TV stations, that you refereed to, solve this by giving specifications to productions companies to deliver the content respecting these specs. They also check all incoming material and reject it, if it does not comply.

The smaller stations normally do not have the power nor the personal to ensure such specs. I for many years worked for a small regional broadcaster here in Switzerland and knows the situation very well. We normally imported every external clip into an editor (Premiere), because our station automation only worked with a specific file format and so we needed to convert the stuff. In that process we could correct major level mismatches. But that is also a time consuming process.

Do you have some broadcast limiter in your output? We used TC electronics dbMax. It’s not cheap, but does a good job of leveling out the audio, without hearable artifacts.

hreinnbeck · May 31, 2021, 10:25am

You can build something that uses ffmpeg for EBU R128, both to process files and for live input/output. Or if you have the budget, I can highly recommend the TC Electronic DB6 range.

rrebuffo · May 31, 2021, 12:18pm

I believe FFAStrans can do the transcoding part automatically with loudness correction. Did you give it a look, @noquierouser?

dan · May 31, 2021, 4:01pm

for files have a look here:

for live audio we have used with good results
junger audio processor level magic based.

the idea is to use hardware ebu r128 processors only for live segments
for files always best solution is to fully parse them entirely
and then apply a single correction.

If you will use live hardware units for files
you will change not only alignment level
but dynamics too which might be not what you want.

noquierouser · June 1, 2021, 12:56am

Thank you all for the responses, there are several solutions that I was unaware of and they sound like very good proposals.

@didikunz indeed, I imagined that the big stations set the rules and those who want to be broadcast must comply or be rejected; but as you rightly say, small stations don’t have that royalty, so the reverse role is adopted. I was previously using Adobe Media Encoder to achieve that goal, but obviously the problem is that it is time consuming and, unfortunately, Adobe does not yet properly utilize the encoding power of GPUs, so I had to opt for solutions like X-Media Recode to achieve that. Unfortunately this tool has other serious problems, such as its instability with certain conversions, it does not support some input formats, and it does not have loudness normalization.

We have thought of implementing a compressor/normalizer in the audio output chain, but there are several considerations to take into account to achieve that goal, such as an audio de-embedder/embedder to process the final output.

@hreinnbeck and @didikunz, I have considered a real-time processor with ffmpeg, although the idea of using a TC Electronic DB series rig sounds more appealing. We will take it into consideration for our budget.

@rrebuffo, your solution involving FFAStrans looks quite interesting and I didn’t know about it, so I’d like to know a bit more, mostly how do you use it, workflows and use cases it has helped you.

@dan, do you have any links to the Jünger Audio equipment you are using? I know that a national TV station in my country is using the T*AP Television Audio Processor, but unfortunately they use it only for their international signal and not the national signal, so it is a bit of a wasted effort for the local audience.

I am aware that using compression can alter the dynamics of the sound, so I want to be very careful with this issue. In that aspect FreeLCS looks interesting too, although I have doubts if it processes AVC or HEVC video and if it remuxes or re-encodes the final video.

I’m going to briefly mention which solutions I’ve tried so far:

Using a compressor for live audio inputs: This is mostly for live shows and we use it only for hosts and guests, as our hardware compressor is 4-in/4-out only. So far so good, but needs more testing and adjustment to comply.
Using a normalizer plugin: Right now I’m using OBS to compose my live shows (I promise I’ll share my experience with this someday, once I iron all the wrinkles in my setup), and I’m applying the HoRNet ELM128 MK2 VST filter to the sources that need normalization, and so far works ok in live shows. There are some limitations, mostly regarding OBS, that distances this approach from being a final solution yet, but it’s a pretty close call.
Manually apply gain compensation for each show: There must be a reason why god invented automation, am I right? This is effective, but way too tedious to accomplish.
Batch conversion and normalization: As I mentioned earlier, I resorted to use X-Media Recode for conversion, but it has a lot of issues and doesn’t support loudness normalization (only volume normalization). I keep using it because other solutions (Handbrake, VidCoder and others) don’t support upscaling and/or letterboxing.

dan · June 1, 2021, 5:50am

@noquierouser our Junger processor is EOL.

You can go with this one:
https://www.junger-audio.com/en/products/slim-line/dual-stereo-level-magic-audio-processor-easy-loudness-sdi

freelcs flow: de-multiplex, audio measure and processing and re-multiplex.
No video re-compression. We are using with OP1a.

I’ll stop here since it is a CasparCG forum

didikunz · June 1, 2021, 8:21am

It is, but I think we sometimes also can go a bit off topic, as we all share the passion of doing TV broadcasting. Just don’t start political discussions, as I did accidental a few days ago in my IBC post

mou22y · June 2, 2021, 5:15pm

I do agree with you @didikunz . Let us discuss TV/STREAM production. It is a Caspar forum but what other forum gathers the worlds best in broadcasting and streaming. I salute this kind of questions. Bring it on

rrebuffo · June 2, 2021, 5:31pm

I use it to automate tedious ingest workflows. Mainly to simplify the file structure mess that Panasonic’s P2 cards and some others have.
It has a node based workflow builder that is very easy to understand and provides very good results.

mint · June 3, 2021, 5:26am

Just adding to the discussion: I built myself a tool that can bulk transcode using ffmpeg from a drag and drop UI. For weird files (like raw camera or cellphones) I can do a reencode for CasparCG and for known good files I have a different preset that just rerenders the audio while applying the loudnorm filter. There’s also some conformance checking for things like still frames, freeze frames, black frames etc.

For a facility I’d probably still recommend FFASTrans with drop folders but if it’s just me and a laptop this is a nice and simple solution.

noquierouser · June 3, 2021, 7:35am

I remember to have seen you comment about that on Twitter some time ago and expressed my interest intestine that tool, or if available, to use it. FFAStrans seems like a nice approach to other kind of adaptable workflows and I’m keeping an eye on it as well.

mint · June 3, 2021, 4:08pm

I never ended up publishing a binary but now I have Release v0.1.0-0 · baltedewit/media-conformer · GitHub

Please note that there might still be rough patches!