Polimake

Audio mixing: what it is and how to do it well

Audio mixing explained seriously: dB, LUFS, EBU R128, mixing for each platform (Spotify -14, broadcast -23), and how to avoid the mistake that ruins the most videos.

· Platform

The team behind Polimake. We explore the intersection of technology, creativity, and automation.

Published:
Audio mixing: what it is and how to do it well

There's a brutal asymmetry in how technical problems are received in a video: a slightly out-of-focus image is forgiven, a poorly placed shadow too, even a clumsy transition goes unnoticed. Badly mixed audio is not. Within seconds, the audience leaves or, at the very least, turns the volume down and stops paying attention. Studies on viewer behavior show that sound quality affects the perceived credibility of a brand even before image quality does.

And yet, audio mixing remains the most underrated phase in many productions. "Audio comes last," "we'll fix it in post," "you can understand it, that's fine." Those shortcuts produce the pieces that sound like home videos no matter how much visual production sits behind them.

This article covers what an audio mix really is, the loudness standards every professional production should meet, the technical decisions that most affect a piece, and the mistakes still seen in 2026 even in productions with generous budgets.

What a mix is, exactly

An audio mix is the process of combining several independent tracks—voice, music, effects, ambiences—into a single final file that sounds good in its playback context. Each track arrives with its own volume, color, and dynamics, and the mix balances them so that none drowns out the others and so the result has the intended character.

In a typical corporate video, a mix can include:

  • Voice of the interviewee or presenter (the main track when there's dialogue).
  • Narrative voice-over.
  • Background music (music bed).
  • Spot effects (transitions, sonic emphasis).
  • Ambient sound (ambient sound, room tone) that adds texture.

Each element has its volume, its EQ (tonal equalization), its compression, its spatial effect. Getting them to coexist without fighting one another is the mix.

A brief history: from the studio to the phone

Audio mixing as a discipline is a century old. In the 1920s through the 1940s, Bell Labs and commercial radio developed the first systematic techniques. Multitrack recording—recording separate tracks instead of a single take—was introduced in the mid-1950s: Les Paul popularized multitrack recording with his home studio around 1953-54, and Atlantic Records adopted the 8-track format in 1957.

Phil Spector developed the "Wall of Sound" in the early 1960s, a technique of dense layers that changed pop production. George Martin, the Beatles' producer, pushed the possibilities of the studio between 1963 and 1969 with recordings that are still studied in production schools today. SSL consoles set the standard for professional rooms from the late 1970s.

The digital revolution arrived in 1989 with Pro Tools (Digidesign, now Avid). For the first time, mixing could be done on a computer with professional results. Logic Pro (originally Notator/Emagic, bought by Apple in 2002), Ableton Live (2001), Cubase, and Reaper democratized access.

The big recent change—and the one that most affects brand production—has been the shift in playback context. Historically, audio was mixed for cinemas, concert halls, living rooms with decent speakers. Today, most audio is consumed on phone speakers, on low-quality earbuds, on Bluetooth speakers, on smart speakers. A mix that ignores that context fails no matter how spectacular it sounds in the studio.

The loudness standards: dB, LUFS, EBU R128

This is where the most important technical detail of modern professional mixing comes in. Skipping it is what separates serious production from improvised work.

Historically, audio was measured in dBFS (decibels relative to Full Scale): instantaneous peak level. A dBFS of 0 is the absolute digital maximum; going beyond that clips. A dBFS of -6 is half the maximum; -12 is a quarter.

The problem: dBFS only measures peaks, not how "loud" the audio is perceived to be. A compressed voice can sound much louder than an uncompressed orchestra, even though both have the same peaks. That's why the "loudness war"—where music and commercials were pushed ever more compressed to sound "louder"—got out of control between the 1990s and the 2000s.

The solution was LUFS (Loudness Units relative to Full Scale). The ITU-R BS.1770 standard was published in 2006 and refined in later versions. LUFS measures perceived loudness—how it sounds to the human ear—weighting frequencies and averaging over time. It's the honest metric.

The target loudness standards every professional mix should respect:

Broadcast (TV):

  • EBU R128 (Europe, 2010): -23 LUFS integrated, max true peak -1 dBTP.
  • ATSC A/85 / CALM Act (U.S., in effect since December 2012): -24 LKFS.
  • AS/NZS 4646 (Australia, NZ): -24 LKFS.
  • Programs and commercials below these levels are rejected by networks or penalized.

Streaming (music and audio):

  • Spotify: -14 LUFS since 2017 (previously -11). Audio that arrives louder is automatically turned down.
  • Apple Music: -16 LUFS.
  • YouTube: -14 LUFS.
  • Amazon Music: -14 LUFS.
  • Tidal: -14 LUFS.

Cinema:

  • Cinema (theater): mixing standard at -27 LUFS or equivalent with Dolby calibration. Much more dynamic than streaming.

Podcast:

  • AES TD1004 (AES recommendation, 2017): -16 LUFS for podcasts (mono or stereo).

The consequence for brand mixing: a professional mix needs different versions depending on the destination. Mixing at -23 LUFS for broadcast and uploading the same file to Spotify produces audio 9 LUFS below the standard; platforms can raise it automatically—but the original tonal curve and dynamics are already compromised. Better to master differently per destination.

How a mix is worked in practice

Beyond final loudness, the typical steps of a mix:

1. Cleanup. Remove noise, electrical hum, clicks, and plosives (those explosive "p" sounds) from each track. Tools: iZotope RX (the professional standard, especially RX 11 in 2026), Adobe Audition Speech Enhancement, Auphonic. Recent AI models—iZotope Voice De-noise, Adobe Enhance Speech (released publicly in 2023)—have transformed this phase: what used to take an hour of manual cleanup is now done in minutes with superior quality.

2. Level balancing (gain staging). Adjust each track's gain so they live in a reasonable range, without clipping or staying too low. Errors here propagate through the rest of the process.

3. Equalization (EQ). Filter out problematic frequencies, boost the ones that add intelligibility or character. Voice typically benefits from a high-pass cut around 80-100 Hz to remove rumble, and a subtle boost at 2-4 kHz for intelligibility.

4. Compression. Reduce the dynamic range so the quietest parts can be heard and the loudest don't clip. Critical for voice in video—a 3:1 compression with the right ratio makes the voice uniform and understandable.

5. Reverb and spatial effects if the piece calls for them. A voice can benefit from a touch of subtle reverb to "fit" into a setting; too much overdoes it.

6. Balance between tracks. Voice typically between -16 and -12 dBFS peak in a mix with music. The music bed between -28 and -24 dBFS when there's voice; it rises to -16 to -12 when the voice isn't present. Spot effects can be louder to punctuate.

7. Automation. Volume and EQ changes throughout the piece. The music drops when the voice comes in and rises when it stops; reverb intensifies in emotional moments; a sound is attenuated so it doesn't compete with a cut.

8. Mastering / final loudness. A limiter on the master bus to ensure the peak doesn't exceed -1 dBTP, and bring the LUFS to the destination's target.

9. Testing on different systems. Studio headphones, consumer earbuds, monitor speakers, phone speaker, car if possible. Each system reveals different problems.

The most common mistake in brand video

If there were room to point out only one: music too loud over the voice. The music bed that sounds "discreet" and emotional in the studio completely covers what's being said on the audience's phone. And because the viewer can't understand it, they leave.

The rule of thumb that rarely fails: when there's dialogue, the music should be at least 10-12 dB below the voice. If in doubt, drop the music another 3 dB. The music isn't there to be heard in detail; it's there to create emotional context beneath the speech. The viewer won't appreciate it any less for being lower; on the contrary, they won't understand the voice if it's louder.

Other recurring mistakes

Voice with background noise. A room with echo, air conditioning, a humming computer. A voice with noise is a voice that tires the listener. Solutions: record with the microphone closer, in a deader room, or use AI cleanup.

Untreated plosives and sibilance. The explosive "p" and "b" sounds, the hissing "s" sounds—each requires a specific tool (de-popper, de-esser).

Abrupt cuts between clips. Every audio cut between takes should have a smooth crossfade. Hard cuts produce audible "clicks" that pull the viewer out of the content.

Volume differences between scenes. If the first interviewee is mixed at -16 LUFS and the next at -12, the viewer perceives the jump and reads it as a technical problem. Mixing everything to a common loudness is basic.

Saturation / clipping. Peaks above -1 dBTP produce audible distortion. A limiter on the master bus should prevent it, but if the individual tracks clip earlier, mastering won't save it.

Overdone effects. Every transition sound effect—the "swoosh," the "boom," the "ding" of the final logo—competes with the voice if it's poorly placed and poorly mixed. The test: does this effect add something or just decorate?

Mixing only on studio headphones. Sennheiser HD 650s or equivalents are a professional tool, but the audience listens on cheap phone speakers. Testing on consumer systems is indispensable.

Not checking the final LUFS. Uploading a file without measuring its loudness is hoping the platform will normalize it automatically—which it does, but at the cost of the original curve.

Mixing before approving the edit. Mixing is slow. If the duration or the order of the clips changes, you have to redo the mix. Better to mix after approving the fine cut of the video.

Not keeping the stems. The stems are the individual audio tracks (voice, music, effects) exported separately. Keeping them allows you to remix later—swap the voice for another translation, adjust for another channel, replace the music with a differently licensed track. Without stems, you have to start over.

How to fit the mix into the workflow

A well-done mix isn't improvised. It needs time, tools, and process.

Time: a professional mix of a 2-3 minute corporate video typically requires 4-8 hours of work from a competent engineer. For more complex pieces (multi-dialogue, original music, dubbing), considerably more.

Tools: Pro Tools, Logic Pro, DaVinci Resolve Fairlight (integrated into DaVinci, free), Reaper (low-cost). Plugins from iZotope (Ozone, RX, Neutron), FabFilter Pro-Q3, Waves, Soundtoys.

Process: mix in a room with proper monitoring (not in an office with background noise), with quality headphones and monitor speakers, validating on consumer systems before approving.

Creative operations are what ensure the mix isn't the rushed phase of the last day. At Polimake, Studio defines the audio quality criteria by piece type; Media carries out production with adequate time and tools, exporting stems and a master per channel; Studio coordinates the timelines so the mix has its space before delivery.

This relates to shooting commercials, where audio determines perceived quality, to postproduction as the phase where the mix lives, and to the decision about delivery format, which defines what loudness each destination requires.

To close

Audio mixing is the detail the audience never names yet that decides whether the piece feels complete or gets abandoned. A piece with a brilliant image and mediocre audio reads as amateur; one with a modest image and impeccable audio reads as professional. The asymmetry favors investing more than intuition suggests in sound.

The practice that ages best: treat the mix as a phase with its own time, its standards (LUFS, EBU R128, target by platform), its tools, and its judgment. Test on consumer systems before approving. Keep the stems. Mix versions per destination, not a single generic master. When that discipline exists, the piece sounds good wherever it's played, and the viewer doesn't think about the audio—which is exactly the sign of a well-done mix.

Quick reference

  • dBFS measures peaks; LUFS measures perceived loudness (what matters).
  • Broadcast Europe: -23 LUFS (EBU R128), max true peak -1 dBTP.
  • Music streaming: -14 to -16 LUFS depending on platform.
  • Podcast: -16 LUFS (AES TD1004).
  • Cinema: more dynamic mix, calibrated differently.
  • Music under voice: 10-12 dB below, minimum.
  • Stems always: voice, music, effects as separate exportable tracks.
  • Master per destination, not generic.
  • Test on consumer systems (phone, cheap earbuds, car), not just pro monitoring.
  • AI tools (iZotope, Adobe Enhance Speech, Auphonic) have transformed the cleanup phase.
  • Mix after approving the edit, not the other way around.