Overlay and layers in video and design

Name: Polimake
Author: Polimake

Overlay in video and design: from Disney's multiplane camera to digital compositing. Layers, blending modes, hierarchy, and mistakes that ruin the image.

Polimake · Platform

The team behind Polimake. We explore the intersection of technology, creativity, and automation.

Published: June 5, 2025

When an editor talks about "putting text over the video," when a designer adjusts the layer order in Photoshop, when a developer sets the z-index of a floating menu -- they're all doing the same thing: overlay. Combining visual elements so they coexist in the same image without losing legibility.

It sounds trivial. It isn't. The difference between a professional piece and an amateur one is often decided on this terrain: how elements are stacked, in what order, with what transparency, with what hierarchy. A poor overlay turns a good video into a confusing ad, a polished brand into a noisy one, a clean screen into one saturated with information.

This article covers what overlay is, where it comes from technically, how it's worked with today, and where teams go wrong when they start stacking layers.

From the multiplane camera to digital compositing

The idea of overlaying images to create a richer composition predates computers by a long way. In animated film, Walt Disney Studios developed the multiplane camera -- patented in 1936 and first used in the short The Old Mill (1937) -- which filmed several layers of drawing on physical planes at different distances, producing a sense of depth unattainable until then. Snow White and the Seven Dwarfs (1937) and Bambi (1942) made it the standard.

In live-action cinematography, compositing was born earlier but grew more sophisticated in the 1960s. Mary Poppins (1964) used the "sodium vapor process" to combine actors with animation; Star Wars (1977) refined optical compositing with VistaVision; The Abyss (1989) introduced the first significant digital compositing in a commercial film; Forrest Gump (1994) made digital compositing a central narrative tool, inserting Tom Hanks into historical footage.

In graphic design, the concept of "layers" reached the mainstream with Photoshop 3.0, released in 1994. Before that, designers worked on a single layer and changes were destructive. Layers made it possible to compose non-destructively: each element lives in isolation and can be moved, hidden, or deleted without affecting the rest. It's hard to imagine the impact it had today, but it completely changed the way people work.

In video, After Effects (CoSA, 1993; acquired by Adobe in 1994) brought the layer model to motion graphics, and Nuke (Foundry, 1993, dominant in VFX since the mid-2000s) made professional compositing the node-based standard used in film today.

On the web, layers are z-index: a CSS property standardized in CSS 2.0 (W3C, 1998) that decides which element appears on top of which when they overlap. Frontend developers spend more time than they'd admit wrestling with stacking contexts.

Four contexts -- animation, film, graphic design, web -- and the same underlying idea: visually ordering what occupies the same space.

How it works technically

When two visual elements overlap, the computer has to decide pixel by pixel what is visible. That depends on three parameters:

Layer order. Layers stack from bottom to top. The top layer is the one seen above the lower ones. In After Effects, Premiere, or Photoshop this is a vertical panel where the layer at the top of the list renders on top in the image. In CSS, z-index (and stacking contexts) serves this function.

Transparency (opacity / alpha). Each layer can be partially transparent. 100% opacity completely blocks what's below; at 50%, the layer below shows through at half strength; at 0%, the layer is invisible. Transparency can be uniform (the whole layer at the same level) or variable per pixel through an alpha channel (a PNG-32 already carries per-pixel alpha; a JPG doesn't support alpha at all).

Blending mode. Instead of simply "covering" what's below, the layer can be combined mathematically with the lower layers. Multiply, screen, overlay, soft light, darken, lighten... Each mode applies a different operation pixel by pixel. Photoshop popularized this nomenclature (originating in printing and brought into software in the late 1980s) and almost all modern software uses the same names and math.

The most-used blending modes:

Multiply: darkens. Useful for ink over a light background.
Screen: lightens. Useful for lights, flares, over a dark background.
Overlay: combines multiply and screen based on luminosity. Useful for adding contrast or texture.
Soft Light: like overlay but more subtle. Useful for delicate finishes.
Darken and Lighten: compare and choose the darkest or lightest pixel between layers.

Knowing these five covers most motion graphics and design needs. The rest are refinements.

Typical uses in video

When people talk about overlay in video, they usually mean these cases:

Lower thirds. The block of text in the lower third with the interviewee's name and title. Originating in 1980s television news, it has become almost universal in corporate video. A text layer over a background layer (rectangle, gradient, brand shape), all with entrance and exit animation.

Burned-in subtitles. Fixed text inside the image, not removable by the player. The correct option for Reels, TikTok, and Stories where 80%+ of consumption is sound-off.

Logo bug. A permanent logo in a corner -- typically top right -- throughout the video. It identifies the channel or brand. In broadcast it's standard; in digital formats it's used less frequently.

Brand overlay. Visual elements -- lines, shapes, gradients -- that frame the content to align it with the brand identity. They can be persistent (the whole video) or transitional.

Split screen. Two or more videos visible at once. Each on its own layer with a mask that defines the area it occupies.

Call-outs. Arrows, circles, labels that point to an element in the video. Frequent in tutorials and explainer videos.

Picture-in-picture. A smaller video overlaid on a larger one. Common in videos where someone comments over a shared screen or gameplay.

Kinetic typography. Animated text that emphasizes words or rhythm. A technique with its own history, popularized in Saul Bass's title sequences from the 1950s (his work for Vertigo in 1958 is still a reference point) and democratized by After Effects.

Transparency effects and double exposure. Images that blend to create semitransparent layers -- a common visual device in aspirational branding, music videos, and editorial content.

Overlaid B-roll. Complementary footage that enriches the main narration. Typical in documentaries: the interviewee speaks while the image alternates between their face and shots related to what they're saying.

Visual hierarchy: what gets seen first

A technically correct overlay can be visually disastrous. The question that matters isn't just "what's on the screen" but "what do I want the viewer to look at first."

Visual hierarchy is built with three tools:

Size. Big dominates. The largest element on screen is usually read first.

Contrast. What stands out against its background is read first. White text on a dark background has more hierarchy than white text on a light background, even if the second is larger.

Position. In Western reading, the eye goes from top-left to bottom-right. Elements in the first quadrant get more attention by default.

An overlay that ignores these three dimensions produces visual noise: many elements competing, none clearly dominant, the viewer not knowing where to look and, in the end, remembering nothing.

The most useful rule of thumb: a single dominant element at any given moment. If there's a lower third, it doesn't compete with kinetic text at the same time. If there's an important brand overlay, a call-out isn't added on top of it. If the interviewee's face is what matters, the graphics breathe beside it, not over it.

Mistakes every beginner repeats

Too many active layers at once. Three elements competing for attention produce frustration. If everything is highlighted, nothing is.

Text illegible from insufficient contrast. White text on a light background, thin text over a busy image. Solution: a semitransparent background box, a soft shadow, a thin outline, or reserving an area of flat background for the text.

Typography used poorly. Condensed typefaces at small sizes, loose kerning, two or three typefaces mixed without logic. Text overlay requires careful typography or it shows.

Excessive animation. Each layer entering and exiting with a different animation. The whole thing feels chaotic. Solution: a reduced, consistent vocabulary -- one or two entrance animations, one or two exit animations, across the whole piece.

Huge logo bug. A logo at 15% of the screen throughout the video distracts more than it identifies. The traditional broadcast rule is around 5-7% of the shorter side.

Safe margin ignored. Text too close to the edge can be cut off by social media interfaces (TikTok's icons occupy the right and bottom edges; Instagram Stories has zones reserved for the user and replies). Solution: an interior safe zone of 10-15% on each side.

Inconsistency between pieces of the same campaign. Each video in the campaign has its own overlay system, its own lower third, its own typography. Result: the brand feels improvised. Solution: a defined motion system -- typefaces, colors, animations, positions, margins -- applied across all pieces.

Blending modes used as decoration. Applying overlay just "because it looks interesting" usually worsens the image. Each blending mode solves a specific problem; using them without purpose introduces noise.

Ignoring the target format. An overlay designed for 16:9 horizontal can fall outside the frame in a 9:16 vertical version, or be covered by the social media interface. Designing with all versions in mind from the start avoids the redesign in the last week.

How the digital context changes things

Social media -- Instagram, TikTok, YouTube Shorts -- has changed overlay conventions.

Large, early text. Because consumption is sound-off and the viewer decides in 1-2 seconds whether to keep watching, the main text appears in the first few seconds and takes up a noticeable portion of the screen.

Stylized burned-in subtitles. No longer the white Netflix band. They're animated text, with color highlights, rhythm synced to the voice, part of the piece's visual language.

Stickers and native elements. TikTok, Instagram, and BeReal have their own visual elements -- polls, countdowns, mentions -- that coexist with editorial overlays. You have to leave space for the platform to add its own without breaking the design.

Verticality. Frames and compositions designed for vertical, not ported from horizontal. Traditional lower thirds don't work in vertical: there's no "lower third" in the same way. Information goes in the center or in specific safe zones.

How to fit overlay into the workflow

The difference between a team that produces coherent pieces and one that improvises each time is whether overlay is systematized or not.

Motion templates with predefined lower thirds, overlays, typefaces, and animations, ready to apply in any project.
A documented brand system with rules for logo use, margins, colors, typefaces -- not an ignored manual PDF, but living files in the editing software.
Templates adapted to each format (16:9, 9:16, 1:1, 4:5) so the vertical version isn't an improvised crop.
Systematic visual review: a check before export that validates hierarchy, contrast, safe margins, and consistency with the campaign.

Creative operations is what ensures this doesn't depend on the memory of whoever is editing that day. At Polimake, Studio defines the motion system and oversees brand consistency; Media executes the overlay on each piece following templates; Studio coordinates the requests so the templates are applied across all formats without gaps.

This relates to motion graphics as a broader territory, to the postproduction phase where overlay is worked on, and to the brand identity that defines when and how brand elements appear.

To close

Overlaying visual elements is the easiest thing in the world in any modern software: you drag a layer, it lands on top of the other, done. Doing it well -- so it communicates instead of saturating, respects hierarchy, keeps coherence across pieces, and survives crops and different formats -- is what separates a polished brand from a noisy one.

The technique behind it -- layer orders, opacity, blending modes, safe margins -- is learned in a week. The discipline -- not stacking on instinct, keeping a system, reviewing before publishing -- is what takes years. And it's what, in the end, decides whether viewers remember your message or only remember that the screen was full of stuff.

Quick references

A single dominant hierarchy at any moment. If everything stands out, nothing does.
Contrast first, animation second. Illegible text isn't fixed with effects.
A 10-15% safe margin on each side to survive interfaces.
A small logo bug, around 5-7% of the shorter side.
Five blending modes cover 90%: multiply, screen, overlay, soft light, darken/lighten.
Motion templates, not manual decisions every time.
Design for all formats from the start (16:9, 9:16, 1:1).
Burned-in subtitles for sound-off channels; a separate SRT for the rest.
An applied brand system, not a manual ignored in a PDF.

From the multiplane camera to digital compositing

How it works technically

Typical uses in video

Visual hierarchy: what gets seen first

Mistakes every beginner repeats

How the digital context changes things

How to fit overlay into the workflow

To close

Quick references

Related content

Raw footage or virgin file: what it is and how to manage it

Phases of a simple animation

Why producing video is usually so expensive (and how to keep it under control)