The Short-Form Video Script Formula: How to Hold Attention Through the Last Second
Most short-form videos lose 60% of their audience by the halfway point. The ones that don't use a specific structural formula. Here is how to write short-form scripts that retain viewers from hook to CTA.
The Retention Cliff Most Creators Never Solve
Getting viewers to stop scrolling is a solved problem. There are documented hook formulas, platform-specific opening strategies, and endless A/B data on what makes someone pause. The harder problem — the one most creators never solve — is what happens after the hook.
[YouTube's creator analytics documentation](https://support.google.com/youtube/answer/9314393) shows that the most common retention curve for short-form video is a sharp cliff immediately after the first 3–5 seconds, followed by a gradual decline for the remainder of the video. By the halfway point, most short-form videos have lost 60–70% of the viewers who made it through the hook.
The creators who consistently hit 70%+ average view duration on short-form content are not using better hooks — they have a different structural approach to what comes after the hook. The difference is scriptwriting architecture.
The Four-Part Short-Form Script Structure
The script structure that produces high average view duration is consistent across categories — educational content, product demonstrations, storytelling, and entertainment formats. It has four components, each with a specific job.
Part 1: The Hook (0–3 seconds)
The hook's job is to create a reason to keep watching. Its function is entirely about the next 3–5 seconds, not the whole video. A hook that promises something — "here is the framework that tripled our content output" — creates an implicit obligation on the viewer's part. They have been promised something. They want to collect.
The hook is covered extensively elsewhere. The important structural note is that the hook must promise something your video can deliver. Over-promising in the hook and under-delivering in the content is the primary cause of high hook retention but low video completion — which is one of the most damaging patterns for algorithm performance because it signals that your content disappointed viewers who were initially interested.
Part 2: The Setup (3–15 seconds)
The setup's job is to confirm that the promise is relevant to the viewer. It does two things: establishes the specific problem or context the video addresses, and signals who this video is for.
"If you're posting to TikTok three times a week and still under 5,000 followers, the reason is almost certainly one of these three things" is a setup. It confirms the problem (stalled growth), signals the audience (active TikTok creators), and creates a more specific forward pull than the hook alone.
The setup is where most creators waste time. Generic setups — "today I want to talk about short-form video strategy" — burn the 3–15 second window where the viewer is most likely to abandon. The setup should read like the second sentence of a conversation that is already interesting.
Part 3: The Payload (15 seconds to 5 seconds before end)
The payload is the content. This is where you deliver what you promised. The structural principles that maximize retention through the payload:
Front-load the value: Give the most important information first. The common instinct to build toward a payoff produces the opposite result in short-form — viewers who sense they are being walked to a point they cannot yet see will abandon before reaching it.
One idea per video: Each video should contain exactly one coherent idea with no more than three supporting points. Videos that try to cover four or five points within 60 seconds lose viewers at each transition because the implied promise of "here is the idea" cannot accommodate multiple shifts in direction.
Explicit transitions: Announce each movement: "the second reason," "and here is why that matters," "but here is the part most people miss." These verbal landmarks reset the viewer's attention clock — they know where they are in the video, which reduces the cognitive load of following the content and increases the probability of completion.
Restatement at the midpoint: At roughly the halfway mark, restate the core point or promise. "Going back to the main thing here" or "this is what that means in practice" re-anchors the viewer who has started passively watching and re-activates their engagement.
Part 4: The Close (Final 5 seconds)
The close does three things: confirms delivery of the promise, gives the viewer something to do or think about, and signals that the video is ending.
Abrupt endings — where the content simply stops — are one of the highest-risk moments for poor post-view behavior. A viewer who feels the video ended without a conclusion is unlikely to visit the channel, subscribe, or share.
Strong closes for short-form take one of three forms:
The callback: Reference the opening promise explicitly. "That is the three-part framework I mentioned at the start." This creates a satisfying resolution loop.
The implication: Tell the viewer what the information means for them. "If you apply this in your next three videos, your completion rates will move." The viewer leaves with a concrete next step.
The open question: Pose a question that the viewer will continue thinking about after the video ends. Post-view contemplation is a positive satisfaction signal — it produces the "this was worth my time" reflection that drives subscribes and shares.
Applying the Formula to Different Content Types
The four-part structure adapts to different short-form formats without losing its core retention mechanics.
Educational / how-to: Hook (promise a skill or insight), Setup (frame the specific problem or gap), Payload (deliver the framework or steps with explicit transitions), Close (call back to the promise, add one implication).
Product demonstration: Hook (show the result before the process), Setup (identify the problem the product solves), Payload (demonstrate the solution with explicit "notice how" call-outs), Close (implication for the viewer's specific situation).
Storytelling / case study: Hook (start at the most interesting moment — not the beginning), Setup (establish what was at stake), Payload (tell the story with "here is why this matters" inserted at the midpoint), Close (the lesson in one sentence, directly stated).
Opinion / contrarian take: Hook (state the position immediately — don't build to it), Setup (acknowledge the conventional view you are challenging), Payload (present your argument with specific evidence points), Close (restate your position, add an open question).
The Retention Test Before Publishing
Before publishing any short-form script, apply this test: can you identify the exact moment in each section where you are re-earning the viewer's continued attention?
In the hook, the re-earning mechanism is the promise. In the setup, it is the audience identification signal. In the payload, it is each explicit transition that tells the viewer where they are and what is coming. In the close, it is the resolution.
If any section lacks a clear re-earning mechanism, the viewer has no reason to stay through it. Add one or cut the section.
The short-form creators who consistently outperform on retention have internalized this framework to the point where they write and edit to it automatically. Every unnecessary sentence gets cut — not because they are optimizing for time, but because every unnecessary sentence is a moment where the viewer has no reason to stay, and some percentage of them will not.