Parchment texture background
GM Tips
April 28, 2026
13 min read

You Don't Need to Do Accents: Make Every NPC Recognizable

Every dwarf ends up Scottish. Every wizard ends up British. Here's the four-variable system for NPC voices that any GM can run tonight — no acting training required.

An open leather notebook on a candlelit medieval desk surrounded by six framed NPC portraits — a bureaucrat, young guard, mercenary, conspirator, merchant, and frightened witness — connected to the notebook by glowing amber speech-line motifs, with a quill, inkwell, and polyhedral dice nearby
A small roster of recognizable NPC voices does the work of an entire campaign

Quick Answer

How do you make NPCs sound distinct without doing accents?

Stop chasing accents and build a small voice repertoire from four variables: pacing (fast/slow), pitch (high/low), forcefulness (loud/soft), and sentence structure (formal/clipped). Combine them with one strong motivation per NPC and you get a recognizable character without any voice acting. Six to eight prepped voices is enough to cover a whole campaign.

  • Players need contrast and consistency, not theatrical perfection
  • Three binary variables (pacing, pitch, forcefulness) give you eight base voices
  • Sentence structure and word choice are a fourth axis that does heavy lifting
  • Motivation shapes voice automatically once you know what an NPC wants badly
  • Keep a notebook of 6-12 voice archetypes and reuse them

Read on for the full breakdown.

It's session night. Your party walks into a tavern you didn't fully prep, the bartender opens his mouth, and out comes the same gravelly Scottish dwarf you've been running for two years. The wizard you introduce twenty minutes later is, somehow, also British. The orc growls. The elf does that breathy thing.

You can hear yourself doing it. You promised yourself last week you'd fix this.

If you've ever lurked in r/DMAcademy or the dndbeyond forum threads where GMs admit they feel they "have to" do voices, you've seen the same confession over and over. We watch a Critical Role clip and walk away thinking the price of admission is professional voice work. Then we sit at the kitchen table on a Thursday night with three friends who just want to know what the bartender thinks about the dead body upstairs, and we stall.

Here's the thing nobody tells new GMs. Accents are one tool. They're not the most important one, and they are absolutely not the price of admission.

What players actually want from an NPC voice

Strip it down. What do players need from an NPC's voice to enjoy the scene?

Two things. Contrast: the bartender doesn't sound like the wizard who walked in behind him. Consistency: the bartender sounds like the bartender across three sessions, so when he shows up again the table goes "oh, him." That's it. Players are not running a casting audition. They're trying to follow a conversation in their imagination.

Toren Atkinson, an actual working voice actor and a long-time GM, confesses on his own blog that even pros start with a small set of default voices and reuse them. He keeps a list of about a dozen. He picks from it. He doesn't try to invent a fresh voice for every random shopkeeper, because nobody can sustain that, and nobody at the table actually needs it. If a working voice actor operates from a fixed roster, what exactly are we doing trying to invent thirty original voices a session?

The goal is recognizability, not range. Label your NPCs so the table can tell them apart, then trust the labels to do the work. Once you accept that, the craft gets a lot less scary.

A medieval fantasy tavern interior at night, with a grizzled mercenary in worn leather armor leaning closed and weary against the bar on the left, while a tall robed wizard in an embroidered blue and gold robe gestures theatrically with a long-fingered hand on the right — the same scene, two completely different silhouettes
A medieval fantasy tavern interior at night, with a grizzled mercenary in worn leather armor leaning closed and weary against the bar on the left, while a tall robed wizard in an embroidered blue and gold robe gestures theatrically with a long-fingered hand on the right — the same scene, two completely different silhouettes

The three-variable system: eight voices from three switches

Here's the framework that does the most work for the least effort. Three binary axes:

  1. Pacing: fast or slow.
  2. Pitch placement: high in the head or low in the chest.
  3. Forcefulness: loud and direct, or soft and reedy.

Three switches, two settings each. That's eight combinations. Eight distinct-sounding NPCs from a system you can memorize in thirty seconds. None of them require an accent. None of them require a voice teacher. You already do the "fast, high, soft" voice when you're stressed about being late, and the "slow, low, loud" voice when you're explaining something to a contractor.

Let me name a few so you can hear them:

  • The bureaucrat: slow, mid-pitch, soft. Every sentence ends like there's still a comma. Reads forms aloud.
  • The eager young guard: fast, high, loud. Sentences trip over each other. Apologizes for the noise mid-sentence.
  • The world-weary mercenary: slow, low, loud. Three-word answers. Long pauses. Calls everyone "kid."
  • The conspirator: fast, low, soft. Words shoved together. Eyes scanning the room between phrases.
  • The pompous merchant: slow, mid-to-high, loud. Drawn-out vowels on the words he wants you to admire.
  • The frightened witness: fast, high, soft. Trails off. Restarts sentences. Makes you lean in.

Six voices, no accents. If you ran those six on rotation for a whole session, no two NPCs would feel the same. The party would build mental models for each one within two lines of dialogue. That's the whole job.

The trick is committing to one combination per NPC and staying there. Drift is what kills these voices, not range. If your bureaucrat speeds up halfway through the conversation because you got excited, the table loses the label. Pick a setting. Lock in. Trust that the contrast does the work.

The fourth variable: how the words actually go together

Pacing, pitch, and forcefulness handle the sound of a voice. Sentence structure handles the brain behind it. The team at High Level Games walks through five non-accent voice dynamics, and the one that surprised me the most is just word choice.

Education, status, and mood all show up in syntax before they show up in sound. A few axes worth playing with:

  • Verbose vs. clipped. The bureaucrat uses ten words to say what the mercenary says in two. "We will require a moment of your time, if it pleases the gentlemen" vs. "Wait."
  • Formal vs. street. The court advisor says "I'm afraid I must decline." The dock fence says "Nah, I'm out."
  • Technical vs. plain. The wizard reaches for "evocation" when she means "the spell that throws fire." The blacksmith says "the bit that breaks first."
  • Hedged vs. direct. The middle manager piles on softeners ("I think maybe we could possibly"). The captain says "We will."

Pick a structure habit and stick to it. Combine that with one of your three-variable voices and you've doubled your effective roster without learning a new sound. The bureaucrat (slow, mid, soft) speaking in verbose hedged sentences is one character. Take the same vocal setting and give him clipped, technical sentences and you have a completely different bureaucrat. Say, a forensic clerk who's bored of you.

Motivation is the master key

This is the move that turns "I am performing a voice" into "I am playing a person."

Decide what the NPC wants. Badly. One thing.

Not their backstory. Not their stat block. Their dominant want in this scene. Money. Respect. Escape. Secrecy. To not get yelled at again. To go home before their shift ends.

Once you know that, the voice writes itself. Michael Ghelfi's improv guide makes this point clearly: if you know an NPC's core fear, pride, or desire, you don't have to think about how they'd talk. You just talk like someone who wants that thing.

Worked example. Same NPC concept: a city watch sergeant being questioned by the party.

  • Sergeant who wants to not get fired. Soft, hedged, lots of "I'd have to check with the captain" and "well, technically." Pacing slows when he gets nervous. Eyes elsewhere. He's not lying. He just doesn't want to be on record about anything.
  • Sergeant who wants the party gone. Clipped, loud, mid-pitch, direct. "Look. I told you. He's not here. Move along." Every sentence is a door closing.
  • Sergeant who wants the party to like him because he's lonely. Fast, slightly high, warm. Tells you more than he should. Volunteers his cousin's name. Laughs at his own jokes.
A middle-aged city watch sergeant in leather and chainmail standing in a stone guard post, with three softly glowing translucent echoes of himself around him — one slumped and anxious, one with crossed arms and dismissive, one leaning forward and friendly — each posture suggesting a different motivation
A middle-aged city watch sergeant in leather and chainmail standing in a stone guard post, with three softly glowing translucent echoes of himself around him — one slumped and anxious, one with crossed arms and dismissive, one leaning forward and friendly — each posture suggesting a different motivation

Same human being. Three completely different scenes, and you didn't have to invent three voices. You picked a want and the voice came along with it.

This also rescues you when you get caught off-guard. Party walks into an unprepped tavern at 10 PM and starts interrogating the bartender. You don't know who this guy is. Don't reach for an accent. Reach for a want. He wants to close in twenty minutes and his back hurts. Now he's clipped, slow, low, mildly annoyed, and you can run him for as long as you need to.

The voice notebook: your roster of six to eight

Atkinson's most practical advice is simple: keep a list. He suggests the back page of your GM notebook. I keep mine in the Campaign Assistant alongside my session notes, but the format matters more than the medium. Each entry should fit on one line.

Close-up of an open leather-bound notebook on a worn medieval desk, both pages filled with handwritten illegible cursive entries arranged as a list of voice archetypes, with tiny ink-drawn margin sketches of cloaked figures and soldiers, a black-feathered quill resting across the page, an open inkwell, and polyhedral dice nearby, lit by warm candlelight
Close-up of an open leather-bound notebook on a worn medieval desk, both pages filled with handwritten illegible cursive entries arranged as a list of voice archetypes, with tiny ink-drawn margin sketches of cloaked figures and soldiers, a black-feathered quill resting across the page, an open inkwell, and polyhedral dice nearby, lit by warm candlelight

Here's a starter roster you can borrow today. Eight voices, each one written in pacing/pitch/forcefulness/structure terms. No celebrity impressions, because that road ends in a Sean Connery dwarf and shame.

NamePacingPitchForcefulnessStructure habitCommon use
The ClerkSlowMidSoftVerbose, hedgedBureaucrats, librarians, scribes
The HotheadFastHighLoudClipped, swearyYoung soldiers, bar regulars, rivals
The GravelSlowLowLoudClipped, bluntMercenaries, blacksmiths, retired adventurers
The WhisperFastLowSoftConfidential, fragmentaryConspirators, fences, spies
The ShowmanVariableMid-highLoudTheatrical, drawn vowelsMerchants, performers, cult recruiters
The MouseFastHighSoftTrailing, restarted sentencesWitnesses, servants, frightened NPCs
The ProfessorMediumMidMediumTechnical, long sentencesWizards, sages, doctors
The DispatcherFastMidDirectImperative, no softenersCaptains, foremen, anyone running a shift

Eight. That's a campaign. You can run a whole long-form game out of these and keep characters distinct just by mixing in motivation. The bartender is a Gravel who wants to close. The temple priest is a Clerk who's lonely. The crime boss is a Whisper who wants the party to think he's reasonable.

When a new NPC matters enough to deserve their own voice, write a ninth entry. Build slowly. Don't try to seed the notebook with thirty voices on day one. You won't remember them and you'll panic-grab Scottish at the table.

In-person vs. online: physicality leaks into your voice

One thing the Explorers Design piece on roleplaying without accents gets right is that voice doesn't live in your throat alone. It lives in your body. And which body parts you have available changes depending on whether you're at a kitchen table or on Roll20.

At an in-person table, posture is your secret weapon. Sit up straight and lean forward for an authority figure: your voice will get louder and clearer without you trying. Slouch and look down for a defeated NPC and the voice goes quiet and slow on its own. Hand gestures change pacing. A pointed finger sharpens cadence. Folded arms close it down. You don't have to think about any of this. Just commit to the body and the voice follows.

On a VTT, posture matters less because nobody can see your shoulders. But two things matter more:

  1. Facial expression on webcam. Eyebrows up = higher pitch. Jaw set = lower, harder voice. The camera is doing more work than you think; your players are reading your face the way they'd read a mini at the table.
  2. Mic distance. Lean into the mic for confidential NPCs and your whisper actually sounds intimate instead of mumbled. Lean back for the loud, public-square ones. This is closer to a director's trick than a voice trick, but it works on Discord whether or not anyone's noticed it.

The big online failure mode is that GMs go mic-flat: same distance, same energy, every NPC. The voices were prepped fine; the audio doesn't deliver them. If you run online, do one pass through your roster on your actual mic before the session and listen back. The eight archetypes above should sound different through your headset, not just in your head.

Where AI fits in this workflow

A quick aside on AI voice tools, because the question keeps coming up. Tools like ElevenLabs (and the analysis Tribality did on whether AI voice can give NPCs unforgettable voices) are getting genuinely good at producing usable audio. But the question worth asking is "good for what?"

Live AI voice at the table is mostly a gimmick. It breaks pacing, it sounds canned, and your players want to hear you perform, flaws included. That part is yours.

Where AI voice tools are actually useful is prep. Before the session, when you're trying to decide what your new villain sounds like, you can describe the character ("tired, mid-fifties, used to be charming, lost too much") and audition five different deliveries in two minutes. Pick the one that lands. Write the direction down ("slow, low, soft, drawn vowels, sounds like he's talking to himself") and now you have a target you can hit at the table without ever playing the AI clip out loud.

This is the workflow gap the Campaign Assistant ends up closing for me. Not the performance, the recall. I write voice directions on the NPC notes in my campaign library, so when the players ambush me with "we want to talk to the duke again" three sessions later, the duke's voice card is right there next to his goals. Whatever tool you use, the point is the same: store the voice direction with the NPC, not in your head.

Try it next session, then again, then again

Here's your homework. Pick one NPC for your next session — just one — and run them off the four-variable system. Pacing, pitch, forcefulness, structure. Write the direction on the NPC card. Pair it with a single sentence about what they want badly. No accent. No funny voice. Just the variables.

You will feel weird the first time. The voice won't feel "characterful enough" because you're used to reaching for accent as the marker of effort. Trust it anyway.

Run the system for three sessions before you judge it. Around session three something clicks: you stop performing voices and start playing people, and the difference shows up in your players. They'll start quoting NPCs back to you. They'll argue about the bureaucrat's motives. The bartender from session one will get a fan club.

Four players gathered around a wooden gaming table covered with a hand-drawn battle map, character minis, and dice, all leaning in with engaged expressions — one laughing, another grinning while pointing, a third gesturing mid-quote as if imitating an NPC, the fourth smiling at the others, lit by a warm hanging lamp overhead
Four players gathered around a wooden gaming table covered with a hand-drawn battle map, character minis, and dice, all leaning in with engaged expressions — one laughing, another grinning while pointing, a third gesturing mid-quote as if imitating an NPC, the fourth smiling at the others, lit by a warm hanging lamp overhead

That's the entire goal. Not impressions. Not theatre. Just NPCs your table remembers, built from a system that fits in your back pocket.

What's your worst stock voice? The one every NPC keeps drifting toward? Drop it in our Discord — we're collecting horror stories. Mine is a Brooklyn butcher that somehow shows up in every merchant I run, and I have no idea how it started. We've all got one. The first step is admitting it.

Related Articles

Get Notified of New Articles

Subscribe to our newsletter for the latest TTRPG tips, AI tools insights, and platform updates.

By subscribing, you agree to receive blog update emails. Unsubscribe anytime.

Ready to get started?

Try these techniques in your next session with ScriptoriumGM.

Start Free