Your AI isn't lying to you. Your list is.
When a cold email campaign flops, the model gets blamed first. Wrong tone, weak copy, "the AI just doesn't get our product." Almost always the real culprit is upstream: better data, not better models, is what separates a reply from a bounce. The list decides who hears you. The model only decides how you phrase it, and a clever sentence sent to the wrong person at a dead address is still nothing.
ReplyLabs is an AI research assistant in Google Sheets. We watch enrichment and scoring runs all day, including our own, and the pattern is boring and consistent: the chain breaks at the data layer long before it reaches the prompt. So before you swap GPT-4 for something newer, look at what you're feeding it.
Key takeaways
- In outbound, the list decides who exists; the model only decides phrasing. A better model can't fix a wrong address.
- Three things to audit before blaming the AI: list freshness, email verification, and the research behind each personalisation signal.
- At ReplyLabs, verifying an email costs $0.01 per address, and we only charge for rows that succeed.
- Judgment - what's actually true and worth saying about a person - is the one input no model supplies for you.
Why is better data, not better models, the real lever in outbound?
Because the model never chooses who you contact. Your list does. A frontier model writing to a person who left the company eight months ago produces a fluent message that lands in an inbox nobody reads. The ceiling on any campaign is set by the worst data in the row, and no amount of model quality lifts that ceiling.
Think of the chain in two halves. Data tells you who exists: the name, the role, the company, the address that resolves. Judgment tells you what to say: the one true, specific thing about this person that earns a reply. Models help with the second half only after the first half is clean. Feed a brilliant writer a list of ghosts and you get brilliant letters to ghosts.
What does the model actually do, and what can't it do?
The model drafts language. It can rewrite a value proposition five ways, match a tone, and summarise a company's website into two sentences. That's real work and worth automating. What it can't do is know whether the person still works there, whether the address bounces, or whether the "trigger" you found is a year stale.
A good reply comes from someone who actually fits, responding to something true and specific about them, sent by a person who meant it. The model can phrase the "true and specific" part. It cannot verify it. That verification is data work, and it happens before the prompt ever runs. Skip it and the model confidently builds on sand.
How do I check if my list is fresh enough?
Look at the date each row was sourced and the date the role was last confirmed. People change jobs constantly, and B2B contact data decays fast - a list that was accurate in January is partly fiction by summer. If you can't say when a row was last checked, treat it as suspect, not as truth.
Freshness has a simple test in a spreadsheet. Add a column for "source date" and a column for "last verified." If either is blank or older than a quarter, that row needs a re-check before it goes anywhere near a send. Old lists don't announce themselves; they just quietly bounce and tank your sender reputation while you blame the copy.
The cheapest fix here is to stop treating a list as a permanent asset. It's perishable stock. Re-research the rows that matter right before a campaign, not once a year.
What does email verification actually catch?
Verification confirms an address can receive mail before you send to it. It catches typos, dead mailboxes, catch-all domains, and addresses that were guessed by a pattern tool and never existed. Each unverified send to a dead address is a bounce, and bounces are the fastest way to wreck deliverability for the addresses that are real.
This is the part people skip because it feels like plumbing. It's the highest-leverage plumbing you own. We're on the spam filter's side here: filters exist because most cold email deserves filtering, and sending to unverified junk is exactly the behaviour filters punish. Verify first, send less, send to real people.
In ReplyLabs, verification runs per row in the sheet at $0.01 per email verified, and we don't charge for rows that fail. Paying for a failed row is wrong, so you don't. That single column - "does this address resolve?" - prevents more campaign damage than any model upgrade you could buy.
What is "the research behind each signal," and why does it matter?
A signal is the reason you're reaching out: they're hiring SDRs, they just raised, they shipped a feature you integrate with. Research is the proof that the signal is real and current. A personalisation line built on an unverified signal reads worse than no personalisation, because it's specifically, checkably wrong.
This is where AI helps and also where it quietly lies if you let it. Ask a model to "find a reason to reach out" and it will invent a plausible one. Ask it to summarise a page you actually scraped, and it stays honest because it's grounded in a real document. The difference is whether a fact sits behind the sentence.
In a sheet, that means one column holds the scraped source - the careers page, the funding note, the product page - and the next column holds the model's summary of it. A human can read both side by side and delete the row if the signal is thin. That review surface is the architecture, not an afterthought.
How do I run this without buying another platform?
Add the checks as columns to the list you already have. Most platforms make you "import your CSV," which is their quiet admission that the spreadsheet was the system of record the whole time. So keep the grid and add compute to it: verify the address, refresh the role, scrape the source behind each signal, score the fit.
We do this on ourselves before any prospect sees it. We are our own heaviest user; multi-hundred-row enrichment and scoring runs are routine for our own outbound. The order is always the same: clean the data, confirm the signal, then let the model phrase the message. Reverse that order and you're polishing copy on a list that was never going to reply.
Every output lands in a cell you can read, sort, fix, or delete before anything depends on it. ReplyLabs researches; it never sends - the product has no send capability at all. The judgment about who's worth contacting and what's true about them stays with you, which is exactly where it belongs.
Conclusion: audit the list before you audit the model
Next time a campaign underperforms, resist the urge to change the model. Open the sheet instead. Check three columns: when was this row sourced, does the address verify, and is there a real document behind the signal you're using. Fix those and the same "boring" model usually starts working.
The model was never the bottleneck. Your judgment plus clean data is the engine; the model is the typist. If you want to test the data layer on your own list, there's $20 of credit on a free ReplyLabs account.
Sources
- ReplyLabs pricing and product facts (email verification at $0.01 per email, only-succeeded-rows billing, $20 free credit): https://replylabs.io