AI in Google Sheets means running a language model prompt against every row of a column and writing the result back into the sheet, all without code. The fastest way to do it is a sidebar add-on: you write one prompt, map it to your input columns, and the add-on runs the model row by row. ReplyLabs does this from inside Sheets, prices each run before it starts, and only charges for rows that succeed. This guide covers how it works, what it costs, and how to keep the output tied to real data instead of generic filler.
What "AI in Google Sheets" actually means
There are three common shapes, and they are not equal.
- Custom functions. A script that exposes something like
=AI(prompt). It works for a few cells but stalls on large ranges because Apps Script has a six-minute execution limit and tight quota on external calls. - Copy-paste into a chatbot. Manual, slow, and impossible to keep in sync when the source data changes.
- A sidebar add-on that batches the work. You write the prompt once, point it at the columns, and the add-on dispatches the rows in parallel server-side. This is the only approach that holds up at thousands of rows.
ReplyLabs is the third shape. The model runs outside the spreadsheet, so the six-minute script limit never applies, and one prompt can fill an entire column whether it has fifty rows or fifty thousand.
Why people put AI in a spreadsheet at all
A spreadsheet is already where most go-to-market data lives. Your prospect list, your product catalogue, your support tickets, your survey responses: they sit in rows and columns long before anyone thinks about a model. Pulling that data into a separate tool, running it through AI, and pushing the results back is friction that kills the workflow. Doing the AI step where the data already lives removes that round trip.
The jobs people reach for most often are:
- Writing at scale. A first-line opener for every prospect, a product description for every SKU, a reply draft for every inbound message.
- Classifying. Tagging each row as a fit or not, sorting feedback into themes, scoring intent from a free-text field.
- Extracting. Pulling a job title, a city, or a funding stage out of a messy "notes" column into clean structured fields.
- Summarising. Condensing a long transcript or scraped page into one sentence you can scan in a list.
Each of these is the same underlying operation: take the values in a row, hand them to a model with an instruction, write the answer back. Once you can do that reliably across thousands of rows, the specific use case is just a different prompt.
How to run a prompt across a column
The flow is the same whether you are writing emails, classifying leads, or extracting fields:
- Open the sidebar with Extensions, ReplyLabs, Open sidebar.
- Select the range you want to process.
- Write your prompt and reference columns by header, for example "Write a one-line opener for
{{Company}}based on{{Recent news}}." - Review the cost preview for your exact row count.
- Click Run. Results stream back into a new column, row by row.
Because each row is its own model call, an error on one row never blocks the rest, and you only pay for the rows that return output. If row 412 has a malformed input and fails, rows 413 onward keep going, and your bill reflects only the rows that produced a result.
Choosing the right model for the job
ReplyLabs ships a managed catalogue spanning OpenAI, Anthropic, Google, and Mistral, so you pick the model to match the task instead of forcing every job through one default.
- Small, fast models are ideal for classification, tagging, and short extractions where you want low cost per row and high throughput. Most "is this a fit, yes or no" jobs belong here.
- Mid-tier models suit personalised writing where tone and nuance matter, such as cold-email openers or product copy.
- Frontier models earn their cost on reasoning-heavy rows: multi-step extraction, judgement calls, or synthesis across several input columns.
A practical pattern is to draft and test a prompt on a cheap model over twenty rows, confirm the structure of the output, then either keep that model for the full run or step up one tier if the sample is not sharp enough. The cost preview updates with the model you choose, so the trade-off is visible before you commit.
Writing prompts that hold up at scale
A prompt that works on one row in a chat window often falls apart across a thousand rows in a sheet, because the data is messier than your test case. A few habits keep the output consistent:
- Reference columns by header, not by hand. Use
{{Company}}or{{First name}}so the right value is substituted for each row. Hard-coding a value means every row gets the same answer. - Tell the model what to do with missing inputs. Real lists have blank cells. A line like "If no recent news is provided, write a generic but warm opener" stops the model from inventing facts.
- Constrain the format. If you want one sentence, say "one sentence, no greeting." If you want a label, say "Reply with exactly one of: Fit, Not a fit." Tight output is easier to filter and use downstream.
- Show one example. A single worked example inside the prompt anchors tone and length far more reliably than adjectives like "concise" or "compelling."
These are not ReplyLabs-specific tricks; they are how language models behave. But they matter more in a spreadsheet because you cannot eyeball five thousand outputs, so the prompt has to be right before you run.
What it costs
ReplyLabs prices AI at the provider's raw cost times 1.25, plus a $0.0025 base fee per succeeded row. A 1,000-row run on a small model lands in low single-digit dollars, and the exact figure shows in the cost preview before you commit. On Pro and Scale you can bring your own AI key, in which case the AI step runs at your provider's raw rate with no markup and no base fee. For a deeper breakdown see how to estimate AI API cost, or try the AI cost calculator.
Two things keep the bill predictable. First, you see the price for your exact row count before anything runs, so there is no surprise after the fact. Second, only succeeded rows are charged, which means failed or skipped rows cost nothing. If you filter a list down before running, you pay for the rows you actually want.
Keeping output tied to real data
Generic AI fills in a name and calls it personalised. The output is only as good as the columns you feed it. Two habits matter:
- Feed scraped or verified facts, not just the company name. A prompt that reads a scraped "about" snippet or a funding stage writes a sentence that could only apply to one prospect.
- Filter before you spend. Drop rows that will not pay you back (undeliverable emails, missing inputs) before the AI step runs. See running a prompt across thousands of rows for the batching mechanics.
The single biggest lever on output quality is the input. A model handed only "Acme Corp" can write something true of any company. A model handed "Acme Corp, a Series B logistics firm that just opened a Rotterdam hub" can write something true of exactly one. That extra context comes from enrichment and scraping, which is why AI rarely lives alone in a real workflow.
Where AI fits in a wider workflow
In practice the AI step is one stage in a chain that starts with raw data and ends with something you can act on. A typical outbound sequence looks like this:
- Scrape company pages or news to fill context columns with real facts.
- Verify the email addresses so you do not waste sends or AI spend on dead rows.
- Filter to the rows worth contacting.
- Run AI to write a personalised opener that references the scraped facts.
Each stage feeds the next, and because it all happens in one sheet, there is no exporting and re-importing between tools. The AI step is where the value becomes visible, but it leans on the verify and scrape steps that came before it. That is the difference between personalisation that reads like a template and personalisation that reads like a human did the research.
Common mistakes to avoid
- Running the whole list before testing the prompt. Always sample twenty rows first. A small fix to the prompt saves you re-running thousands of rows.
- Personalising on the company name alone. Without a real fact in the prompt, the output is generic by construction. Add a scraped or enriched column.
- Ignoring the cost preview. It is there to stop surprises. Glance at it before every run, especially when you change the model.
- Forgetting failed rows are free. You do not need to manually exclude bad rows to avoid paying for them, but filtering still saves time and keeps outputs clean.
Keeping output accurate and on-brand
At small volumes you read every cell. At thousands of rows you cannot, so you need a way to trust the output without inspecting all of it. The reliable approach is to sample and constrain rather than to inspect everything.
- Sample before you scale. Run twenty rows, read every one, and only then run the full list. Most prompt problems show up in the first twenty rows.
- Spot-check after the full run. Sort the output column and skim the top, middle, and bottom. Outliers in length or tone surface fast when the column is sorted.
- Pin the voice with an example. A single in-prompt example of the tone you want does more than a paragraph of instructions. The model copies the example far more faithfully than it follows adjectives.
- Forbid the failure modes you have seen. If the model keeps opening with "I hope this email finds you well," add "do not open with a greeting" to the prompt. Negative instructions are blunt but effective.
Because outputs land in a normal column, anything you would normally do to a column still works: conditional formatting to flag empty results, a filter to isolate rows that need a re-run, or a quick formula to count outputs over a length limit. The AI step does not lock the data away; it just fills cells.
A worked example: personalised openers
Say you have a sheet of 2,000 prospects with columns Company, First name, and a scraped About snippet. You want a one-line opener for each that references something real.
- Confirm the inputs are real. Make sure the About column actually holds scraped text, not a placeholder. If it is empty for some rows, decide what the prompt should do with them.
- Write the prompt. "Write a one-line cold-email opener for
{{First name}}at{{Company}}. Reference one specific detail from this about text:{{About}}. One sentence, no greeting, under 25 words. If the about text is empty, write a warm generic opener that mentions{{Company}}." - Sample. Run the first twenty rows. Read them. Tighten the prompt if the openers are too long or too generic.
- Run the full list. Output streams into a new column. The cost preview already told you the total before you clicked Run.
- Spot-check and ship. Sort by opener length, skim the extremes, then use the column in your sequence.
The thing that makes this work is step one. The opener is only as specific as the About column, which is why scraping and enrichment feed the AI step rather than competing with it.
Common questions
Does this need an OpenAI account?
No. ReplyLabs includes a managed model catalogue across OpenAI, Anthropic, Google, and Mistral. You can also bring your own key on Pro and Scale.
Will it time out on large sheets?
No. The model runs server-side, outside Apps Script, so the six-minute custom-function limit does not apply. Runs of thousands of rows are normal.
Can I use my own data columns in the prompt?
Yes. Reference any column by its header in double braces, for example {{First name}} or {{Company}}, and the value for that row is substituted before the model runs.
Is my data sent anywhere I should know about?
Row inputs and outputs are passed to the AI provider you select to generate the result, then retained only briefly before purge. Metadata about the run persists so you can see your history. If you bring your own key, the AI call goes through your own provider account.
What is an AI prompt in this context?
It is the instruction template you write once and the add-on applies to every row, with column values substituted in. See the AI prompt definition.