To estimate the AI API cost of a Google Sheets job, multiply the average input tokens per row by the model's input rate, add the average output tokens per row times the output rate, divide by one million to get the cost per row, then multiply by the number of rows you expect to succeed. That single formula covers the whole estimate. In ReplyLabs you do not have to do it by hand for managed runs, because the cost preview shows the figure before you run, but understanding the maths lets you sanity-check the preview, compare models, and decide whether BYOK is worth it. This article walks through the formula, a worked example, the managed-versus-BYOK adjustment, and the gotchas that throw estimates off.
How do you estimate AI API cost?
The cost of one model call is driven by two token counts and two rates. Providers price per million tokens, and they price input (what you send) and output (what the model generates) separately, with output almost always costing more.
The per-row formula is:
cost per row = ((input_tokens x input_rate) + (output_tokens x output_rate)) / 1,000,000
The job estimate is then:
job cost = cost per row x rows_that_succeed
A token is a subword unit, not a word: on average one token is roughly four English characters, or about 0.75 words. So a 300-word prompt is around 400 tokens, and a 150-word generated answer is around 200 tokens. Those rough conversions are enough to estimate; you do not need exact token counts to get within a sensible margin.
A worked example
Say you are running a prompt down a column of 50,000 rows. Each row sends a prompt of about 400 input tokens and the model returns about 200 output tokens. The model you picked is priced at $2.50 per million input tokens and $15.00 per million output tokens.
Input cost per row: (400 x $2.50) / 1,000,000 = $0.0010.
Output cost per row: (200 x $15.00) / 1,000,000 = $0.0030.
Raw cost per row: $0.0010 + $0.0030 = $0.0040.
Job cost at 50,000 succeeded rows: $0.0040 x 50,000 = $200 of raw model spend.
Notice the output dominated even though it was half the token count, because the output rate was six times the input rate. That asymmetry is the single most important thing to internalise: trimming a verbose output instruction usually saves more than trimming the prompt.
How does managed pricing change the estimate?
The maths above gives you the raw provider cost, which is what you pay under BYOK (bring your own key), billed to your own provider account. Managed mode in ReplyLabs adds two things on top of the raw cost: a markup of 1.25x and a base fee of $0.0025 per succeeded row.
So adapt the per-row figure depending on mode:
- BYOK per row = raw cost (the
$0.0040above), billed by your provider. - Managed per row =
(raw cost x 1.25) + $0.0025=($0.0040 x 1.25) + $0.0025=$0.0075.
Across 50,000 succeeded rows that is $200 of provider spend under BYOK versus $375 managed. The $175 gap is the markup and base fee. The cheaper the raw call, the larger the fixed $0.0025 base fee looms in proportion, so high-volume runs on small fast models are where BYOK pulls furthest ahead. For the full break-even logic, the bring your own API key page lays it out, and the definition of the two modes is on the BYOK glossary entry.
You do not have to run any of this by hand. The AI cost calculator takes your token counts, rate, and row count and shows the managed and raw figures side by side, so you can compare modes before committing to a run.
What throws an estimate off?
Five things account for most of the gap between an estimate and the real bill.
- Output length varies more than input. Your prompt is fixed, but how much the model writes back is not. If outputs range from one line to a paragraph, estimate against the longer end so you are not surprised.
- Only succeeded rows are charged. In both managed and BYOK mode, failed and skipped rows cost nothing on the ReplyLabs side. So estimate against rows you expect to succeed, not your total row count. A list with 20% junk inputs costs less than its size suggests.
- Retries. A row that fails once and succeeds on retry still bills once, for the succeeded result. Retries do not multiply your bill on the ReplyLabs side, but heavy provider-side retry traffic under BYOK can show up on your provider invoice.
- Prompt caching and batch discounts. Most providers now offer cached-input pricing (often around 10% of the base input rate for repeated context) and batch APIs at roughly half price. If your prompt reuses a large fixed system block across every row, the effective input rate can be far below list, so a naive estimate overstates the cost.
- Model choice. Rates span a wide range, from cents per million tokens on budget models to tens of dollars per million on frontier reasoning models. Re-estimate whenever you switch models; the same job can differ by 50x depending on the model.
A quick estimation checklist
Before a large run, settle five numbers: average input tokens per row, average output tokens per row, the model's input rate, its output rate, and the count of rows you expect to succeed. Multiply through the per-row formula, scale by the row count, then apply the managed adjustment if you are not on BYOK. Cross-check against the cost preview in the sidebar or the AI cost calculator. For configuring the prompt and run itself, see the AI prompts help guide, and to start, the ReplyLabs home page lists the plans.
Common questions
How do I calculate the cost of an AI API call?
Multiply input tokens by the input rate and output tokens by the output rate, then divide by one million. Providers price input and output separately per million tokens, and output usually costs several times more than input.
How many tokens is a word?
On average one token is about 0.75 words, or roughly four English characters. A 300-word prompt is around 400 tokens. These rough conversions are accurate enough for estimating.
Why is my actual bill different from my estimate?
Usually because output length varied, fewer rows succeeded than expected, or prompt caching and batch discounts lowered the effective rate. Only succeeded rows are charged, so junk inputs cost nothing and can make the real bill lower than a naive estimate.
Does ReplyLabs charge for failed rows?
No. In both managed and BYOK mode, only succeeded rows are charged. Failed and skipped rows cost nothing on the ReplyLabs side.
How do I estimate cost for a managed run versus BYOK?
Estimate the raw cost first. That is the BYOK figure. For managed, multiply the raw cost by 1.25 and add $0.0025 per succeeded row. The AI cost calculator shows both side by side.