Web scraping in Google Sheets: methods, limits, and how to do it at scale

Web scraping in Google Sheets means pulling data off web pages and writing it into rows and columns, ideally without leaving the spreadsheet. There are three ways to do it: the built-in IMPORTXML and IMPORTHTML functions for simple static pages, a sidebar add-on that scrapes server-side for pages that need scale or JavaScript rendering, and a raw scraping API for engineering teams who want to write their own pipeline. The native functions are free but cap out fast, stall on JavaScript pages, and stop at roughly 50 calls per sheet. An add-on like ReplyLabs runs the requests off-spreadsheet, so it handles thousands of URLs, renders dynamic pages, and only charges for URLs that return data. This guide covers each method, where it breaks, the costs, and how to scrape responsibly.

What are the ways to scrape a website into Google Sheets?

There are three practical shapes, and they suit different jobs.

Native import functions. IMPORTXML, IMPORTHTML, and IMPORTDATA live inside Sheets and need no setup. You write a formula, point it at a URL and an XPath or a table index, and the result lands in the cell. Free, instant, and fine for a handful of static pages.
A sidebar add-on. You select a column of URLs, the add-on fetches each page on a server outside the spreadsheet, and results stream back into a new column. This is the approach that survives at thousands of rows and on JavaScript-heavy sites, because the work never touches Apps Script.
A scraping API. A developer writes code that calls a scraping service, parses the response, and pushes rows into Sheets through the Sheets API. Maximum control, but it is a build project, not a spreadsheet task.

ReplyLabs is the second shape, a web scraper that runs from inside Google Sheets. It also bundles the API providers so you can route a run through whichever engine fits the target site, without writing any code yourself.

How do IMPORTXML and IMPORTHTML work, and where do they fail?

The native functions are the obvious starting point because they cost nothing and need no install. IMPORTHTML pulls a table or list by index. IMPORTXML pulls any element you can target with an XPath query. A typical formula looks like this:

=IMPORTXML("https://example.com/about", "//h1")
=IMPORTHTML("https://example.com/pricing", "table", 1)

The XPath in the first formula, //h1, grabs the first level-one heading on the page. You can target nested nodes too, for example a div inside an article, written as a path expression in the XPath string. For a static page with clean HTML, this works and it is genuinely useful.

The problems show up the moment you scale or hit a modern site:

JavaScript-rendered pages return nothing. IMPORTXML reads the raw HTML the server sends, not the page a browser builds after running scripts. Most large, modern sites render their content with JavaScript, so the data you want simply is not in the HTML the function sees. You get an empty cell or an error.
Roughly 50 calls per sheet. Each spreadsheet can run only about 50 import calls before they start failing, and each call returns up to 50,000 characters. A list of 500 URLs is out of reach on the native functions alone.
Unpredictable refresh and slowdowns. The import functions re-check for updates roughly every hour while the document is open, and once you have more than a dozen of them the sheet slows down and behaves erratically because every edit can trigger a recalculation.
No retries, no rendering, no error handling. When a page is slow, blocks the request, or needs a header, the formula just fails. There is nowhere to add a retry or a fallback.
Rate limiting and #N/A. Hammer a site with formulas and Google throttles you, returning errors that look like the data does not exist.

IMPORTXML is the right tool for a quick one-off on a static page. It is the wrong tool for enriching a lead list, monitoring competitor pages, or anything that runs more than a few dozen times.

What is the 6-minute Apps Script limit and why does it matter?

If the native functions are not enough, the next instinct is to write a custom Apps Script that loops over your URLs and fetches each one. This hits a hard wall: Google caps a single Apps Script execution at six minutes. Fetch a page, parse it, write the row, repeat, and you will burn through that budget after a few hundred URLs at best, often far fewer if the pages are slow. The script dies mid-run and you are left with a half-filled column and no clean way to resume.

You can work around it with time-based triggers that chunk the work into six-minute slices, but now you are maintaining a brittle queue-and-resume system instead of scraping. This is the core reason a serious scrape does not belong inside Apps Script. The fix is to move the fetching off the spreadsheet entirely. ReplyLabs runs every request on its own servers, so the six-minute limit never applies and a run of fifty thousand URLs is just a longer run, not a broken one.

Is web scraping legal and how do I do it ethically?

Scraping publicly available data is broadly legal in the US and EU, but the way you do it matters more than whether you do it. The landmark hiQ Labs v. LinkedIn case established that accessing a page anyone can view without logging in is not "unauthorised access" under US anti-hacking law, a position the Ninth Circuit reaffirmed in 2022. That does not make scraping a free-for-all. The same case ended in a settlement where hiQ destroyed its scraped data, and newer disputes, such as Reddit's late-2025 suit against Perplexity, turn on whether technical access controls were bypassed rather than on whether the data was public.

A few rules keep you on the right side of both the law and basic courtesy:

Respect robots.txt. It is not a contract, and ignoring it is not hacking, but it signals intent. Honouring it is the clearest sign of good faith and the easiest way to avoid being blocked.
Rate-limit yourself. Scraping fast enough to degrade a site can count as trespass to chattels. A common rule of thumb is no more than one request per second per domain. ReplyLabs paces requests and backs off rather than flooding a host.
Do not scrape personal data without a lawful basis. GDPR and CCPA apply to names, emails, and photos regardless of whether the page was public. The Clearview AI fines, over 91 million euros across 15 jurisdictions by 2025, are the cautionary tale.
Scrape behind no login. Public pages only. Anything behind a sign-in wall changes the legal picture entirely.

The short version: scrape public pages, slowly, honour the site's stated wishes, and never collect personal data you have no basis to hold.

How do I scrape with ReplyLabs, step by step?

The flow is the same whether you are pulling page text, prices, or company details:

Put your target URLs in a column. Open the sidebar with Extensions, ReplyLabs, Open sidebar.
Select the range of URLs.
Choose a scrape engine, or leave it on the default in-house scraper, which auto-falls back to another engine if a page resists the first attempt.
Pick what to extract: full page text for AI to read, or a specific field.
Review the cost preview for your exact URL count.
Click Run. Each URL is fetched on a server, and results stream back into a new column.

Because every URL is its own request, a page that blocks or times out fails on its own row without stopping the rest, and you only pay for URLs that return data. Failed rows cost nothing, so a list with dead links does not inflate your bill.

What does scraping cost in ReplyLabs?

ReplyLabs offers four engines so you can match the price to the target site:

ReplyLabs Scrape (in-house), from $0.002 per URL. Fast, with automatic fallback to another engine when a page resists. The default for most lists.
Google Sheets Scrape, which runs through your own Apps Script so requests come from Google IP addresses. Useful when a site treats Google traffic differently.
Jina, at $0.005 per URL, a managed reader that returns clean page text.
Firecrawl, at $0.0075 per URL, for the hardest JavaScript-heavy pages.

You see the price for your exact URL count before anything runs, and only succeeded URLs are charged. New accounts start with $20 of free credit, enough to scrape thousands of URLs on the in-house engine before paying anything. To model a full run alongside AI and verification costs, use the cost calculator.

What can you actually do with scraped data in a sheet?

Two patterns cover most of the real work.

Enrich a list. You have a column of company URLs and you want context: what the company does, where it is based, a recent headline. Scrape the about page or homepage into a text column, then run an AI prompt over it to extract clean fields or write a personalised line. This is the engine behind lead enrichment in Google Sheets, where a scraped fact turns a generic opener into one that could only apply to a single prospect.

Monitor pages over time. Scrape a set of competitor pricing pages, product pages, or job boards on a schedule and diff the results against last week. A change in a price, a new job posting, or a removed product line shows up as a changed cell. Because the output is a normal column, any spreadsheet trick still works: conditional formatting to flag changes, a filter to isolate movers, a formula to count differences.

In both cases the scrape is rarely the finish line. It feeds an AI step or a verification step, and because it all happens in one sheet there is no exporting and re-importing between tools.

Common questions

Can IMPORTXML scrape JavaScript-rendered pages?

No. IMPORTXML reads the raw HTML the server returns, before any JavaScript runs. Most modern sites build their content with JavaScript, so the data is not in that raw HTML and the formula returns an empty cell or an error. A server-side scraper that renders the page, like the ReplyLabs Firecrawl engine, is the way to reach that content.

How many URLs can I scrape in Google Sheets?

With native functions, roughly 50 import calls per spreadsheet before they start failing. With a server-side add-on like ReplyLabs, there is no spreadsheet-side cap, because the requests run off the sheet. Runs of thousands of URLs are normal.

Why does my IMPORTXML formula return an error or `#N/A`?

Usually one of three reasons: the page is JavaScript-rendered so the content is not in the raw HTML, you have exceeded the per-sheet import limit, or Google is rate-limiting your requests. The native functions give you no way to retry or render, which is why scale and dynamic pages need a different tool.

Is scraping public web pages legal?

Scraping publicly accessible pages is broadly legal in the US and EU, as established by hiQ Labs v. LinkedIn. The conditions matter: stay on pages with no login, respect robots.txt, rate-limit your requests, and do not collect personal data without a lawful basis under GDPR or CCPA.

Do I pay for URLs that fail to scrape?

No. ReplyLabs only charges for URLs that return data. A page that blocks the request, times out, or is a dead link fails on its own row at no cost, so a messy list does not inflate your bill.

What is the difference between scraping and the native import functions?

The native functions run inside the spreadsheet, are free, and break on JavaScript pages and at about 50 calls. A scraper runs requests on a server, renders dynamic pages, retries failures, and scales to thousands of URLs. See what a web scraper is for the definition, or start with ReplyLabs to try it on your own list.

What are the ways to scrape a website into Google Sheets?

There are three practical shapes, and they suit different jobs.

Native import functions. IMPORTXML, IMPORTHTML, and IMPORTDATA live inside Sheets and need no setup. You write a formula, point it at a URL and an XPath or a table index, and the result lands in the cell. Free, instant, and fine for a handful of static pages.
A sidebar add-on. You select a column of URLs, the add-on fetches each page on a server outside the spreadsheet, and results stream back into a new column. This is the approach that survives at thousands of rows and on JavaScript-heavy sites, because the work never touches Apps Script.
A scraping API. A developer writes code that calls a scraping service, parses the response, and pushes rows into Sheets through the Sheets API. Maximum control, but it is a build project, not a spreadsheet task.

How do IMPORTXML and IMPORTHTML work, and where do they fail?

=IMPORTXML("https://example.com/about", "//h1")
=IMPORTHTML("https://example.com/pricing", "table", 1)

The problems show up the moment you scale or hit a modern site:

JavaScript-rendered pages return nothing. IMPORTXML reads the raw HTML the server sends, not the page a browser builds after running scripts. Most large, modern sites render their content with JavaScript, so the data you want simply is not in the HTML the function sees. You get an empty cell or an error.
Roughly 50 calls per sheet. Each spreadsheet can run only about 50 import calls before they start failing, and each call returns up to 50,000 characters. A list of 500 URLs is out of reach on the native functions alone.
Unpredictable refresh and slowdowns. The import functions re-check for updates roughly every hour while the document is open, and once you have more than a dozen of them the sheet slows down and behaves erratically because every edit can trigger a recalculation.
No retries, no rendering, no error handling. When a page is slow, blocks the request, or needs a header, the formula just fails. There is nowhere to add a retry or a fallback.
Rate limiting and #N/A. Hammer a site with formulas and Google throttles you, returning errors that look like the data does not exist.

IMPORTXML is the right tool for a quick one-off on a static page. It is the wrong tool for enriching a lead list, monitoring competitor pages, or anything that runs more than a few dozen times.

What is the 6-minute Apps Script limit and why does it matter?

Is web scraping legal and how do I do it ethically?

A few rules keep you on the right side of both the law and basic courtesy:

Respect robots.txt. It is not a contract, and ignoring it is not hacking, but it signals intent. Honouring it is the clearest sign of good faith and the easiest way to avoid being blocked.
Rate-limit yourself. Scraping fast enough to degrade a site can count as trespass to chattels. A common rule of thumb is no more than one request per second per domain. ReplyLabs paces requests and backs off rather than flooding a host.
Do not scrape personal data without a lawful basis. GDPR and CCPA apply to names, emails, and photos regardless of whether the page was public. The Clearview AI fines, over 91 million euros across 15 jurisdictions by 2025, are the cautionary tale.
Scrape behind no login. Public pages only. Anything behind a sign-in wall changes the legal picture entirely.

The short version: scrape public pages, slowly, honour the site's stated wishes, and never collect personal data you have no basis to hold.

How do I scrape with ReplyLabs, step by step?

The flow is the same whether you are pulling page text, prices, or company details:

Put your target URLs in a column. Open the sidebar with Extensions, ReplyLabs, Open sidebar.
Select the range of URLs.
Choose a scrape engine, or leave it on the default in-house scraper, which auto-falls back to another engine if a page resists the first attempt.
Pick what to extract: full page text for AI to read, or a specific field.
Review the cost preview for your exact URL count.
Click Run. Each URL is fetched on a server, and results stream back into a new column.

What does scraping cost in ReplyLabs?

ReplyLabs offers four engines so you can match the price to the target site:

ReplyLabs Scrape (in-house), from $0.002 per URL. Fast, with automatic fallback to another engine when a page resists. The default for most lists.
Google Sheets Scrape, which runs through your own Apps Script so requests come from Google IP addresses. Useful when a site treats Google traffic differently.
Jina, at $0.005 per URL, a managed reader that returns clean page text.
Firecrawl, at $0.0075 per URL, for the hardest JavaScript-heavy pages.

What can you actually do with scraped data in a sheet?

Two patterns cover most of the real work.

In both cases the scrape is rarely the finish line. It feeds an AI step or a verification step, and because it all happens in one sheet there is no exporting and re-importing between tools.

Common questions

Can IMPORTXML scrape JavaScript-rendered pages?

How many URLs can I scrape in Google Sheets?

Why does my IMPORTXML formula return an error or `#N/A`?

Is scraping public web pages legal?

Do I pay for URLs that fail to scrape?

No. ReplyLabs only charges for URLs that return data. A page that blocks the request, times out, or is a dead link fails on its own row at no cost, so a messy list does not inflate your bill.

What are the ways to scrape a website into Google Sheets?

How do IMPORTXML and IMPORTHTML work, and where do they fail?

What is the 6-minute Apps Script limit and why does it matter?

Is web scraping legal and how do I do it ethically?

How do I scrape with ReplyLabs, step by step?

What does scraping cost in ReplyLabs?

What can you actually do with scraped data in a sheet?

Common questions

Can IMPORTXML scrape JavaScript-rendered pages?

How many URLs can I scrape in Google Sheets?

Why does my IMPORTXML formula return an error or `#N/A`?

Is scraping public web pages legal?

Do I pay for URLs that fail to scrape?

What is the difference between scraping and the native import functions?

Try it on your own list

Web scraping in Google Sheets: methods, limits, and how to do it at scale

What are the ways to scrape a website into Google Sheets?

How do IMPORTXML and IMPORTHTML work, and where do they fail?

What is the 6-minute Apps Script limit and why does it matter?

Is web scraping legal and how do I do it ethically?

How do I scrape with ReplyLabs, step by step?

What does scraping cost in ReplyLabs?

What can you actually do with scraped data in a sheet?

Common questions

Can IMPORTXML scrape JavaScript-rendered pages?

How many URLs can I scrape in Google Sheets?

Why does my IMPORTXML formula return an error or `#N/A`?

Is scraping public web pages legal?

Do I pay for URLs that fail to scrape?

What is the difference between scraping and the native import functions?

Try it on your own list

What are the ways to scrape a website into Google Sheets?

How do IMPORTXML and IMPORTHTML work, and where do they fail?

What is the 6-minute Apps Script limit and why does it matter?

Is web scraping legal and how do I do it ethically?

How do I scrape with ReplyLabs, step by step?

What does scraping cost in ReplyLabs?

What can you actually do with scraped data in a sheet?

Common questions

Can IMPORTXML scrape JavaScript-rendered pages?

How many URLs can I scrape in Google Sheets?

Why does my IMPORTXML formula return an error or #N/A?

Is scraping public web pages legal?

Do I pay for URLs that fail to scrape?

What is the difference between scraping and the native import functions?

Try it on your own list

What are the ways to scrape a website into Google Sheets?

How do IMPORTXML and IMPORTHTML work, and where do they fail?

What is the 6-minute Apps Script limit and why does it matter?

Is web scraping legal and how do I do it ethically?

How do I scrape with ReplyLabs, step by step?

What does scraping cost in ReplyLabs?

What can you actually do with scraped data in a sheet?

Common questions

Can IMPORTXML scrape JavaScript-rendered pages?

How many URLs can I scrape in Google Sheets?

Why does my IMPORTXML formula return an error or #N/A?

Is scraping public web pages legal?

Do I pay for URLs that fail to scrape?

What is the difference between scraping and the native import functions?

Try it on your own list

Why does my IMPORTXML formula return an error or `#N/A`?

Why does my IMPORTXML formula return an error or `#N/A`?