A web scraper is a tool that automatically fetches web pages and extracts data from them into a structured format such as rows and columns. Instead of opening each page by hand and copying text, a scraper requests the page, reads its HTML, picks out the values you want, and writes them somewhere usable. In a spreadsheet context, a web scraper turns a column of URLs into a column of clean data: page text, a company description, a price, a job title.
How a web scraper works
A scraper does three things for every URL. It fetches the page, sending an HTTP request the way a browser would. It parses the response, locating the data within the HTML, sometimes after rendering JavaScript so the page looks the way a human would see it. It extracts and stores the chosen values into a structured output. A good scraper also handles the messy reality of the web: retrying a slow page, backing off when a site rate-limits, and skipping a dead link without crashing the whole run.
Scraper vs the native import functions
Google Sheets ships with IMPORTXML and IMPORTHTML, which are lightweight scrapers for static pages. They read the raw HTML a server returns, so they cannot see content that JavaScript builds in the browser, and they cap out at roughly 50 calls per sheet. A dedicated web scraper runs the requests on a server outside the spreadsheet, renders dynamic pages, retries failures, and scales to thousands of URLs without hitting the six-minute Apps Script limit.
Web scraper in ReplyLabs
ReplyLabs is a web scraper that runs from inside Google Sheets. You select a column of URLs, pick an engine, and results stream back into a new column. Only URLs that return data are charged, and the in-house engine auto-falls back to another when a page resists. Scraped text is most useful as input to an AI step, which is why scraping and enrichment usually run together.