How to scrape LinkedIn company pages safely (and the legal limits)

Scraping LinkedIn company pages means collecting public company information, such as a company name, industry, size band, headquarters location, or website, from pages anyone can view. The honest answer up front: scraping LinkedIn directly violates LinkedIn's User Agreement, even when a court has said the underlying public data is not protected by anti-hacking law. The safer, more durable approach is to take the company facts you find on LinkedIn, such as a company's own website, and scrape that instead, where there is no terms-of-service conflict and the data is richer. This guide explains what the law actually says, where LinkedIn's terms draw the line, and a workflow that stays on the right side of both.

Is it legal to scrape LinkedIn company pages?

It is more complicated than a yes or no. Two separate things are in play: anti-hacking law, and LinkedIn's own contract.

On anti-hacking law, the Ninth Circuit in hiQ Labs v. LinkedIn held that scraping data that is publicly visible without logging in does not count as unauthorised access under the US Computer Fraud and Abuse Act. In plain terms, a court found that public LinkedIn data is not protected by the federal anti-hacking statute.

On contract, the same case later went against hiQ. In 2022 a district court found hiQ had breached LinkedIn's User Agreement, which prohibits scraping and the creation of fake accounts, and the parties settled. So the data may be public, but the act of scraping LinkedIn still breaks the agreement you accept by using the site. Terms of service, copyright, and privacy law each constrain scraping independently of the CFAA.

The takeaway: public data being legally accessible is not the same as scraping LinkedIn being allowed. If you scrape LinkedIn directly, you are violating its terms, and LinkedIn enforces this aggressively with account bans and litigation.

What is the safer way to get company data?

Pivot off LinkedIn to the company's own footprint. A LinkedIn company page almost always lists the company's website. That website is fair game to scrape under the usual public-page rules, it carries no LinkedIn terms-of-service problem, and it is usually richer than the LinkedIn summary: the about page, product pages, pricing, recent news, and contact details.

So the compliant pattern is not "scrape LinkedIn", it is "use LinkedIn to identify companies, then scrape their public websites". You get the company description, positioning, and signals you actually need for outreach or research, from a source you are allowed to read at scale. This is the same enrichment loop described in lead enrichment in Google Sheets, where a scraped fact from a company's own site turns a generic message into a specific one.

What company data can you ethically collect?

Stick to firmographic, non-personal data about the organisation, gathered from sources you are permitted to scrape:

Company description and positioning from the about or homepage.
Industry and what the company sells from product and pricing pages.
Headquarters location and office regions from contact or footer pages.
Recent signals such as a funding announcement, a new product, or hiring activity from the company's news or careers page.

Avoid scraping personal data about individuals, such as employee names, profiles, or photos, without a lawful basis. GDPR and CCPA apply to personal data regardless of whether a page was public, and the Clearview AI fines, over 91 million euros across multiple jurisdictions, show how seriously regulators treat scraped personal data. Company-level facts about an organisation are a far safer footing than personal profiles.

How do I build a compliant company-data workflow in a spreadsheet?

The workflow keeps you on public, permitted pages the whole way through:

Build your target list of companies, with each company's website URL in a column. If you started from LinkedIn, copy the website link the company itself publishes, not profile data behind the network.
Open the ReplyLabs sidebar with Extensions, ReplyLabs, Open sidebar and select the column of website URLs.
Choose a scrape engine, or leave it on the in-house default, which falls back to another engine when a page resists.
Scrape the homepage or about page into a text column.
Run an AI step over that text to extract clean fields: a one-line description, the industry, the location, or a personalised opener.

Every fetch in this flow targets a company's own public site, paced and retried on a server, never LinkedIn behind its login or terms. ReplyLabs is a web scraper that runs inside Google Sheets, so the scrape and the AI extraction happen in one place with no exporting between tools.

How should I scrape responsibly?

The same rules that keep any scrape defensible apply doubly when company data is involved:

Stay on pages with no login. Public pages only. Anything behind a sign-in wall, including most of LinkedIn's useful data, changes the legal picture entirely.
Respect robots.txt. It signals a site's wishes. Honouring it is the clearest sign of good faith.
Rate-limit yourself. Roughly one request per second per domain. Scraping fast enough to degrade a site can count as trespass to chattels. ReplyLabs paces requests and backs off rather than flooding a host.
Read the terms. A site's terms of service can prohibit scraping even when the data is public, as hiQ learned. Honour them.

What does scraping company sites cost?

ReplyLabs charges only for URLs that return data, across four engines. The in-house engine starts at $0.002 per URL with automatic fallback, Jina is $0.005 per URL for clean text, and Firecrawl is $0.0075 per URL for the hardest JavaScript-heavy pages. There is also a dedicated LinkedIn company data option billed at $0.005 per succeeded row for permitted company-level lookups. You see the price for your exact count before running, and new accounts get $20 of free credit. Model a full run with the cost calculator.

Common questions

Can I scrape LinkedIn company pages directly?

Scraping LinkedIn directly violates its User Agreement, which prohibits scraping, even though a court found the underlying public data is not protected by US anti-hacking law. The durable approach is to use LinkedIn to identify companies, then scrape their own public websites, where there is no terms conflict.

What did the hiQ v. LinkedIn case decide?

The Ninth Circuit held that scraping publicly visible data is not unauthorised access under the CFAA. Separately, a later ruling found hiQ had breached LinkedIn's contract by scraping and creating fake accounts. Public data can be legally accessible while scraping the site still breaks its terms.

Is company data personal data under GDPR?

Firmographic facts about an organisation, such as industry, size, and headquarters, are generally not personal data. Information about identifiable individuals, such as employee names and profiles, is personal data and needs a lawful basis to collect, even when public.

How do I get company data into a spreadsheet safely?

Scrape each company's own public website into a column, then run an AI step to extract clean fields. See lead enrichment in Google Sheets for the full pattern, or start with ReplyLabs.

Is it legal to scrape LinkedIn company pages?

It is more complicated than a yes or no. Two separate things are in play: anti-hacking law, and LinkedIn's own contract.

What is the safer way to get company data?

What company data can you ethically collect?

Stick to firmographic, non-personal data about the organisation, gathered from sources you are permitted to scrape:

Company description and positioning from the about or homepage.
Industry and what the company sells from product and pricing pages.
Headquarters location and office regions from contact or footer pages.
Recent signals such as a funding announcement, a new product, or hiring activity from the company's news or careers page.

How do I build a compliant company-data workflow in a spreadsheet?

The workflow keeps you on public, permitted pages the whole way through:

Build your target list of companies, with each company's website URL in a column. If you started from LinkedIn, copy the website link the company itself publishes, not profile data behind the network.
Open the ReplyLabs sidebar with Extensions, ReplyLabs, Open sidebar and select the column of website URLs.
Choose a scrape engine, or leave it on the in-house default, which falls back to another engine when a page resists.
Scrape the homepage or about page into a text column.
Run an AI step over that text to extract clean fields: a one-line description, the industry, the location, or a personalised opener.

How should I scrape responsibly?

The same rules that keep any scrape defensible apply doubly when company data is involved:

Stay on pages with no login. Public pages only. Anything behind a sign-in wall, including most of LinkedIn's useful data, changes the legal picture entirely.
Respect robots.txt. It signals a site's wishes. Honouring it is the clearest sign of good faith.
Rate-limit yourself. Roughly one request per second per domain. Scraping fast enough to degrade a site can count as trespass to chattels. ReplyLabs paces requests and backs off rather than flooding a host.
Read the terms. A site's terms of service can prohibit scraping even when the data is public, as hiQ learned. Honour them.

What does scraping company sites cost?

Common questions

Can I scrape LinkedIn company pages directly?

What did the hiQ v. LinkedIn case decide?

Is company data personal data under GDPR?

How do I get company data into a spreadsheet safely?

Scrape each company's own public website into a column, then run an AI step to extract clean fields. See lead enrichment in Google Sheets for the full pattern, or start with ReplyLabs.

Is it legal to scrape LinkedIn company pages?

What is the safer way to get company data?

What company data can you ethically collect?

How do I build a compliant company-data workflow in a spreadsheet?

How should I scrape responsibly?

What does scraping company sites cost?

Common questions

Can I scrape LinkedIn company pages directly?

What did the hiQ v. LinkedIn case decide?

Is company data personal data under GDPR?

How do I get company data into a spreadsheet safely?

Try it on your own list

How to scrape LinkedIn company pages safely (and the legal limits)

Is it legal to scrape LinkedIn company pages?

What is the safer way to get company data?

What company data can you ethically collect?

How do I build a compliant company-data workflow in a spreadsheet?

How should I scrape responsibly?

What does scraping company sites cost?

Common questions

Can I scrape LinkedIn company pages directly?

What did the hiQ v. LinkedIn case decide?

Is company data personal data under GDPR?

How do I get company data into a spreadsheet safely?

Try it on your own list