How to Scrape LinkedIn: Step-by-Step Guide

LinkedIn scraping has the potential to transform your business: perform lead generation and brand monitoring using this guide and build your own LinkedIn scraper!

How to Scrape LinkedIn: Step-by-Step Guide
Article content
  1. Why Scrape Data from LinkedIn?
  2. What Data you Can Scrape from LinkedIn?
  3. How to Scrape LinkedIn with Infatica API
  4. Features of Infatica LinkedIn Scraper API
  5. Scrape LinkedIn Using Selenium and BeautifulSoup in Python
  6. Is scraping LinkedIn legal?
  7. Conclusion: What is the Best Way to Scrape LinkedIn?
  8. Frequently Asked Questions

Social media platforms like LinkedIn are growing every year – and so does the importance of LinkedIn scraping . In this article, we’re covering ins and outs of scraping data from LinkedIn – and how to use Infatica’s easy-to-understand LinkedIn scraping API.

Why Scrape Data from LinkedIn?

Different users connected to the LinkedIn network

As a popular social network, LinkedIn is home to public discourse on thousands of topics among millions of users. Bundled together, these individuals posts, comments, and reactions are a goldmine if you’re looking for actionable data – here’s why:

Lead generation effort. LinkedIn profile scrapers are particularly useful for their ability to collect public profile data and create effective outreach campaigns, lists of potential customers, and lists of potential leads.

⭐ Further reading: See our detailed guide to lead generation via web scraping.

Brand monitoring. By extension, your LinkedIn page and relevant groups can be a great way for your clients to interact with your brand – this includes both positive and negative feedback, which you can use to improve your product and create effective outreach campaigns.

Consumer research. LinkedIn scrapers can also provide customer behavior data: Users’ public comments and posts can be analyzed for social listening and personalization engagement.

What Data you Can Scrape from LinkedIn?

Different scrapable LinkedIn pages

As a platform, LinkedIn includes a wide set of page types that aggregate different kinds of data: these include profile info, posts, groups, recommendations, courses, jobs, and more. Here’s a quick overview of what kind of data a typical LinkedIn scraping API can collect:

LinkedIn Profiles Scrape public profile information
LinkedIn Groups Scrape public group information
LinkedIn job listings Scrape listed jobs
LinkedIn jobs Scrape job descriptions
LinkedIn company profile Scrape company information
LinkedIn search results Scrape information from keywords and filters

How to Scrape LinkedIn with Infatica API

Scraping bot uses Infatica Scraper API to connect to LinkedIn

Infatica Scraper API is a scraping tool for industry-grade data collection. Supporting platforms like LinkedIn, Amazon, Google, Facebook, and more, it’s sure to become your ultimate LinkedIn scraping API if you give it a try.

Step 1. Sign in to your Infatica account

Your Infatica account has different useful features (traffic dashboard, support area, how-to videos, etc.) It also has your unique user_key value which we’ll need to use the API – you can find it in your personal account’s billing area. The input example for theuser_key value is a 20-symbol combination of numbers and lower- and uppercase letters, e.g. KPCLjaGFu3pax7PYwWd3.

Step 2. Send a JSON request

This request will contain all necessary data attributes. Here’s a sample request:

{
	"user_key":"KPCLjaGFu3pax7PYwWd3",
	"URLS":[
			{
				"URL":"https://www.linkedin.com/in/clementdelangue/",
				"Headers":{
					"Connection":"keep-alive",
					"User-Agent":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
					"Upgrade-Insecure-Requests":"1"
				},
				"userId":"ID-0"
			}
	]
}

Here are attributes you need to specify in your request:

  • user_key: Hash key for API interactions; available in the personal account’s billing area.
  • URLS: Array containing all planned downloads.
  • URL: Download link.
  • Headers: List of headers that are sent within the request; additional headers (e.g. cookie, accept, and more) are also accepted. Required headers are: Connection, User-Agent, Upgrade-Insecure-Requests.
  • userId: Unique identifier within a single request; returning responses contain the userId attribute.

Here’s a sample request containing 4 LinkedIn URLs:

{
	"user_key":"KPCLjaGFu3pax7PYwWd3",
	"URLS":[
		{
			"URL":"https://www.linkedin.com/in/clementdelangue/",
			"Headers":{"Connection":"keep-alive","User-Agent":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0","Upgrade-Insecure-Requests":"1"},
			"userId":"ID-0"
		},
		{
			"URL":"https://www.linkedin.com/in/julienchaumond/",
			"Headers":{"Connection":"keep-alive","User-Agent":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0","Upgrade-Insecure-Requests":"1"},
			"userId":"ID-1"
		},
		{
			"URL":"https://www.linkedin.com/company/huggingface/people/",
			"Headers":{"Connection":"keep-alive","User-Agent":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0","Upgrade-Insecure-Requests":"1"},
			"userId":"ID-2"
		},
		{
			"URL":"https://www.linkedin.com/feed/update/urn:li:activity:6945049650122948608/",
			"Headers":{"Connection":"keep-alive","User-Agent":"Mozilla\/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0","Upgrade-Insecure-Requests":"1"},
			"userId":"ID-3"
		}
	]
}

Step 3. Get the response and download the files

When finished, the API will send a JSON response containing – in our case – four download URLs. Upon receiving the response, notice its attributes: Status (HTTP status) and Link (file download link.) Follow the links to download the corresponding contents.

{
	"ID-0":{"status":996,"link":""},
	"ID-1":{"status":200,"link":"https://www.domain.com/files/product2.txt"},
	"ID-2":{"status":200,"link":"https://www.domain.com/files/product3.txt"},
	"ID-3":{"status":null,"link":""}
}

Please note that the server stores each file for 20 minutes. The optimal URL count is below 1,000 URLs per one request. Processing 1000 URLs may take 1-5 minutes.

Features of Infatica LinkedIn Scraper API

Noticing the struggle of our clients to build a reliable and fast LinkedIn scraping API, we decided to solve this problem by creating an easy-to-use web scraper with support for platforms like LinkedIn, Google, Amazon, Facebook, and many more. The result is Infatica Scraper API and its awesome features:

Millions of proxies: Scraper utilizes a pool of 35+ million datacenter and residential IP addresses across dozens of global ISPs, supporting real devices, smart retries and IP rotation.

100+ global locations: Choose from 100+ global locations to send your web scraping API requests – or simply use random geo-targets from a set of major cities all across the globe.

Robust infrastructure: Make your projects scalable and enjoy advanced features like concurrent API requests, CAPTCHA solving, browser support and JavaScript rendering.

Flexible pricing: Infatica Scraper offers a wide set of flexible pricing plans for small-, medium-, and large-scale projects, starting at just $25 per month.

Scrape LinkedIn Using Selenium and BeautifulSoup in Python

Scraping bot uses Python to connect to LinkedIn

An alternative method of LinkedIn web scraping involves using Python, the most popular programming language for data collection, to build a scraping service of our own. Additionally, we’ll use Selenium, a browser automation suite, and BeautifulSoup, a Python library for parsing HTML documents, to make our bot even

❔ Further reading: We have an up-to-date overview of Python web crawlers on our blog – or you can watch its video version on YouTube.

🍲 Further reading: Using Python's BeautifulSoup to scrape images

🎭 Further reading: Using Python's Puppeteer to automate data collection

Set up the components

First, let’s install the libraries we mentioned above:

pip install selenium
pip install beautifulsoup4

Additionally, our setup requires a web driver – an interface for software to remotely control the behavior of web browsers. Popular web drivers include Chromium (Chrome), Firefox, Edge, Internet Explorer, and Safari; in this guide, we’ll be using the Chrome web driver – you can download it here.

Log into your LinkedIn account

We’ll start with initializing the Selenium web driver and submitting a GET request. Then, we’ll locate page elements that manage the process of logging in. Here’s the code snippet to do that:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Creating a webdriver instance
driver = webdriver.Chrome("Enter-Location-Of-Your-Web-Driver")
# This instance will be used to log into LinkedIn

# Opening linkedIn's login page
driver.get("https://linkedin.com/uas/login")

# waiting for the page to load
time.sleep(5)

# entering username
username = driver.find_element_by_id("username")

# In case of an error, try changing the element
# tag used here.

# Enter Your Email Address
username.send_keys("User_email")

# entering password
pword = driver.find_element_by_id("password")
# In case of an error, try changing the element
# tag used here.

# Enter Your Password
pword.send_keys("User_pass")		

# Clicking on the log in button
# Format (syntax) of writing XPath -->
# //tagname[@attribute='value']
driver.find_element_by_xpath("//button[@type='submit']").click()
# In case of an error, try changing the
# XPath used here.

Scrape LinkedIn profile introduction

LinkedIn data doesn’t load fully unless the page is scrolled to the bottom. To create a LinkedIn profile scraper, we’ll need to input the profile’s URL and scroll the page completely:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

# Creating an instance
driver = webdriver.Chrome("Enter-Location-Of-Your-Web-Driver")

# Logging into LinkedIn
driver.get("https://linkedin.com/uas/login")
time.sleep(5)

username = driver.find_element_by_id("username")
username.send_keys("")  # Enter Your Email Address

pword = driver.find_element_by_id("password")
pword.send_keys("")  # Enter Your Password

driver.find_element_by_xpath("//button[@type='submit']").click()

# Opening Kunal's Profile
# paste the URL of Kunal's profile here
profile_url = "https://www.linkedin.com/in/kunalshah1/"

driver.get(profile_url)  # this will open the link

start = time.time()

# will be used in the while loop
initialScroll = 0
finalScroll = 1000

while True:
    driver.execute_script(f"window.scrollTo({initialScroll}, {finalScroll})")
    # this command scrolls the window starting from
    # the pixel value stored in the initialScroll
    # variable to the pixel value stored at the
    # finalScroll variable
    initialScroll = finalScroll
    finalScroll += 1000

    # we will stop the script for 3 seconds so that
    # the data can load
    time.sleep(3)
    # You can change it as per your needs and internet speed

    end = time.time()

    # We will scroll for 20 seconds.
    # You can change it as per your needs and internet speed
    if round(end - start) > 20:
        break

To complete our LinkedIn scraper, we’ll add the BeautifulSoup library to process the page structure. Let’s save the LinkedIn profile page’s source code to a variable and feed it into BeautifulSoup:

src = driver.page_source

# Now using beautiful soup
soup = BeautifulSoup(src, 'lxml')

LinkedIn profile introduction contains elements like first and last name, company name, city, and more – each element has a corresponding HTML tag we’ll need to locate. Let’s use Chrome’s Developer Tools to find them.

We now see that the <div> tag we need is 'class': 'pv-text-details__left-panel. Let’s input this tag into BeautifulSoup:

# Extracting the HTML of the complete introduction box
# that contains the name, company name, and the location
intro = soup.find('div', {'class': 'pv-text-details__left-panel'})

print(intro)

In the HTML output, we’ll notice the HTML tags we were looking for – let’s input these tags in this code snippet to scrape the profile information:

# In case of an error, try changing the tags used here.

name_loc = intro.find("h1")

# Extracting the Name
name = name_loc.get_text().strip()
# strip() is used to remove any extra blank spaces

works_at_loc = intro.find("div", {'class': 'text-body-medium'})

# this gives us the HTML of the tag in which the Company Name is present
# Extracting the Company Name
works_at = works_at_loc.get_text().strip()


location_loc = intro.find_all("span", {'class': 'text-body-small'})

# Ectracting the Location
# The 2nd element in the location_loc variable has the location
location = location_loc[1].get_text().strip()

print("Name -->", name,
	"\nWorks At -->", works_at,
	"\nLocation -->", location)

Scrape LinkedIn profile experience section

The same method applies to the experience section of a LinkedIn profile – let’s open the Developer Tools and see the corresponding <div> tags. Here’s the code snippet to scrape profile experience:

# In case of an error, try changing the tags used here.
  
li_tags = experience.find('div')
a_tags = li_tags.find("a")
job_title = a_tags.find("h3").get_text().strip()
  
print(job_title)
  
company_name = a_tags.find_all("p")[1].get_text().strip()
print(company_name)
  
joining_date = a_tags.find_all("h4")[0].find_all("span")[1].get_text().strip()
  
employment_duration = a_tags.find_all("h4")[1].find_all(
    "span")[1].get_text().strip()
  
print(joining_date + ", " + employment_duration)

Scrape LinkedIn Job Search Data

Selenium allows us to automate opening the Jobs section. The code snippet below initializes Selenium, opens the Jobs page, and tells BeautifulSoup to collect this data. Here are the data types and their corresponding HTML tags that we'll input:

LinkedIn jobs 'class': 'job-card-list__title'
LinkedIn companies 'class': 'job-card-container__company-name'
LinkedIn job locations 'class': 'job-card-container__metadata-wrapper'
jobs = driver.find_element_by_xpath("//a[@data-link-to='jobs']/span")
# In case of an error, try changing the XPath.

jobs.click()

job_src = driver.page_source

soup = BeautifulSoup(job_src, 'lxml')

jobs_html = soup.find_all('a', {'class': 'job-card-list__title'})
# In case of an error, try changing the XPath.

job_titles = []

for title in jobs_html:
    job_titles.append(title.text.strip())

print(job_titles)

company_name_html = soup.find_all(
    'div', {'class': 'job-card-container__company-name'})
company_names = []

for name in company_name_html:
    company_names.append(name.text.strip())

print(company_names)

import re  # for removing the extra blank spaces

location_html = soup.find_all(
    'ul', {'class': 'job-card-container__metadata-wrapper'})

location_list = []

for loc in location_html:
    res = re.sub('\n\n +', ' ', loc.text.strip())

    location_list.append(res)

print(location_list)

jobs_html = soup.find_all('a', {'class': 'job-card-list__title'})
# In case of an error, try changing the XPath.

job_titles = []

for title in jobs_html:
    job_titles.append(title.text.strip())

print(job_titles)

Troubleshooting

Although this code has been tested to work in July 2022, it might break at some point in time: LinkedIn is constantly changing its page structure, in part to prevent LinkedIn data scraping efforts. In this case, you’ll have to manually open the page source and edit this guide’s code, replacing obsolete page elements.

If your internet speed is somewhat slow, connection may fail and LinkedIn may terminate your session. To address this problem, use the time.sleep() function, which will allow the bot more time to connect. For instance, to provide 5 additional seconds, use time.sleep(5).

Is scraping LinkedIn legal?

Scraping bot is confused about various data collection laws

Please note that this section is not legal advice – it’s an overview of latest legal practice related to this topic. We encourage you to consult law professionals to view and review each web scraping project on a case-by-case basis.

🌍 Further reading: Our blog also features a detailed overview of web scraping legality with analysis of latest legal practice related to data collection, including LinkedIn scraping.

LinkedIn is a global platform, attracting users from all over the world – but regulations that oversee LinkedIn data scraping are created on region-, country-, and even state-level. Here’s a list of intellectual property, privacy, and cybersecurity laws that pertain to the legality of LinkedIn web scraping:

  • General Data Protection Regulation (GDPR) and
  • Digital Single Market Directive (DSM) for Europe,
  • California Consumer Privacy Act (CCPA) for California in particular,
  • Computer Fraud and Abuse Act (CFAA) and
  • The fair use doctrine for the US as a whole,
  • And more.

Here’s the good news: Generally, courts have interpreted these regulations to consider web scraping legal. The fair use doctrine in particular lays out a useful guideline: Collecting data is legal, but you need to transform it in a meaningful way to maintain its legality – a good example is using a LinkedIn group scraper to create outreach software. Conversely, simply copying the platform’s data and republishing it is illegal.

HiQ Labs v. LinkedIn – and its consequences

HiQ Labs is a data analytics company that used LinkedIn profile scrapers to analyze employee attrition. In 2017, LinkedIn decided to take it to court, arguing that the scale of HiQ Labs’ scraping operation was so big that it was more akin to hacking.

Both in 2017 and 2022, the court ruled in favor of HiQ Labs, commenting that scraping LinkedIn wasn’t a violation of the CFAA norms: LinkedIn’s data is publicly available and doesn’t require authorization; therefore, accessing it doesn’t constitute hacking.

Most importantly, the 2022 decision was carried out by the Supreme Court, which makes other courts much more likely to be pro-scraping in their future rulings. Still, like any major tech company, LinkedIn goes above and beyond to protect its data from web scrapers – and like any major tech company, LinkedIn uses a set of anti-scraping measures: reCAPTCHA, Cloudflare, etc. This makes residential proxies essential for a successful LinkedIn web scraping pipeline – without proxies, you’re running a much higher risk of getting blocked.

Conclusion: What is the Best Way to Scrape LinkedIn?

With an endless list of LinkedIn web scraping in mind, we made sure to engineer Infatica Scraper API to be the optimal choice: Its reliability and high performance are supplemented by Infatica’s proxy network, which helps you maintain your request success rate high.

Power users with more time on their hands can try building their own LinkedIn group scraper using Python and its libraries. Although this solution provides more flexibility and control over the web scraping process, maintaining it and fixing unexpected errors falls on the developer – and this can eat up a lot of their time.

Another alternative involves software with point-and-click interfaces: for instance, Chrome extensions and standalone apps. They can be a great choice when you’re just starting out with LinkedIn scraping: Their graphical interface allows you to collect data without writing code yourself – but this can also be a disadvantage: Should LinkedIn change its page structure, you’ll have to wait for the scraper’s developer to fix this.

Frequently Asked Questions

LinkedIn does prevent web scraping in its terms of service and doesn’t offer an official API. Additionally, tech companies like LinkedIn use anti-scraping mechanisms like reCAPTCHA and Cloudflare to detect automated access to their services – and banning suspicious IP addresses.

Here are three popular methods:
  1. Use Infatica Scraper API, which does the majority of heavy-lifting during LinkedIn scraping for you (managing proxies, establishing the connection, saving data, etc.)
  2. Build a LinkedIn scraper using Python, Selenium, and BeautifulSoup.
  3. Use a browser extension or an app with a graphical interface.

One way involves your browser’s developer tools, which can show the post’s HTML structure. Copy the post’s HTML tags into the code we outlined in this article and the BeautifulSoup scraper might be able to parse the post data.

As demonstrated in the code above, input the following HTML tags into your scraper:
  • 'class': 'job-card-list__title' for LinkedIn jobs
  • 'class': 'job-card-container__company-name' for LinkedIn companies
  • 'class': 'job-card-container__metadata-wrapper' for LinkedIn job locations

Sharon Bennett

Sharon Bennett

Sharon Bennett is a networking professional who analyzes various measures of online censorship. She lends her expertise to Infatica to explore how proxies can help to address this problem.

Get In Touch

Have a question about Infatica? Get in touch with our experts to learn how we can help.