

Web scraping has evolved far beyond fetching static HTML from server-side rendered pages. Today’s websites are often dynamic, JavaScript-heavy, and built with frameworks like React, Angular, or Vue. That’s where Playwright shines – offering capabilities that make it particularly well-suited for scraping modern web applications. Let’s learn how to use it!
What Is Playwright?
Playwright is a powerful open-source automation library developed by Microsoft that enables developers to programmatically control web browsers. Built on Node.js, Playwright allows scripts to interact with modern web applications in much the same way a real user would – clicking buttons, filling forms, and navigating through pages.
Originally designed for end-to-end testing, Playwright has quickly gained popularity in the web scraping community thanks to its robust browser automation capabilities and built-in support for dynamic content. Unlike traditional scraping tools that work best with static HTML, Playwright excels at handling JavaScript-heavy websites where content loads dynamically or requires user interaction to appear.
Why Use Playwright for Web Scraping?
JavaScript rendering: Playwright launches real browser instances, meaning it can fully render pages, execute JavaScript, and access content that only appears after client-side processing. This gives it a major edge over traditional HTTP-based scrapers that can’t see dynamic content.
Automation-Focused API: With its intuitive and powerful API, Playwright allows you to:
- Wait for elements to appear using built-in waiting mechanisms
- Interact with page elements (clicks, inputs, scrolls)
- Capture screenshots, PDFs, and other media
- Extract structured data from rendered pages using selectors
This programmatic control mimics user behavior, helping you bypass common client-side obstacles to scraping.
Multi-browser and headless support: Playwright supports three major browser engines – Chromium, Firefox, and WebKit – allowing you to target a wide range of environments. You can run these in headless mode for speed or headed mode when debugging.
Robust anti-bot evasion: Because it controls real browsers, Playwright can better mimic human interaction than pure HTTP scraping tools. When paired with techniques like user-agent rotation and stealth plugins, it becomes much harder for websites to detect and block scraping activity.
Cross-platform and scalable: Whether you're running scripts locally, deploying them in a CI/CD pipeline, or scaling scraping jobs across servers, Playwright works consistently across operating systems. It's also compatible with Docker, making it easier to build and scale containerized scraping workflows.
Setting Up Playwright
Before you can start scraping with Playwright, you'll need to get the library installed and set up in your environment. The process is straightforward and works across Windows, macOS, and Linux systems.
1. Installation
Playwright is available via npm and supports all major browsers. To get started, open your terminal and run:
npm install playwright
This will install Playwright and download the necessary browser binaries (Chromium, Firefox, and WebKit) the first time you run it. If you want to install only a specific browser, you can use:
npm install playwright-chromium
2. Basic Usage Example
Here's a minimal example that launches a Chromium browser, navigates to a website, and logs the page title. Let’s see how easy it is to automate browsing tasks using Playwright:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch(); // Use { headless: false } to see the browser
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.title();
console.log(`Page title: ${title}`);
await browser.close();
})();
3. Running in Headless or Headed Mode
By default, Playwright runs in headless mode (no UI). If you want to observe browser behavior during development, set:
const browser = await chromium.launch({ headless: false });
This opens a visible browser window, making it easier to debug selectors and scraping logic.
4. Using Playwright with Docker
For scalable, consistent deployments – especially in production – you can run Playwright in a Docker container. The Playwright team offers an official Docker image that includes all dependencies and browsers pre-installed:
docker pull mcr.microsoft.com/playwright
You can then run your scraping script in a containerized environment, simplifying infrastructure setup and improving reliability across deployments.
Integrating Proxies in Playwright
When scraping at scale or targeting content that varies by location, proxies are essential. They allow you to route requests through different IP addresses and regions, avoid rate limits, and reduce the risk of being blocked. Playwright supports proxy configuration natively, making it easy to route browser traffic through your proxy provider of choice.
How to Configure a Proxy in Playwright
When launching a browser instance, Playwright allows you to pass a proxy
option in the launch configuration. Here's a basic example using a proxy server:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
proxy: {
server: 'http://your.proxy.server:port',
username: 'yourUsername', // optional
password: 'yourPassword' // optional
}
});
const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());
await browser.close();
})();
Choosing the Right Proxy Provider
A reliable proxy network is critical for sustained scraping. Look for a large and diverse IP pool, multiple geolocation options, high uptime and speed, and rotation support and usage transparency. Infatica’s proxy network includes both residential and datacenter proxies – making it easy to scale and geotarget your scraping operations while avoiding common blocks.
Handling Anti-Scraping Measures
Modern websites employ a wide range of defenses to detect and block automated scraping. While Playwright gives you the power to mimic real user behavior, combining it with the right anti-evasion strategies is key to building a resilient scraper.
Headless Detection and Mitigation
Many sites try to detect whether a browser is running in headless mode. To avoid this, launch Playwright in non-headless mode during critical interactions:
const browser = await chromium.launch({ headless: false });
Additionally, randomize browser viewport sizes and user agents and use tools like playwright-extra
with stealth plugins to minimize fingerprinting.
User-Agent and Header Spoofing
Default headers often signal automation. Customize them to resemble real browser traffic:
await page.setExtraHTTPHeaders({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...',
'Accept-Language': 'en-US,en;q=0.9'
});
Rotate these headers periodically to simulate diverse user traffic.
CAPTCHAs and Bot Challenges
CAPTCHAs can halt scraping if not addressed. Your options include:
- Use services like 2Captcha or Anti-Captcha to solve challenges.
- Detect CAPTCHA elements and implement fallback logic (skip, retry, log).
- Reduce CAPTCHA triggers with good proxy hygiene and behavior simulation.
Session & Cookie Management
Some sites track sessions and cookies to identify bots. Playwright lets you manage sessions easily:
const context = await browser.newContext();
await context.addCookies([{ name: 'session_id', value: 'abc', domain: '.example.com' }]);
Preserve cookies across sessions to appear more consistent and human-like.
Proxy Rotation
One of the most effective evasion tactics is rotating IPs using a proxy pool. This:
- Prevents bans tied to specific IPs
- Distributes scraping load
- Makes traffic appear organic
Infatica’s rotating proxy network integrates seamlessly with Playwright and offers location-specific IPs – helping you maintain access and stay under detection thresholds.
Sample Use Case: Scraping a Dynamic Website
To see Playwright in action, let’s walk through a real-world example: scraping product data from a JavaScript-heavy e-commerce website. Many modern sites load product listings dynamically using client-side rendering, making tools like Playwright ideal for the job.
1. Target: A Dynamic Product Page
Let’s say you want to scrape a list of product names and prices from a search results page that loads data via JavaScript (e.g., electronics or clothing items). Here’s how you might do that with Playwright:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://example-ecommerce.com/search?q=smartphone');
// Wait for products to load
await page.waitForSelector('.product-item');
// Extract data
const products = await page.$$eval('.product-item', items => {
return items.map(item => ({
name: item.querySelector('.product-title')?.innerText.trim(),
price: item.querySelector('.product-price')?.innerText.trim()
}));
});
console.log(products);
await browser.close();
})();
2. Handling Pagination
Many product listings span multiple pages. Playwright allows you to click through pages or dynamically generate URLs:
await page.click('button.next-page'); // Or:
await page.goto(`https://example.com/page=${pageNumber}`);
Looping through pages and aggregating data can scale your results effectively.
3. Error Handling and Retries
Always wrap your scraping logic in try/catch
blocks to handle timeouts or missing elements gracefully:
try {
await page.waitForSelector('.product-item', { timeout: 5000 });
} catch (err) {
console.warn('Product list did not load in time.');
}
Scaling and Best Practices
Getting a Playwright scraper working is just the beginning. The real challenge lies in scaling it – making sure it runs efficiently, handles thousands of requests, recovers from errors, and doesn’t get blocked. In this chapter, we’ll walk through some of the key lessons and strategies that help turn a working scraper into a production-grade data pipeline.
Parallelizing Your Scraping
One of Playwright’s great strengths is its ability to run multiple browser contexts or pages at once. Rather than waiting for one task to finish before starting the next, you can launch several sessions in parallel. This significantly speeds up data collection, especially when dealing with large datasets or multiple URLs.
To manage this, you can spin up multiple browser contexts or even run multiple worker processes. Promise-based concurrency, queue systems like Redis with BullMQ, or simple thread pools can all help distribute the load effectively:
const context = await browser.newContext();
const page1 = await context.newPage();
const page2 = await context.newPage();
Avoiding Detection with Throttling and Random Delays
As your scraping scales up, so does the risk of detection. Sites often flag bots that hit them too fast or too frequently. That’s why it’s important to slow things down just enough – introducing random delays between requests, rotating user agents, and mimicking real browsing behavior. These subtle changes make your traffic blend in and reduce the chances of being blocked.
Even better, route your requests through proxies to distribute the traffic. A rotating pool of residential or datacenter IPs – such as those offered by Infatica – helps keep your scrapers flying under the radar while expanding your reach across different regions.
Planning for Errors and Outages
No matter how carefully you build your scraper, something will eventually go wrong. A selector might change. A page might time out. A proxy might fail. Instead of trying to avoid all possible issues upfront, build for resiliency.
Catch errors when they occur, retry failed requests, and log everything – successes, failures, and unexpected behavior. This logging is what makes debugging and improving your scraper possible over time.
Keeping an Eye on Performance
As scraping becomes more integral to your operations, monitoring becomes essential. You’ll want to track success rates, identify spikes in failures, and be alerted to bans or major changes in site structure. Whether you build simple dashboards or integrate with tools like Prometheus and Grafana, visibility into how your scrapers are performing helps you stay ahead of problems.
Playwright vs. Other Web Scraping Tools
With so many web scraping tools available today, choosing the right one depends on your use case, technical preferences, and the complexity of the websites you're targeting. Playwright stands out for its modern design, automation power, and robustness against anti-bot systems – but how does it compare to other popular options?
Feature | Playwright | Puppeteer | Selenium | Scrapy |
---|---|---|---|---|
Headless Browser Support | ✓ Chromium, Firefox, WebKit | ✓ Chromium only (official) | ✓ Chrome, Firefox, others | ✗ (requires integration) |
Cross-Browser Testing | ✓ Native | ✗ | ✓ via WebDriver | ✗ |
JavaScript Rendering | ✓ Excellent | ✓ Good | ✓ Moderate | ✗ |
Stealth & Anti-Bot Evasion | ✓ Strong (with plugins) | ✓ Moderate | ✗ Limited | ✗ Requires middleware |
Performance at Scale | ✓ Efficient with parallelism | ✓ Lightweight | ✗ Slower, heavy processes | ✓ Optimized for speed |
Language Support | JavaScript, Python, .NET, Java | JavaScript only | Multiple (Python, Java, etc.) | Python only |
Ease of Use | ✓ Modern, consistent API | ✓ Familiar syntax | ✗ Verbose setup | ✓ Pythonic, declarative |



Comparison summary
If your target sites are JavaScript-heavy, Playwright offers arguably the most powerful and modern solution, with broad browser support and anti-bot resistance. Puppeteer is simpler and lighter for Chromium-only tasks. Selenium remains useful for traditional testing and broader language support, but it lags in scraping efficiency. Meanwhile, Scrapy is a great option when speed and structured crawling are the priority – and when JavaScript rendering isn't required.