Web Scraping Pagination: Best Practices with Proxies & APIs

Pagination in Web Scraping: From Page Numbers to Infinite Scroll

Pavlo Zinkovski 26 Aug 2025 8 min read

Article content

Common Types of Pagination
Challenges in Scraping Paginated Content
Techniques for Scraping Paginated Data
Best Practices for Scraping Paginated Data
When to Use a Scraping API vs. DIY Scraping
Frequently Asked Questions

Pagination introduces an extra layer of complexity for web scrapers. Instead of extracting all the data from a single page, scrapers must be designed to navigate through multiple pages to capture complete datasets. Missing just one step in the pagination process can result in incomplete or inaccurate results – a major issue for projects that rely on comprehensive data. Let's explore the most common types of pagination, the obstacles they present, and the best practices for scraping paginated data efficiently and ethically.

Common Types of Pagination

Websites implement pagination in different ways depending on their design and performance needs. For web scrapers, recognizing the underlying pagination method is the first step toward building an effective extraction strategy.

Page-based pagination

https://example.com/products?page=3

The simplest form, where each page is identified by a page number in the URL. Easy to scrape since the structure is predictable, but you have to ensure that you don’t skip or duplicate pages when page counts change.

Offset-based pagination

https://example.com/products?offset=50&limit=25

Content is divided based on a starting point (offset) and number of results (limit). Useful for databases and APIs. However, large offsets can slow queries or trigger anti-bot measures.

Cursor-based (or token-based) pagination

https://api.example.com/products?cursor=eyJpZCI6IjEyMyJ9

Instead of page numbers or offsets, the site provides a cursor or token that points to the next set of results. Common in modern APIs and social media feeds. The challenging part is that the tokens often expire quickly and require careful handling.

Infinite scrolling / “Load More” buttons

Results load dynamically as the user scrolls down or clicks a “Load More” button. Powered by JavaScript and AJAX requests. Requires headless browsers or API call inspection, since content isn’t always visible in the initial HTML.

Challenges in Scraping Paginated Content

While pagination helps websites stay organized and efficient, it also adds extra hurdles for web scraping projects. Each pagination type introduces its own complications, and failing to account for them can leave your dataset incomplete or inconsistent. Below are the most common challenges:

Large data volumes

Paginated sites often contain thousands of entries spread across dozens or even hundreds of pages. Scrapers must be designed to handle this volume without crashing, stalling, or missing data.

JavaScript-rendered pagination

Infinite scroll and “Load More” features rely on JavaScript and background AJAX requests. Since the data isn’t present in the initial HTML, scrapers need more advanced techniques such as headless browsers or network request inspection.

Rate limiting and blocking

Requesting multiple pages in quick succession can trigger anti-bot defenses. Websites may respond with CAPTCHAs, throttled connections, or outright IP bans.

💡 This is where proxies become essential: rotating IP addresses across a proxy network helps distribute requests, making your scraper appear more like normal user traffic and reducing the risk of blocks.

Duplicate or missing data

Dynamic pagination structures can change over time, leading to skipped or duplicated results. Scrapers need logic for deduplication and error handling to ensure accuracy.

Changing pagination patterns

Sites frequently redesign their pagination systems, especially e-commerce and travel platforms. A scraper that worked last week might suddenly break if URLs, tokens, or scrolling mechanics change.

Techniques for Scraping Paginated Data

There’s no one-size-fits-all approach to scraping paginated websites. The right method depends on how the site structures its pages and loads its data. Let’s explore some techniques with code examples in Python to illustrate how they work.

Static HTML scraping (page-based pagination)

Many websites use predictable URLs like ?page=1, ?page=2, etc. A scraper can simply loop through these pages.

import requests
from bs4 import BeautifulSoup

base_url = "https://example.com/products?page={}"

for page in range(1, 6):  # scrape first 5 pages
    url = base_url.format(page)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    # extract product titles
    for item in soup.select(".product-title"):
        print(item.get_text())

The script builds each paginated URL, fetches the HTML, and parses product titles. This is the simplest method but only works when the pagination pattern is numeric and consistent.

Offset-based pagination

Some websites use offsets and limits instead of page numbers, e.g., ?offset=50&limit=25.

import requests

base_url = "https://example.com/products?offset={}&limit=25"

for offset in range(0, 100, 25):  # scrape 4 pages of 25 items each
    url = base_url.format(offset)
    response = requests.get(url)
    data = response.json()  # often these endpoints return JSON
    for product in data["products"]:
        print(product["name"])

The scraper increments the offset, requests each “slice” of data, and extracts results. This approach is common in APIs and database-driven sites.

Cursor-based (token-based) pagination

Modern APIs often return a token (cursor) pointing to the next page of results.

import requests

url = "https://api.example.com/products"
params = {"limit": 25}
has_more = True

while has_more:
    response = requests.get(url, params=params).json()
    for product in response["data"]:
        print(product["name"])
    
    # update cursor for next request
    if "next_cursor" in response:
        params["cursor"] = response["next_cursor"]
    else:
        has_more = False

Instead of page numbers, the API provides a cursor (like a unique ID). The scraper updates its request with the new cursor until there are no more results.

Infinite scroll / “Load More” buttons

Sites with infinite scrolling load data dynamically via AJAX requests. To scrape them, you can either inspect the underlying network requests or use a headless browser.

import requests

url = "https://api.infatica.io/v1/scraper"
params = {
    "url": "https://example.com/products?page=1",
    "render_js": True,  # handles infinite scroll
    "pagination": "auto" # optional parameter for automated pagination
}

response = requests.get(url, params=params, auth=("API_KEY", ""))
print(response.json())

The browser scrolls down, triggering AJAX calls that load more content. After enough scrolling, the scraper extracts the dynamically rendered elements.

Best Practices for Scraping Paginated Data

Pagination is a powerful way to structure large datasets, but scraping it efficiently requires more than just looping through URLs. To avoid incomplete results, wasted resources, or getting blocked, it’s important to follow a set of best practices.

Respect website limits

Scraping too aggressively can overload servers and trigger defenses. Always add short delays between requests or implement rate limiting in your code.

Rotate IPs and user agents

Websites monitor traffic patterns to detect bots. Sending hundreds of requests from a single IP or with the same browser fingerprint is a red flag.

💡 Using a reliable proxy network solves this by rotating IPs across different regions, distributing requests, and reducing the chance of detection.

Deduplicate results

Paginated structures can change or overlap, especially with infinite scroll or token-based APIs. Implement checks to remove duplicates to ensure clean, accurate datasets.

Monitor for structural changes

Websites frequently update their design and pagination systems. Automating health checks for your scraper helps you detect issues early and avoid data gaps.

Cache and reuse data when possible

Instead of scraping the same page repeatedly, store results locally and refresh only when necessary. This saves bandwidth, reduces costs, and lessens the load on target sites.

Prioritize ethics and compliance

Always review a site’s robots.txt file and terms of service. Ethical scraping not only protects you legally but also builds long-term sustainability for your data pipelines.

When to Use a Scraping API vs. DIY Scraping

Both DIY scrapers and managed scraping APIs can handle pagination, but the right choice depends on your project’s scale, complexity, and resources. Let's compare the two approaches:

Aspect	DIY Scraping	Web Scraping API
Setup & Maintenance	Requires building and updating code for pagination, proxies, and anti-bot measures.	No setup needed — API handles pagination, proxies, and rendering automatically.
Scalability	Limited by your infrastructure and IP pool; scaling requires more resources.	Scales easily to millions of requests with built-in proxy rotation and load balancing.
Handling JavaScript / Infinite Scroll	Needs headless browsers (e.g., Selenium, Playwright), which are slow and resource-intensive.	JavaScript rendering supported out of the box, no extra setup required.
Error Handling	Must implement retries, deduplication, and monitoring manually.	Automatic retries, deduplication, and error handling included.
Costs	Lower initial cost, but hidden expenses for proxies, servers, and maintenance.	Transparent pricing — one service covers proxies, scraping logic, and infrastructure.
Best For	Small projects, experiments, or scraping static HTML sites.	Large-scale, dynamic, or time-sensitive scraping projects.

Frequently Asked Questions

Pagination refers to splitting content across multiple pages, like product listings or search results. In web scraping, handling pagination correctly ensures you capture the complete dataset instead of just the first page, avoiding incomplete or duplicated data.

Websites use several pagination methods: page-based (numeric URLs), offset-based, cursor- or token-based APIs, and infinite scroll/load-more buttons. Each type requires different scraping strategies to navigate pages and extract data reliably.

High-frequency requests, repetitive IP addresses, and predictable patterns can trigger anti-bot measures like CAPTCHAs or IP bans. Using proxies and rotating user agents helps distribute requests, reducing the risk of being blocked.

Infinite scroll or dynamic pagination often relies on JavaScript. Scrapers can use headless browsers like Selenium or Puppeteer, or a Web Scraping API that automatically renders JS content and handles pagination without extra setup.

DIY scraping works for small, static sites. For large-scale, dynamic, or JS-heavy sites, a scraping API simplifies pagination, manages proxies, and handles errors automatically, saving time and improving reliability for enterprise-level projects.

Contact Sales

Web scraping

Pavlo Zinkovski

As Infatica's CTO & CEO, Pavlo shares the knowledge on the technical fundamentals of proxies.

Pagination in Web Scraping: From Page Numbers to Infinite Scroll

Common Types of Pagination

Page-based pagination

Offset-based pagination

Cursor-based (or token-based) pagination

Infinite scrolling / “Load More” buttons

Challenges in Scraping Paginated Content

Large data volumes

JavaScript-rendered pagination

Rate limiting and blocking

Duplicate or missing data

Changing pagination patterns

Techniques for Scraping Paginated Data

Static HTML scraping (page-based pagination)

Offset-based pagination

Cursor-based (token-based) pagination

Infinite scroll / “Load More” buttons

Best Practices for Scraping Paginated Data

Respect website limits

Rotate IPs and user agents

Deduplicate results

Monitor for structural changes

Cache and reuse data when possible

Prioritize ethics and compliance

When to Use a Scraping API vs. DIY Scraping

Frequently Asked Questions

What is pagination in web scraping?

What are the common types of pagination?

Why do scrapers get blocked when handling pagination?

How can I scrape JavaScript-heavy paginated sites?

When should I use a scraping API instead of a DIY scraper?

You can also learn more about: