Pagination in Web Scraping: From Page Numbers to Infinite Scroll

Struggling with paginated websites? Explore proven scraping techniques, code snippets, and how proxies + APIs help overcome blocks and scalability issues.

Pagination in Web Scraping: From Page Numbers to Infinite Scroll
Pavlo Zinkovski
Pavlo Zinkovski 8 min read
Article content
  1. Common Types of Pagination
  2. Challenges in Scraping Paginated Content
  3. Techniques for Scraping Paginated Data
  4. Best Practices for Scraping Paginated Data
  5. When to Use a Scraping API vs. DIY Scraping
  6. Frequently Asked Questions

Pagination introduces an extra layer of complexity for web scrapers. Instead of extracting all the data from a single page, scrapers must be designed to navigate through multiple pages to capture complete datasets. Missing just one step in the pagination process can result in incomplete or inaccurate results – a major issue for projects that rely on comprehensive data. Let's explore the most common types of pagination, the obstacles they present, and the best practices for scraping paginated data efficiently and ethically.

Common Types of Pagination

Websites implement pagination in different ways depending on their design and performance needs. For web scrapers, recognizing the underlying pagination method is the first step toward building an effective extraction strategy.

Page-based pagination

https://example.com/products?page=3

The simplest form, where each page is identified by a page number in the URL. Easy to scrape since the structure is predictable, but you have to ensure that you don’t skip or duplicate pages when page counts change.

Offset-based pagination

https://example.com/products?offset=50&limit=25

Content is divided based on a starting point (offset) and number of results (limit). Useful for databases and APIs. However, large offsets can slow queries or trigger anti-bot measures.

Cursor-based (or token-based) pagination

https://api.example.com/products?cursor=eyJpZCI6IjEyMyJ9

Instead of page numbers or offsets, the site provides a cursor or token that points to the next set of results. Common in modern APIs and social media feeds. The challenging part is that the tokens often expire quickly and require careful handling.

Infinite scrolling / “Load More” buttons

Results load dynamically as the user scrolls down or clicks a “Load More” button. Powered by JavaScript and AJAX requests. Requires headless browsers or API call inspection, since content isn’t always visible in the initial HTML.

Challenges in Scraping Paginated Content

While pagination helps websites stay organized and efficient, it also adds extra hurdles for web scraping projects. Each pagination type introduces its own complications, and failing to account for them can leave your dataset incomplete or inconsistent. Below are the most common challenges:

Large data volumes

Multiple pages during pagination

Paginated sites often contain thousands of entries spread across dozens or even hundreds of pages. Scrapers must be designed to handle this volume without crashing, stalling, or missing data.

JavaScript-rendered pagination

Infinite scroll and “Load More” features rely on JavaScript and background AJAX requests. Since the data isn’t present in the initial HTML, scrapers need more advanced techniques such as headless browsers or network request inspection.

Breaking Through Google’s 2025 JavaScript Barrier: Infatica’s Approach to Scraping Search Data | Infatica
Discover how Infatica keeps Google scrapers operational in light of Google’s 2025 JavaScript update – by rendering JavaScript and using advanced anti-blocking strategies.

Rate limiting and blocking

Requesting multiple pages in quick succession can trigger anti-bot defenses. Websites may respond with CAPTCHAs, throttled connections, or outright IP bans.

How to Avoid Rate Limiting When Using APIs and Web Scraping
Learn what rate limiting is, why websites and APIs use it, and the best strategies to avoid disruptions while keeping your workflows smooth and reliable.

💡 This is where proxies become essential: rotating IP addresses across a proxy network helps distribute requests, making your scraper appear more like normal user traffic and reducing the risk of blocks.

Duplicate or missing data

Damaged and missing data points

Dynamic pagination structures can change over time, leading to skipped or duplicated results. Scrapers need logic for deduplication and error handling to ensure accuracy.

Changing pagination patterns

Sites frequently redesign their pagination systems, especially e-commerce and travel platforms. A scraper that worked last week might suddenly break if URLs, tokens, or scrolling mechanics change.

Techniques for Scraping Paginated Data

There’s no one-size-fits-all approach to scraping paginated websites. The right method depends on how the site structures its pages and loads its data. Let’s explore some techniques with code examples in Python to illustrate how they work.

Static HTML scraping (page-based pagination)

Many websites use predictable URLs like ?page=1, ?page=2, etc. A scraper can simply loop through these pages.

import requests
from bs4 import BeautifulSoup

base_url = "https://example.com/products?page={}"

for page in range(1, 6):  # scrape first 5 pages
    url = base_url.format(page)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    # extract product titles
    for item in soup.select(".product-title"):
        print(item.get_text())

The script builds each paginated URL, fetches the HTML, and parses product titles. This is the simplest method but only works when the pagination pattern is numeric and consistent.

Offset-based pagination

Some websites use offsets and limits instead of page numbers, e.g., ?offset=50&limit=25.

import requests

base_url = "https://example.com/products?offset={}&limit=25"

for offset in range(0, 100, 25):  # scrape 4 pages of 25 items each
    url = base_url.format(offset)
    response = requests.get(url)
    data = response.json()  # often these endpoints return JSON
    for product in data["products"]:
        print(product["name"])

The scraper increments the offset, requests each “slice” of data, and extracts results. This approach is common in APIs and database-driven sites.

Cursor-based (token-based) pagination

Modern APIs often return a token (cursor) pointing to the next page of results.

import requests

url = "https://api.example.com/products"
params = {"limit": 25}
has_more = True

while has_more:
    response = requests.get(url, params=params).json()
    for product in response["data"]:
        print(product["name"])
    
    # update cursor for next request
    if "next_cursor" in response:
        params["cursor"] = response["next_cursor"]
    else:
        has_more = False

Instead of page numbers, the API provides a cursor (like a unique ID). The scraper updates its request with the new cursor until there are no more results.

Infinite scroll / “Load More” buttons

Sites with infinite scrolling load data dynamically via AJAX requests. To scrape them, you can either inspect the underlying network requests or use a headless browser.

import requests

url = "https://api.infatica.io/v1/scraper"
params = {
    "url": "https://example.com/products?page=1",
    "render_js": True,  # handles infinite scroll
    "pagination": "auto" # optional parameter for automated pagination
}

response = requests.get(url, params=params, auth=("API_KEY", ""))
print(response.json())

The browser scrolls down, triggering AJAX calls that load more content. After enough scrolling, the scraper extracts the dynamically rendered elements.

Best Practices for Scraping Paginated Data

Pagination is a powerful way to structure large datasets, but scraping it efficiently requires more than just looping through URLs. To avoid incomplete results, wasted resources, or getting blocked, it’s important to follow a set of best practices.

Respect website limits

Scraping too aggressively can overload servers and trigger defenses. Always add short delays between requests or implement rate limiting in your code.

Rotate IPs and user agents

Websites monitor traffic patterns to detect bots. Sending hundreds of requests from a single IP or with the same browser fingerprint is a red flag.

What is IP Rotation? Benefits, Use Cases & More | Infatica
What’s IP Rotation? A simple guide to dynamic addresses for better web scraping and online privacy.

💡 Using a reliable proxy network solves this by rotating IPs across different regions, distributing requests, and reducing the chance of detection.

Deduplicate results

Paginated structures can change or overlap, especially with infinite scroll or token-based APIs. Implement checks to remove duplicates to ensure clean, accurate datasets.

Monitor for structural changes

Websites frequently update their design and pagination systems. Automating health checks for your scraper helps you detect issues early and avoid data gaps.

Cache and reuse data when possible

Instead of scraping the same page repeatedly, store results locally and refresh only when necessary. This saves bandwidth, reduces costs, and lessens the load on target sites.

Prioritize ethics and compliance

Always review a site’s robots.txt file and terms of service. Ethical scraping not only protects you legally but also builds long-term sustainability for your data pipelines.

When to Use a Scraping API vs. DIY Scraping

Both DIY scrapers and managed scraping APIs can handle pagination, but the right choice depends on your project’s scale, complexity, and resources. Let's compare the two approaches:

Aspect DIY Scraping Web Scraping API
Setup & Maintenance Requires building and updating code for pagination, proxies, and anti-bot measures. No setup needed — API handles pagination, proxies, and rendering automatically.
Scalability Limited by your infrastructure and IP pool; scaling requires more resources. Scales easily to millions of requests with built-in proxy rotation and load balancing.
Handling JavaScript / Infinite Scroll Needs headless browsers (e.g., Selenium, Playwright), which are slow and resource-intensive. JavaScript rendering supported out of the box, no extra setup required.
Error Handling Must implement retries, deduplication, and monitoring manually. Automatic retries, deduplication, and error handling included.
Costs Lower initial cost, but hidden expenses for proxies, servers, and maintenance. Transparent pricing — one service covers proxies, scraping logic, and infrastructure.
Best For Small projects, experiments, or scraping static HTML sites. Large-scale, dynamic, or time-sensitive scraping projects.

Frequently Asked Questions

Pagination refers to splitting content across multiple pages, like product listings or search results. In web scraping, handling pagination correctly ensures you capture the complete dataset instead of just the first page, avoiding incomplete or duplicated data.

Websites use several pagination methods: page-based (numeric URLs), offset-based, cursor- or token-based APIs, and infinite scroll/load-more buttons. Each type requires different scraping strategies to navigate pages and extract data reliably.

High-frequency requests, repetitive IP addresses, and predictable patterns can trigger anti-bot measures like CAPTCHAs or IP bans. Using proxies and rotating user agents helps distribute requests, reducing the risk of being blocked.

Infinite scroll or dynamic pagination often relies on JavaScript. Scrapers can use headless browsers like Selenium or Puppeteer, or a Web Scraping API that automatically renders JS content and handles pagination without extra setup.

DIY scraping works for small, static sites. For large-scale, dynamic, or JS-heavy sites, a scraping API simplifies pagination, manages proxies, and handles errors automatically, saving time and improving reliability for enterprise-level projects.


Pavlo Zinkovski

As Infatica's CTO & CEO, Pavlo shares the knowledge on the technical fundamentals of proxies.

You can also learn more about:

Pagination in Web Scraping: From Page Numbers to Infinite Scroll
Web scraping
Pagination in Web Scraping: From Page Numbers to Infinite Scroll

Struggling with paginated websites? Explore proven scraping techniques, code snippets, and how proxies + APIs help overcome blocks and scalability issues.

What Is Rate Limiting – And How to Avoid It
Proxies and business
What Is Rate Limiting – And How to Avoid It

Learn what rate limiting is, why websites and APIs use it, and the best strategies to avoid disruptions while keeping your workflows smooth and reliable.

Fix pip Connection Issues with Proxies
How to
Fix pip Connection Issues with Proxies

Struggling with pip network errors or blocked access to PyPI? Learn how to set up proxies in pip, secure your connections, and keep installations running smoothly.

Get In Touch
Have a question about Infatica? Get in touch with our experts to learn how we can help.