CSS Selectors Cheat Sheet for Web Scraping

Pavlo Zinkovski 21 Jun 2023 10 min read

Article content

What’s the use of CSS selectors in web scraping?
How to handle dynamic and interactive web pages with CSS selectors
How to deal with common challenges and errors when using CSS selectors
How to use advanced CSS selectors features such as pseudo-classes, pseudo-elements, and functions
Basic selectors cheat sheet
Attribute selectors cheat sheet
Combinators cheat sheet
Pseudo-classes cheat sheet
Pseudo-elements cheat sheet
Functions cheat sheet
Frequently Asked Questions

CSS selectors are patterns that match one or more elements in an HTML document. They are essential for web scraping, as they allow you to target the data you want to extract from web pages. In this cheat sheet, you will learn how to use different types of CSS selectors, such as basic selectors, attribute selectors, combinators, pseudo-classes, pseudo-elements, and functions.

You will also learn how to handle dynamic and interactive web pages with CSS selectors, how to deal with common challenges and errors when using CSS selectors, and how to use advanced CSS selectors features. By the end of this cheat sheet, you will be able to use CSS selectors confidently and effectively for web scraping.

What’s the use of CSS selectors in web scraping?

Inspecting a web page with developer tools

CSS selectors are patterns that match one or more elements in an HTML document. They are commonly used to apply styles to those elements using CSS (Cascading Style Sheets), but they can also be used for web scraping. By using CSS selectors, you can tell your scraper which elements contain the data you want to collect.

For example, suppose you want to scrape the titles of all the articles from a blog. You can inspect the HTML structure of the page using your browser's developer tools and find out that the titles are inside <h4> tags with a class of card-title. You can then use the following CSS selector to target those elements:

h4.card-title

This selector will match all <h4> elements that have a card-title class attribute. You can then extract the text content of those elements and save it to a file or database.

There are many types of CSS selectors, such as class selectors, ID selectors, attribute selectors, pseudo-class selectors, pseudo-element selectors, and combinators. Each type has a different syntax and functionality. For example, a[href^="https"] is an attribute selector that matches all <a> elements whose href attribute value begins with "https".

You can also combine multiple selectors using combinators, such as descendant combinator ( ), child combinator (>), adjacent sibling combinator (+), general sibling combinator (~), or list combinator (,). For example, div p is a descendant combinator that matches all <p> elements inside <div> elements.

CSS selectors are very powerful and flexible for web scraping, as they allow you to select elements based on their properties, relationships, and states. You can use various online tools to test your CSS selectors and see how they work.

How to handle dynamic and interactive web pages with CSS selectors

Some web pages are not static, but dynamic and interactive: This means that they may change their content or layout based on user actions, such as clicking, scrolling, hovering, or resizing. They may also use JavaScript to modify the HTML structure of the page. This can pose a challenge for web scraping using CSS selectors, as the selectors may not work consistently or reliably.

To handle dynamic and interactive web pages with CSS selectors, you need to inspect the page carefully and find out how it changes and what triggers the changes. You also need to use the right tools and techniques to capture the data you want.

Some tips for handling dynamic and interactive web pages with CSS selectors are:

Use your browser's developer tools to inspect the HTML structure of the page and see how it changes when you interact with it. You can also use the console to run JavaScript commands or CSS queries on the page.
Use selectors that are specific enough to target the elements you want, but not too specific that they break when the page changes. For example, avoid using selectors that rely on fixed attributes or positions, such as iframe[id="ext-gen262"] or div:nth-child(3). Instead, use selectors that rely on meaningful attributes or relationships, such as iframe[src*="example.com"] or div.card-title.
Use pseudo-classes and pseudo-elements to target elements based on their states or parts. For example, use a:hover to target links on mouse over, or p::first-line to target the first line of every paragraph.
Use combinators to combine multiple selectors and target elements based on their relationships. For example, use div p to target all paragraphs inside divs, or li + a to target the first link after each list item.
Use functions to target elements based on complex criteria. For example, use :not() to exclude elements that match a selector, or :nth-child() to target elements based on their position among siblings.

How to deal with common challenges and errors when using CSS selectors

When using CSS selectors for web scraping, you may encounter some common challenges and errors that can affect your results. Some of these are:

The page is not fully loaded when you run your scraper. This can cause some elements to be missing or incomplete in the HTML source code. To avoid this, you need to wait for the page to load completely before running your scraper. You can use tools such as Selenium or Puppeteer to automate this process and control the browser behavior.

The page uses JavaScript to modify its content dynamically. This can cause some elements to change or disappear in the HTML source code after you run your scraper. To avoid this, you need to capture the data before it changes or disappears. You can use tools such as Selenium or Puppeteer to execute JavaScript commands on the page and access the data directly from the DOM.

The page has anti-scraping measures that detect and block your scraper. This can cause your scraper to fail or return inaccurate data. To avoid this, you need to disguise your scraper as a normal browser and avoid sending too many requests in a short time. You can use tools such as Infatica proxies to bypass anti-scraping measures and access the data you want.

How to use advanced CSS selectors features such as pseudo-classes, pseudo-elements, and functions

CSS selectors have some advanced features that can help you target elements more precisely and flexibly. Some of these features are:

Pseudo-classes: These are keywords that start with a colon (:) and specify a special state of an element. For example, a:visited matches links that have already been clicked, or input:checked matches checked input elements.

Pseudo-elements: These are keywords that start with a double colon (::) and specify a part of an element. For example, p::first-letter matches the first letter of every paragraph, or div::after inserts content after every div element.

Functions: These are keywords that start with a colon (:) and take arguments in parentheses (()). For example, :not(.special) matches elements that do not have a special class attribute, or :nth-child(odd) matches elements that are odd among their siblings.

You can use these features to target elements based on their properties, relationships, and states. You can also combine them with other selectors and combinators to create complex selectors. For example, you can use p:not(:first-child)::first-line to match the first line of every paragraph except the first one, or div > p:nth-of-type(2n+1) to match every odd paragraph that is a direct child of a div element.

CSS selectors cheat sheet

There are multiple categories of CSS selectors, each featuring dozens of unique selectors. For each major category, we’ve selected five popular selectors:

Basic selectors cheat sheet

p: Selects all <p> elements and makes their text blue.

p {
  color: blue;
}

.special: Selects all elements with class="special" and makes their background yellow.

.special {
  background: yellow;
}

#intro: Selects the element with id="intro" and makes its font size larger.

#intro {
  font-size: 2em;
}

*: Selects all elements and adds a border around them.

* {
  border: 1px solid black;
}

element.class: Selects all elements with a specific tag name and class name. For example, p.special selects all <p> elements with class="special" and makes their text italic.

p.special {
  font-style: italic;
}

Attribute selectors cheat sheet

[attribute]: Selects all elements with a specific attribute. For example, [href] selects all elements with an href attribute and makes their text green.

[href] {
  color: green;
}

[attribute=value]: Selects all elements with a specific attribute and value. For example, [href="https://example.com"] selects all elements with href="https://example.com" and makes their text red.

[href="https://example.com"] {
  color: red;
}

[attribute^=value]: Selects all elements with an attribute value that begins with a specific value. For example, [href^="https"] selects all elements with an href value that begins with "https" and makes their text bold.

[href^="https"] {
  font-weight: bold;
}

[attribute$=value]: Selects all elements with an attribute value that ends with a specific value. For example, [href$=".pdf"] selects all elements with an href value that ends with ".pdf" and makes their text underline.

[href$=".pdf"] {
  text-decoration: underline;
}

[attribute*=value]: Selects all elements with an attribute value that contains a specific value. For example, [href*="infatica"] selects all elements with an href value that contains "infatica" and makes their text purple.

[href*="infatica"] {
  color: purple;
}

Combinators cheat sheet

element element: Selects all elements that are descendants of a specific element. For example, div p selects all <p> elements that are inside <div> elements and makes their text align center.

 div p {
   text-align: center;
 }

element > element: Selects all elements that are direct children of a specific element. For example, div > p selects all <p> elements that are direct children of <div> elements and makes their text align left.

div > p {
  text-align: left;
}

element + element: Selects all elements that are adjacent siblings of a specific element. For example, div + p selects all <p> elements that are immediately after <div> elements and makes their text align right.

div + p {
  text-align: right;
}

element ~ element: Selects all elements that are general siblings of a specific element. For example, div ~ p selects all <p> elements that are after <div> elements and makes their text align justify.

div ~ p {
  text-align: justify;
}

element, element: Selects all elements that match either of the two selectors. For example, div, p selects all <div> and <p> elements and makes their margin 10 pixels.

div, p {
  margin: 10px;
}

Pseudo-classes cheat sheet

:hover: Selects an element when the mouse pointer is over it. For example, a:hover selects links on mouse over and makes their text orange.

a:hover {
  color: orange;
}

:visited: Selects links that have already been visited. For example, a:visited selects links that have been clicked before and makes their text gray.

a:visited {
  color: gray;
}

:first-child: Selects an element that is the first child of its parent. For example, p:first-child selects every <p> element that is the first child of its parent and makes its text larger.

p:first-child {
  font-size: 1.5em;
}

:nth-child(n): Selects an element that is the nth child of its parent. For example, p:nth-child(2) selects every <p> element that is the second child of its parent and makes its text smaller.

p:nth-child(2) {
  font-size: 0.8em;
}

:not(selector): Selects every element that does not match the selector. For example, p:not(.special) selects every <p> element that does not have a special class attribute and makes its text black.

p:not(.special) {
  color: black;
}

Pseudo-elements cheat sheet

::first-letter: Selects the first letter of an element. For example, p::first-letter selects the first letter of every <p> element and makes it uppercase.

p::first-letter {
  text-transform: uppercase;
}

::first-line: Selects the first line of an element. For example, p::first-line selects the first line of every <p> element and makes it italic.

p::first-line {
  font-style: italic;
}

::before: Inserts content before an element. For example, div::before {content: "Hello";} inserts “Hello” before every <div> element.

div::before {
  content: "Hello";
}

::after: Inserts content after an element. For example, div::after {content: "Bye";} inserts “Bye” after every <div> element.

div::after {
  content: "Bye";
}

::selection: Selects the portion of an element that is selected by the user. For example, p::selection {color: red;} changes the color of the selected text in every <p> element to red.

p::selection {
  color: red;
}

Functions cheat sheet

:has(selector): Selects an element if it contains another element that matches the selector. For example, div:has(p) selects every <div> element that contains a <p> element.

 div:has(p) {
   border: 1px solid blue;
 }

:matches(selector1, selector2, ...): Selects an element if it matches any of the selectors. For example, p:matches(.special, .important) selects every <p> element that has either a special or an important class attribute.

p:matches(.special, .important) {
  font-weight: bold;
}

:nth-child(an+b): Selects an element that is the an+bth child of its parent. For example, p:nth-child(2n+1) selects every odd <p> element among its siblings.

p:nth-child(2n+1) {
  background: pink;
}

:nth-last-child(an+b): Selects an element that is the an+bth child of its parent, counting from the last child. For example, p:nth-last-child(2n+1) selects every odd <p> element among its siblings, counting from the last one.

p:nth-last-child(2n+1) {
  background: lightblue;
}

:nth-of-type(an+b): Selects an element that is the an+bth element of its type among its siblings. For example, p:nth-of-type(2n+1) selects every odd <p> element among its siblings of the same type.

p:nth-of-type(2n+1) {
  background: yellow;

Frequently Asked Questions

A CSS selector is a pattern that matches one or more elements in the HTML document. You can use selectors to apply styles to those elements. For example, p is a selector that matches all <p> elements.

There are many types of CSS selectors, such as class selectors, ID selectors, attribute selectors, pseudo-class selectors, pseudo-element selectors, and combinators. Each type has a different syntax and functionality. For example, .intro is a class selector that matches all elements with class="intro".

You can use online tools such as CSS Selector Tester or Selector Playground to test your CSS selectors. You can also use the developer tools in your browser to inspect the elements and their styles.

You can combine CSS selectors using combinators, such as child combinator (>), adjacent sibling combinator (+), general sibling combinator (~), or list combinator (,). For example, div p is a descendant combinator that matches all <p> elements inside <div> elements.

Pseudo-classes and pseudo-elements are keywords that start with a colon (:) or a double colon (::) and specify a special state or part of an element. For example, a:hover is a pseudo-class selector that matches links on mouse over, and p::first-letter is a pseudo-element selector that matches the first letter of every <p> element.

Contact Sales

Web scraping

Pavlo Zinkovski

As infatica`s CTO & CEO, Pavlo shares the knowledge on the technical fundamentals of proxies.

CSS selectors cheat sheet for web scraping

What’s the use of CSS selectors in web scraping?

How to handle dynamic and interactive web pages with CSS selectors

How to deal with common challenges and errors when using CSS selectors

How to use advanced CSS selectors features such as pseudo-classes, pseudo-elements, and functions

CSS selectors cheat sheet

Basic selectors cheat sheet

Attribute selectors cheat sheet

Combinators cheat sheet

Pseudo-classes cheat sheet

Pseudo-elements cheat sheet

Functions cheat sheet

Frequently Asked Questions

You can also learn more about:

CSS selectors cheat sheet for web scraping

What’s the use of CSS selectors in web scraping?

How to handle dynamic and interactive web pages with CSS selectors

How to deal with common challenges and errors when using CSS selectors

How to use advanced CSS selectors features such as pseudo-classes, pseudo-elements, and functions

CSS selectors cheat sheet

Basic selectors cheat sheet

Attribute selectors cheat sheet

Combinators cheat sheet

Pseudo-classes cheat sheet

Pseudo-elements cheat sheet

Functions cheat sheet

Frequently Asked Questions

What is a CSS selector?

What are the different types of CSS selectors?

How do I test CSS selectors?

How do I combine CSS selectors?

How do I use pseudo-classes and pseudo-elements in CSS selectors?

You can also learn more about: