HTTP cookies are an essential part of the modern web – when a user visits practically any website, they’re greeted with a cookie consent window. With the rise of cross-site tracking and ad networks, cookies became a concern for user privacy. So what are HTTP cookies and what role do they play in the user's web browser? How important are they for storing data? How are first- and third-party cookies different? How should web developers create them the right way? In this guide, we’re taking a closer look at this technology and answering all of these questions.
What are HTTP Cookies?
An HTTP cookie is a small data packet that gets exchanged between the web browser (e.g. Mozilla Firefox) and the target web server (e.g. wikipedia.org.) Upon visiting the website for the first time, your browser creates a cookie file and stores its parameters (e.g. HTTP headers) on your disk. Each cookie is used for saving the given session state, i.e. the data your browser is currently using. You may also hear people refer to them as “web cookies”, “internet cookies”, and “browser cookies”.
Let’s take a look at the cookie file itself – we can do this via your browser’s developer tools, which are typically accessible by pressing F12. In Google Chrome, for instance, you can then navigate to the Application tab (select it from the top menu) and open the Cookies submenu (select it from the list on the left.) This will show you a table containing all first- and third-party cookies that your browser acquired on the given website.
We can also see that each HTTP cookie has certain parameters like:
- Name: cookie name.
- Value: cookie value.
- Domain: target servers that can acquire the given cookie.
- Expires / Max-Age: the cookie's expiration date or max age attribute.
- Size: The cookie's size, in bytes.
- And more.
Why is it called a “cookie”? The technology’s name is tied to the cookie-the-baked-treat, so designers always add icons of these snacks to cookie consent pop-ups. Here’s the explanation: Fortune cookies (those with messages on a paper inside them) inspired the Unix term magic cookies, which denoted a small piece of data exchanged between computer programs.
How HTTP Cookies Work
When a user visits a website, the web server can send a Set-Cookie
HTTP header in the response. This header contains the cookie's name and value, and optionally other attributes such as Expires
, Max-Age
, Domain
, Path
, and Secure
. For example, a simple Set-Cookie
header might look like this:
Set-Cookie: sessionId=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT
This instructs the user's browser to store the cookie and send it back to the server with subsequent requests to the same domain.
How Browsers Store and Send Cookies
The browser performs cookie management locally on the user's device. Cookies are associated with the domain from which they came, and the browser ensures that only cookies matching the domain and path of subsequent requests are included in the Cookie
HTTP header. When the user makes another request to the same domain, the browser includes the stored cookies in the request header like so:
Cookie: sessionId=abc123
This way, the server receives the cookie information with each request, allowing it to maintain a stateful session with the browser.
Recognizing and Identifying Users
Cookies allow websites to recognize users by assigning them a unique identifier (usually a random string). When a user returns to a website, the server can read the unique identifier from the cookie and match it to the user's previous session. This enables the server to deliver a personalized experience, such as remembering user preferences, login states, and providing targeted content. Tracking cookies, in particular, can follow a user's browsing activity across different sites, which is used for targeted advertising and analytics.
By using cookies, websites can maintain user sessions, store user preferences, and track user behavior, which is essential for providing a seamless and personalized web experience.
Purpose of HTTP Cookies
Thanks to HTTP cookies, the purposes of modern web applications are vastly different from their Web 1.0 counterparts – they’re interactive and dynamic, making up the fun factor of using them: Shopping gets quicker, logging in gets easier, and major platforms like Google get uncannily adept at targeting ads at you:
Personalization
While old(er) static websites stay the same day in, day out, each HTTP cookie helps create the illusion of modern websites “getting to know you” (i.e. personalization), showing you relevant content the more you browse them. Websites typically provide specialized menus for customizing their appearance (e.g. font size), language, and functionality: For instance, the DuckDuckGo user preferences lets you choose the web page’s colors – and saves this data in the cookie file, which can be accessed by the browser on the next visit.
State/Session Management
The cookie's original goal was realizing the concept of an online shopping cart – a simple website feature that was previously impossible without a way to exchange data between the browser and the web server. Nowadays, the server exchanges session identifier cookies with the user each time the latter makes a request, while the server stores all users’ shopping cart data in its database. This way, it can use a cookie with a session identifier to display the correct shopping cart.
Identifiers also help with session management, and login sessions: Upon opening a login window, the browser receives a cookie with a unique user's session identifier. If the user manages to log in, the server will have the user logged and associate the session with the given cookie.
Tracking Users
The same site identifiers could be used to monitor the user's browsing history across multiple sites – and so user tracking was born: In most cases, these are third-party cookies which don’t generate value for the end-user (via website personalization or session management) and simply track them instead.
While tracking cookies aren’t “evil” per se, they’re prone to serious misuse when the advertiser wants to earn even more money.
For example, behavior tracking can be exploited for customer behavior analytics at scale: An experiment by the Wall Street Journal showed that, upon visiting 50 most popular websites in the US (Google, YouTube, Facebook, Amazon, Yahoo, etc.), you would have 3,180 third-party cookies on your device. These tracking cookies, working actively on millions of devices, create powerful ad networks – and even force government regulators to step in (e.g. Google Analytics and the EU.)
Types of HTTP cookies
Two major cookie types, session cookies and persistent cookies, have different use cases and characteristics. Let’s explore them in greater detail:
Session cookies
Session cookies, also known as transient cookies, are temporary cookies that are deleted after a browser session. The primary purpose of a session cookie is to store activity data and simplify user navigation by making web pages load faster during a single browsing session.
Examples of when to use session cookies:
- Shopping cart functionality: Session cookies can store items added to a shopping cart during a single browsing session.
- Login credentials: They can store a user's login credentials for the duration of a session, allowing seamless navigation without repeated logins.
Persistent cookies
Persistent cookies store preferences across sessions, so they are saved on the device and remain until expiration, i.e. even after the user closes the browser. Their duration is determined by the Max-Age or Expires attribute. Unlike session cookies, which are stored in memory, persistent cookies are stored on the user's hard disk drive (HDD).
First-Party Cookies
A first party cookie, as the name implies, comes from a first party: In case of web browsing, these cookies are sent by the website/domain itself. First-party cookies are important for performance as websites often rely on them to provide:
- Automatic login via a username/password combination,
- Website preferences,
- Visited links (shown in purple),
- Saved settings,
- Shopping cart items,
- And more.
Third-party cookies
Conversely, third-party cookies are created by other domains: For example, visiting a domain abc.com may create cookie files for advertising tracking from the domain xyz.com. Third-party cookies aren’t usually related to website functionality; instead, they collect users’ unique identifiers to enable cross-site tracking them en masse.
Moreover, a third-party cookie may remain on the user’s device even after it reboots or shuts down, which allows advertisers to monitor the user’s activity on other websites. For example, they can show a product ad and later use third-party cookies to check if the user purchased it. Oftentimes, this cookie type creates privacy concerns and requires certain browser blocking options.
Secure Cookies
In an earlier screenshot, you may have noticed that a cookie may have the Secure attribute. If set to True, the HTTP request containing the cookie will use a secure protocol (e.g. HTTPS) for exchanging data. This way, secure cookies have a way of protecting the confidentiality of their data.
However, security mechanisms of a cookie can be bypassed via network threats, end system threats, cross site request forgery, and cookie harvesting, which would allow the attacker to intercept data like login details and session identifiers. For this reason, some web browser programmers advise to avoid transmitting sensitive information via cookies.
Zombie Cookies
As the name suggests, zombie cookies come back to life after deletion: While a regular cookie is engineered to respect the user’s privacy, its zombie counterpart ignores all restrictions: In addition to “respawning”, it can be saved in several locations, preventing the user from controlling their web data properly. Notably, third-party cookies often use the “zombie” mechanic to boost their efficiency.
Many privacy-conscious users decline web cookies or delete them on a regular basis. This is a concern for ad networks, which use tracking cookies to monitor the user’s behavior across different services. A zombie cookie is an effective, if morally dubious, solution to this problem: They enable web traffic monitoring across different websites via obtaining the user’s unique IDs meant for other websites. Moreover, they may ignore the confinements of a single user's browser and track them across their entire device.
These shady practices are largely to blame for the criticism targeting the cookie as a whole, resulting in a number of pro-privacy regulations like the EU cookie law (also known as the ePrivacy Directive), GDPR, and more.
HTTP Cookie properties
Much of cookies’ functionality and quirks are contained within their special properties. Here’s how they work:
Domain
The Domain attribute specifies which hosts (domains) are allowed to receive the cookie. If a cookie has a Domain attribute set to a particular domain, then it is also available to all subdomains of that domain. For example:
- If a cookie is set with
Domain=example.com
, it will be sent with requests toexample.com
as well as any subdomains likesub.example.com
. - If the Domain attribute is not specified, the cookie will be returned only to the same server that set it and not to any subdomains.
“Buckets" typically refer to the categorization or grouping of cookies based on certain attributes or policies. While the term "buckets" is not an official part of the specification, it is sometimes used informally to describe how browsers manage and limit the number of cookies stored for each domain. This helps prevent excessive resource usage and potential performance issues.
Subdomains are part of the larger domain and are typically used to organize different sections of a website or to host different services. Cookies can be set such that they are accessible by the main domain and all its subdomains.
Path
The Path attribute specifies the URL path that must exist in the requested URL for the browser to send the Cookie header. Path defines the scope of the cookie: it tells the browser which paths on the server should include the cookie in the HTTP request. The default value for the Path attribute is the path of the URL that the browser used when the cookie was received. If a cookie has been set with Path=/docs
, the cookie will be included in requests to /docs
and any subdirectories like /docs/subdirectory
.
By setting the Path attribute, you can control where the cookie is sent and prevent it from being sent to every page on your domain, which can enhance security and performance. Remember, the more specific the path, the narrower the scope of the cookie.
Expires and Max-age
The Expires and Max-Age attributes are used to control the lifetime of a cookie, determining how long the cookie should be stored on the client's device before being deleted. The Expires attribute specifies an cookie expiration date and time for the cookie as an HTTP-date timestamp – and when the specified date and time are reached, the cookie is automatically deleted from the client's device. If the Expires attribute is not set, the cookie is treated as a session cookie and will be deleted when the browser session ends.
The Max-Age attribute defines the maximum age of the cookie in seconds, indicating the number of seconds until the cookie expires from the time it was set. If both Expires and Max-Age are set, Max-Age takes precedence, and Expires is ignored.
Persistent cookies include an Expires or Max-Age attribute and remain stored on the user's device even after the browser is closed, until they expire. On the other hand, session cookies do not have an Expires or Max-Age attribute and are deleted when the user closes their browser or when the session ends.
SameSite
The SameSite attribute allows servers to specify whether cookies should be sent with cross-site requests (where the site is defined by the registrable domain and the scheme: http or https). This attribute provides protection against cross-site request forgery (CSRF) attacks by controlling cookie behavior.
The concept of "site" refers to the combination of the specific domain suffix and the part of the domain just before it. For example, if a user is on www.web.dev
and requests an image from static.web.dev
, that's considered a same-site request. The public suffix list defines what pages count as being on the same website. It includes not only top-level domains like .com
but also services like github.io
. This allows subdomains like your-project.github.io
and my-project.github.io
to count as separate sites.
The SameSite attribute can take three possible values:
- Strict: Useful for cookies related to features that are always behind an initial navigation, such as changing a password or making a purchase. Can only be sent in a first-party context
- Lax: Allows cookies to be sent with top-level navigations, making it less restrictive than Strict. Good for preferences or features that users expect to work across different pages.
- None: Necessary for scenarios where cookies need to be shared across different third-party domains (e.g., embedded widgets, cross-site analytics).
How to Create HTTP Cookies
To better understand how they work under the hood, we can create them manually: One option involves using a web browser; the alternative is using a web server to set a cookie header.
Client-side
For this method, we’ll need the document.cookie
property and any web browser – for instance, Google Chrome. Let’s open Chrome’s Developer Tools (F12
), navigate to the Console
tab, and type:
document.cookie="testcookie=1"
Let’s check if our test cookie has been added correctly. In the Application menu tab, select the Cookies menu on the left. Let’s look for our test cookie in the list:
Here it is! We successfully created it.
Web server-side
Creating cookies on the backend is a more common option, rich with different approaches towards this task: JavaScript or Python, backend code itself or a web server like Nginx, etc. In any case, the backend sets cookies in the HTTP response and then the server sends the data to the user.
We can start with a simple HTML script:
<<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-compatible" content ="ie=edge">
<title>Document</title>
</head>
<body>
<button id = 'btnCreateCookie'>Create Cookie </button>
<script>
const btnCreateCookie = document.getElementbyID("btnCreateCookie")
btnCreateCookie.addEventLister("click", e=> document.cookie = "example-3")
</script>
</body>
</html>
We can then use Node.js to return the HTML script via an index.js app:
const app = require("express")()
app.get("/", (req, res) => {
res.sendFile(`${__dirname}/index.html`)
})
app.listen(8080, ()=>console.log("listening on port8080"))
Upon running the script, you can create a cookie by pressing a button. If you don’t want to type JavaScript code manually, you can input an array:
const app = require("express")()
app.get("/", (req, res) => {
res.setHeader("set-cookie", ["setfromserver=1"])
res.sendFile(`${__dirname}/index.html`)
})
app.listen(8080, ()=>console.log("listening on port8080"))
Security and Privacy Concerns
Let's delve into the security and privacy aspects related to cookies:
Cross-Site Scripting (XSS)
Attackers exploit XSS vulnerabilities to inject malicious scripts into web pages. By doing so, they can steal cookies from other users' browsers. Stolen cookies allow attackers to impersonate legitimate users, perform unauthorized transactions, or manipulate user accounts.
Session Hijacking (Cookie Poisoning)
In session hijacking attacks, attackers manipulate genuine cookies to gain unauthorized access or compromise data. They intercept cookies before they are sent back to the server or create forged cookies to impersonate users. Successful cookie poisoning allows bypassing security measures and gaining unauthorized access to user sessions.
Techniques for Securing Cookies:
- Use HTTPS: Always serve your web pages over HTTPS. Secure cookies should only be transmitted over encrypted channels (HTTPS, WSS) to prevent eavesdropping and interception.
- Set Secure and HttpOnly Attributes: Use the
Secure
attribute to ensure cookies are sent only over secure connections. Set theHttpOnly
attribute to disallow client-side scripts (such as JavaScript) to access cookies, reducing the risk of XSS attacks. - Implement SameSite Attribute: The
SameSite
attribute controls when cookies are sent in cross-site requests. UseSameSite=Lax
orSameSite=Strict
to limit cross-site cookie transmission. SetSameSite=None
only for cookies that require cross-site access (e.g., embedded widgets) and use it withSecure
to ensure HTTPS.
How to manage HTTP Cookies
Managing cookies is essential for privacy and security. Let's explore how users can control cookies through their browser settings. Remember, however, that cookie blocking and cookie deletion can create certain website functionality limitations.
Browsers offer several grades of cookie protection – you can typically select one of these options:
- Allow all cookies: Enables all cookies, including third-party trackers.
- Block third-party cookies in Incognito: Blocks third-party trackers in Incognito mode.
- Block third-party cookies only: Blocks potential tracking cookies.
- Block all cookies: Disables all cookies.
Google Chrome
- Click the three vertical dots in the upper-right corner.
- Select Settings.
- Under Privacy and security, click Site settings.
- Choose Cookies and other site data.
- Adjust the cookie settings based on your preferences.
Microsoft Edge
- Click the three horizontal dots in the upper-right corner.
- Select Settings.
- Navigate to Cookies and site permissions.
- You can manage cookies from there.
Mozilla Firefox
- Click the three horizontal lines in the upper-right corner.
- Choose Options.
- Go to Privacy & Security.
- Under Cookies and Site Data, select your preferred settings.
Apple Safari
- Click Safari in the menu bar.
- Choose Preferences.
- Go to the Privacy tab.
- Adjust cookie settings as needed.
Cookies in Web Scraping
Cookies play a significant role in mimicking human behavior and bypassing anti-scraping technologies (e.g. bot detection). Many of these features help Infatica’s Web Scraper API collect data more efficiently:
Mimicking Human Behavior
User-Agent Header: Websites often use the User-Agent header to identify the client (browser or bot) making an HTTP request. By manually modifying the User-Agent header in web scraping code, you can mimic a commonly used web browser or device, making requests appear more like those of legitimate users.
JavaScript Execution: Particular websites rely on JavaScript to load content dynamically. To appear human-like, web scrapers can execute JavaScript code (using tools like Puppeteer or Playwright) to interact with the page, click buttons, scroll, and wait for elements to load.
Mouse Movements and Delays: Introducing random mouse movements and delays between requests can simulate human browsing patterns. Bots can move the cursor slightly or pause before making subsequent HTTP requests.
Bypassing Anti-Scraping Measures
Rotating IP Addresses: Web scrapers can use multiple IP addresses (via proxies or VPNs) to avoid detection. Rotating IPs makes it difficult for websites to track scraping activity from a single source.
Avoiding Honeypots: Some websites set up honeypots (fake pages) to trap scrapers. By analyzing the HTML structure, scrapers can avoid accessing these traps.
Handling CAPTCHAs: CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are common anti-bot measures. Techniques to bypass CAPTCHAs include using CAPTCHA solvers, smart proxies, OCR (Optical Character Recognition), and machine learning algorithms.
Saving Cookies: Cookies store session information. By saving and reusing cookies across requests, scrapers can maintain state and appear more like returning users.
Avoiding Hidden Traps: Some websites include hidden links or forms that bots might accidentally trigger. Scraper logic should avoid interacting with hidden elements.
Conclusion
Even though cookies have been at the center of controversy in recent years, they’re still an integral part of the modern web: Without them, navigating websites would take much more time – and web development would be restricted in terms of website functionality. Creating and managing a cookie correctly is important for data collection and similar tasks – and we hope this guide helped you understand this topic better.
Frequently Asked Questions
HttpOnly
is an attribute that disallows client-side scripts to read data. This way, only the server can access cookie data, which makes cookie usage more secure and prevents malicious client-side scripts. To add the HttpOnly
tag to the HTTP cookie, append [; HttpOnly]
like this: Set-Cookie: `=“[; “=“]` `[; expires=“][; domain=“]` `[; path=“][; secure][; HttpOnly]`
Set-Cookie:name=value
in its HTTP header. If a set already exists, it sends this instead: Cookie:name=value
.
Cookies
file in the \AppData\Local\Google\Chrome\User Data\Default
folder; Mozilla Firefox stores them in cookies.sqlite
located in \AppData\Roaming\Mozilla\Firefox\Profiles
.