Nowadays, a big part of the whole internet content is served using content delivery networks (CDNs). Despite such popularity, there is little research on how online censorships affect this technology. Researchers from MIT have analyzed possible methods of blocking CDN-served content using China as a real-life example. They also have developed a solution to bypass online censorship using cached content.
Here is a roundup of the main ideas of their research.
Intro
Online censorship is a global threat to free information access throughout the world. Todays's censorship has become possible because the internet has inherited the end-to-end communication model from the outdated telephone networks of the 1970s. In such a system, you can block access to the specific content without efforts just using the IP address.
However, the development of the internet has lead to the birth of new technologies for information exchange. One of them is content caching, which speeds up communication and enhance productivity. Today, CDN providers process vast amounts of web content. Akamai company alone is responsible for the processing of 30% of global web traffic.
CDN network is a distributed system for better content delivery. Typical CDN consists of multiple servers throughout the world where the content is cached so that the internet user can download it from the nearest caching server instead of connecting to the source-destination where the material was initially stored. This significantly increases the overall speed of communication and decrease the load for websites where the content is originally hosted.
CDN-based content censorship
Despite the popularity of CDN technology, there is little research about how censors worldwide control it. The authors of the initial investigation have analyzed popular online censorship methods and studied their effectiveness for blocking CDN traffic. Also, they compared their research to a practical situation in China. This is the country with the most advanced state-backed censorship systems ("Chinese firewall").
IP filtering
The most popular online censorship technique due to its simplicity. This approach involves the detection of IP addresses of resources with the forbidden content. Then censor-controlled ISPs stop delivering packets sent to these addresses.
Nowadays, network equipment features IP filtering capabilities so that it is straightforward to use this censorship method. However, it is not quite suitable for CDN censorship. CDN networks are distributed (i.e. use multiple IP addresses), usage of shared servers (meaning the block will affect non-forbidden content hosted on the same edge-server with the forbidden one), dynamic IP assignment (even if the censor finds out the IP address to block, it will expire very soon, and the content will be hosted on another address).
DNS interference
Another popular censorship method involves interfering with DNS technology. This approach is aimed at preventing users from even discovering IP addresses of resources with forbidden content. There are multiple ways to solve this task including bypassing DNS connections, DNS poisoning, blocking DNS-requests
DNS interference is a very effective method. However, it can be bypassed by using non-standard DNS resolution methods, like out-of-band channels. Thus, DNS interference is often combined with IP filtering. However, as mentioned above, such filtering is ineffective for censoring CDN.
URL/keywords filtering via DPI
Modern network monitoring equipment can be used to analyze specific URLs and keywords in transmitted data packets. This technology is called DPI (deep packet inspection). If the system detects forbidden URLs or keywords in the data stream, the connection drops.
This is a very effective yet resource-intensive censorship method. CDN content can be protected from DPI interference by using encryption (HTTPS).
Advanced DPI analysis
Besides using DPI for search of specific keywords and URLs, this technology can be used for more advanced analysis. Such ways include statistical analysis of online/offline traffic and identification protocols. These are very resource-intensive methods. There are no evidence of their active usage by censors.
Self-censorship of CDN providers
When a censor is a state, it has a wide range of capabilities to block the entire CDN providers on a controlled territory. To avoid such destructive consequences, CDN providers often start self-censorship to continue working in a specific country.
Real-world CDN censorship: how Chinese Firewall works
The Great Chinese Firewall is considered as the most effective and sophisticated censorship system. So, the researchers studied how effective it is for CDN filtering.
They've obtained a Linux-based node inside China and several computers outside of the country. Also, researchers downloaded a list of censored websites from the GreatFire.org website and analyzed the method of block applied in each case.
The list of CDNs working in China contains CloudFlare, Amazon CloudFront, EdgeCast, Fastly, and SoftLayer. The only CDN with its own China-based infrastructure is Akamai, so it was the primary object of research. During the experiments, researchers have found out addresses of Akamai edge servers inside and outside China. For Chinese servers, no forbidden content was returned by servers (HTTP 403 Forbidden error). Outside of the country, these websites opened without any restrictions. It was evidence of self-censorship applied by Akamai.
Providers without own China-based infrastructure do not use self-censorship for local users. For cush companies, the most widespread censorship method is DNS filtering when requests to blocked websites are forwarded to wrong IP addresses. The Chinese Firewall does not block IPs of CDN edge servers to provide only targeted blocks and does not affect not censored information stored on the same servers.
However, such targeted blocks can be applied only for non-encrypted traffic. For HTTPs connections, the Firewall blocks the entire domain, instead of specific pages.
Also, China has its own CDN networks (ChinaCache, ChinaNetCenter, and CDNetworks). All these providers follow local regulations and block content forbidden in the country.
CacheBrowser: fighting censorship using CDN
As it is hard for censors to block CDN traffic, researchers decided to use this technology to create a circumvention solution. Censors can hack the DNS, however for the CDN system to work, usage of DNS is not obligatory. The user can download by directly sending queries to the edge-server where the needed content is already cached.
The user installs the client-side software and can access the content via a regular web browser.
When the attempt to establish the connection with some website is made, the browser queries the local DNS database (LocalDNS) to get the IP address of the host. The regular DNS is requested only for domains that are not in the LocalDNS yet. The Scraper module is continuously run over the list of requested URLs and search for potentially censored names. Then it sends requests to the Resolver module to resolve the newly found blocked domains and put them to the LocalDNS. The browser's cache is cleared afterward to flush the existing DNS records for the blocked domain.
If the Resolver module can't understand what CDN provider the domain belongs to, it sends the request to the Bootstrapper module.
How it works in real life
The client-side software is designed for Linux but can be ported for Windows as well. Modern Mozilla Firefox browser is used for content browsing. Scraper and Resolver modules are coded with Python, while Customer-to-CDN and CDN-toIP databases are stored as txt-files. Regular Linux OS /etc/hosts file is used as a LocalDNS database.
For the blocked URL that looks like https://blocked.com/ the script will get the IP-address of an edge server from /etc/hosts and send HTTP GET request for BlockedURL.html access with HOST HTTP header:
blocked.com/ and User-Agent: Mozilla/5.0 (Windows
NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1
The Bootstrapper module is designed using the free tool digwebinterface.com. This is a DNS resolver that can't be blocked. It replies to DNS-requests from multiple geographically distributed locations.
Using this toolkit, the researchers managed to access Facebook, which is blocked in China for many years.