IRTF Open Meeting @ IETF-95 Buenos Aires, Argentina Tuesday, April 5, 2016 (AST) 10:00-12:30 Tuesday Morning session I Applied Networking Researh Prize (ANRP) Award Talks 2x 45 min *** Roya Ensafi *** for examining how the Chinese “great firewall” discovers hidden circumvention servers: Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver, and Vern Paxson. Examining How the Great Firewall Discovers Hidden Circumvention Servers. Proc. ACM Internet Measurement Conference (IMC), Tokyo, Japan, October 28-30, 2015. Q&A Mat Ford: I was wondering whether there was any risk of prosecution or worse for the measurement clients you were using in China? RE: This is a big issue when dealing with any censorship measurement. But internal knowledge is the key. We know that China doesn't go after anybody connecting to Tor. There weren't any previous persecutions, actually China doesn't care - they just want to ensure majority don't get access to proxy servers. So if you have limited number of users you can set up proxy server and authorities don't care. Nalini Elkins: I'm curious if you saw IPv4 or IPv6 or both? RE: We only looked at IPv4. There is a common rumour that IPv6 is not being blocked, but we haven't looked into it. Dave Plonka: I monitor IPv6 and the only place I see IPv6 from China is coming across R&E networks. They seem to come across Internet2. Presumably this traffic is being treated differently somehow - did you see a difference in how R&E traffic was treated in the IPv4 space? RE: Yes. Main reason we chose CERNET in addition to Unicom is because of rumours that CERNET was being treated differently. Also heard rumours that business places like Shanghai also have more relaxed controls. Previous work (Great Firewall over Time and Space) was trying to investigate that - used special side-channels to choose clients all over China and try to see whether Tor is being blocked differently in different locations. Didn't find evidence of regional differences. For DNS layer, other colleagues performed investigation and they didn't find any differences either. There are contradictory results. For Tor, there is some central system. DP: Bifurcation of v4/v6 Internet is interesting. RE: GFW is constantly upgraded, so increased use of IPv6 could cause a reaction. Steve Paget: You mentioned that regular Tor was blocked, but that DPI could detect active Tor - how do you know that active probing is generating most of the blocks? RE: We see the IPs of active probers showing up at our private bridges. SP: Do you think that's the reason they're being blocked or is it because of DPI? RE: The DPI first detects the IP of the bridges and then the active probers confirm that it is running the protocol and then blocks. It might be that DPI is separated from the active probers. SP: If one signal comes through but not the other what would happen? What if DPI detected it but active probing did not? RE: Active probing gets input from the DPI, can't selectively choose to active probe or not. SP: What if active probing then fails? RE: For a while Tor detected that active probe was from China and didn't respond. That was a way of dealing with active probers. So that is possible. SP: Did it work? RE: It's a temporary solution. Upgraded pluggable transport and think it's safe. Dino: Do you have any plans for looking at other national firewalls, like North Korea to see whether they behave differently. RE: Yes, we are working on that. Not published yet. We know that China leads the way. If we see other countries making similar implementations we could conclude that GFW is selling expertise. MF: This is obviously an arms race, can you speculate on what the next steps in obfuscation technology might be? Do we just connect to Tor every 25hrs? RE: My suggestion was every 24hrs we ask users to connect to Tor authority directories to get the latest Tor bridge IPs but these are just hacks. Current pluggable transport Ups4 is being used in China. And also we have obfuscation is one way to go around GFW DPIs - Tor has other kinds of pluggable tranports that right now work. Dao Yun: I live in China so I experience some of the GFW inconvenience. Do you think GFW is using DPI to detect protocol, but if DPI has real time response to the setup procedure on every Internet border, what kind of DPI can provide this kind of high throughput? RE: Good question. It was interesting for us to be able to answer this question. 3 years ago probe showed up every 15 minutes. Now it's much faster. Don't know about technology they are using. They can probably come up with their own solutions. DY: I don't think it's possible to do this kind of work in real time using DPI. RE: Our observation was surprising. Our relay was a private relay - connections established from China can only have been from monitoring traffic and then all probes happened immediately after probe from client in China happened. Using new VPS. The fact that 1000s of active probers showed up, more than 50 in real time, is shocking. This was a surprise to us too. ?: Traffic between client and Tor is encrypted, so active probes work? RE: TLS handshake is unencrypted if vanilla version. So cipher list can be monitored. By looking at that, ToR traffic can be fingerprinted. Nalini Elkins: Did you look at L2 addresses at all? You said there were multiple L3 addresses but seemed like they were all coming from the same thing. RE: Good question. We didn't see any pattern. I don't think we did see anything special about L2. NE: Do you see different results depending on whether client is a phone, or wireless etc? RE: I'm not sure. Raphael: Did you see probes from China to outside China blocking traffic passing through China? RE: No, but you point at something interesting. We had control bridges set up (shadow infrastructure) - didn't establish any connection from China to it, residing in same /24 as real bridge - from those control bridges we didn't see traffic, so we know that they are coming after us because they observe the traffic. *** Zakir Durumeric *** for an empirical analysis of email delivery security: Zakir Durumeric, David Adrian, Ariana Mirian, James Kasten, Elie Bursztein, Nicolas Lidzborski, Kurt Thomas, Vijay Eranti, Michael Bailey, and J. Alex Halderman. Neither Snow Nor Rain Nor MITM… An Empirical Analysis of Email Delivery Security. Proc. ACM Internet Measurement Conference (IMC), Tokyo, Japan, October 28-30, 2015. Dave Oran: Did you ever backtrace from received spam or phishing attacks to figure out whether STARTTLS is actually helping in any regard or is there little corelation between how bad from a social engineering point of view the email coming in is versus the transport that was used to deliver it? ZD: I believe that less spam is protected with STARTTLS but I don't know whether that effects what the user does with it. Until recently they didn't even get any indication. So it's an interesting question but I don't think we have an answer today. Erik Nordmark: On the next to last slide you had a graph showing things slowly picking up, but what's the scale on the y-axis? ZD: This is % - 65%, 70%, 75% inbound mail to gmail. Steve Paget: One of your slides showed 0.4% of certificates were properly created. Is that correct? Like 99.6% of certs are not properly deployed on mail servers? ZD: Correct. That is they have the name of the domain itself and not of the host. SP: 36% were still properly associated with the MX, right? ZD: Correct. SP: Do we need to do something here - if 99.6% have this set up wrong, that seems like a problem. ZD: I agree. We've ended up with a cat and mouse, hard to encourage deployment of something that nobody validates, but we don't validate stuff that isn't deployed. I think there are also a couple of roadblocks (large cloud providers) - we don't have an obvious solution for a large mail provider to say this is how they should present a certificate for each of many different domains. So yes this is a problem, we do want to authenticate were we are sending our mail, but there are challenges to be overcome before we could issue advice as to how to do that. ?: You didn't mention DANE ZD: Yes, DANE does help solve this particularly when you have a certificate that matches the host name or if you wanted to provide some sort of key-pinning. The problem is and the reason these drafts have been put out is that we're not at the point where DANE has been deployed widely. If DANE and DNSSEC were widely deployed we could say go out and do this, but still requires a fair amount of effort for an organisation to go down the path versus say publishing a DNS record that says what you should do about my server. MF: Thanks for the talk. ANRP nominations will open again in June/July timeframe and close at the end of October, so please let us know about great research papers that you read and we'll hopefully have some more interesting talks like these at the IETF meetings next year. DP: Can we keep Zakir for a few minutes to talk about ZMAP and Censys? ZD: For those who don't know, ZMAP is an open source project released 3 years ago that allows you to perform a portscan of the entire IPv4 address space in an hour using a 1G connection. Widely used in research community to understand TLS deployment, embedded devices, SCADA devices. Problem when released was that people didn't have bandwidth or infrastructure. Only top-tier research institutions could use this. We published datasets from university of michigan, e.g. port 443 handshakes on a weekly basis - random numbers, cipher suites, content provided over http. People still found this hard to process (800GB per day). So we released a tool called Censys - a query engine on top of the data. Allows researchers to run SQL queries over the dataset. Used to find TLS vulnerabilities, working with browsers to identify websites that will go away when features are deprecated, also used for censorship measurement and embedded device security. Today, zmap is fairly stable - what about IPv6? IPv6 is so much larger brute-force approach won't work. Statistically driven approach is required. Still working to deliver a tool that people could actually use. More difficult piece is layers above - app layer protocols, identifying devices. Embedded devices - how do we tractably identify them. Started manually tagging devices - UPS, SCADA controller, but manual approach won't scale. Trying to develop machine-learning approach. Censys continues to move along, people propose things, add protocols - always looking for help to do this work.