Measurement and Analysis for Protocols Research Group (maprg) Agenda @ IETF-101 (London)
========================================================================================

Date: Tuesday, March 20, 9:30-12:00 - Tuesday Morning session I  
Room: Sandringham 

Scribed by: Mat Ford

Abstracts of all talks are included at the foot of these notes.

## Intro & Overview
Dave Plonka (DP) and Mirja Kühlewind (MK)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-intro-overview-05

## Heads-up talk: Challenges in measuring 1 Gbps access speeds  
Jason Livingood 

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-measurement-challenges-in-the-gigabit-era-01

## Heads-up talk: Zesplot in five minutes - An attempt to visualise IPv6 address space  
Luuk Hendriks

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-zesplot-an-attempt-to-visualise-ipv6-address-space-00

DP: Interesting transitioning from tool to useful measurement (anycast)

## Update on previous presentation: Measuring the quality of DNSSEC deployment  
Roland van Rijswijk-Deij

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-measuring-the-quality-of-dnssec-deployment-00

## Update on previous presentation: Update on IPv6 Performance Data  
Tommy Pauly (TP)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-update-on-ipv6-performance-data-tommy-pauly-01

Tim Chown: Where are UK cellular measurements coming from?

TP: This is just sampling from Apple devices, iPhones etc., other devices may have been configured differently. This is the view that we have of UK carriers.

TC: I believe some UK operator has about a million handsets on IPv6, but maybe not Apple devices.

MK: Do you have absolute numbers?

TP: For this specific measurement, it's order of 1000 connections, so very little.

Lorenzo Colliti: Based on your charts I get the impression that across the board you end up using IPv6 half the time of the total connections.

TP: Based on happy-eyeballs data that would be mainly from hosts that are not dual stack.

LC: Say you had a large head start for IPv6, how would the numbers change?

TP: Looking at our happy-eyeballs data, when IPv6 is available in the network and you have a dual-stack host, we use IPv6 95% of the time.

LC: Maybe I'm just misreading graph. Why is WiFi used only 14% of time if there is 39% IPv6 on US WiFi?

TP: I imagine it's because a lot of our connections are to things that don't have dual-stacked service.

LC: So the 15% difference is the content factor.

TP: Yes.
    
Michael Tuexen: What is the RTT of a TCP connection?

TP: This is smoothed average. At the end of the connection we look at smoothed average of what TCP saw during lifetime of connection.

Geoff Huston: You're not measuring IPv6 and IPv4 to the same endpoint. You're measuring IPv6 when happy eyeballs says use IPv6, IPv4 when happy eyeballs says use IPv4 and IPv4 when there is no IPv6 to use, right?

TP: Yes, lots of biases in here.

GH: I get a very different answer when I look at one endpoint and IPv6 and IPv4 to the same endpoint. If you changed happy eyeballs timers would you change this profile?

TP: We're including all IPv4 connections, some of which not eligible for happy eyeballs. It's just trying to get overall picture.

GH: You're comparing oranges and mandarins!

TP: Yes.

## Update on client adoption for both TLS SNI and IPv6  
Erik Nygen

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-update-on-tls-sni-and-ipv6-client-adoption-01

## On the use of TCP's Initial Congestion Window in IPv4 and by Content Delivery Networks
Jan Rüth (JR)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-on-the-use-of-tcps-initial-congestion-window-in-ipv4-and-by-content-delivery-networks-00

Bob Briscoe: There was a presentation at a recent IETF that measured ECN but also incidnetally measured IW10 - got similar results - didn't go into as much detail - that study showed some measurement results for mobile networks - I can post URL on the mailing list.

JR: Thanks.

Stephen Strowes: I'm looking forward to seeing your IPv6 results! CDNs do pretty aggressive MSS clamping - interested to see how that affects your results.

JR: We were wondering how CDNs were delivering their initial windows - some of them seem to pace their data, not all.

## A First Look at QUIC in the Wild
Jan Rüth (JR)

https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-a-first-look-at-quic-in-the-wild-00

DP: This is an upcoming publication next week at PAM so you're some of the first to see these results.

Dmitri Tikhonov: LiteSpeed is spelled L I T E.

MK: There is a question in jabber about the availability of data? Your own measurements might be available or shareable?

JR: Some measurement data is available - IXP and ISP traces are available - MAWI data is available from their website, we can't publish TLD data because domain lists are under NDA, https://quic.netray.io - IPv4 weekly scans run on Friday, data available on Saturdays.

## Adoption, Human Perception, and Performance of HTTP/2 Server Push  
Torsten Zimmermann (TZ)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-adoption-human-perception-and-performance-of-http2-server-push-00

DP: How do you translate from an Alexa1M domain name to a CDN to determine who is hosting content?

TZ: Started with IPs, mapped to AS numbers, also doing regular DNS measurements, and analyse DNS chain CNAMES and identifiers - this could be an incomplete list.

DP: Hint - using passive DNS can tell you the collection of FQDNs below a name - there are many Alexa services that use many CDNs simultaneously.

Erik Nygren: Especiallly as more sites move to SNI the zmap IPv4 scan is going to become decreasingly useful because many servers will behave differently depending on which SNI they see. IPv6 deployment growth is also a consideration. Alexa very much focuses on www style sites but increasing amount of content is on separate domains for images, videos etc. that don't show up as well on that list. When looking at push performance behaviour - you show that there are different providers providing push - is that common across the board or did some providers have behaviours that were generally negative while others were more generally positive?

TZ: We couldn't attritbute that to the provider itself, it's more like the configuration of the website so the user of the website. Push requires manual configuration, but we also see some sites that use plugins, like WordPress plugins, scan the file tree, observe static files to do that configuration

EN: There are starting to be product features from various vendors that analyse site behaviour and then modify push behaviour. You may see cases where that kind of closed-loop analysis of what to push has very different beaviours than more static configurations.

Ian Swett: Thanks for publishing this work - I've long been looking for data on the benefits or otherwise of push - it's hard to get good data and this is the best I've seen - it makes clear the challenges of making push work in the wild. Have you considered evaluating hash digest and whether that makes things much better?

TZ: We will look into that. There will be a talk at httpbis meeting today on hash digest - there's still a lot that can wrong with server push - it's a cool feature but maybe standardised too early - without hash digest it shouldn't be used.

## Inferring BGP Blackholing Activity in the Internet  
Georgios Smaragdakis (GS)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-inferring-bgp-blackholing-in-the-internet-00
    
Alexander Azimov: Unfortunately blackholing is a service - it's not only DDOS mitigation - it's also a service for censorship - especially in Russia it is used to block some resources and it is quite popular.

GS: I didn't want to say that, you said that, fine. Some of the long-lived ones are candidates for censorship - we have enough indications for that but we don't have ground truth - if you see days of blackholing and also we use other analysis we can find that these are for websites of political content.

AA: It looks like blackholing plus hijacks.

## Deploying MDA Traceroute on RIPE Atlas Probes
Kevin Vermeulen (KV)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-deploying-mda-traceroute-on-ripe-atlas-probes-kevin-vermeulen-00
    
DP: Interesting to contrast what you've found with this breadth of paths versus what we usually call the diameter of Internet. We usually see a maximum length. Is this terminology with diamonds something new or is there some existing literature about this?

KV: There is a paper from Transactions on Networking that conducted a survey 7 years ago, it defined length and width - we have added symmetry and meshing for our heuristics these metrics are important.

Kyle Rose: Could these techniques be used by an attacker to discover weak points in a network? A wide node could mean something for example. Could you identify a specific load-balanced node and try to take it down because you've identified it as a bottle-neck for example.

KV: I'm not into security, but maybe, i suppose so - the goal here was to make a nice map so by merging all the traceroutes that we get indeed we see the core of the network which may be useful to an attacker.

Tim Chown: I regularly use a package called perfsonar to measure loss, latency and throuput between a large number of sites. It's trace task has no idea when something changes where the change occurred. It's nice to have an algorithm that helps distinguish between the endpoints, endpoint networks and the network in-between.

KV: The heuristics don't take into account where the load-balancing is taking place but they could - it's a question that we've considered.

TC: Will follow up by email.

## An endhost-centric approach to detect network performance problems
Olivier Tilmans (OT)

Slides: https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-an-endhost-centric-approach-to-detect-network-performance-problems-00

Tim Chown: Is code available?

OT: If you want it, I can give it to you, but it's not yet published - I am open to deploy elsewhere and compare results. Sharing dataset has privacy implications for the students.

Lorenzo Colliti: Are you using cgroup eBPF filters?

OT: I am putting a tracing mechanism in the kernel attached to 

LC: So you see socket calls not packets?

OT: I'm intercepting kprobes - 

LC: What does this give you for quic? It shouldn't give you anything.

OT: You can do this in user space as well.

LC: You have to do something like either ld preload or whatever to attach

OT: Not even actually - we can set up the kernel such that any userspace or kernel space call matching some symbol will have 

LC: So your implementation only works for a given version of the quic binary?

OT: Yes that is a challenge we have. We know that students for example have Chrmomium - if we want to move to another implementation we have to make it compatible - need to define a better way/generic way

LC: It could break between chromium version 65 and 66.

OT: yes but we own machine so we own the version - we control everything

LC: Alright, thankyou.

Neal Cardwell: Wanted to offer a conjecture on some of the mysterious delay spikes you're seeing. Some of the TCP RTTs you're seeing are suspiciously similar to delayed ack values on common operating systems (200ms and 40ms). So something to look at.

Theresa Enghardt: Where can i find your code?

OT: I can share privately but it's not published yet. If you want to collaborate drop me an email.

## Closing remarks
    
DP: You may have noticed a difference between the last two talks and the remainder of the agenda. This was deliberate to carve out space to have talks about tooling and measurement strategies. In maprg we want to bring insight to engineering and operation of protocols. We suggest that you can only come and talk about a tool if you have a novel measurement for us.

MK: Thanks all for being on time and please give us feedback if there were talks you liked, didn't like, or things you'd like to see in future.


Abstracts
=========

Zesplot in five minutes - An attempt to visualise IPv6 address space (Luuk Hendriks) Visualising IPv6 address space is a challenging exercise. While approaches based on Hilbert curves have proven to be useful in the IPv4 space, they end up producing uselessly large visualisations when applied to the IPv6 space. Inspired by the IPv6 Hackathon organized by the RIPE NCC in November 2017, our experimental tool Zesplot is an attempt to apply the idea of so-called squarified treemaps [0] on IPv6 prefixes and addresses. 
Zesplot produces plots based on two inputs: a list of prefixes, and a list of addresses. The list of IPv6 prefixes is used to display squares, where the size of the square reflects the size of the prefix. Then, the list of IPv6 addresses is used to determine the colour of said squares: the more addresses are within a certain prefix, the brighter that square is coloured. Thus, one can easily spot outliers in an input set: a small but bright square for example, means many 'hits' from a small prefix. Example use cases are visualisation of access logs of e.g. webservers, origin of spam mail, or gaining insights in measurement results for anything related to IPv6. Another possible use case is education or address planning, where one can directly see the impact of splitting up a prefix in different ways. 
Currently, Zesplot outputs to SVG with an HTML/JS wrapper, allowing for zooming in/out on the plot, and providing additional info (think ASN, number of addresses per prefix) while hovering the squares. We are eager to learn what use cases are most useful for people, both operators and researchers, to determine the direction for Zesplot. A first version of the tool should be available under a permissive open source license soon. 
[0] https://www.win.tue.nl/~vanwijk/stm.pdf

Measuring the quality DNSSEC deployment (Roland van Rijswijk-Deji) 
In 2017 we performed two extensive studies of the DNSSEC ecosystem using longitudinal data collected by the OpenINTEL active DNS measurement system (https://openintel.nl/). Both studies focused on the quality of DNSSEC deployments. In other words: if organisations bother to deploy DNSSEC, do they deploy it in a secure way? We find that in generic TLDs, DNSSEC deployment is low (1%). Fortunately, that 1% does mostly get it right; "real" errors in DNSSEC deployment are rare. When we zoom in on two ccTLDs that have incentivized DNSSEC deployment (.nl and .se), the picture is a bit more grim. While errors are rare, deployments seldom follow best practices, leading to potentially insecure DNSSEC deployment. 

Update on client adoption for both TLS SNI and IPv6 (Erik Nygen) 
With the exhaustion of IPv4, the multi-tenancy enabled by TLS SNI is critical to supporting the rapid adoption of HTTPS.  Over the past few years, TLS SNI has gone from having insufficient adoption for being generally useful to being viable in a majority of cases.  IPv6 can also help here by not being address-limited and has also seen solid growth in many countries.  Akamai has been closely tracking global adoption of both IPv6 and TLS SNI (and taking steps to influence both) over the past few years. This talk will provide an update on where the world is with end-user and client adoption for both TLS SNI and IPv6, based on traffic statistics being collected from Akamai traffic delivery. We will highlight both leaders and laggards, looking at areas that can have the most leverage for increasing global adoption of both. 

On the use of TCP's Initial Congestion Window in IPv4 and by Content Delivery Networks (Jan Rüth) 
Paper “Large-Scale Scanning of TCP’s Initial Window”: https://conferences.sigcomm.org/imc/2017/papers/imc17-final43.pdf Improving web performance is fueling the debate over sizing TCP's initial congestion window (IW). This debate yielded several RFC updates to recommended IW sizes, e.g., an increase to IW10 in 2010. The current adoption of IW recommendations is, however, unknown. First, we conduct large-scale measurements covering the entire IPv4 space inferring the IW distribution size by probing HTTP and HTTPS. We find that many relevant systems have followed the recommendation of IW10, yet a large body of legacy systems is still holding on to past standards. Second, to understand if standardization and research perspective still meet Internet reality, we further study the IW configurations of major Content Delivery Networks (CDNs) as known adaptors of performance optimizations. Our study makes use of a globally distributed infrastructure of VPNs giving access to residential access links that enable to shed light on network dependent configurations. We observe that most CDNs are well aware of the IW's impact and find a high amount of customization that is beyond current Internet standards. Further, we find CDNs that utilize different IWs for different customers and content while others resort to fixed values. We find various initial window configurations, most below 50 segments yet with exceptions of up to 100 segments — the tenfold of current standards. Our study highlights that Internet reality drifted away from recommended practices and thus updates are required. 

A First Look at QUIC in the Wild (Jan Rüth) 
Paper (author's version): https://arxiv.org/abs/1801.05168 For the first time since the establishment of TCP and UDP, the Internet transport layer is subject to a major change by the introduction of QUIC. Initiated by Google in 2012, QUIC provides a reliable, connection-oriented low-latency and fully encrypted transport. We provide the first broad assessment of QUIC usage in the wild. We are monitoring the entire IPv4 address space since August 2016 and about 46% of the DNS namespace to detected QUIC-capable infrastructures. As of October 2017 our measurements show that the number of QUIC-capable IPs has more than tripled since then to over 617.59K. We find around 161K domains hosted on QUIC-enabled infrastructure, but only 15K of them present valid certificates over QUIC. We publish up to date data through: https://quic.comsys.rwth-aachen.de. Second, we analyze over one year of traffic traces provided by MAWI, one day of a major European tier-1 ISP and from a large IXP to understand the dominance of QUIC in the Internet traffic mix. We find QUIC to account for 2.6% to 9.1% of the current Internet traffic, depending on the vantage point. This share is dominated by Google pushing up to 42.1% of its traffic via QUIC. 

Adoption, Human Perception, and Performance of HTTP/2 Server Push (Torsten Zimmerman) 
The web is current subject to a major protocol shift with the transition to HTTP/2, that overcomes limitations of HTTP/1. For instance, it now is a binary protocol that enables request-response multiplexing and introduces Server Push as a new request model. While Push is regarded as key feature to speed-up the web by saving unnecessary round-trips, the IETF standard does not define its usage, i.e., what to push when. 
The goal of our work is to inform standardization with an up-to-date picture on i) its current usage, ii) its influence on user perception, and iii) optimization potential. Our Push usage assessment is based on large-scale measurements [1] covering the IPv4 and the complete set of .com/.net/.org domains. We regularly report our results at https://push.comsys.rwth-aachen.de. We find both the HTTP/2 and the Push adoption to steadily increase, yet Push usage is orders of magnitudes lower than HTTP/2, highlighting its complexity to use (e.g., 220K domains on the Alexa 1M support HTTP/2 and only 932 Push). 
Second, our performance evaluation of Push enabled sites shows that Push can both speed-up and slow-down the web [1][2]. These detrimental effects cannot be simply attribute to simple factors like type, size, or fraction of pushed objects, again highlighting the complexity to use push correctly. 
We assessed if these effects are user perceivable in a user study [2], i.e., to assess if current engineering and standardization efforts are indeed sufficient to optimize the Web. Server Push can yield human-perceivable improvements, but also lead to impairments. Notably, these effects are highly website specific and indicate that finding a generic strategy is challenging. 
Our ongoing work studies how to better use push. We thus thoroughly analyze Push performance impacts in a controlled and isolated testbed. Based on these results and the previous contributions, we investigate a novel approach to realize Server Push, incorporating website specific knowledge and client-side aspects, that can lead to improvements for some websites. 
We believe that our work can help to understand how standardized features are applied in the wild and what are the resulting consequences. 

[1] https://www.comsys.rwth-aachen.de/fileadmin/papers/2017/2017-zimmermann-networking-push.pdf
[2] https://www.comsys.rwth-aachen.de/fileadmin/papers/2017/2017-zimmermann-internetqoe-push.pdf

Inferring BGP Blackholing Activity in the Internet (Georgios Smaragdakis) 
The Border Gateway Protocol (BGP) has been used for decades as the de facto protocol to exchange reachability information among networks in the Internet. However, little is known about how this protocol is used to restrict reachability to selected destinations, e.g., that are under attack. While such a feature, BGP blackholing, has been available for some time, we lack a systematic study of its Internet-wide adoption, practices, and network efficacy, as well as the profile of blackholed destinations. 
In this presentation we describe how we develop and evaluate a methodology to automatically detect BGP blackholing activity in the wild. We apply our method to both public and private BGP datasets. We find that hundreds of networks, including large transit providers, as well as about 50 Internet exchange points (IXPs) offer blackholing service to their customers, peers, and members. Between 2014-2017, the number of blackholed prefixes increased by a factor of 6, peaking at 5K concurrently blackholed prefixes by up to 400 Autonomous Systems. We assess the effect of blackholing on the data plane using both targeted active measurements as well as passive datasets, finding that blackholing is indeed highly effective in dropping traffic before it reaches its destination, though it also discards legitimate traffic. We augment our findings with an analysis of the target IP addresses of blackholing. We also show that BGP blackholing correlates with periods of high activity of DDoS attacks. Our tools and insights are relevant for operators considering offering or using BGP blackholing services as well as for researchers studying DDoS mitigation in the Internet. 

Deploying MDA Traceroute on RIPE Atlas Probes (Kevin Vermeulen) 
Traceroute is widely used by network operators, to troubleshoot, and by scientists, to understand the topology of the internet. In the presence of load balancing, classic traceroute can lead to errors and misinterpretations, and these have been corrected by the widely used Paris Traceroute, which we have developed and maintain. Paris Traceroute's Multipath Detection Algorithm (MDA) allows a user to discover the load balanced paths between a source and a destination, with configurable statistical guarantees on the completeness of the results. The more complex these topologies are, the more packets the MDA requires in order to provide the guarantees. For a single route trace, the numbers can run into the thousands of packets, and even tens of thousands. In a resource constrained environment, such as a RIPE Atlas probe, this is too costly. We have made an empirical study of the patterns in which load balancers tend to reveal themselves in route traces, and we are using the results to implement and deploy a new MDA, which promises to significantly reduce the number of probe packets required to discover complete multipath routes. 
We describe ongoing work that has been presented last week at CAIDA's AIMS workshop. In order to reduce the number of probe packets required to discover complete multipath routes, we have done a survey on load balancers, consisting in probing 350000 ip destinations from 35 planet lab node as sources, and extract some metrics. This talk will mainly focus on the results we have found in the survey, and the metrics we have extracted to classify the load-balancers we have found.. 

An endhost-centric approach to detect network performance problems (Olivier Tilmans and Olivier Bonaventure) 
As enterprises increasingly rely on cloud services, their networks become a vital part of their daily operations. Many enterprise networks use passive measurements techniques and tools, such as Netflow. However, these do not allow to estimate Key Performance Indicators (KPIs) of connections, for example losses or delays. Although monitoring functions on routers or middleboxes can be convenient from a deployment viewpoint, they miss a lot of information about performance problems as they need to infer the state of each connection and they will become less and less useful as encrypted protocols are getting deployed (e.g., QUIC encrypts transport headers). It is time to revisit the classical approaches to network monitoring and exploit the information available on the end hosts. In this talk, we propose a new monitoring framework where monitoring daemons directly instrument end-hosts and export KPIs about the different transport protocols towards an IPFIX collector. More specifically, our monitoring daemons insert at runtime lightweight probes in the native transport stacks (e.g., the Linux kernel TCP stack, libc’s name resolution routines, QUIC implementations) to extract general statistics from the state maintained for each connection. An aggregation daemon analyzes these statistics to detect events (e.g., connection established, RTOs, reordering) and exports KPIs towards an IPFIX collector. We will present a prototype deployment of these monitoring daemons in a campus network, and discuss early measurement results.