Internet Engineering Task Force C. Xie Internet-Draft China Telecom Intended status: Informational L. Song Expires: May 29, 2019 Beijing Internet Institute J. Palet Martinez The IPv6 Company November 25, 2018 Network-side Happy Eyeballs based on accurate IPv6 measurement draft-xie-v6ops-network-happyeyeballs-01 Abstract During the period of IPv6 transition, both ISPs and ICPs (Internet Content Providers) care about user's experience in dual-stack networks. They hesitate to provide IPv6 to their users due to the fear of poor IPv6 performance. Network-based Happy Eyeballs (NHE) is proposed in this memo as an approach to facilitate ISPs to identify IPv6 connectivity issues and provide better connectivity to end users. NHE does accurate measurements and comparison on IPv6/IPv4 performance on the network side compared with client-side as in Happy Eyeballs v2 (HEv2) [RFC8305]. It works independently with client's adoption of HEv2 and both coexist without conflicting. REMOVE BEFORE PUBLICATION: The source of the document with test script is currently placed at GitHub [NHE-GitHub]. Comments and pull request are welcome. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on May 29, 2019. Xie, et al. Expires May 29, 2019 [Page 1] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 Copyright Notice Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Overview of NHE Framework . . . . . . . . . . . . . . . . . . 3 3. IPv6/IPv4 Performance Measurement . . . . . . . . . . . . . . 4 3.1. Performance metrics . . . . . . . . . . . . . . . . . . . 4 3.2. Location of IPv6/IPv4 Measurement . . . . . . . . . . . . 6 3.3. Reducing measurement traffic . . . . . . . . . . . . . . 7 4. Reporting IPv6 failures using syslog . . . . . . . . . . . . 7 4.1. Discovery of the syslog collector NSP . . . . . . . . . . 8 5. One Use Case of Troubleshooting action . . . . . . . . . . . 8 6. Security considerations . . . . . . . . . . . . . . . . . . . 9 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 1. Introduction During the period of IPv6 transition, both ISPs and ICPs (Internet Content Providers) care about user's experience in dual-stack networks. They hesitate to provide IPv6 to their users due to the fear of poor IPv6 performance. Happy Eyeballs v2 (HEv2) [RFC8305] provides an approach to enable clients to attempt multiple connections in parallel. It is helpful to work around the blocked, broken, or sub-optimal network. Taking IPv6 priority consideration in design, HEv2 helps increase IPv6 traffic in networks and reduce the delay in client side as well if IPv6 connectivity is poorer. So far, most modern web browsers support HEv2 very well, thanks to popular web browser engines, such as WebKit and Trident. However, in practice there are still some barriers keeping Mobile developers who develop Apps with APIs and libs which don't not implementing HEv2. Xie, et al. Expires May 29, 2019 [Page 2] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 Firstly, HEv2 adds additional complexity and uncertainties to both development and operation. For example, according to the section 8 of [RFC8305] there are 6 Configurable values such as Resolution Delay and Connection Attempt Delay. It raises a bar for small application developers to do a "nuanced implementation" to tune these values according to network dynamics. Secondly, paralleled connections emitted by HEv2 produces larger volume of traffic which consume both mobile fees and power. As a result, mobile application developers may choose not to adopt HEv2 or postpone their IPv6 transition due to those issues. The third, client-based HEv2 hides some of the possible IPv6 connectivity issues to the operator, because users don't notice anything broken, so they aren't reporting it to their providers. Those issues are more notable in regions where IPv6 performance is not as good as IPv4 in terms of RTT and failure rate [APNIC-v6perf]. This memo is intended to proposed a Network-side Happy Eyeballs (NHE), an approach to improve IPv6 connectivity by doing network-side IPv6 measurement and failure reporting. Instead of requiring the client to race IPv6 and IPv4 connections, NHE intends to do the "race" on the network side. NHE aims to provide helpful alert information for ISP to fix the networking issues by themselves. In addition, this memo also introduces a potential use case of NHE to work around networking issues which can't be resolved locally (issues of third parties on the path to the destination, or the destination itself, for example). The rationale of NHE approach is simple. Considering that ISPs typically the mobile and broadband network providers have more resources, capability and motivation to do accurate IPv6/IPv4 performance measurement, using existing protocols for the immediate alert/reporting of failures, those can then be analyzed and resolved, improving network reliability, for the good of their users. With sufficient and accurate troubleshooting information, ISPs will have a crystal clear vision about their IPv6 network performance and spare no effort to improve it. 2. Overview of NHE Framework As shown in Figure 1 NHE Frame consists of three key components: IPv6/IPv4 performance measurement, IPv6 failure reporting and troubleshooting actions on these failures. To resolve the issue of Client-based HEv2 concealing the operational issues of IPv6 network, IPv6 failure reporting is a key element in NHE, by reporting and collecting precise performance information of the IPv6 network. Note that IPv6 failure event is not necessary only triggered by disconnection or severe packet dropping. It includes all events once Xie, et al. Expires May 29, 2019 [Page 3] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 IPv6 connection is slower than IPv4 in the race. Section 4 will introduce more about how IPv6 failure reporting works. The IPv6/IPv4 performance measurement component is designed to feed IPv6 failure reporting, by performing IPv6/IPv4 RTT measurement on a special list of domains (called measuring list). The list of domains under the measurement can start with a well populated cache, then updated in alignment with a certain dynamic popularity of Domains in the network. To achieve better accuracy of measurement, probes may be located adjacent to clients on the edge of the network. The criteria of putting a specific domain into that list and how to perform the measurement are introduced in Section 3. +-------------+ +--------------+ +---------------+ | IPv6/IPv4 +---> | IPv6 Failure +---> |Troubleshooting| | Measurement | | Reporting | | Actions | +-------------+ +--------------+ +---------------+ Figure 1: High-Level NHE Framework After IPv6 failure information is collected and analyzed, various troubleshooting actions can be adopted accordingly. Most of the actions are similar to IPv4 network troubleshooting. For example, if the problem is local, operators should resolve the networking issue as soon as possible. If the problem is caused by far-end or third- parties, the ISP may check the upstream ISPs or transit peering ASs to clear the issue (withdraw some BGP peerings for example). There is a case with an action which can be adopted temporarily to reduce the suffering of IPv6 poor performance for a specific domain. It will be introduced in Section 5. Note that NHE can work independently with client's adoption of HEv2 and both coexist without conflicting in the NHE framework. 3. IPv6/IPv4 Performance Measurement An accurate IPv6 performance measurement is vital to the success of NHE. An accurate measurement depends on what to measure, where and how to measure. 3.1. Performance metrics In client-side HEv2, a kind of round-trip delay or round-trip time(RTT) metric is used where a race between IPv6 and IPv4 connection is measured, starting from the domain name resolution, Xie, et al. Expires May 29, 2019 [Page 4] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 then the TCP setup on both address families. In NHE, a similar approach is adopted in the ISP network to simulate a client doing a race for a list of domains. o Lookup the domain name. If a positive AAAA response with at least one valid AAAA record is received, it continues the process. If a negative response with no AAAA record is received, it will break and continue with another domain in the list. Note that, if a negative response with ServFail is received, which means error on the far-end server, it should be marked "alert to operator" to report "ServFail" incident. It is observed that some clients will continue asking AAAA queries after receiving ServFail response. o Make TCP connections via all IPv6 and IPv4 destination addresses returned. Note that in NHE there is no address sorting or connection attempt Delay which are important in the design of client-side HEv2. NHE measuring server can concurrently make connections on all addresses returned. o The round-trip delay is measured including the RTT of the domain name resolution and the RTT of the TCP setup (started when sending the SYN and ending when the ACK is received). If there is more than one IP address in either the AAAA or A record responses, all the addresses should be measured for the round-trip delay. o Calculate the difference of round-trip delay (Diff-RTT) of different address families. If there are more than one IP address in either the AAAA or A record responses, the minimum RTT of a destination from one address family will be chosen to do the difference, that is Min(RTT-IPv6)-min(RTT-IPv4) o For each domain, if the difference of the round-trip delay of IPv6 and IPv4 is larger than a configurable threshold, the domain will be recorded in a local list and flagged as "alert to operator" with "Poor IPv6 Performance" incident. This action will trigger the reporting algorithm (described in section 4). If the domain is already listed in the local list with a flag "alert to operator", nothing should be done (in order to avoid repetitive alerts). o When a follow-up measurement result shows that, for a given domain, which was previously flagged as "alert to operator", there is no longer an issue, the "alert to operator" flag must be cleared and the reporting algorithm will be triggered. Note that the threshold value should be tunable by the network provider to gain a better tradeoff between IPv6 vs IPv4 performance and allow to adjust the IPv6 vs IPv4 priority local policy. Xie, et al. Expires May 29, 2019 [Page 5] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 In the measurement process described above, there is an important domain list called Measuring list which contains the targeted domains. The list can be formed and updated from the popular domains visited by users of the network. This measurement should be done periodically on each domain, every a given configurable number of hours. Compared to Client-side HEv2, NHE operated by ISPs have more resources to do better performance measurement. For exmaple, the race on the handset measures the round-trip delay on one instantaneous connection which does not fully represent the connectivity performance of one address family in a persistent period. For example, erratic variation in delay (caused by network jitter) makes it difficult to support many interactive real-time applications. So, the statistics of round- trip delay are helpful for ISPs to build more sophisticated measurements. Section 4 of [RFC2681] specifies some statistics definitions for round-trip delay which can be utilized for advanced Round-trip delay measurement, such as percentile, median, minimum, inverse percentile, etc. Also, HEv2 measurements may be influenced by access network problems, which don't affect NHE measurements. The ISP should measure the access network problems using alternative means. 3.2. Location of IPv6/IPv4 Measurement According to the accuracy requirement of user performance simulation, the location where the measurement is done is very important. The intuitive approach is to place the measuring probes or servers on the edge, in proximity to the end users. In 4G LTE cellular networks as a typical case, the performance measurement servers can be located in proximity to base stations (or an aggregation point). There is only at most one hop difference in the end-to-end path between a real end- user and a destination. In the case of broadband networks,the measuring probes can be collocated with the BRAS, OLTs, or equivalent aggregation points, depending on each access technology. Setting up probes at different parts of the network, including core, or close to the upstream provider connectivity, can help to determine the source of the issues, especially if they affect many domains. Moreover, probes can be designed into a special mobile device for reporting purpose. There are many choices. For example, people can implement a specially application to do the probing and reporting to a collector operated by the operator. Modern APM (application performance measurement) and NPM (network performance measurement) Xie, et al. Expires May 29, 2019 [Page 6] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 technologies allow normal application software integrated with special SDK (Software Development Kit) for measuring purposes. So, it is possible for mobile App providers to adopt NHE framework to avoid IPv6 poor performance by themselves, although the troubleshooting actions are different from the one on the ISP case. 3.3. Reducing measurement traffic Since the IPv6/IPv4 performance varies per domain, there is a fear of having to generate a lot of measuring traffic in NHE. There are two approaches that may be helpful to generate less traffic. One is to keep a moderate size of Measuring list list including, for example, the top 1k popular domains in the cache. The size of the Measuring list can be configurable as well according to the ISP local policy.One optional approach to limit the size of the measuring list is to focus on top 1k Apps other than domains. ISPs can cooperate with ICPs to maintain a domain list of top Apps for NHE. The second approach to reduce the measurement traffic, is to use passive measurements. The round-trip delay of DNS lookup of a particular domain is trivial in most of the cases if there is a cache hit. So passive measurement should focus on monitoring TCP connection of specific destinations. Suppose there are 1000 top popular domains in the measurement list, which means a thousand of TCP connections will be inserted into the passive monitoring to measure the round-trip delay. 4. Reporting IPv6 failures using syslog In order to simplify the reporting of the NHE failures, syslog ([RFC5424]) over UDP ([RFC5426]), MUST be used, by means of the default port (514) with IPv6-only. The intent is to make this reporting very simple, so no choice of alternative ports or transport protocols is offered. Operators willing to use this reporting MUST configure at least one syslog collector. The configuration can be done in a static way, providing dedicated IPv4/IPv6 addresses for the syslog collectors and probes. As an alternative, a more automated procedure can be done by configuring at least one syslog collector at the IPv6 prefix formed as: Network-Specific Prefix::192.88.99.1 Xie, et al. Expires May 29, 2019 [Page 7] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 The Network-Specific Prefix (NSP) MUST be chosen by the operator from its RIR allocated IPv6 addressing space. Additional collectors can be made available by using anycast at the NSP + 192.88.99.0/24 prefix Note that messages encoded in syslog are to be defined. As introduced in Section 3.1, syslog of NHE should contain two kinds of message to report "Poor IPv6 Performance" incident and "ServFail" incident. 4.1. Discovery of the syslog collector NSP In case the automated procedure is used, the same mechanism described by RFC7050 ([RFC7050]) should be used to look for the address of the syslog collector(s). Because the collectors will be using an IPv6 address with the 32 low order bits from the reserved range 192.88.99.0/24, this will not be in conflict with any public addresses used in Internet, so this mechanism is compatible with the expected usage of the NSP for NAT64. 5. One Use Case of Troubleshooting action Besides the normal network troubleshooting measures taken by network operators as usual in IPv4 networks, there are other troubleshooting actions for temporary but urgent workarounds. Before [RFC6555] and [RFC8305] were documented, selective filtering of the DNS AAAA record (returning NODATA) was proposed as a practice making the IPv6 transition less painful [Less-painful]. The basic idea introduced is that ISP DNS Recursive servers does not return AAAA for users who have broken IPv6 connectivity. There are some working implementations of such filter AAAA option in BIND 9 [Filtering-AAAA]. However, it should be noticed that there are two security risks on selective filtering. One is that it may break DNSSEC and omit RRSIG records covering type AAAA as well as AAAA record. The second is that filtering AAAA records cause DNS incoherency in the end-users perspective which may causes some risks if end user's application depend on the integrity of DNS data. To reduce both security risks, an alternative approach for an ISP using NHE, could be to run a special resolver which artificially delays the AAAA answers of a targeted domain name. A domain name being targeted means that the IPv6 performance of that domain name is measured and reported with poor performance. So, instead of Xie, et al. Expires May 29, 2019 [Page 8] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 filtering the AAAA record, postponing the AAAA responses with a configurable timer (i.e., 300 ms) may cause IPv6 connection losing the race on the client side which avoid Concurrent IPv6 and IPv4 connection attempts. It will help the HEv2 client. The non-HE client will fall back sooner to IPv4 without IPv6 connections and retries. Note that there is a corner case when negative response with ServFail are received for a domain name lookup, no ServFail response should be returned to the client, because it is observed that some clients will continue querying for AAAA RRs after receiving ServFail response. In this case, the resolver could silently drop the query without responding to the client. 6. Security considerations TBD 7. IANA Considerations No IANA considerations for this memo 8. Acknowledgments Acknowledgments are given to Geoff Huston, David Schinazi, Marc Blanchet, and Paul Vixie who gave comments and suggestions on the conception of NHE. Thanks to Tony Finch and Tommy Pauly who gave positive comment on the part which 01 version stands on. 9. References [APNIC-v6perf] APNIC, "APNIC IPv6 Performance Monitoring", . [Filtering-AAAA] "Filter AAAA option in BIND 9", August 2017, . [Less-painful] Yahoo, "IPv6 and recursive resolvers:How do we make the transition less painful?", March 2010, . Xie, et al. Expires May 29, 2019 [Page 9] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 [NHE-GitHub] BII, "GitHub Repository of Network-side Happy Eyeballs", . [RFC2681] Almes, G., Kalidindi, S., and M. Zekauskas, "A Round-trip Delay Metric for IPPM", RFC 2681, DOI 10.17487/RFC2681, September 1999, . [RFC5424] Gerhards, R., "The Syslog Protocol", RFC 5424, DOI 10.17487/RFC5424, March 2009, . [RFC5426] Okmianski, A., "Transmission of Syslog Messages over UDP", RFC 5426, DOI 10.17487/RFC5426, March 2009, . [RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April 2012, . [RFC7050] Savolainen, T., Korhonen, J., and D. Wing, "Discovery of the IPv6 Prefix Used for IPv6 Address Synthesis", RFC 7050, DOI 10.17487/RFC7050, November 2013, . [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: Better Connectivity Using Concurrency", RFC 8305, DOI 10.17487/RFC8305, December 2017, . Authors' Addresses Chongfeng Xie China Telecom No.118 Xizhimennei street, Xicheng District Beijing 100035 P. R. China Email: xiechf.bri@chinatelecom.cn Xie, et al. Expires May 29, 2019 [Page 10] Internet-DraftNetwork-side Happy Eyeballs based on accurateNovember 2018 Linjian Song Beijing Internet Institute 2nd Floor, Building 5, No.58 Jing Hai Wu Lu, BDA Beijing 100176 P. R. China Email: songlinjian@gmail.com Jordi Palet Martinez The IPv6 Company Molino de la Navata, 75 Madrid, La Navata - Galapagar 28420 Spain Email: jordi.palet@theipv6company.com URI: http://www.theipv6company.com/ Xie, et al. Expires May 29, 2019 [Page 11]