idnits 2.17.1 

draft-song-atr-large-resp-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([ATR-Github]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 189: '...f ATR payload size and timer SHOULD be...'
     RFC 2119 keyword, line 250: '...at AT bit and TC bit SHOULD be set and...'
     RFC 2119 keyword, line 365: '...hat in IPv4 ATR payload size SHOULD be...'
     RFC 2119 keyword, line 366: '...n IPv6 the value SHOULD be 1232 octets...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 8, 2018) is 2180 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ATR-Github'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Bennett'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Brownlee'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'How-ATR-Work'

  ** Downref: Normative reference to an Informational draft:
     draft-taylor-v6ops-fragdrop (ref. 'I-D.taylor-v6ops-fragdrop')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IPv6-frag-DNS'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Liang'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Not-speak-TCP'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Paxson'

  ** Downref: Normative reference to an Informational RFC: RFC 7872

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Tinta'


     Summary: 5 errors (**), 0 flaws (~~), 1 warning (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                  L. Song
3	Internet-Draft                                Beijing Internet Institute
4	Intended status: Standards Track                             May 8, 2018
5	Expires: November 9, 2018

7	       ATR: Additional Truncation Response for Large DNS Response
8	                      draft-song-atr-large-resp-01

10	Abstract

12	   As the increasing use of DNSSEC and IPv6, there are more public
13	   evidence and concerns on IPv6 fragmentation issues due to larger DNS
14	   payloads over IPv6.  This memo introduces an simple improvement on
15	   DNS server by replying an additional truncated response just after
16	   the normal fragmented response.  It can be used to relieve users
17	   suffering on DNS latency and failures due to large DNS response.  It
18	   also can be utilized as a measuring and troubleshooting tool to
19	   locate the issue and conquer.

21	   REMOVE BEFORE PUBLICATION: The source of the document with test
22	   script is currently placed at GitHub [ATR-Github].  Comments and pull
23	   request are welcome.

25	Status of This Memo

27	   This Internet-Draft is submitted in full conformance with the
28	   provisions of BCP 78 and BCP 79.

30	   Internet-Drafts are working documents of the Internet Engineering
31	   Task Force (IETF).  Note that other groups may also distribute
32	   working documents as Internet-Drafts.  The list of current Internet-
33	   Drafts is at https://datatracker.ietf.org/drafts/current/.

35	   Internet-Drafts are draft documents valid for a maximum of six months
36	   and may be updated, replaced, or obsoleted by other documents at any
37	   time.  It is inappropriate to use Internet-Drafts as reference
38	   material or to cite them other than as "work in progress."

40	   This Internet-Draft will expire on November 9, 2018.

42	Copyright Notice

44	   Copyright (c) 2018 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (https://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
60	   2.  The ATR mechanism . . . . . . . . . . . . . . . . . . . . . .   4
61	   3.  Indicating a ATR response . . . . . . . . . . . . . . . . . .   5
62	   4.  Operational considerations  . . . . . . . . . . . . . . . . .   6
63	     4.1.  ATR timer . . . . . . . . . . . . . . . . . . . . . . . .   6
64	     4.2.  ATR payload size  . . . . . . . . . . . . . . . . . . . .   7
65	     4.3.  Less aggresiveness of ATR . . . . . . . . . . . . . . . .   8
66	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
67	   6.  IANA considerations . . . . . . . . . . . . . . . . . . . . .   9
68	   7.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .   9
69	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
70	   Appendix A.  How well does ATR actually work? . . . . . . . . . .  11
71	   Appendix B.  Considerations on Resolver awareness of ATR  . . . .  12
72	   Appendix C.  Revision history of this document  . . . . . . . . .  13
73	     C.1.  draft-song-atr-large-resp-01  . . . . . . . . . . . . . .  13
74	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  14

76	1.  Introduction

78	   Large DNS response is identified as a issue for a long time.  There
79	   is an inherent mechanism defined in [RFC1035] to handle large DNS
80	   response (larger than 512 octets) by indicating (set TrunCation bit)
81	   the resolver to fall back to query via TCP.  Due to the fear of cost
82	   of TCP, EDNS(0) [RFC6891] was proposed which encourages server to
83	   response larger response instead of falling back to TCP.  However, as
84	   the increasing use of DNSSEC and IPv6, there are more public evidence
85	   and concerns on user's suffering due to packets dropping caused by
86	   IPv6 fragmentation in DNS due to large DNS response.

88	   It is observed that some IPv6 network devices like firewalls
89	   intentionally choose to drop the IPv6 packets with fragmentation
90	   Headers[I-D.taylor-v6ops-fragdrop].  [RFC7872] reported more than 30%
91	   drop rates for sending fragmented packets.  Regarding IPv6
92	   fragmentation issue due to larger DNS payloads in response, one
93	   measurement [IPv6-frag-DNS] reported 35% of endpoints using
94	   IPv6-capable DNS resolver can not receive a fragmented IPv6 response
95	   over UDP.  Moreover, most of the underlying issues with fragments are
96	   unrevealed due to good redundancy and resilience of DNS.  It is hard
97	   for DNS client and server operators to trace and locate the issue
98	   when fragments are blocked or dropped.  The noticeable DNS failures
99	   and latency experienced by end users are just the tip of the iceberg.

101	   Depending on retry model, the resolver's failing to receive
102	   fragmented response may experience long latency or failure due to
103	   timeout and reties.  One typical case is that the resolver finally
104	   got the answer after several retires and it falls back to TCP after
105	   deceasing the payload size in EDNS0.  To avoid that issue, some
106	   authoritative servers may adopt a policy ignoring the UDP payload
107	   size in EDNS0 extension and always truncating the response when the
108	   response size is large than a expected one.  However one study
109	   [Not-speak-TCP] shows that about 17% of resolvers in the samples can
110	   not ask a query in TCP when they receive truncated response.  It
111	   seems a dilemma to choose hurting either the users who can not
112	   receive fragments or the users without TCP fallback capacity.  There
113	   is also some voice of "moving all DNS over TCP".  But It is generally
114	   desired that DNS can keep the efficiency and high performance by
115	   using DNS UDP in most of time and fallback as soon as possible to TCP
116	   if necessary for some corner case.

118	   To relieve the problem, this memo introduces an small improvement on
119	   DNS responding process by replying an Additional Truncated Response
120	   (ATR) just after a normal large response which is to be fragmented.
121	   Generally speaking ATR provides a way to decouple the EDNS0 and TCP
122	   fallback in which they can work independently according to the server
123	   operator's requirement.  One goal of ATR is to relieve the hurt of
124	   users, both stub and recursive resolver, from the position of server,
125	   both authoritative and recursive server.  It does not require any
126	   changes on resolver and has a deploy-and-gain feature to encourage
127	   operators to implement it to benefit their resolvers.

129	   Another goal of ATR is to help troubleshooting for DNS operators
130	   where ATR can be deployed as a measurement tool to identify
131	   vulnerable servers and users.  A flag bit is required in EDNS0 OPT
132	   header to distinguish ATR response from a ordinary truncated
133	   response.  A resolver (or troubleshooter) can tell if there is any
134	   fragment not received in a certain transaction with a name server by
135	   receiving only ATR response without ordinary UDP response.  Another
136	   way of using ATR as measurement tool is to reply ATR response to
137	   specific group of resolvers and record the TCP connections it
138	   received during that period.  It can help identify vulnerable users
139	   and adopt ATR to them selectively.

141	   [REMOVE BEFORE PUBLICATION] Note that in Appendix A of this memo
142	   there is a brief introduction of the test done by APNIC on how well
143	   does ATR actually work.  And comments are also attached by the author
144	   of this memo.  It may help people to understand what the benefit and
145	   tradeoff that ATR brings.

147	2.  The ATR mechanism

149	   The ATR mechanism is very simple that it involves a ATR module in the
150	   responding process of current DNS implementation . As show in the
151	   following diagram the ATR module is right after truncation loop if
152	   the packet is not going to be fragmented.

154	   A DNS +-------------+        +-------------+  Normal
155	   query |             | No     |             | response
156	   +------>  Truncation +-------->     ATR     +--------->
157	         |    loop     |        |    Module   |
158	         | truncation? |        | truncation? |
159	         +-------------+        +-------------+
160	             yes|                   yes|     +-----+
161	                |                      +-----+timer+-->
162	                |                            +-----+
163	                |                      Truncated Response
164	                +--------------->
165	                 Truncated Response

167	                  Figure 1: High-Level Testbed Components

169	   The ATR responding process goes as follows:

171	   o  When an authoritative server receives a query and enters the
172	      responding process, it first go through the normal truncation loop
173	      to see whether the size of response surpasses the EDNS0 payload
174	      size.  If yes, it ends up with responding a truncated packets.  If
175	      no, it enters the ATR module.

177	   o  In ATR module, similar like truncation loop, the size of response
178	      is compared with a value called ATR payload size.  If the response
179	      of a query is larger than ATR payload size, the server firstly
180	      sends the normal response and then coin a truncated response with
181	      the same ID of the query.

183	   o  The server can reply the coined truncated response in no time.
184	      But considering the possible impact of network reordering, it is
185	      suggested a timer to delay the second truncated response, for
186	      example 10~50 millisecond which can be configured by local
187	      operation.

189	   Note that the choice of ATR payload size and timer SHOULD be
190	   configured locally.  And the operational consideration and guidance
191	   is discussed in Section 4.2 and Section 4.1 respectively.

193	   There are three typical cases of ATR-unaware resolver behavior when a
194	   resolver send query to an ATR server in which the server will
195	   generate a large response with fragments:

197	   o  Case 1: a resolver (or sub-resolver) will receive both the large
198	      response and a very small truncated response in sequence.  It will
199	      happily accepts the first response and drop the second one because
200	      the transaction is over.

202	   o  Case 2: In case a fragment is dropped in the middle, the resolver
203	      will end up with only receiving the small truncated response.  It
204	      will retry using TCP in no time.

206	   o  Case 3: For those (probably 30%*17% of them) who can not speak TCP
207	      and sitting behind a firewall stubbornly dropping fragments.  Just
208	      say good luck to them!

210	   In the case authoritative server truncated all response surpass
211	   certain value , for example setting IPv6-edns-size to 1220 octets,
212	   ATR will helpful for resolver with TCP capacity, because the resolver
213	   still has a fair chance to receive the large response.

215	3.  Indicating a ATR response

217	   As introduced in ATR it is necessary to distinguish ATR response in a
218	   special way from a ordinary truncated response.  It enables resolver
219	   operators to log cases where ATR responses is received without a
220	   (reassembled) UDP response to a query.  Without an indicator that
221	   distinguishes ATR response, there would be no way to avoid false
222	   alarms from authoritative servers that always and only return
223	   truncated responses when the message exceeds some size.  It is
224	   actually the use case where Google RDNS is considering.  Google RDNS
225	   would like to use such indications to flag problematic name servers
226	   where RDNS should restrict maximum EDNS to a lower value than the
227	   default 4096 that currently used.

229	   A simple way for that indicator of ATR response is to define a bit in
230	   the Z field on the EDNS0 OPT header in the response.  This has the
231	   virtue of simplicity, and only a minimal risk of breaking existing
232	   implementations.  This bit is referred to as the "ATR Response" (AT)
233	   bit.  In the context of the EDNS0 OPT meta-RR, just following the DO
234	   bit, the AT bit is the second bit of the third and fourth bytes of
235	   the "extended RCODE and flags" portion (section 6.1.3 of RFC6891) of
236	   the EDNS0 OPT meta-RR, structured as follows:

238	                    +0 (MSB)                +1 (LSB)
239	            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
240	         0: |   EXTENDED-RCODE      |       VERSION         |
241	            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
242	         2: |DO|AT|                 Z                       |
243	            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

245	       Figure 2: Wire format of extended RCODE and flags with AT bit

247	   The logic of AT bit is simple that setting the AT bit to one in a
248	   response indicates to the resolver that the response is an ATR
249	   response.  The AT bit cleared(set to zero) indicates the response is
250	   a ordinary response.  Note that AT bit and TC bit SHOULD be set and
251	   appear in the response as a pair.  The response will be ignored if
252	   only AT bit is set.

254	   The indication of ATR defined in this memo is for measurement and
255	   logging purpose.  But it is possible used by resolver operator in
256	   other aspects, because it signals the resolver the large response is
257	   fragmented and dropped on the path.  The resolver could act more
258	   actively if it is able to recognized that bit.  More is discussed in
259	   Appendix B.

261	4.  Operational considerations

263	   In previous sections, only behavior of ATR server and AT bit are
264	   specified.  There are lots of space for operational issues, such as
265	   the parameter of the ATR timer and ATR payload size, and policies on
266	   when ATR is triggered to avoid side-effect.

268	4.1.  ATR timer

270	   As introduced in Section 2 ATR timer is a way to avoid the impact of
271	   network reordering(RO).  The value of the timer is critical, because
272	   if the delay is too short, the ATR response may be received earlier
273	   than the fragmented response (the first piece), the resolver will
274	   fall back to TCP bearing the cost which should have been avoided.  If
275	   the delay is too long, the client may timeout and retry which negates
276	   the incremental benefit of ATR.  Generally speaking, the delay of the
277	   timer should be "long enough, but not too long".

279	   To the best knowledge of author, the nature of RO is characterized as
280	   follows hopefully helping ATR users understand RO and how to operate
281	   ATR appropriately in RO context.

283	   o  RO is mainly caused by the parallelism in Internet components and
284	      links other than network anomaly [Bennett].  It was observed that
285	      RO is highly related to the traffic load of Internet components.
286	      So RO will long exists as long as the traffic load continue
287	      increase and the parallelism is used to enhance network
288	      throughput.

290	   o  The probability of RO varies largely depending on the different
291	      tests samples.  Some work shown RO probability below 2% [Paxson]
292	      [Tinta] and another work was above 90% [Bennett].  But it is
293	      agreed that RO is site-dependent and path-dependent.  It is
294	      observed in that when RO happens, it is mostly exhibited
295	      consistently in a small percentages of the paths.  It is also
296	      observed that higher rates smaller packets were more prone to RO
297	      because the sending inter-spacing time was small.

299	   o  It was reported that the inter-arrival time of RO varies from a
300	      few milliseconds to multiple tens of milliseconds [Tinta].  And
301	      the larger the packet the larger the inter-arrival time, since
302	      larger packets will take longer to be transmitted.

304	   Reasonably we can infer that firstly RO should be taken into account
305	   because it long exists due to middle Internet components which can
306	   not be avoided by end-to-end way.  Secondly the mixture of larger and
307	   small packets in ATR case will increase the inter-arrival time of RO
308	   as well as the its probability.  The good news is that the RO is
309	   highly site specific and path specific, and persistent which means
310	   the ATR operator is able to identify a few sites and paths, setup a
311	   tunable timer setting for them, or just put them into a blacklist
312	   without replying ATR response.

314	   Based on the above analysis it is hard to provide a perfect value of
315	   ATR timer for all ATR users due to the diversity of networks.  It
316	   seems OK to set the timer with a range from ten to hundreds ms, just
317	   below the timeout setting of typical resolver.  Is suggested that a
318	   decision should be made as operator-specific according to the
319	   statistic of the RTT of their users.  Some measurement shown
320	   [Brownlee][Liang] the mean of response time is below 50 ms for the
321	   sites with lots of anycast instance like L-root, .com and .net name
322	   servers.  For that sites, delay less than 50 ms is appropriate.

324	4.2.  ATR payload size

326	   Regarding the operational choice for ATR payload size, there are some
327	   good input from APNIC study [scoring-dns-root]on how to react to
328	   large DNS payload for authoritative server.  The difference in ATR is
329	   that ATR focuses on the second response after the ordinary response.

331	   For IPv4 DNS server, it is suggested the study that do not truncate
332	   and fragment IPv4 UDP response with a payload up to 1472 octets which
333	   is Ethernet MTU(1500) minus the sum of IPv4 header(20) and UDP
334	   header(8).  The reason is to avoid gratuitously fragmenting outbound
335	   packets and TCP fallback at the source.

337	   In the case of ATR, the first ordinary response is emitted without
338	   knowing it be to fragmented or not on the path.  If a large value is
339	   set up to 1472 octets, payload size between 512 octets and the large
340	   value size will probably get fragmented by aggressive firewalls which
341	   leads losing the benefit of ATR.  If ATR payload size set exactly 512
342	   octets, in most of case ATR response and the single unfragmented
343	   packets are under a race at the risk of RO.

345	   Given IPv4 fragmentation issue is not so serious compared to IPv6, it
346	   is suggested in this memo to set ATR payload size 1472 octets which
347	   means ATR only fit large DNS response larger than 1500 octets in
348	   IPv4.

350	   For IPv6 DNS server, similar to IPv4, the APNIC study is suggested
351	   that do not truncate IPv6 UDP packets with a payload up to 1,452
352	   octets which is Ethernet MTU(1500) minus the sum of IPv6 header(40)
353	   and UDP header(8). 1452 octets is chosen to avoid TCP fallback in the
354	   context that most TCP MSS in the root server is not set probably at
355	   that time.

357	   In the case of ATR considering the second truncated response, a
358	   smaller size: 1232 octets, which is IPv6 MTU for most network
359	   devices&#65288;1280&#65289; minus the sum of IPv6 header(40) and UDP
360	   header(8), should be chosen as ATR payload size to trigger necessary
361	   TCP fallback.  As a complementary requirement with ATR, the TCP MSS
362	   should be set 1220 octets to avoid Packet Too Big ICMP message as
363	   suggested in the APNIC study.

365	   In short, it is recommended that in IPv4 ATR payload size SHOULD be
366	   1472 octets, and in IPv6 the value SHOULD be 1232 octets.

368	4.3.  Less aggresiveness of ATR

370	   There is a concern ATR sends TC=1 response too aggressively
371	   especially in the beginning of adoption.  One of the idea to mitigate
372	   this aggressiveness, ATR may respond TC=1 responses at a low
373	   possibility, such as 10%.

375	   Another way to mitigating is to reply ATR response selectively.  It
376	   is observed that RO and IPv6 fragmentation issues are path specific
377	   and persistent due to the Internet components and middle box.  So it
378	   is reasonable to keep a ATR "whitelist" by counting the retries and
379	   recording the IP destination address of that large response causing
380	   many retires.  ATR only acts to those queries from the IP address in
381	   the white list.

383	5.  Security Considerations

385	   There may be concerns on DDoS attack problem due to the fact that the
386	   ATR introduces multiple responses from authoritative server.  DNS
387	   cookies [RFC7873] and RRL on authoritative may be possible solutions

389	6.  IANA considerations

391	   EDNS(0) [RFC6891] defines 16 bits as extended flags in the OPT
392	   record, these bits are encoded into the TTL field of the OPT record.
393	   IETF Standards Action is required for assignments of new EDNS(0)
394	   flags.

396	   This document reserves one of these bits as the AT bit.  It is
397	   requested that the second bit (left most) be allocated.  Thus the USE
398	   of the OPT record TTL field would look like:

400	                    +0 (MSB)                +1 (LSB)
401	            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
402	         0: |   EXTENDED-RCODE      |       VERSION         |
403	            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
404	         2: |DO|AT|                 Z                       |
405	            +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

407	         Figure 3: the USE of the OPT record TTL field for AT bit

409	7.  Acknowledgments

411	   Many thanks to reviewers and their comments.  Geoff Huston and Joao
412	   Damas did a testing on the question "How well does ATR actually
413	   work?".  Alexander Dupuy proposed the idea to distinguish ATR
414	   responses from normal ones.  Akira Kato contributed ideas on
415	   operational consideration.

417	8.  References

419	   [ATR-Github]
420	              "XML source file and test script of DNS ATR", September
421	              2017, <https://github.com/songlinjian/DNS_ATR>.

423	   [Bennett]  "Packet Reordering is Not Pathological Network Behavior",
424	              December 1999, <http://citeseerx.ist.psu.edu/viewdoc/
425	              download?doi=10.1.1.461.7629&rep=rep1&type=pdf>.

427	   [Brownlee]
428	              "Response time distributions for global name servers",
429	              2002,
430	              <http://www.caida.org/publications/papers/2002/nsrtd/
431	              nsrtd.pdf>.

433	   [How-ATR-Work]
434	              APNIC, "How well does ATR actually work?", April 2018,
435	              <https://blog.apnic.net/2018/04/16/
436	              how-well-does-atr-actually-work/>.

438	   [I-D.taylor-v6ops-fragdrop]
439	              Jaeggli, J., Colitti, L., Kumari, W., Vyncke, E., Kaeo,
440	              M., and T. Taylor, "Why Operators Filter Fragments and
441	              What It Implies", draft-taylor-v6ops-fragdrop-02 (work in
442	              progress), December 2013.

444	   [IPv6-frag-DNS]
445	              "Dealing with IPv6 fragmentation in the DNS", August 2017,
446	              <https://blog.apnic.net/2017/08/22/
447	              dealing-ipv6-fragmentation-dns>.

449	   [Liang]    Tsinghua University, "Measuring Query Latency of Top Level
450	              DNS Servers", February 2013,
451	              <https://netsec.ccert.edu.cn/duanhx/files/2013/02/
452	              latency.pdf>.

454	   [Not-speak-TCP]
455	              "A Question of DNS Protocols", August 2013,
456	              <https://labs.ripe.net/Members/gih/
457	              a-question-of-dns-protocols>.

459	   [Paxson]   "End-to-End Internet Packet Dynamics", August 1999,
460	              <https://cseweb.ucsd.edu/classes/fa01/cse222/papers/
461	              paxson-e2e-packets-sigcomm97.pdf>.

463	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
464	              specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
465	              November 1987, <https://www.rfc-editor.org/info/rfc1035>.

467	   [RFC6891]  Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms
468	              for DNS (EDNS(0))", STD 75, RFC 6891,
469	              DOI 10.17487/RFC6891, April 2013,
470	              <https://www.rfc-editor.org/info/rfc6891>.

472	   [RFC7872]  Gont, F., Linkova, J., Chown, T., and W. Liu,
473	              "Observations on the Dropping of Packets with IPv6
474	              Extension Headers in the Real World", RFC 7872,
475	              DOI 10.17487/RFC7872, June 2016,
476	              <https://www.rfc-editor.org/info/rfc7872>.

478	   [RFC7873]  Eastlake 3rd, D. and M. Andrews, "Domain Name System (DNS)
479	              Cookies", RFC 7873, DOI 10.17487/RFC7873, May 2016,
480	              <https://www.rfc-editor.org/info/rfc7873>.

482	   [scoring-dns-root]
483	              APNIC, "Scoring the DNS Root Server System", November
484	              2016, <https://blog.apnic.net/2016/11/15/
485	              scoring-dns-root-server-system/>.

487	   [Tinta]    "Characterizing End-to-End Packet Reordering with UDP
488	              Traffic", August 2009, <https://static.googleusercontent.c
489	              om/media/research.google.com/en//pubs/archive/35247.pdf>.

491	Appendix A.  How well does ATR actually work?

493	   It is worth of mentioning APNIC report[How-ATR-Work] on "How well
494	   does ATR actually work?" done by Geoff Huston and Joao Damas after 00
495	   version of ATR draft.  It was reported firstly in IEPG meeting before
496	   IETF 101 and then posted in APNIC Blog later.

498	   It is said the test was performed over 55 million endpoints, using an
499	   online ad distribution network to deliver the test script across the
500	   Internet.  The result is positive that ATR works!  From the end
501	   users' perspective, in some 9% of IPv4 cases the use of ATR by the
502	   server will improve the speed of resolution of a fragmented UDP
503	   response by signaling to the client an immediate switch to TCP to
504	   perform a re-query.  The IPv6 behavior would improve the resolution
505	   times in 15% of cases.

507	   It also analyzed the pros and cons of ATR.  On one hand, It is said
508	   that ATR certainly looks attractive if the objective is to improve
509	   the speed of DNS resolution when passing large DNS responses.  And
510	   ATR is incrementally deployable in favor of decision made by each
511	   server operator.  On another hand, ATR also has some negative
512	   factors.  One issue is adding another DNS DDoS attack vector due to
513	   the additional packet sent by ATR, (author's note : very small adding
514	   actually.)  Another issue is risk of RO by the choice of the delay
515	   timer which is discussed fully in Section 4.1.

517	   As a conclusion, it is said that "ATR does not completely fix the
518	   large response issue.  If a resolver cannot receive fragmented UDP
519	   responses and cannot use TCP to perform DNS queries, then ATR is not
520	   going to help.  But where there are issues with IP fragment
521	   filtering, ATR can make the inevitable shift of the query to TCP a
522	   lot faster than it is today.  But it does so at a cost of additional
523	   packets and additional DNS functionality".  "If a faster DNS service
524	   is your highest priority, then ATR is worth considering", said at the
525	   end of this report

527	   This test and report definitely made ATR more shiny attracting
528	   attention in the community.  But it is found that there are still
529	   something unknown on "How well does ATR actually work".  Firstly the
530	   test only counts the "success" case ("failure" otherwise) in which
531	   the "success" for large UDP case can be achieved by normal retries.
532	   The latency of that retries may reduced by ATR is not taken into
533	   account.

535	   Secondly more analysis could be done in future to compare ATR with
536	   TCP.  From the failure rate of users, we see ATR and TCP have similar
537	   performance (same for IPv4, and only 2% difference in IPv6).  But
538	   there are resolvers who are able to receive the fragments more sooner
539	   in ATR case , but they fall back to TCP in TCP case.  If not
540	   misunderstood, the two unknown parts underestimates the benefit of
541	   ATR in terms of speed of response.

543	   The third unknown part is about the ATR timer and RO impact.  In
544	   APNIC test, 10 ms was adopted as the delay of ATR timer according to
545	   00 version of this draft.  Different delay of ATR timer may not
546	   change the key result on gains of ATR (9% for IPv4 and 15% for IPv6).
547	   But the cost of RO is not measured.  In the majority cases ATR is not
548	   needed, say 87% in IPv4, and 79% in IPv6.  So it overestimate ATR in
549	   this regards if RO cost is not taken into account.

551	Appendix B.  Considerations on Resolver awareness of ATR

553	   ATR proposed in this memo is a server-side function which requires no
554	   change in resolver, so it is not required that resolver should
555	   recognized or use AT bit.  But it may helpful for some special cases
556	   where a resolver is able to recognized or use AT bit.

558	   One case is that when receiving a ATR response a ATR-aware resolver
559	   can adopt a "happyeyeballs" strategy by opening a separate
560	   transaction sending the query via TCP instead of falling back to TCP
561	   and closing the original UDP transaction.  Listen to port 53 on both
562	   TCP and UDP port 53 will enhance the availability and reduce the
563	   latency.  It will add more tolerance to network reordering issue as
564	   well.  However, it should be taken into account about the balance of
565	   resolver's resource.  Less priority should be given to that function
566	   when the resolver is "busy".

568	   Another case is that a ATR-aware resolver is able to indicate its
569	   support for ATR to prudent servers who do not reply ATR response to
570	   every query and resolver.  If that requirement is valid, it is
571	   possible for resolver to re-use the AT bit defined in this memo as a
572	   indication asking ATR response from the server.

574	   However the two cases are currently outside of the scope of server-
575	   ATR specification.  It needs further discussion.

577	Appendix C.  Revision history of this document

579	C.1.  draft-song-atr-large-resp-01

581	   After receiving reviews and comments, changes of 01 version are shown
582	   as belows:

584	   o  Rewrite introduction and add another goal of ATR as a measuring
585	      tool;

587	   o  Add section 3 indicating a ATR response.  An bit in the EDNS0 OPT
588	      header is defined as a indicator of ATR response.  The flag bit is
589	      called "ATR Response" (AT) bit;

591	   o  Add Section 4 Operation considerations, which discuss ATR timer ,
592	      ATR payload size, and less aggressiveness of ATR;

594	   o  Add IANA consideration to register the AT bit;

596	   o  Add section 7 Acknowledgments;

598	   o  Append a list of references regarding Network reordering, and
599	      APNIC's study on IPv6 and DNS;

601	   o  Add Appendix A, An introduce of APNIC testing work and author's
602	      comments;

604	   o  Appendix B.  Considerations on Resolver awareness of ATR;

606	   o  Change the category="std" . It is said in RFC6891 IETF Standards
607	      Action is required for assignments of new EDNS(0) flags.  So the
608	      draft should be categorized as standard track if registering AT
609	      bit is desired in this document.

611	   Change history is also available in the public GitHub repository
612	   where this document is maintained: <https://github.com/songlinjian/
613	   DNS_ATR>.

615	Author's Address

617	   Linjian Song
618	   Beijing Internet Institute

620	   Email: songlinjian@gmail.com
621	   URI:   http://www.biigroup.com/