idnits 2.17.1 draft-przygienda-idr-compressed-updates-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Feb 18, 2019) is 1887 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-05) exists of draft-idr-bgp-route-refresh-options-03 == Outdated reference: A later version (-36) exists of draft-ietf-idr-bgp-extended-messages-21 -- Possible downref: Non-RFC (?) normative reference: ref. 'QUANT' ** Downref: Normative reference to an Informational RFC: RFC 1950 ** Downref: Normative reference to an Informational RFC: RFC 1951 ** Obsolete normative reference: RFC 2283 (Obsoleted by RFC 2858) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Przygienda 3 Internet-Draft Juniper 4 Intended status: Standards Track A. Lingala 5 Expires: August 22, 2019 AT&T 6 C. Mate 7 NIIF/Hungarnet 8 J. Tantsura 9 Nuage Networks 10 Feb 18, 2019 12 Compressed BGP Update Message 13 draft-przygienda-idr-compressed-updates-06 15 Abstract 17 This document provides specification of an optional compressed BGP 18 update message format to allow family independent reduction in BGP 19 control traffic volume. 21 Requirements Language 23 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 24 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 25 document are to be interpreted as described in RFC 2119 [RFC2119]. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on August 22, 2019. 44 Copyright Notice 46 Copyright (c) 2019 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 64 4. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 4.1. Decompression Capability Negotiation . . . . . . . . . . 5 66 4.2. Compressed BGP Update Messages . . . . . . . . . . . . . 5 67 4.3. Compressor Overflow . . . . . . . . . . . . . . . . . . . 6 68 4.4. Compressor Restarts . . . . . . . . . . . . . . . . . . . 7 69 4.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 7 70 5. Special Considerations . . . . . . . . . . . . . . . . . . . 7 71 5.1. Impact on Network Sniffing Tools . . . . . . . . . . . . 7 72 6. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . 8 73 6.1. Decompressor Capability . . . . . . . . . . . . . . . . . 8 74 6.2. Compressed Update Messages . . . . . . . . . . . . . . . 8 75 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 76 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 77 9. Normative References . . . . . . . . . . . . . . . . . . . . 10 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 80 1. Introduction 82 BGP as a protocol evolved over the years to carry larger and larger 83 volumes of information and this trend seems to continue unabated. 84 And while lots of the growth can be contributed to the advent of new 85 address families spurred by [RFC2283], steady increase in attributes 86 and their size amplifies this tendency. Recently, even the same NLRI 87 may be advertised multiple times by the means of ADD-PATH [RFC7911] 88 extensions. All those developments drive up the volume of 89 information BGP needs to exchange to synchronize RIBs of the peers. 91 Although BGP update format provides a simple "semantic" compression 92 mechanism that avoids the repetition of attributes if multiple NLRIs 93 share them already, in practical terms, the packing of updates has 94 proven a difficult challenge. The packing attempts are further 95 undermined by the plethora of "per NLRI-tagging" attributes such as 96 extended communities [RFC4360]. 98 One could of course dismiss the growing, raw volume of the data 99 necessary to exchange BGP information between two peers as a mere 100 trifle given the still rising link bandwidths, alas we are facing 101 other sustained trends that would make the reduction of data volume 102 exchanged by BGP highly desirable: 104 o Link delays will remain constant until radically new transmission 105 mechanisms become common place [QUANT]. Bare those developments, 106 and given the prevailing constant ethernet MTU, increasing volume 107 of BGP traffic will cause more and more IP packets to be sent with 108 the BGP synchronization speed being limited by the expanding 109 bandwith-delay product. 111 o The data volume, which for one peer may be reasonable, becomes 112 less so when many of those need to be refreshed due to [RFC4724] 113 and [RFC7313] interactions. Use of those techniques is expected 114 to increase due to increasing demands on BGP reliability and novel 115 variants of state synchronization between peers. 117 o BGP message length is limited to 4K which in itself is a 118 recognized problem. Extensions to the message length 119 [ID.draft-ietf-idr-bgp-extended-messages-21] are being worked on 120 but this puts its own requirements and memory pressure on the 121 implementations and ultimately will not help with attributes 122 exceeding 4K size limit in mixed environments. 124 o Virtualization techniques introduce an increasing amount of 125 context switches an IP packet has to cross between two BGP 126 instances. Coupled with difficulties in estimating a reasonable 127 TCP MSS in virtualized environments and the number of IP packets 128 TCP generates, more and more context switching overhead per update 129 is necessary before good-put BGP processing can happen. 131 Obviously, unless we change BGP encoding drastically by e.g. 132 introducing more context to allow for semantic compression, we cannot 133 expect a reduction in data volume without paying some kind of price. 134 Ideas such as changing BGP format to allow for decoupling of 135 attribute value updates from the NLRI updates could be a viable 136 course of action. The challenges of such a scheme are significant 137 and since such "compression" would extend the semantics and formats 138 of the updates as we have them today, former and future drafts may 139 interact with such an approach in ways not discernible today. Last 140 but not least, attempting to introduce a smarter, context-rich 141 encoding is likely to cause dependency problems and slow-down in BGP 142 encoding procedures. 144 Fortunately, some observations can be made and emerging trends 145 exploited to attempt a reduction in BGP data volumes without the 146 mentioned disadvantages: 148 o BGP updates are very repetitive. Smallest change in attribute 149 values causes extensive repetition of all attributes and any 150 difference prevents packing of NLRIs in same update. On top, each 151 update message BGP still carries a marker that largely lost its 152 practical value some time ago. One could generalize those facts 153 by saying that BGP updates tend to exhibit very low entropy. 155 o CPU cycles available to run control protocols are getting more and 156 more abundant as does to a certain extent memory. They tend to 157 not be available anymore in easily harvested "single core with 158 higher frequency" form factors but as multiple cores that 159 introduce the usual pitfalls of parallelization. In short, 160 getting a lot of independent work done is getting cheaper and 161 cheaper while speeding up a single strain of execution depending 162 on previous results less so. This opens nevertheless the 163 possibility to apply different filters on BGP streams, possibly 164 even executing in parallel threads. One possible filter can 165 compress the data in a manner completely transparent to the rest 166 of existing implementation. 168 Hence, we suggest in this document the removal of redundancy in the 169 BGP update stream via Huffman codes which can be applied as filter to 170 a BGP update stream concurrently to the rest of the BGP processing 171 and per peer. Subsequently, this document describes an optional 172 scheme to compress BGP update traffic with a deflate variant of 173 Huffman encoding [RFC1950], [RFC1951]. 175 In broadest terms, such a scheme will be beneficial if a BGP 176 implementation finds itself in an I/O constrained scenario while 177 having spare CPU cycles disponible. Compression will ease the 178 pressure on TCP processing and synchronization as well as reduce raw 179 number of IP packets exchanged between peers. 181 2. Terminology 183 3. IANA Considerations 185 This document will request IANA to assign new BGP message type value 186 and and a new optional capability value in the BGP Capability Codes 187 registry. The suggested value for the Compressed Updates message 188 type in this process will be 7 and for the Capability Code the 189 suggested value will be 76. 191 IANA will be requested as well to assign a new subcode in the "BGP 192 Cease NOTIFICATION message subcodes" registry. The suggested name 193 for the code point will be "Decompression Error". The suggested 194 value will be 10. 196 4. Procedures 198 4.1. Decompression Capability Negotiation 200 The capability to *decompress* a new, optional message type carrying 201 compressed updates is advertised via the usual BGP optional 202 capability negotiation technique. 204 A peer MUST NOT send any compressed updates towards peers that did 205 not advertise the capability to decompress. A peer MAY send 206 compressed updates towards peers that advertised such capability. 208 4.2. Compressed BGP Update Messages 210 A new BGP message is introduced under the name of "Compressed BGP 211 Update". It contains inside arbitrary number of following message 212 types 214 o normal BGP updates 216 o Enhanced Route Refresh [RFC7313] subtype 1 and 2 (BoRR and EoRR) 218 o Route Refresh with Options 219 [ID.draft-idr-bgp-route-refresh-options-03] subtype 4 and 5 (BoRR 220 and EoRR with options) 222 following each other and compressed while following the rules below: 224 1. Compressed and uncompressed BGP updates MAY follow each other in 225 arbitrary order with exception of compressor overflow scenario 226 per Section 4.3. 228 2. After decompression of the stream of interleaved compressed and 229 uncompressed BGP update messages the resulting uncompressed 230 sequence does not have to be identical to the sequence in a 231 stream that would be generated without compression. However, the 232 processing of the uncompressed sequence MUST ensure that the 233 ultimate semantics of the message stream is the same to the peer 234 as of a correct uncompressed case. 236 3. The sender is explicitly permitted to generate outgoing updates 237 in a manner that reorders them as compared to uncompressed 238 stream, but if it does so it MUST ensure that the resulting 239 stream of updates retains the original semantics as if 240 compression was not in use. 242 4. The updates and refreshes contained within the compressed BGP 243 update message MUST be stripped of the initial marker while 244 preserving the BGP update or route refresh message header. The 245 length field in the BGP header retains its original value. 247 5. Each compressed BGP Update MUST carry a sequence of non- 248 fragmented original messages, i.e. it cannot e.g. contain a part 249 of an original BGP update. Section 4.3 presents the only 250 exception to this rule. 252 6. Each compressed BGP Update MUST be sent as a block, i.e. the 253 decompression MUST be able to yield decompressed results of the 254 update without waiting for further compressed updates. This is 255 different from the normally used stream compression mode. 256 Section 4.3 presents the only exception to this rule. 258 7. The compressed update message MAY exceed the maximum message size 259 but in such case compressor overflow per Section 4.3 MUST be 260 invoked. 262 4.3. Compressor Overflow 264 To achieve optimal compression rates it is desirable to provide to 265 the compressor enough data so the resulting compressed update is as 266 close to the maximum BGP update size as possible. Unfortunately, a 267 Huffman with adapting dictionary compresses at always varying ratio 268 which can lead to an overflow unless it is used very conservatively. 269 A special provision, optionally to be used at the sender's 270 discretion, allows for such overruns and simplifies the handling of 271 overflow events. 273 In case the compressed block size exceeds the maximum BGP update 274 size, the compressing peer MUST set the according bit in the 275 compressed update generated and MUST proceed it with one and only one 276 compressed update with the overflow and compressor restart bit 277 cleared and the remainder of the block. No other BGP update messages 278 are allowed in the TCP stream between the compressed update of a 279 certain compressor and its overflow fragment. In case of any 280 deviations, the error procedures of Section 4.5 MUST be followed. 282 The receiving peer MUST concancenate the first compressed update and 283 the following overflow update as a single compressed block and apply 284 decompression to it. 286 The first update MAY be smaller than the maximum BGP update size. 288 4.4. Compressor Restarts 290 In certain scenarios it is beneficial for the compressing peer to be 291 able to restart any of the compressors at any point in the ongoing 292 BGP session. To indicate such an occurrence, each compressed update 293 CAN carry a flag signaling to the decompressing peer that it MUST 294 restart the given de-compressor before attempting to handle the 295 update. 297 4.5. Error Handling 299 If the decompression fails for any reason, the failure MUST cause 300 immediate CEASE notification with a newly introduced subcode of 301 "Decompression Error" (as documented in the IANA BGP Error Codes 302 registry). The peer which experienced the failure MAY initiate the 303 connection again but it SHOULD NOT advertise the decompressor 304 capability until an administrative reset of the session or re- 305 configuration of the peer. This will achieve self-stabilization of 306 the feature in case of implementation problems. 308 The compressing peer MAY send such CEASE notification as well and 309 close the peer. It is at the discretion of the decompressing peer 310 given such a notification to omit the decompression capability on the 311 next OPEN. 313 5. Special Considerations 315 5.1. Impact on Network Sniffing Tools 317 Network sniffing tool today have the capability to monitor an ongoing 318 BGP session and try to reconstruct the state of the peers from the 319 updates parsed. Obviously, with compression enabled, such a monitor 320 cannot follow the compressed updates unless the session is monitored 321 from the first compressed update on. 323 Several possibilities to deal with the problem exist, the simplest 324 one being the restart of the compressors on a periodic basis to allow 325 the monitoring tool to 'sync up'. It goes without saying that this 326 will be detrimental to the compression ratio achieved. 328 Another possibility would have been to periodically send the Huffman 329 dictionary over the wire but this complexity has been left out as to 330 not overburden this specification. Moreover, at the current time, 331 such a capability is not part of any standard Huffman implementation 332 that could be easily referred to. 334 6. Packet Formats 336 6.1. Decompressor Capability 338 Decompressor Capability is following the normal procedures of 339 [RFC5492]. In its generic form the option can support different 340 compressors in the future. 342 0 1 2 3 343 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | Code | Length | type| de/compressor parameters| 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 348 This document specifies only DEFLATE Huffman support per [RFC1950]. 350 0 1 2 3 351 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | Code | Length | CM | CINFO | Reserved | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 Code: To be obtained by early allocation, suggested value in this 357 process will be 76. 359 Length: 1 octet. 361 CM: 4 bits of CM indicating DEFLATE compressed format value as 362 specified in [RFC1950]. 364 CINFO: 4 bits of CINFO as specified in [RFC1950]. Invalid values 365 MUST lead to the capability being ignored. The compressing peer 366 MUST use this value for the parametrization of its algorithm. 368 6.2. Compressed Update Messages 370 This carries the original updates in a single message with content 371 adhering to Section 4.2. 373 0 1 2 3 374 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 | Length | Type |R|O| ULI | ID# | 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 | compressed data ... 379 +-+-+-+-+-+-+-+-+-+- ... 381 Type: To be obtained by early allocation, suggested value in this 382 process will be 7. 384 Length: 2 octets. 386 ID#: 3 bits. Indicates the number of the compressor used. Up to 8 387 compressors MAY be used by the compressing peer to allow for 388 multiple thread of execution to compress the BGP update stream. 389 Accordingly the decompressing side MUST support up to 8 390 independent decompressors. 392 R: If the bit is set, the according de-compressor MUST be initialized 393 before the following compressed data is decompressed per 394 Section 4.4. The bit MAY be set on first compressed update sent 395 for the compressor on the session or is otherwise implied sapienti 396 sat. The bit MUST NOT be set on the overflow fragment in case of 397 overflow. 399 O: If the bit is set, procedures in Section 4.3 MUST be applied. If 400 both the R-bit and the O-bit are set, the de-compressor must be 401 re-initialized before the update and its overflow is assembled and 402 decompression attempted. 404 ULI: Original uncompressed length indication as to be interpreted as 405 2**(11+ULI). This MUST indicate a buffer large enough the 406 decompressed data (including overflow) will fit in. The 407 indication MAY be ignored by the receiver but should allow for 408 efficient buffer allocation. The field MUST be ignored on 409 overflow fragment. 411 7. Security Considerations 413 This document introduces no new security concerns to BGP or other 414 specifications referenced in this document. 416 8. Acknowledgements 418 Thanks to John Scudder for some bar discussions that primed the 419 creative process. Thanks to Eric Rosen, Jeff Haas and Acee Lindem 420 for their careful reviews. Thanks to David Lamperter for discussions 421 on reordering issues. 423 9. Normative References 425 [ID.draft-idr-bgp-route-refresh-options-03] 426 Patel et al., K., "Extension to BGP's Route Refresh 427 Message", internet-draft draft-idr-bgp-route-refresh- 428 options-03.txt, May 2017. 430 [ID.draft-ietf-idr-bgp-extended-messages-21] 431 Bush et al., R., "Extended Message support for BGP", 432 internet-draft draft-ietf-idr-bgp-extended-messages- 433 21.txt, May 2016. 435 [QUANT] Zyga, L., "Worldwide Quantum Web May Be Possible with Help 436 from Graphs", New Journal on Physics , June 2016. 438 [RFC1950] Deutsch, P. and J-L. Gailly, "ZLIB Compressed Data Format 439 Specification version 3.3", RFC 1950, 440 DOI 10.17487/RFC1950, May 1996, 441 . 443 [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification 444 version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, 445 . 447 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 448 Requirement Levels", BCP 14, RFC 2119, 449 DOI 10.17487/RFC2119, March 1997, 450 . 452 [RFC2283] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 453 "Multiprotocol Extensions for BGP-4", RFC 2283, 454 DOI 10.17487/RFC2283, February 1998, 455 . 457 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 458 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 459 February 2006, . 461 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 462 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 463 DOI 10.17487/RFC4724, January 2007, 464 . 466 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 467 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 468 2009, . 470 [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 471 Route Refresh Capability for BGP-4", RFC 7313, 472 DOI 10.17487/RFC7313, July 2014, 473 . 475 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 476 "Advertisement of Multiple Paths in BGP", RFC 7911, 477 DOI 10.17487/RFC7911, July 2016, 478 . 480 Authors' Addresses 482 Tony Przygienda 483 Juniper 484 1137 Innovation Way 485 Sunnyvale, CA 486 USA 488 Email: prz@juniper.net 490 Avinash Lingala 491 AT&T 492 200 S Laurel Ave 493 Middletown, NJ 494 USA 496 Email: ar977m@att.com 498 Csaba Mate 499 NIIF/Hungarnet 500 18-22 Victor Hugo 501 Budapest 1132 502 Hungary 504 Email: matecs@niif.hu 505 Jeff Tantsura 506 Nuage Networks 507 755 Ravendale Drive 508 Mountain View, CA 94043 509 USA 511 Email: jefftant.ietf@gmail.com