idnits 2.17.1 draft-przygienda-idr-compressed-updates-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 17, 2017) is 2504 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-05) exists of draft-idr-bgp-route-refresh-options-02 == Outdated reference: A later version (-36) exists of draft-ietf-idr-bgp-extended-messages-12 -- Possible downref: Non-RFC (?) normative reference: ref. 'QUANT' ** Downref: Normative reference to an Informational RFC: RFC 1950 ** Downref: Normative reference to an Informational RFC: RFC 1951 ** Obsolete normative reference: RFC 2283 (Obsoleted by RFC 2858) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Przygienda 3 Internet-Draft Juniper 4 Intended status: Standards Track A. Lingala 5 Expires: December 19, 2017 AT&T 6 J. Tantsura 7 Futurewei Technologies Inc 8 June 17, 2017 10 Compressed BGP Update Message 11 draft-przygienda-idr-compressed-updates-01 13 Abstract 15 This document provides specification of an optional compressed BGP 16 update message format to allow family independent reduction in BGP 17 control traffic volume. 19 Requirements Language 21 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 22 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 23 document are to be interpreted as described in RFC 2119 [RFC2119]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on December 19, 2017. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 62 4. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 4.1. Decompression Capability Negotiation . . . . . . . . . . 5 64 4.2. Compressed BGP Update Messages . . . . . . . . . . . . . 5 65 4.3. Compressor Overflow . . . . . . . . . . . . . . . . . . . 6 66 4.4. Compressor Restarts . . . . . . . . . . . . . . . . . . . 6 67 4.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 7 68 5. Special Considerations . . . . . . . . . . . . . . . . . . . 7 69 5.1. Impact on Network Sniffing Tools . . . . . . . . . . . . 7 70 6. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . 7 71 6.1. Decompressor Capability . . . . . . . . . . . . . . . . . 7 72 6.2. Compressed Update Messages . . . . . . . . . . . . . . . 8 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 74 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 75 9. Normative References . . . . . . . . . . . . . . . . . . . . 9 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 78 1. Introduction 80 BGP as a protocol evolved over the years to carry larger and larger 81 volumes of information and this trend seems to continue unabated. 82 And while lots of the growth can be contributed to the advent of new 83 address families spurred by [RFC2283], steady increase in attributes 84 and their size amplifies this tendency. Recently, even the same NLRI 85 may be advertised multiple times by the means of ADD-PATH [RFC7911] 86 extensions. All those developments drive up the volume of 87 information BGP needs to exchange to synchronize RIBs of the peers. 89 Although BGP update format provides a simple "semantic" compression 90 mechanism that avoids the repetition of attributes if multiple NLRIs 91 share them already, in practical terms, the packing of updates has 92 proven a difficult challenge. The packing attempts are further 93 undermined by the plethora of "per NLRI-tagging" attributes such as 94 extended communities [RFC4360]. 96 One could of course dismiss the growing, raw volume of the data 97 necessary to exchange BGP information between two peers as a mere 98 trifle given the still rising link bandwidths, alas we are facing 99 other sustained trends that would make the reduction of data volume 100 exchanged by BGP highly desirable: 102 o Link delays will remain constant until radically new transmission 103 mechanisms become common place [QUANT]. Bare those developments, 104 and given the prevailing constant ethernet MTU, increasing volume 105 of BGP traffic will cause more and more IP packets to be sent with 106 the BGP synchronization speed being limited by the expanding 107 bandwith-delay product. 109 o The data volume, which for one peer may be reasonable, becomes 110 less so when many of those need to be refreshed due to [RFC4724] 111 and [RFC7313] interactions. Use of those techniques is expected 112 to increase due to increasing demands on BGP reliability and novel 113 variants of state synchronization between peers. 115 o BGP message length is limited to 4K which in itself is a 116 recognized problem. Extensions to the message length 117 [ID.draft-ietf-idr-bgp-extended-messages-12] are being worked on 118 but this puts its own requirements and memory pressure on the 119 implementations and ultimately will not help with attributes 120 exceeding 4K size limit in mixed environments. 122 o Virtualization techniques introduce an increasing amount of 123 context switches an IP packet has to cross between two BGP 124 instances. Coupled with difficulties in estimating a reasonable 125 TCP MSS in virtualized environments and the number of IP packets 126 TCP generates, more and more context switching overhead per update 127 is necessary before good-put BGP processing can happen. 129 Obviously, unless we change BGP encoding drastically by e.g. 130 introducing more context to allow for semantic compression, we cannot 131 expect a reduction in data volume without paying some kind of price. 132 Ideas such as changing BGP format to allow for decoupling of 133 attribute value updates from the NLRI updates could be a viable 134 course of action. The challenges of such a scheme are significant 135 and since such "compression" would extend the semantics and formats 136 of the updates as we have them today, former and future drafts may 137 interact with such an approach in ways not discernible today. Last 138 but not least, attempting to introduce a smarter, context-rich 139 encoding is likely to cause dependency problems and slow-down in BGP 140 encoding procedures. 142 Fortunately, some observations can be made and emerging trends 143 exploited to attempt a reduction in BGP data volumes without the 144 mentioned disadvantages: 146 o BGP updates are very repetitive. Smallest change in attribute 147 values causes extensive repetition of all attributes and any 148 difference prevents packing of NLRIs in same update. On top, each 149 update message BGP still carries a marker that largely lost its 150 practical value some time ago. One could generalize those facts 151 by saying that BGP updates tend to exhibit very low entropy. 153 o CPU cycles available to run control protocols are getting more and 154 more abundant as does to a certain extent memory. They tend to 155 not be available anymore in easily harvested "single core with 156 higher frequency" form factors but as multiple cores that 157 introduce the usual pitfalls of parallelization. In short, 158 getting a lot of independent work done is getting cheaper and 159 cheaper while speeding up a single strain of execution depending 160 on previous results less so. This opens nevertheless the 161 possibility to apply different filters on BGP streams, possibly 162 even executing in parallel threads. One possible filter can 163 compress the data in a manner completely transparent to the rest 164 of existing implementation. 166 Hence, we suggest in this draft the removal of redundancy in the BGP 167 update stream via Huffman codes which can be applied as filter to a 168 BGP update stream concurrently to the rest of the BGP processing and 169 per peer. Subsequently, this document describes an optional scheme 170 to compress BGP update traffic with a deflate variant of Huffman 171 encoding [RFC1950], [RFC1951]. 173 In broadest terms, such a scheme will be beneficial if a BGP 174 implementation finds itself in an I/O constrained scenario while 175 having spare CPU cycles disponible. Compression will ease the 176 pressure on TCP processing and synchronization as well as reduce raw 177 number of IP packets exchanged between peers. 179 2. Terminology 181 3. IANA Considerations 183 This document will request IANA to assign new BGP message type value 184 and and a new optional capability value in the BGP Capability Codes 185 registry. The suggested value for the Compressed Updates message 186 type in this process will be 6 and for the Capability Code the 187 suggested value will be 76. 189 IANA will be requested as well to assign a new subcode in the "BGP 190 Cease NOTIFICATION message subcodes" registry. The suggested name 191 for the code point will be "Decompression Error". The suggested 192 value will be 10. 194 4. Procedures 196 4.1. Decompression Capability Negotiation 198 The capability to *decompress* a new, optional message type carrying 199 compressed updates is advertised via the usual BGP optional 200 capability negotiation technique. 202 A peer MUST NOT send any compressed updates towards peers that did 203 not advertise the capability to decompress. A peer MAY send 204 compressed updates towards peers that advertised such capability. 206 4.2. Compressed BGP Update Messages 208 A new BGP message is introduced under the name of "Compressed BGP 209 Update". It contains inside arbitrary number of following message 210 types 212 o normal BGP updates 214 o Enhanced Route Refresh [RFC7313] subtype 1 and 2 (BoRR and EoRR) 216 o Route Refresh with Options 217 [ID.draft-idr-bgp-route-refresh-options-02] subtype 4 and 5 (BoRR 218 and EoRR with options) 220 following each other and compressed while following the rules below: 222 1. Compressed and uncompressed BGP updates MAY follow each other in 223 arbitrary order with exception of compressor overflow scenario 224 per Section 4.3. After decompression of the stream of 225 interleaved compressed and uncompressed BGP update messages the 226 resulting sequence of updates does not have to be identical to 227 the sequence in a stream generated without compression. However, 228 the uncompressed sequence MUST ensure that the ultimate semantics 229 of the updates are the same to the peer as in the no-compression 230 case. 232 2. The updates contained within the compressed BGP update message 233 MUST be stripped of the initial marker while preserving the BGP 234 update message header. The length field in the BGP update header 235 retains its original value. 237 3. Each compressed BGP Update MUST carry a sequence of non- 238 fragmented original updates, i.e. it cannot contain a part of an 239 original BGP update. Section 4.3 presents the only exception to 240 this rule. 242 4. Each compressed BGP Update MUST be sent as a block, i.e. the 243 decompression MUST be able to yield decompressed results of the 244 update without waiting for further compressed updates. This is 245 different from the normally used stream compression mode. 246 Section 4.3 presents the only exception to this rule. 248 5. The compressed update message MAY exceed the maximum message size 249 but in such case compressor overflow per Section 4.3 MUST be 250 invoked. 252 4.3. Compressor Overflow 254 To achieve optimal compression rates it is desirable to provide to 255 the compressor enough data so the resulting compressed update is as 256 close to the maximum BGP update size as possible. Unfortunately, a 257 Huffman with adapting dictionary compresses at always varying ratio 258 which can lead to an overflow unless it is used very conservatively. 259 A special provision, optionally to be used at the sender's 260 discretion, allows for such overruns and simplifies the handling of 261 overflow events. 263 In case the compressed block size exceeds the maximum BGP update 264 size, the compressing peer MUST set the according bit in the 265 compressed update generated and MUST proceed it with one and only one 266 compressed update with the overflow and compressor restart bit 267 cleared and the remainder of the block. No other BGP update messages 268 are allowed in the TCP stream between the compressed update of a 269 certain compressor and its overflow fragment. In case of any 270 deviations, the error procedures of Section 4.5 MUST be followed. 272 The receiving peer MUST concancenate the first compressed update and 273 the following overflow update as a single compressed block and apply 274 decompression to it. 276 The first update MAY be smaller than the maximum BGP update size. 278 4.4. Compressor Restarts 280 In certain scenarios it is beneficial for the compressing peer to be 281 able to restart any of the compressors at any point in the ongoing 282 BGP session. To indicate such an occurrence, each compressed update 283 CAN carry a flag signaling to the decompressing peer that it MUST 284 restart the given de-compressor before attempting to handle the 285 update. 287 4.5. Error Handling 289 If the decompression fails for any reason, the failure MUST cause 290 immediate CEASE notification with a newly introduced subcode of 291 "Decompression Error" (as documented in the IANA BGP Error Codes 292 registry). The peer which experienced the failure MAY initiate the 293 connection again but it SHOULD NOT advertise the decompressor 294 capability until an administrative reset of the session or re- 295 configuration of the peer. This will achieve self-stabilization of 296 the feature in case of implementation problems. 298 The compressing peer MAY send such CEASE notification as well and 299 close the peer. It is at the discretion of the decompressing peer 300 given such a notification to omit the decompression capability on the 301 next OPEN. 303 5. Special Considerations 305 5.1. Impact on Network Sniffing Tools 307 Network sniffing tool today have the capability to monitor an ongoing 308 BGP session and try to reconstruct the state of the peers from the 309 updates parsed. Obviously, with compression enabled, such a monitor 310 cannot follow the compressed updates unless the session is monitored 311 from the first compressed update on. 313 Several possibilities to deal with the problem exist, the simplest 314 one being the restart of the compressors on a periodic basis to allow 315 the monitoring tool to 'sync up'. It goes without saying that this 316 will be detrimental to the compression ratio achieved. 318 Another possibility would have been to periodically send the Huffman 319 dictionary over the wire but this complexity has been left out as to 320 not overburden this specification. Moreover, at the current time, 321 such a capability is not part of any standard Huffman implementation 322 that could be easily referred to. 324 6. Packet Formats 326 6.1. Decompressor Capability 328 Decompressor Capability is following the normal procedures of 329 [RFC5492]. In its generic form the option can support different 330 compressors in the future. 332 0 1 2 3 333 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 334 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 335 | Code | Length | type| de/compressor parameters| 336 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 338 This document specifies only DEFLATE Huffman support per [RFC1950]. 340 0 1 2 3 341 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 | Code | Length | CM | CINFO | Reserved | 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 346 Code: To be obtained by early allocation, suggested value in this 347 process will be 76. 349 Length: 1 octet. 351 CM: 3 bits of CM indicating DEFLATE compressed format value as 352 specified in [RFC1950]. 354 CINFO: 4 bits of CINFO as specified in [RFC1950]. Invalid values 355 MUST lead to the capability being ignored. The compressing peer 356 MUST use this value for the parametrization of its algorithm. 358 6.2. Compressed Update Messages 360 This carries the original updates in a single message with content 361 adhering to Section 4.2. 363 0 1 2 3 364 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 366 | Length | Type |R|O| ULI | ID# | 367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 368 | compressed data ... 369 +-+-+-+-+-+-+-+-+-+- ... 371 Type: To be obtained by early allocation, suggested value in this 372 process will be 6. 374 Length: 2 octets. 376 ID#: 3 bits. Indicates the number of the compressor used. Up to 8 377 compressors MAY be used by the compressing peer to allow for 378 multiple thread of execution to compress the BGP update stream. 379 Accordingly the decompressing side MUST support up to 8 380 independent decompressors. 382 R: If the bit is set, the according de-compressor MUST be initialized 383 before the following compressed data is decompressed per 384 Section 4.4. The bit MAY be set on first compressed update sent 385 for the compressor on the session or is otherwise implied sapienti 386 sat. The bit MUST NOT be set on the overflow fragment in case of 387 overflow. 389 O: If the bit is set, procedures in Section 4.3 MUST be applied. If 390 both the R-bit and the O-bit are set, the de-compressor must be 391 re-initialized before the update and its overflow is assembled and 392 decompression attempted. 394 ULI: Original uncompressed length indication as to be interpreted as 395 2**(11+ULI). This MUST indicate a buffer large enough the 396 decompressed data (including overflow) will fit in. The 397 indication MAY be ignored by the receiver but should allow for 398 efficient buffer allocation. The field MUST be ignored on 399 overflow fragment. 401 7. Security Considerations 403 This document introduces no new security concerns to BGP or other 404 specifications referenced in this document. 406 8. Acknowledgements 408 Thanks to John Scudder for some bar discussions that primed the 409 creative process. Thanks to Eric Rosen, Jeff Haas, Acee Lindem and 410 Jeff Tantsura for their careful reviews. 412 9. Normative References 414 [ID.draft-idr-bgp-route-refresh-options-02] 415 Patel et al., K., "Extension to BGP's Route Refresh 416 Message", internet-draft draft-idr-bgp-route-refresh- 417 options-02.txt, May 2017. 419 [ID.draft-ietf-idr-bgp-extended-messages-12] 420 Bush et al., R., "Extended Message support for BGP", 421 internet-draft draft-ietf-idr-bgp-extended-messages- 422 12.txt, May 2016. 424 [QUANT] Zyga, L., "Worldwide Quantum Web May Be Possible with Help 425 from Graphs", New Journal on Physics , June 2016. 427 [RFC1950] Deutsch, P. and J-L. Gailly, "ZLIB Compressed Data Format 428 Specification version 3.3", RFC 1950, 429 DOI 10.17487/RFC1950, May 1996, 430 . 432 [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification 433 version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, 434 . 436 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 437 Requirement Levels", BCP 14, RFC 2119, 438 DOI 10.17487/RFC2119, March 1997, 439 . 441 [RFC2283] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 442 "Multiprotocol Extensions for BGP-4", RFC 2283, 443 DOI 10.17487/RFC2283, February 1998, 444 . 446 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 447 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 448 February 2006, . 450 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 451 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 452 DOI 10.17487/RFC4724, January 2007, 453 . 455 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 456 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 457 2009, . 459 [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 460 Route Refresh Capability for BGP-4", RFC 7313, 461 DOI 10.17487/RFC7313, July 2014, 462 . 464 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 465 "Advertisement of Multiple Paths in BGP", RFC 7911, 466 DOI 10.17487/RFC7911, July 2016, 467 . 469 Authors' Addresses 471 Tony Przygienda 472 Juniper 473 1137 Innovation Way 474 Sunnyvale, CA 475 USA 477 Email: prz@juniper.net 479 Avinash Lingala 480 AT&T 481 200 S Laurel Ave 482 Middletown, NJ 483 USA 485 Email: ar977m@att.com 487 Jeff Tantsura 488 Futurewei Technologies Inc 490 Email: jefftant.ietf@gmail.com