idnits 2.17.1 draft-przygienda-idr-compressed-updates-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 10, 2017) is 2603 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-36) exists of draft-ietf-idr-bgp-extended-messages-12 -- Possible downref: Non-RFC (?) normative reference: ref. 'QUANT' ** Downref: Normative reference to an Informational RFC: RFC 1950 ** Downref: Normative reference to an Informational RFC: RFC 1951 ** Obsolete normative reference: RFC 2283 (Obsoleted by RFC 2858) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Przygienda 3 Internet-Draft Juniper 4 Intended status: Standards Track March 10, 2017 5 Expires: September 11, 2017 7 Compressed BGP Update Message 8 draft-przygienda-idr-compressed-updates-00 10 Abstract 12 Specification of compressed BGP update message formats and 13 procedures. 15 Requirements Language 17 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 18 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 19 document are to be interpreted as described in RFC 2119 [RFC2119]. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on September 11, 2017. 38 Copyright Notice 40 Copyright (c) 2017 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 58 4. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 4.1. Decompression Capability Negotiation . . . . . . . . . . 5 60 4.2. Compressed BGP Update Messages . . . . . . . . . . . . . 5 61 4.3. Compressor Overflow . . . . . . . . . . . . . . . . . . . 6 62 4.4. Compressor Restarts . . . . . . . . . . . . . . . . . . . 6 63 4.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 6 64 5. Special Considerations . . . . . . . . . . . . . . . . . . . 7 65 5.1. Impact on Network Sniffing Tools . . . . . . . . . . . . 7 66 6. Packet Formats . . . . . . . . . . . . . . . . . . . . . . . 7 67 6.1. Decompressor Capability . . . . . . . . . . . . . . . . . 7 68 6.2. Compressed Update Messages . . . . . . . . . . . . . . . 8 69 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 70 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 71 9. Normative References . . . . . . . . . . . . . . . . . . . . 9 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 10 74 1. Introduction 76 BGP as a protocol evolved over the years to carry more and more 77 information and this trend seems to continue unabated. And while 78 lots of the growth can be contributed to the advent of new address 79 families spurred by [RFC2283], steady increase in attributes and 80 their size adds to that. Recently, even the same NLRI may be 81 advertised multiple times by the means of ADD-PATH 82 [ID.draft-ietf-idr-add-paths-15] extensions. All those developments 83 drive up the volume of information BGP needs to exchange to 84 synchronize RIBs of the peers. 86 Although BGP update format provides a simple "semantic" compression 87 mechanism that avoids the repetition of attributes if multiple NLRIs 88 share them already, in practical terms, the packing of updates has 89 proven a difficult challenge. The packing attempts are further 90 undermined by the plethora of "per NLRI-tagging" attributes such as 91 extended communities [RFC4360]. 93 One could of course dismiss the growing, raw volume of the data 94 necessary to exchange BGP information between two peers as a mere 95 trifle given the still rising link bandwidths, alas we are facing 96 other sustained trends that would make the reduction of data volume 97 exchanged by BGP highly desirable: 99 o Link delays will remain constant until radically new transmission 100 mechanisms become common place [QUANT]. Bare those developments, 101 and given the prevailing constant ethernet MTU, increasing volume 102 of BGP traffic will cause more and more IP packets to be sent with 103 the BGP synchronization speed being limited by the expanding 104 bandwith-delay product. 106 o The data volume, which for one peer may be reasonable, becomes 107 less so when many of those need to be refreshed due to [RFC4724] 108 and [RFC7313] interactions. Use of those techniques is expected 109 to increase due to increasing demands on BGP reliability and novel 110 variants of state synchronization between peers. 112 o BGP message length is limited to 4K which in itself is a 113 recognized problem. Extensions to the message length 114 [ID.draft-ietf-idr-bgp-extended-messages-12] are being worked on 115 but this puts its own requirements and memory pressure on the 116 implementations and ultimately will not help with attributes 117 exceeding 4K size limit in mixed environments. 119 o Virtualization techniques introduce an increasing amount of 120 context switches an IP packet has to cross between two BGP 121 instances. Coupled with difficulties in estimating a reasonable 122 TCP MSS in virtualized environments the number of IP packets TCP 123 starts to generate more and more overhead before real BGP update 124 processing can happen. 126 Obviously, unless we change BGP encoding drastically by e.g. 127 introducing more context to allow for semantic compression, we cannot 128 expect a reduction in data volume without paying some kind of price. 129 Ideas such as changing BGP format to allow for decoupling of 130 attribute value updates from the NLRI updates could be a viable 131 course of action. The challenges of such a scheme are significant 132 and since such "compression" would extend the semantics and formats 133 of the updates as we have them today, former and future drafts may 134 interact with such an approach in ways not discernible today. Last 135 but not least, attempting to introduce a smarter, context-rich 136 encoding is likely to cause dependency problems and slow-down in BGP 137 encoding procedures. 139 Fortunately, some observations can be made and an emerging trend 140 exploited to attempt a reduction in BGP data volumes without this 141 kind of disadvantage: 143 o BGP updates are very repetitive. Smallest change in attribute 144 values causes extensive repetition of all attributes and any 145 difference prevents packing of NLRIs in same update. On top, each 146 update message BGP still carries a marker that largely lost its 147 practical value some time ago. One could summarize that by saying 148 that BGP updates tend to exhibit very low entropy. 150 o CPU cycles available to run control protocols are getting more and 151 more abundant as does to a certain extent memory. They tend to 152 not be available anymore in easily harvested "single core with 153 higher frequency" form factors but as multiple cores that 154 introduce the usual pitfalls of parallelization. In short, 155 getting a lot of independent work done is getting cheaper and 156 cheaper while speeding up a single strain of execution depending 157 on previous results less so. This opens nevertheless the 158 possibility to apply different filters on BGP streams, possibly 159 even executing in parallel threads. One possible filter can 160 compress the data in a manner completely transparent to the rest 161 of existing implementation. 163 Hence, we suggest in this draft the removal of redundancy in the BGP 164 update stream via Huffman codes which can be applied as filter to a 165 BGP update stream concurrently to the rest of the BGP processing and 166 per peer. Subsequently, this document describes an optional scheme 167 to compress BGP update traffic with a deflate variant of Huffman 168 encoding [RFC1950], [RFC1951]. 170 In broadest terms, such a scheme will be beneficial if a BGP 171 implementation finds itself in an I/O constrained scenario while 172 having spare CPU cycles disponible. Compression will ease the 173 pressure on TCP processing and synchronization as well as reduce raw 174 number of IP packets exchanged between peers. 176 2. Terminology 178 3. IANA Considerations 180 This document requests IANA to assign new BGP message type value and 181 and a new optional capability value in the BGP Capability Codes 182 registry. The suggested value for the Compressed Updates message 183 type is 6 and for the Capability Code the suggested value is 76. 185 IANA is requested as well to assign a new subcode in the "BGP Cease 186 NOTIFICATION message subcodes" registry. The suggested name for the 187 code point is "Decompression Error". The suggested value is 10. 189 4. Procedures 191 4.1. Decompression Capability Negotiation 193 The capability to *decompress* a new, optional message type carrying 194 compressed updates is advertised via the usual BGP optional 195 capability negotiation technique. 197 A peer MUST NOT send any compressed updates towards peers that did 198 not advertise the capability to decompress. A peer MAY send 199 compressed updates towards peers that advertised such capability. 201 4.2. Compressed BGP Update Messages 203 A new BGP message is introduced under the name of "Compressed BGP 204 Update". It contains inside arbitrary number of normal BGP update 205 messages following each other and compressed while following the 206 rules below: 208 1. Compressed and uncompressed BGP updates MAY follow each other in 209 arbitrary order with exception of compressor overflow scenario 210 per Section 4.3. After decompression of the stream of compressed 211 and interleaved uncompressed BGP update messages the resulting 212 sequence of updates does not have to be identical to the sequence 213 in a stream generated without compression. However, the 214 uncompressed sequence MUST ensure that the ultimate semantics of 215 the updates are the same to the peer as in the no-compression 216 case. 218 2. The updates contained within the compressed BGP update message 219 MUST be stripped of the initial marker while preserving the BGP 220 update message header. The length field in the BGP update header 221 retains its original value. 223 3. Each compressed BGP Update MUST carry a sequence of non- 224 fragmented original updates, i.e. it cannot contain a part of an 225 original BGP update. Section 4.3 presents the only exception to 226 this rule. 228 4. Each compressed BGP Update MUST be sent as a block, i.e. the 229 decompression MUST be able to yield decompressed results of the 230 update without waiting for further compressed updates. This is 231 different from the normally used stream compression mode. 232 Section 4.3 presents the only exception to this rule. 234 5. The compressed update message MAY exceed the maximum message size 235 but in such case compressor overflow per Section 4.3 MUST be 236 invoked. 238 4.3. Compressor Overflow 240 To achieve optimal compression rates it is desirable to provide to 241 the compressor enough data so the resulting compressed update is as 242 close to the maximum BGP update size as possible. Unfortunately, a 243 Huffman with adapting dictionary compresses at always varying ratio 244 which can lead to an overflow unless it is used very conservatively. 245 A special provision, optionally to be used at the sender's 246 discretion, allows for such overruns and simplifies the handling of 247 overflow events. 249 In case the compressed block size exceeds the maximum BGP update 250 size, the compressing peer MUST set the according bit in the 251 compressed update generated and MUST proceed it with one and only one 252 compressed update with the overflow and compressor restart bit 253 cleared and the remainder of the block. No other BGP update messages 254 are allowed in the TCP stream between the compressed update of a 255 certain compressor and its overflow fragment. In case of any 256 deviations, the error procedures of Section 4.5 MUST be followed. 258 The receiving peer MUST concancenate the first compressed update and 259 the following overflow update as a single compressed block and apply 260 decompression to it. 262 The first update MAY be smaller than the maximum BGP update size. 264 4.4. Compressor Restarts 266 In certain scenarios it is beneficial for the compressing peer to be 267 able to restart any of the compressors at any point in the ongoing 268 BGP session. To indicate such an occurrence, each compressed update 269 CAN carry a flag signaling to the decompressing peer that it MUST 270 restart the given de-compressor before attempting to handle the 271 update. 273 4.5. Error Handling 275 If the decompression fails for any reason, the failure MUST cause 276 immediate CEASE notification with a newly introduced subcode of 277 "Decompression Error" (as documented in the IANA BGP Error Codes 278 registry). The peer which experienced the failure MAY initiate the 279 connection again but it SHOULD NOT advertise the decompressor 280 capability until an administrative reset of the session or re- 281 configuration of the peer. This will achieve self-stabilization of 282 the feature in case of implementation problems. 284 The compressing peer MAY send such CEASE notification as well and 285 close the peer. It is at the discretion of the decompressing peer 286 given such a notification to omit the decompression capability on the 287 next OPEN. 289 5. Special Considerations 291 5.1. Impact on Network Sniffing Tools 293 Network sniffing tool today have the capability to monitor an ongoing 294 BGP session and try to reconstruct the state of the peers from the 295 updates parsed. Obviously, with compression enabled, such a monitor 296 cannot follow the compressed updates unless the session is monitored 297 from the first compressed update on. 299 Several possibilities to deal with the problem exist, the simplest 300 one being the restart of the compressors on a periodic basis to allow 301 the monitoring tool to 'sync up'. It goes without saying that this 302 will be detrimental to the compression ratio achieved. 304 Another possibility would have been to periodically send the Huffman 305 dictionary over the wire but this complexity has been left out as to 306 not overburden this specification. Moreover, at the current time, 307 such a capability is not part of any standard Huffman implementation 308 that could be easily referred to. 310 6. Packet Formats 312 6.1. Decompressor Capability 314 Decompressor Capability is following the normal procedures of 315 [RFC5492]. In its generic form the option can support different 316 compressors in the future. 318 0 1 2 3 319 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 320 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 321 | Code | Length | type| de/compressor parameters| 322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 324 This document specifies only DEFLATE Huffman support per [RFC1950]. 326 0 1 2 3 327 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 329 | Code | Length | CM | CINFO | Reserved | 330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 332 Code: TBD, suggested value of 76. 334 Length: 1 octet. 336 CM: 3 bits of CM indicating DEFLATE compressed format value as 337 specified in [RFC1950]. 339 CINFO: 4 bits of CINFO as specified in [RFC1950]. Invalid values 340 MUST lead to the capability being ignored. The compressing peer 341 MUST use this value for the parametrization of its algorithm. 343 6.2. Compressed Update Messages 345 This carries the original updates in a single message with content 346 adhering to Section 4.2. 348 0 1 2 3 349 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 | Length | Type |R|O| ULI | ID# | 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | compressed data ... 354 +-+-+-+-+-+-+-+-+-+- ... 356 Type: TBD, suggested value is 6. 358 Length: 2 octets. 360 ID#: 3 bits. Indicates the number of the compressor used. Up to 8 361 compressors MAY be used by the compressing peer to allow for 362 multiple thread of execution to compress the BGP update stream. 363 Accordingly the decompressing side MUST support up to 8 364 independent decompressors. 366 R: If the bit is set, the according de-compressor MUST be initialized 367 before the following compressed data is decompressed per 368 Section 4.4. The bit MAY be set on first compressed update sent 369 for the compressor on the session or is otherwise implied sapienti 370 sat. The bit MUST NOT be set on the overflow fragment in case of 371 overflow. 373 O: If the bit is set, procedures in Section 4.3 MUST be applied. If 374 both the R-bit and the O-bit are set, the de-compressor must be 375 re-initialized before the update and its overflow is assembled and 376 decompression attempted. 378 ULI: Original uncompressed length indication as to be interpreted as 379 2**(11+ULI). This MUST indicate a buffer large enough the 380 decompressed data (including overflow) will fit in. The 381 indication MAY be ignored by the receiver but should allow for 382 efficient buffer allocation. The field MUST be ignored on 383 overflow fragment. 385 7. Security Considerations 387 This document introduces no new security concerns to BGP or other 388 specifications referenced in this document. 390 8. Acknowledgements 392 Thanks to John Scudder for some bar discussions that primed the 393 creative process. Thanks to Eric Rosen, Jeff Haas, Acee Lindem and 394 Jeff Tantsura for their careful reviews. 396 9. Normative References 398 [ID.draft-ietf-idr-add-paths-15] 399 Walton et al., D., "Advertisement of Multiple Paths in 400 BGP", internet-draft draft-ietf-idr-add-paths-15.txt, May 401 2016. 403 [ID.draft-ietf-idr-bgp-extended-messages-12] 404 Bush et al., R., "Advertisement of Multiple Paths in BGP", 405 internet-draft draft-ietf-idr-bgp-extended-messages- 406 12.txt, May 2016. 408 [QUANT] Zyga, L., "Worldwide Quantum Web May Be Possible with Help 409 from Graphs", New Journal on Physics , June 2016. 411 [RFC1950] Deutsch, P. and J-L. Gailly, "ZLIB Compressed Data Format 412 Specification version 3.3", RFC 1950, 413 DOI 10.17487/RFC1950, May 1996, 414 . 416 [RFC1951] Deutsch, P., "DEFLATE Compressed Data Format Specification 417 version 1.3", RFC 1951, DOI 10.17487/RFC1951, May 1996, 418 . 420 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 421 Requirement Levels", BCP 14, RFC 2119, 422 DOI 10.17487/RFC2119, March 1997, 423 . 425 [RFC2283] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 426 "Multiprotocol Extensions for BGP-4", RFC 2283, 427 DOI 10.17487/RFC2283, February 1998, 428 . 430 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 431 Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, 432 February 2006, . 434 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 435 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 436 DOI 10.17487/RFC4724, January 2007, 437 . 439 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 440 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 441 2009, . 443 [RFC7313] Patel, K., Chen, E., and B. Venkatachalapathy, "Enhanced 444 Route Refresh Capability for BGP-4", RFC 7313, 445 DOI 10.17487/RFC7313, July 2014, 446 . 448 Author's Address 450 Tony Przygienda 451 Juniper 452 1137 Innovation Way 453 Sunnyvale, CA 454 USA 456 Email: prz@juniper.net