<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="no"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-przygienda-idr-compressed-updates-07"
    ipr="trust200902"
    obsoletes="" submissionType="IETF" updates="" xml:lang="en">
	<front>
		<title abbrev="draft-przygienda-idr-compressed-updates">
            Compressed BGP Update Message</title>

        <author fullname="Tony Przygienda" initials="A." surname="Przygienda">
			<organization>Juniper</organization>
			<address>
                <postal>
                    <street>1137 Innovation Way
                    </street>
                    <city>Sunnyvale</city>
                    <region>CA
                    </region>
                    <code/>
                    <country>USA
                    </country>
                </postal>
                <phone/>
                <facsimile/>
                <email>prz@juniper.net
                </email>
				<uri/>
			</address>
		</author>

        <author fullname="Avinash Lingala" initials="A." surname="Lingala">
            <organization>AT&amp;T</organization>
            <address>
                <postal>
                    <street>200 S Laurel Ave
                    </street>
                    <city>Middletown</city>
                    <region>NJ
                    </region>
                    <code/>
                    <country>USA
                    </country>
                </postal>
                <phone/>
                <facsimile/>
                <email>ar977m@att.com
                </email>
                <uri/>
            </address>
        </author>

<author fullname="Csaba Mate" initials="C." surname="Mate">
    <organization>NIIF/Hungarnet</organization>
    <address>
        <postal>
            <street>18-22 Victor Hugo</street>
            <city>Budapest</city>
            <region>
            </region>
            <code>1132</code>
            <country>Hungary
            </country>
        </postal>
        <phone/>
        <facsimile/>
        <email>matecs@niif.hu
        </email>
        <uri/>
    </address>
</author>

        <!--
         <author fullname="Hannes Gredler" initials="H." surname="Gredler">
            <organization>RtBrick Inc.</organization>
            <address>
                <email>hannes@rtbrick.com
                </email>
                <uri/>
            </address>
        </author>
        -->

        <author fullname="Jeff Tantsura" initials="J." surname="Tantsura">
            <organization>Nuage Networks</organization>
            <address>
                <postal>
                    <street>755 Ravendale Drive</street>
                    <city>Mountain View</city>
                    <region>CA
                    </region>
                    <code>94043</code>
                    <country>USA
                    </country>
                </postal>
                <phone/>
                <facsimile/>
                <email>jefftant.ietf@gmail.com
                </email>
                <uri/>
            </address>
        </author>

		<date year="2019"/>
		<abstract>
			<t>This document provides specification of an optional
                compressed BGP update
                message format to allow family
                independent reduction in BGP control traffic volume.
			</t>
		</abstract>
		<note title="Requirements Language">
			<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref format="default"
          pageno="false" target="RFC2119">RFC 2119</xref>.</t>
		</note>
	</front>
	<middle>
        
        <section title="Introduction" toc="default">
            <t>
                BGP as a protocol evolved over the years to carry larger and larger volumes of
                information and this trend seems to continue unabated.
                And while lots of the growth can be contributed to the advent of new address families spurred by
                <xref format="default" pageno="false" target="RFC2283"/>, steady increase in attributes and their size amplifies
                this tendency.
                Recently, even the same NLRI may be advertised multiple times by the means of <xref format="default" pageno="false"
                    target="RFC7911">ADD-PATH</xref> extensions.
                All those developments drive up the volume of information BGP needs to exchange to synchronize RIBs of the peers.
                
            </t>
            <t>
                Although BGP update format provides a simple "semantic" compression mechanism that avoids the repetition of attributes
                if multiple NLRIs share them already, in practical terms,
                the packing of updates has proven a difficult challenge. The packing attempts are further undermined by the plethora of "per NLRI-tagging"
                attributes such as extended
                communities <xref format="default" pageno="false" target="RFC4360"/>.
            </t>
            <t>
                One could of course dismiss the growing, raw volume of the data necessary to exchange BGP information between two peers as
                a mere trifle given the still rising link bandwidths, alas we are facing other sustained trends that would make the
                reduction of data volume exchanged by BGP highly desirable:
                
            </t>
            
            <t>
                <list style="symbols">
                    <t >Link delays will remain constant until radically new transmission mechanisms become common place
                        <xref format="default" pageno="false" target="QUANT"/>.
                        Bare those developments, and given the prevailing constant ethernet MTU, increasing volume of BGP traffic will
                        cause more and more IP packets
                        to be sent with the
                        BGP synchronization speed being limited by the
                        expanding bandwith-delay product.
                    </t>
                    <t>The data volume, which for one peer may be reasonable, becomes less so when many of those need to be refreshed due to
                        <xref format="default" pageno="false" target="RFC4724"/> and <xref format="default" pageno="false" target="RFC7313"/>
                        interactions. Use of those techniques is expected to increase due to increasing demands on BGP reliability and
                        novel variants of state synchronization between peers.
                    </t>
                    <t >BGP message length is limited to 4K which in itself is a recognized problem. <xref format="default"
                        pageno="false" target="ID.draft-ietf-idr-bgp-extended-messages-21">Extensions to the message length</xref> are
                    being worked on but this puts
                    its own requirements and memory pressure on the implementations and ultimately will not help with attributes
                    exceeding 4K size limit in mixed environments.
                    </t>
                    <t>Virtualization techniques introduce
                        an increasing amount of context switches an IP packet has to cross between two
                        BGP instances.
                        Coupled with difficulties in estimating a reasonable TCP MSS in virtualized environments and
                        the number of
                        IP packets TCP generates, more
                        and more context switching overhead per update is necessary
                        before good-put BGP processing can happen.
                    </t>
                </list>
            </t>
            <t>
                Obviously, unless we change BGP encoding drastically by e.g. introducing
                more context to allow for semantic compression, we cannot expect a reduction in
                data volume without paying some kind of price.
                Ideas such as changing BGP format to allow for decoupling of attribute value updates from the NLRI
                updates could be a viable course of action.  The challenges of such a scheme are significant and since such
                "compression" would extend the semantics and formats of the updates as we have them today, former and future drafts may
                interact with such an approach in ways not discernible today. Last but not least, attempting to introduce a smarter, context-rich encoding
                is likely to cause dependency problems and slow-down in BGP encoding procedures.
            </t>
            <t>
                
                Fortunately, some observations can be made and  emerging
                trends exploited to attempt a reduction in
                BGP data volumes without the mentioned disadvantages:
                <list style="symbols">
                    <t>BGP updates are very repetitive. Smallest change in attribute values causes extensive repetition of all attributes
                        and any difference prevents packing of NLRIs in same update. On top, each update message BGP still carries
                        a marker that largely lost its practical value some time ago.  One could generalize those facts by saying that BGP updates tend to
                        exhibit very low
                        entropy.
                        
                    </t>
                    <t>
                        CPU cycles available to run control protocols are getting  more and more abundant as does to a certain
                        extent memory. They tend to not be available anymore in easily harvested "single core with higher frequency" form factors but as
                        multiple cores that introduce the usual pitfalls of parallelization. In short, getting a lot of
                        independent work done is getting cheaper and cheaper while speeding up a single strain of
                        execution depending on previous results less so. This opens nevertheless the possibility to apply
                        different filters on BGP streams, possibly even executing in parallel threads. One possible filter can
                        compress the data in a manner completely transparent to the rest of existing implementation.
                    </t>
                </list>
            </t>
            <t>
                Hence, we suggest in this document the removal of redundancy
                in the BGP update stream
                via Huffman codes which can be applied as filter to a BGP
                update stream concurrently to
                the rest of the BGP processing and per peer.
                Subsequently, this document describes an optional scheme to
                compress BGP update traffic with a deflate variant of Huffman
                encoding <xref format="default"
                pageno="false" target="RFC1950"/>, <xref format="default"
                pageno="false" target="RFC1951"/>.
            </t>
            <t>In broadest terms, such a scheme will be beneficial if a BGP
                implementation finds itself in an I/O constrained scenario
                while having spare CPU cycles disponible. Compression will
                ease the pressure on TCP processing and synchronization as well
                as reduce raw number of IP packets exchanged between peers.
            </t>
            
            
            
        </section>
		<section title="Terminology" toc="default">
		<t>
					</t>
        
        </section>
        
        <section anchor="IANA" title="IANA Considerations" toc="default">
            <t>This document will request IANA to assign new BGP message type value and
                and a new optional capability value in the BGP Capability
                Codes registry. The suggested value for the Compressed
                Updates message type in this process will be 7 and
                for the Capability Code the suggested
                value will be 76.</t>
            
            <t> IANA will be requested as well to assign a new subcode in the "BGP Cease
                NOTIFICATION message subcodes" registry.  The suggested name for the
                code point will be "Decompression Error".  The suggested value
                will be 10.</t>
        </section>
        
        <section title="Procedures">
            
            <section title="Decompression Capability Negotiation">
                
                <t>
                    The capability to *decompress* a new, optional message type carrying
                    compressed updates is advertised via the usual BGP optional capability
                    negotiation technique.
                </t>
                
                <t>A peer MUST NOT send any compressed updates towards peers that did
                    not advertise the capability to decompress. A peer MAY send
                    compressed updates towards peers that advertised such
                    capability.
                </t>

                
            </section>

        <section title="Compressed BGP Update Messages" anchor="updmsg">
    
    <t>
        A new BGP message is introduced under the name of "Compressed BGP Update".
        It contains inside arbitrary number of following message types
</t>
    <t>
    <list style="symbols">
        <t>

        normal BGP updates</t>
        <t>Enhanced Route Refresh <xref format="default"
            pageno="false" target="RFC7313"/> subtype 1 and 2 (BoRR and EoRR)
            </t>
        <t>Route Refresh with Options <xref format="default"
            pageno="false" target="ID.draft-idr-bgp-route-refresh-options-03"/>
            subtype 4 and 5 (BoRR and EoRR with options)
    </t>
        </list>

</t>

<t>
        following
        each other and compressed while following the rules below:
        
    </t>
    
    <t>
        <list style="numbers">
            <t>Compressed and uncompressed BGP updates MAY follow each other in
                arbitrary order with exception of compressor overflow scenario per
                <xref format="default"
                pageno="false" target="overflow"/>.
</t>

<t>
                After decompression of the stream of
                interleaved compressed and uncompressed BGP update messages
                the resulting uncompressed sequence does not have
                to be identical to the sequence in a stream that would be
                generated without compression.  However, the processing of the
                uncompressed
                sequence MUST ensure that the ultimate semantics of the message
                stream is the same to the peer as of a correct uncompressed case.
</t>

            <t>
                The sender is explicitly permitted to generate outgoing
                updates in a
                manner that
                reorders them as compared to uncompressed stream, but
                if it does so it MUST ensure that the
                resulting stream of updates retains the original
                semantics as if
                compression was not in use.
</t>

            <t>The updates and refreshes
                contained within the compressed BGP update message
                MUST be stripped of the initial marker while preserving the
                BGP update or route refresh message
                header. The length field in the BGP header retains its
                 original value. 
            </t>
            <t>Each compressed BGP Update MUST carry a sequence of
                non-fragmented original messages,
                i.e. it cannot e.g. contain a part of an original BGP update.
                <xref format="default"
                pageno="false" target="overflow"/> presents the only exception
                to this rule.
            </t>
            <t>Each compressed
                BGP Update MUST be sent as a block, i.e. the decompression
                MUST be able to yield decompressed results of the update
                without waiting for
                further compressed updates. This is different from the
                normally used
                stream compression mode. <xref format="default"
                pageno="false" target="overflow"/> presents the only exception
                to this rule.
                
            </t>
            <t>The compressed update message MAY exceed the maximum
                message size but in such case compressor overflow per
                <xref format="default"
                pageno="false" target="overflow"/> MUST be invoked.</t>
        </list>
    </t>

        </section>
        
        <section title="Compressor Overflow" anchor="overflow">
            
            <t>
                To achieve optimal compression rates it is desirable to
                provide to the compressor enough data so the
                resulting compressed update is as close to the maximum
                
                BGP update size as possible. Unfortunately, a Huffman
                with adapting dictionary compresses at always varying ratio
                which can
                lead to an overflow unless it is used very conservatively.
                A special provision, optionally to be used at the sender's
                discretion, allows for such overruns and simplifies
                the handling of overflow events.
            </t>
            <t>
                In case
                                the compressed block size exceeds
                the
                maximum BGP update size, the compressing peer MUST set
                the according bit in the compressed update generated and
                MUST proceed it with one and only one compressed update with
                the overflow and compressor restart bit cleared and the
                remainder of the block.
                No other BGP update messages are allowed in the TCP
                stream between the compressed update of a certain compressor
                and its overflow fragment.
                In case of any deviations, the error procedures of
                <xref format="default"
                pageno="false" target="error"/> MUST be followed.
            </t>
            
            <t>
                The receiving peer MUST concancenate the first compressed
                update and the following overflow update as a single
                compressed block and apply decompression to it.
            </t>
            <t>
                The first update MAY be smaller than the maximum BGP
                update size.
                </t>

            
        </section>
        
        <section title="Compressor Restarts" anchor="restart">
            
            <t>
                In certain scenarios it is beneficial for the
                compressing peer to be able to restart any of the
                compressors at any point in the ongoing BGP session.
                To indicate such an occurrence, each compressed
                update CAN carry a flag signaling to the decompressing
                peer that it MUST restart the given de-compressor before
                attempting to handle the update.
            </t>

        </section>
        
        <section title="Error Handling" anchor="error">
            
            <t>If the decompression fails for any reason,
                the failure MUST cause
                immediate CEASE notification with a newly introduced subcode of
                "Decompression Error" (as documented in the IANA BGP Error
                Codes registry).
                The peer which experienced the failure MAY initiate the
                connection again but
                it SHOULD NOT advertise the decompressor capability until an
                administrative
                reset of the session or re-configuration of the peer. This will
                achieve self-stabilization of the feature in case of
                implementation problems.
            </t>
            <t>The compressing peer MAY send such CEASE notification as well
                and close the peer.
                It is at the discretion of the decompressing peer given such a
                notification to omit the decompression capability on the next
                OPEN.</t>
        </section>
        
        </section>
        
        
        <section title="Special Considerations">
           
            
            <section title="Impact on Network Sniffing Tools">
                
                <t>
                    Network sniffing tool today have the capability to
                    monitor an ongoing BGP session and try to reconstruct
                    the state of the peers from the updates parsed. Obviously,
                    with compression enabled, such a monitor cannot
                    follow the compressed updates unless the session is
                    monitored from the first compressed update on.
                    </t>
                <t>
                    Several possibilities to deal with the problem exist,
                    the simplest one being the restart of the compressors on a periodic
                    basis to allow the monitoring tool to 'sync up'. It goes without
                    saying that this will be detrimental to the compression
                    ratio achieved.
                    </t>
                <t>Another possibility would have been to periodically send
                    the Huffman dictionary over the wire but this complexity
                    has been left out as to not overburden this specification.
                    Moreover, at the current time,
                    such a capability is not part of any standard
                    Huffman implementation that could be easily referred to.
                </t> 
            </section>
        </section>
        
        <section title="Packet Formats" toc="default">
		
<section title="Decompressor Capability">
    <t>
        Decompressor Capability is following the normal procedures of
        <xref format="default"
        pageno="false" target="RFC5492"/>. In its generic form the option
        can support different compressors in the future.
        </t>
    
    <figure align="left" alt="" height="" suppress-title="false" title="" width="">
        
        <artwork align="left" alt="" height="" name="" type="" width="" xml:space="preserve">
            <![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Code         |    Length     | type| de/compressor parameters|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            ]]>
        </artwork>
    </figure>
<t>This document specifies only DEFLATE Huffman support per
    <xref format="default"
    pageno="false" target="RFC1950"/>.
</t>
<figure align="left" alt="" height="" suppress-title="false" title="" width="">
    
    <artwork align="left" alt="" height="" name="" type="" width="" xml:space="preserve">
        <![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Code         |    Length     |  CM   | CINFO |   Reserved    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        ]]>
    </artwork>
</figure>

<t>
    <list style="hanging">
        <t hangText="Code:">To be obtained by early allocation,
            suggested value in this process will be 76.</t>
        <t hangText="Length:">1 octet.</t>
        <t hangText="CM:">4 bits of CM indicating DEFLATE compressed format
            value
            as specified in <xref format="default" pageno="false"
            target="RFC1950"/>.</t>
        <t hangText="CINFO:">4 bits of CINFO as specified in <xref
            format="default"
            pageno="false" target="RFC1950"/>.  Invalid values MUST lead to the
            capability being ignored.
                The compressing peer MUST use this value for the parametrization
                of its algorithm.</t>
               </list>
    </t>

</section>

		<section title="Compressed Update Messages" anchor="S2L">
			<t>
			This carries the original updates in a single message with content
            adhering to <xref format="default"
            pageno="false" target="updmsg"/>.
			</t>
	<figure align="left" alt="" height="" suppress-title="false" title="" width="">

<artwork align="left" alt="" height="" name="" type="" width="" xml:space="preserve"> 
<![CDATA[
    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |              Length           |      Type     |R|O| ULI | ID# |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | compressed data    ...
   +-+-+-+-+-+-+-+-+-+- ...
                                                                    
]]>
</artwork>
</figure>			
<t>
					<list style="hanging">
						<t hangText="Type:">To be obtained by early allocation,
                            suggested value in this process will be 7.</t>
						<t hangText="Length:">2 octets.</t>
                        <t hangText="ID#:">3 bits. Indicates the number of
                            the compressor used. Up to 8 compressors MAY
                            be used by the compressing peer to allow for
                            multiple thread of execution to compress the
                            BGP update stream. Accordingly the decompressing
                        side MUST support up to 8 independent decompressors.</t>

                                               <t hangText="R:">If the bit is set,
                                                   the
                                                   according de-compressor MUST
                            be initialized before the following compressed data is
                            decompressed per <xref format="default"
                            pageno="false" target="restart"/>.
                            The bit MAY be set on first compressed

                        update sent for the compressor on the session or is
                        otherwise implied
                        sapienti sat.
                        The bit MUST NOT be set on the overflow fragment in case
                        of overflow.
                        <!-- If the bit is set despite the fact that no
                        compressor restart support has been indicated, procedures in
                        <xref format="default"
                        pageno="false" target="error"/> MUST be followed while
                        announcing a subcode of "Decompression Error".
                        -->
                        </t>
                        <t hangText="O:">If the bit is set, procedures in
                            <xref format="default"
                            pageno="false" target="overflow"/> MUST be applied.
                            <!-- If the bit is set despite the fact that no
                        overflow support has been indicated,
                        <xref format="default"
                        pageno="false" target="error"/> MUST be followed while
                            announcing a subcode of "Decompression Error". 
                             --> If both
                        the R-bit and the O-bit are set, the de-compressor must
                        be re-initialized before the update and its overflow is
                        assembled and
                        decompression attempted.</t>
                        <t hangText="ULI:">Original uncompressed length indication
                            as to be interpreted as
                            2**(11+ULI). This MUST indicate a buffer large enough
                            the decompressed
                            data (including overflow) will fit in. The indication
                            MAY be ignored by
                            the receiver but should allow for efficient buffer
                            allocation. The field MUST be
                            ignored on overflow fragment.

                            </t>
					</list>
				</t>

		</section>
        
</section>
        
        
        
        <section title="Security Considerations">
            <t>This document introduces no new security concerns to BGP or other
                specifications referenced in this document.</t>
            </section>
        
        <section title="Acknowledgements">
            <t>Thanks to John Scudder for some bar discussions that primed
                the creative process. Thanks to Eric Rosen, Jeff Haas and
                Acee Lindem for their careful reviews. Thanks to David Lamperter
            for discussions on reordering issues.</t>
            </section>
        
	</middle>
	<back>
		<references title="Normative References">
            <?rfc include="reference.RFC.1950"?>
            <?rfc include="reference.RFC.1951"?>
			<?rfc include="reference.RFC.2119"?>
            <?rfc include="reference.RFC.2283"?>
            <?rfc include="reference.RFC.4360"?>
            <?rfc include="reference.RFC.4724"?>
            <?rfc include="reference.RFC.5492"?>
            <?rfc include="reference.RFC.7313"?>
            <?rfc include="reference.RFC.7911"?>
            
            <reference anchor="ID.draft-ietf-idr-bgp-extended-messages-21">
                <front>
                    <title>Extended Message support for BGP</title>
                    <author initials="R." surname="Bush et al.">
                        <organization/>
                    </author>
                    <date month="May" year="2016"/>
                </front>
                <seriesInfo name="internet-draft" value="draft-ietf-idr-bgp-extended-messages-21.txt"/>
                <format target="http://tools.ietf.org/id/draft-ietf-idr-bgp-extended-messages-21.txt" type="TXT"/>
            </reference>

<reference anchor="ID.draft-idr-bgp-route-refresh-options-03">
    <front>
        <title>Extension to BGP's Route Refresh Message</title>
        <author initials="K." surname="Patel et al.">
            <organization/>
        </author>
        <date month="May" year="2017"/>
    </front>
    <seriesInfo name="internet-draft" value="draft-idr-bgp-route-refresh-options-03.txt"/>
    <format target="http://tools.ietf.org/id/draft-idr-bgp-route-refresh-options-03.txt" type="TXT"/>
</reference>


            
            <reference anchor="QUANT">
                <front>
                    <title>Worldwide Quantum Web May Be Possible with Help from Graphs</title>
                    <author initials="L." surname="Zyga">
                        <organization>New Journal on Physics</organization>
                        </author>
                    <date month="June" year="2016"/>
                </front>
                <seriesInfo name="New Journal on Physics" value=""/>
                    <format target="http://phys.org/news/2016-06-worldwide-quantum-web-graphs.html" type="TXT"/>
                
			</reference>
            
		</references>
	</back>
</rfc>
