PCN T. Tsou Internet-Draft T. Taylor Expires: May 20, 2008 Huawei November 17, 2007 PCN Boundary Node Behaviour draft-tsou-pcn-boundary-behav-00.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 20, 2008. Copyright Notice Copyright (C) The IETF Trust (2007). Tsou & Taylor Expires May 20, 2008 [Page 1] Internet-Draft PCN Boundary Node Behaviour November 2007 Abstract The Pre-Congestion Notification Architecture document defines a PCN domain and the PCN-ingress and PCN-egress nodes that form its boundary. The present document is an attempt to describe the detailed behaviour of the PCN boundary nodes. It is a contribution toward the PCN WG milestone: "Suggested Flow Admission and Termination Boundary Mechanisms". This first version is expected to evolve with discussion and further thought toward a more precise and prescriptive view of boundary node behaviour. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview of Boundary Node Functions . . . . . . . . . . . . . 5 4. Details of Boundary Node Functions . . . . . . . . . . . . . . 7 4.1. PCN-ingress-node Functions . . . . . . . . . . . . . . . . 7 4.2. PCN-egress-node Functions . . . . . . . . . . . . . . . . 8 4.3. Peer Address Determination and Flow Aggregation . . . . . 9 5. Communication Behavior between Boundary Nodes . . . . . . . . 11 5.1. PCN-ingress-node to PCN-egress-node . . . . . . . . . . . 11 5.2. PCN-egress-node to PCN-ingress-node . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 9.1. Normative References . . . . . . . . . . . . . . . . . . . 16 9.2. Informative References . . . . . . . . . . . . . . . . . . 16 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 Intellectual Property and Copyright Statements . . . . . . . . . . 18 Tsou & Taylor Expires May 20, 2008 [Page 2] Internet-Draft PCN Boundary Node Behaviour November 2007 1. Introduction The Pre-Congestion Notification Architecture document [I-D.PCNarch] defines a PCN domain and the PCN-ingress and PCN-egress nodes that form its boundary. The present document is an attempt to describe the detailed behaviour of the PCN boundary nodes. As part of this effort, it deals with the following issues: o mapping from flows to aggregates; o discovery of peer addresses; o processing of observations at the PCN-egress-node; o transmission policy for congestion level estimates (CLE). Tsou & Taylor Expires May 20, 2008 [Page 3] Internet-Draft PCN Boundary Node Behaviour November 2007 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The formal definitions of "PCN-domain", "PCN-boundary-node", "PCN- interior-node", "PCN-ingress-node" and "PCN-egress-node" are given in section 2 of [I-D.PCNarch]. These terms are used here, generally without the hyphens, to have the same meaning. This memo uses the following abbreviations: CE Congestion Experienced (ECN marking) CLE Congestion level estimates DSCP Differentiated Services Codepoint ECMP Equal cost multi-path ECN Explicit congestion notification PCN Pre-congestion notification Tsou & Taylor Expires May 20, 2008 [Page 4] Internet-Draft PCN Boundary Node Behaviour November 2007 3. Overview of Boundary Node Functions A PCN domain is a self-controlled group of nodes, some of which, the PCN boundary nodes, connect the PCN domain to other network domains. PCN boundary nodes play an extremely important role in the implementation of the PCN mechanism; they are in charge of flow admission and flow termination so as to protect the existing payload within the PCN domain. PCN boundary nodes fulfill two functional roles, those of PCN-ingress-node and of PCN-egress-node. A given flow enters the PCN domain through the PCN-ingress-node and leaves it through the PCN-egress-node. In physical terms, the flow passes through an ingress-egress pair. This natural pairing of the PCN boundary nodes through which a given flow passes is bi-directional: the PCN boundary node that serves as the PCN-ingress-node for one flow also serves as the PCN-egress-node for a flow in the opposite direction and vice versa. As [I-D.PCNarch] specifies, a PCN domain is a Diffserv domain. There are different priority traffic classes within the PCN domain. When a flow is presented to the PCN domain, the PCN-ingress-node should figure out whether it is a PCN flow or not. If it is, the PCN- ingress-node marks the flow packets accordingly. When these packets leave the domain, the PCN-egress-node should remove the PCN markings so they do not confuse a subsequent domain. It is obvious that the PCN-ingress-node supports flow recognition. This is a necessity if the PCN-ingress-node is to enforce flow admission and termination (policy action). The PCN-egress-node measures the aggregate flow for each PCN-ingress/ PCN-egress pair for which it is the PCN-egress-node. In some circumstances, the PCN-egress-node should also support flow recognition. At present, we do not require this, for reasons of scalability and simplicity. If the PCN-egress-node did measure each individual flow, it would add too much cost to the cached flow table. As part of its measurements for a given ingress-egress aggregate, the PCN-egress-node obtains information relating to congestion level estimates (CLE). The PCN-egress-node sends the CLE in a message to the PCN-ingress-node or to a node acting as centralized collection point. The strategy for when the CLE messages are sent is discussed in Section 5.2. The PCN-ingress-node or the centralized collector node acting on its behalf makes flow admission or termination judgements based on these CLE messages. When the aggregate flow from some PCN-ingress-node contains no traffic or too low a traffic level, no measurement or too inaccurate measurement of congestion can be performed. No CLE message can be Tsou & Taylor Expires May 20, 2008 [Page 5] Internet-Draft PCN Boundary Node Behaviour November 2007 sent or the CLE message sent is too inaccurate to make the right decision. In this case the PCN-ingress-node needs to send a probe message to the PCN-egress-node to gain more information. According to the design requirement, ECMP within a PCN domain can also best be resolved by a probe message. For the sake of efficiency, the probing operation requires careful design to ensure that it does not significantly affect the existing load within the domain. Current discussion has concluded that probing is a topic that should be considered after the basic mechanisms have been defined. The following sections deal with three major topics. Section 4 is a detailed functional definition of the PCN-ingress-node and PCN- egress-node. This part will draw upon the existing content of [I-D.PCNarch], but will identify specific issues that have to be addressed. Section 4.3 deals with the specific issues of flow-to- aggregate mapping and peer address determination. Finally, Section 5 considers the control and communication messaging that must occur between the PCN-ingress-node and PCN-egress-node. Tsou & Taylor Expires May 20, 2008 [Page 6] Internet-Draft PCN Boundary Node Behaviour November 2007 4. Details of Boundary Node Functions 4.1. PCN-ingress-node Functions The PCN-ingress-node enforces flow admission and flow termination decisions on flows offered to the PCN domain. Quoting from section 5.2 of [I-D.PCNarch], its functions are: o Packet classify o Police o PCN-color o PCN-meter These basic actions imply some additional requirements on the PCN- ingress-node: a. The PCN-ingress-node should know the address of the PCN-egress- node for each new flow request that arrives. This is the key that allows the PCN-ingress-node to select the applicable CLE information from the messages it has received. See Section 4.3 for a discussion of how the PCN-ingress-node and PCN-egress-node determine each other's address and associate individual flows to the aggregate flow between them. b. When the admission decision function is implemented in the PCN- ingress-node, that node should know the congestion level to each PCN-egress-node to which it has admitted flows. This is needed to perform flow admission and flow termination. Generally, the PCN-ingress-node should possess the latest congestion level information. The information is saved and refreshed periodically as new CLE messages are received. If too long a period elapses without receipt of congestion level information from a given PCN- egress-node, the PCN-ingress-node may have to take an active part in gathering the information, by polling or by sending a probing message. It would seem undesirable to add to network load by polling or probing unless there is a decision to be made. Admission decisions (and consequent probes) are covered by the preceding bullet. The one case that may require thought is where CLE messages fail to get through because of congestion in the egress-to-ingress direction, and this is matched by congestion in the ingress-to-egress direction that would call for flow termination. Tsou & Taylor Expires May 20, 2008 [Page 7] Internet-Draft PCN Boundary Node Behaviour November 2007 Although it is out of scope of the current charter, the admission decision function may be implemented in a centralized control node. CLE maintenance and refreshment will then be the responsibility of this centralized node. In that case, the PCN- ingress-node will act as a policy enforcement point only, admitting or rejecting flow in accord with the policy provided by the centralized control node. c. Currently, the PCN architectural requirements do not include support of ECN. [I-D.PCNarch] spells out the present assumptions about the interaction between ECN and PCN. d. When the PCN-ingress-node terminates a flow, it should send a signaling message to notify the flow source about the congestion condition and reason for termination. 4.2. PCN-egress-node Functions The PCN-egress-node is in charge of aggregate flow measurement and emission of CLE messages. The basic functions of the PCN-egress-node are listed in [I-D.PCNarch]: packet classify, PCN-meter, and PCN- color -- but this section adds more details. a. Packet classify - determine which PCN-ingress-node a PCN-packet has come from. This is a requirement of measurement. After packet classification, all packets arriving at the PCN-egress- node are grouped into their respective ingress-egress aggregate flows. In the case of tunnelled packets, the PCN-egress-node differentiates ingress nodes according to the ingress node address in tunnel encapsulation header. Otherwise the PCN- egress-node must use the source address and possibly other information within the packet header. In this latter case, the PCN-egress-node must in concept keep a boundary node address table, in which it saves the PCN-ingress- node address and flow source/destination prefix mapping. By searching on the flow source/destination address, the PCN-egress- node can get the PCN-ingress-node address. For a discussion on how this table can be set up, see Section 4.3. b. PCN-meter - make "measurements of PCN-traffic". The measurements are made on the aggregate flow of all PCN-packets from a particular PCN-ingress-node. Smoothing is done over time, using for example the EWMA (Exponentially Weighted Moving Average) method applied separately to numerator and denominator of the congestion ratio. The measurement period for individual observations requires careful calculation. Shorter measurement periods increase the amount of computation required at the PCN- Tsou & Taylor Expires May 20, 2008 [Page 8] Internet-Draft PCN Boundary Node Behaviour November 2007 egress-node while increasing the volatility of the results. Too long a measurement period reduces the responsiveness of the system to signs of approaching congestion. Smoothing has a similar effect to lengthening the measurement period, but gives more weight to more recent measurements. Instead of smoothing, one might consider looking at the process as one of statistical estimation of a marking probability that is step-wise time-varying. One assesses each new observation to decide whether it represents a continuation of the previous regime or is the result of a new value of the estimated probability. The decision would use a standard deviation based on the assumption of a binomial probability distribution. Once the PCN-marking rate calculations have been carried out, the PCN-egress-node must send a CLE message back to the PCN-ingress- node providing the results. Again there is a requirement to know the peer address. See Section 4.3. c. PCN-color - for PCN-packets, set the DSCP field or DSCP and ECN fields to the appropriate value(s) for use outside the PCN- domain. 4.3. Peer Address Determination and Flow Aggregation Both at ingress and at egress the boundary nodes are faced with the problem of classifying flows by aggregate. As mentioned in the previous section, this is conceptually equivalent to having a table in each node, mapping source/destination prefix pair to the identity of the peer node. Using tunnels between ingress and egress requires the equivalent information, since otherwise the ingress node does not know the tunnel to which a given flow should be directed. The PCN- egress-node must in addition have a mapping from the PCN-ingress-node identity to its address. This mapping is trivial if the node address is used to represent its identity. The table just described is equivalent to the information required to establish full-mesh direct routing between the boundary nodes. It seems unfortunate if it is really necessary to bypass the information-hiding benefits of routing through interior nodes. Let us consider the possibilities for acquiring the necessary mappings, to see if we can do better. o The mappings could be installed in each boundary node by configuration. This raises obvious concerns for scalability and responsiveness to changes in the external prefixes served by each boundary node. Tsou & Taylor Expires May 20, 2008 [Page 9] Internet-Draft PCN Boundary Node Behaviour November 2007 o The mappings could be acquired by an automatic peer-to-peer discovery procedure tied to the exchange of routing data within the PCN-domain. o The mapping for each flow could be determined as it is offered to the PCN-ingress-node, through use of an RSVP PATH message or NSIS equivalent. The first two methods have the disadvantage that they require the persistent storage of a full mapping table at each boundary node. Their advantage is that they would require less messaging and associated resource consumption than the third approach. Moreover, when a new flow is offered to the PCN-ingress-node, it is in a position to make an immediate decision to admit or not, rather than having to wait a round-trip for a response from the PCN-egress-node. The third method, per-flow mapping query, works best if flows tend to be focussed between specific ingress-egress pairs rather than spread uniformly around the network. The per-flow burden will be reduced to the extent that flow mappings can be cached and reused at the ingress and egress nodes. Since in general new flows will not be between the same source and destination addresses as existing ones, reusability of the mapping data requires that the information exchanged between the ingress and egress nodes be in the form of the prefixes routed through the respective nodes rather than the specific addresses involved in the flow that triggered the information exchange. While the use of one or more centralized collector nodes is out of scope of the current PCN charter, one can visualize a system wherein such nodes acquire the list of served prefixes from each boundary node and provide aggregate identifiers in response to per-flow queries from ingress and egress nodes. The collector nodes receive the CLE messages from the PCN-egress-nodes, with metered results presented for each aggregate identifier active at the egress node. They forward the CLE results to the PCN-ingress-nodes after mapping from aggregate identifier to PCN-ingress-node address. It would be desirable not to exclude this model of operation when creating the basic PCN design. Tsou & Taylor Expires May 20, 2008 [Page 10] Internet-Draft PCN Boundary Node Behaviour November 2007 5. Communication Behavior between Boundary Nodes 5.1. PCN-ingress-node to PCN-egress-node The discussion of the previous section suggests that, except when configuration is used to provide the flow-to-aggregate and aggregate-to peer address mappings, the PCN-ingress-node will have to send some sort of message to the PCN-egress-node to establish these mappings, for a specific flow or all flows that could occur between them. We have already remarked on the possible use of the RSVP PATH message (modified to carry source prefix information as well as the specific source and destination addresses) in the ingree-to-egress direction on a per-flow basis. In the reverse direction, the RESV (which may contain a CLE message) would carry the destination prefix served by the PCN-egress-node. Whether the prefix information in each direction is merely the range within which the specific source or destination address of the flow lies or the complete set of prefixes served by the respective node is for further discussion. In addition to the creation of mappings, the PCN-ingress-node may also send probe messages to the PCN-egress-node. Current list discussion seems to lean toward putting off any consideration of probing in our initial work, but we may come back to it in the future. Probe messages are useful when there is no traffic between ingress and egress or too little traffic for the PCN-egress-node to measure accurately. Probing would be initiated at the PCN-ingress- node if after mapping an offered flow to an aggregate it found stale or no CLE information for that aggregate. The design of the probing operation should consider the appropriate action if there is excessiive delay in receiving the probe response. 5.2. PCN-egress-node to PCN-ingress-node The messaging that the PCN-egress-node may have to do as part of the flow-to-aggregate mapping procedure has already been discussed. The previous section also suggested that the PCN-egress-node will have to respond to probe messages. The nature of that response depends on the particular marking behaviour and algorithms used in the network. Aside from helping to generate mappings and responding to probes, the PCN-egress-node must report the results of its measurements. As indicated already, this is done by sending a CLE message to each PCN- ingress-node (or to a central collector node on its behalf). The content of these messages provides the basis for the PCN-ingress-node or some other policy entity to make flow admission or flow termination decisions. One question that must be addressed is: when should the CLE message Tsou & Taylor Expires May 20, 2008 [Page 11] Internet-Draft PCN Boundary Node Behaviour November 2007 be sent? There are three basic possibilities: o The CLE message is sent in response to polling by the PCN-ingress- node. An interesting variant of this is that the trigger for sending the CLE is receipt of an RSVP PATH message or NSIS equivalent, sent to acquire the mapping between a specific flow and an aggregate. The CLE information would thus be made available precisely when it is needed to make the admission decision. This fails to take care of the requirements for flow termination, however, so something more would be needed. o The second possibility is that the CLE message is sent autonomously whenever the congestion level estimate crosses pre- configured lower and upper thresholds. Because of the lack of information redundancy in the CLE messages transmitted compared the other methods, this approach requires that the CLE message be delivered reliably. This method could be used to supplement the polling approach described in the previous bullet, with the CLE messages being sent autonomously only for transitions across the upper threshold. o The final possibility is that the CLE message is sent periodically at a fixed interval. This method is more robust than the others when CLE messages go missing, since the PCN-ingress-node has past data which it can extrapolate until the next measurement arrives. Tsou & Taylor Expires May 20, 2008 [Page 12] Internet-Draft PCN Boundary Node Behaviour November 2007 6. Security Considerations PCN-ingress-node and PCN-egress-node dealing with DDOS attack and similarities. When DDOS attack arrives at PCN-ingress-node. PCN- ingress-node should figure them out and take some action to protect the existing payload and itself from failure. Tsou & Taylor Expires May 20, 2008 [Page 13] Internet-Draft PCN Boundary Node Behaviour November 2007 7. IANA Considerations This memo presents no IANA considerations. Tsou & Taylor Expires May 20, 2008 [Page 14] Internet-Draft PCN Boundary Node Behaviour November 2007 8. Acknowledgements Thanks to Gabriele Corliano for ideas contributed in preliminary discussion, and to Philip Eardley for an excellent review of an earlier version of this memo. Tsou & Taylor Expires May 20, 2008 [Page 15] Internet-Draft PCN Boundary Node Behaviour November 2007 9. References 9.1. Normative References [I-D.PCNarch] Eardley, P., "Pre-Congestion Notification Architecture", October 2007. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 9.2. Informative References [I-D.encodComp] Chan, K. and G. Karagiannis, "Pre-Congestion Notification Encoding Comparison", July 2007. draft-chan-pcn-encoding-comparison-00.txt (Work in progress.) [I-D.tsvwgCLarch] Briscoe, B., "Pre-Congestion Notification Encoding Comparison", October 2006. draft-briscoe-tsvwg-cl-architecture-04.txt (Expired work in progress.) [RFC1633] Braden, B., Clark, D., and S. Shenker, "Integrated Services in the Internet Architecture: an Overview", RFC 1633, June 1994. Tsou & Taylor Expires May 20, 2008 [Page 16] Internet-Draft PCN Boundary Node Behaviour November 2007 Authors' Addresses Tina Tsou Huawei Technologies F3-5-089S, R&D Center, Longgang District Shenzhen 518129 China Email: tena@huawei.com Tom Taylor Huawei Technologies 1852 Lorraine Ave Ottawa, Ontario K1H 6Z8 Canada Phone: +1 613 680 2675 Email: tom.taylor@rogers.com Tsou & Taylor Expires May 20, 2008 [Page 17] Internet-Draft PCN Boundary Node Behaviour November 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Tsou & Taylor Expires May 20, 2008 [Page 18]