idnits 2.17.1 draft-ietf-forces-sctptml-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC3654], [RFC4960], [FE-PROTO]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 691: '... The FE SHOULD do channel connection...' RFC 2119 keyword, line 693: '...ion. The CE, however, MUST NOT assume...' RFC 2119 keyword, line 745: '... ForCES FE or CE MUST support the foll...' RFC 2119 keyword, line 862: '...t is recommended that the FE SHOULD do...' RFC 2119 keyword, line 865: '...The CE, however, MUST NOT assume that ...' (1 more instance...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 1, 2009) is 5413 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 3768 (Obsoleted by RFC 5798) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Hadi Salim 3 Internet-Draft Mojatatu Networks 4 Expires: January 2, 2010 K. Ogawa 5 NTT Corporation 6 July 1, 2009 8 SCTP based TML (Transport Mapping Layer) for ForCES protocol 9 draft-ietf-forces-sctptml-04 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with the 14 provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on January 2, 2010. 34 Copyright Notice 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents in effect on the date of 41 publication of this document (http://trustee.ietf.org/license-info). 42 Please review these documents carefully, as they describe your rights 43 and restrictions with respect to this document. 45 Abstract 47 This document defines the SCTP based TML (Transport Mapping Layer) 48 for the ForCES protocol. It explains the rationale for choosing the 49 SCTP (Stream Control Transmission Protocol) [RFC4960] and also 50 describes how this TML addresses all the requirements described in 51 [RFC3654] and the ForCES protocol [FE-PROTO] draft. 53 Table of Contents 55 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 3. Protocol Framework Overview . . . . . . . . . . . . . . . . . 3 58 3.1. The PL . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3.2. The TML . . . . . . . . . . . . . . . . . . . . . . . . . 5 60 3.2.1. TML and PL Interfaces . . . . . . . . . . . . . . . . 5 61 3.2.2. TML Parameterization . . . . . . . . . . . . . . . . . 6 62 4. SCTP TML overview . . . . . . . . . . . . . . . . . . . . . . 6 63 4.1. Rationale for using SCTP for TML . . . . . . . . . . . . . 8 64 4.2. Meeting TML requirements . . . . . . . . . . . . . . . . . 8 65 4.2.1. SCTP TML Channels . . . . . . . . . . . . . . . . . . 9 66 4.2.2. Satisfying TML Requirements . . . . . . . . . . . . . 14 67 5. SCTP TML Channel Work . . . . . . . . . . . . . . . . . . . . 16 68 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 69 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 70 7.1. IPsec Usage . . . . . . . . . . . . . . . . . . . . . . . 17 71 7.1.1. SAD and SPD setup . . . . . . . . . . . . . . . . . . 18 72 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 73 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 74 9.1. Normative References . . . . . . . . . . . . . . . . . . . 18 75 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 76 Appendix A. SCTP TML Channel Work Implementation . . . . . . . . 19 77 A.1. SCTP TML Channel Initialization . . . . . . . . . . . . . 20 78 A.2. Channel work scheduling . . . . . . . . . . . . . . . . . 20 79 A.2.1. FE Channel work scheduling . . . . . . . . . . . . . . 20 80 A.2.2. CE Channel work scheduling . . . . . . . . . . . . . . 21 81 A.3. SCTP TML Channel Termination . . . . . . . . . . . . . . . 21 82 A.4. SCTP TML NE level channel scheduling . . . . . . . . . . . 22 83 Appendix B. Service Interface . . . . . . . . . . . . . . . . . . 22 84 B.1. TML Boot-strapping . . . . . . . . . . . . . . . . . . . . 23 85 B.2. TML Shutdown . . . . . . . . . . . . . . . . . . . . . . . 24 86 B.3. TML Sending and Receiving . . . . . . . . . . . . . . . . 25 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 89 1. Definitions 91 The following definitions are taken from [RFC3654]and [RFC3746]: 93 ForCES Protocol -- The protocol used at the Fp reference point in the 94 ForCES Framework in [RFC3746]. 96 ForCES Protocol Layer (ForCES PL) -- A layer in ForCES protocol 97 architecture that defines the ForCES protocol architecture and the 98 state transfer mechanisms as defined in [FE-PROTO]. 100 ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in 101 ForCES protocol architecture that specifically addresses the protocol 102 message transportation issues, such as how the protocol messages are 103 mapped to different transport media (like SCTP, IP, ATM, Ethernet, 104 etc), and how to achieve and implement reliability, security, etc. 106 2. Introduction 108 The ForCES (Forwarding and Control Element Separation) working group 109 in the IETF defines the architecture and protocol for separation of 110 Control Elements(CE) and Forwarding Elements(FE) in Network 111 Elements(NE) such as routers. [RFC3654] and [RFC3746] respectively 112 define architectural and protocol requirements for the communication 113 between CE and FE. The ForCES protocol layer specification 114 [FE-PROTO] describes the protocol semantics and workings. The ForCES 115 protocol layer operates on top of an inter-connect hiding layer known 116 as the TML. The relationship is illustrated in Figure 1. 118 This document defines the SCTP based TML for the ForCES protocol 119 layer. It also addresses all the requirements for the TML including 120 security, reliability, etc as defined in [FE-PROTO]. 122 3. Protocol Framework Overview 124 The reader is referred to the Framework document [RFC3746], and in 125 particular sections 3 and 4, for an architectural overview and 126 explanation of where and how the ForCES protocol fits in. 128 There is some content overlap between the ForCES protocol draft 129 [FE-PROTO] and this section (Section 3) in order to provide basic 130 context to the reader of this document. 132 The ForCES protocol layering constitutes two pieces: the PL and TML 133 layer. This is depicted in Figure 1. 135 +----------------------------------------------+ 136 | CE PL | 137 +----------------------------------------------+ 138 | CE TML | 139 +----------------------------------------------+ 140 ^ 141 | 142 ForCES PL | messages 143 | 144 v 145 +-----------------------------------------------+ 146 | FE TML | 147 +-----------------------------------------------+ 148 | FE PL | 149 +-----------------------------------------------+ 151 Figure 1: Message exchange between CE and FE to establish an NE 152 association 154 The PL is in charge of the ForCES protocol. Its semantics and 155 message layout are defined in [FE-PROTO]. The TML is necessary to 156 connect two ForCES end-points as shown in Figure 1. 158 Both the PL and TML are standardized by the IETF. While only one PL 159 is defined, different TMLs are expected to be standardized. The TML 160 at each of the nodes (CE and FE) is expected to be of the same 161 definition in order to inter-operate. 163 When transmitting from a ForCES end-point, the PL delivers its 164 messages to the TML. The TML then delivers the PL message to the 165 destination TML(s). 167 On reception of a message, the TML delivers the message to its 168 destination PL level (as described in the ForCES header). 170 3.1. The PL 172 The PL is common to all implementations of ForCES and is standardized 173 by the IETF [FE-PROTO]. The PL level is responsible for associating 174 an FE or CE to an NE. It is also responsible for tearing down such 175 associations. 177 An FE may use the PL level to asynchronously send packets to the CE. 178 The FE may redirect via the PL (from outside the NE) various control 179 protocol packets (e.g. OSPF, etc) to the CE. Additionally, the FE 180 delivers various events that CE has subscribed-to via PL [FE-MODEL]. 182 The CE and FE may interact synchronously via the PL. The CE issues 183 status requests to the FE and receives responses via the PL. The CE 184 also configures the associated FE's LFBs' components using the PL 185 [FE-MODEL]. 187 3.2. The TML 189 The TML level is responsible for transport of the PL level messages. 190 [FE-PROTO] section 5 defines the requirements that need to be met by 191 a TML specification. The SCTP TML specified in this document meets 192 all the requirements specified in [FE-PROTO] section 5. 193 Section 4.2.2 describes how the TML requirements are met. 195 3.2.1. TML and PL Interfaces 197 There are two interfaces to the PL and TML, both of which are out of 198 scope for ForCES. The first one is the interface between the PL and 199 TML and the other is the CE Manager (CEM)/FE Manager (FEM)[RFC3746] 200 interface to both the PL and TML. Both interfaces are shown in 201 Figure 2. 203 +----------------------------+ 204 | +----------------------+ | 205 | | | | 206 +---------+ | | PL Layer | | 207 | | | +----------------------+ | 208 |FEM/CEM |<---->| ^ | 209 | | | | | 210 +---------+ | |TML API | 211 | | | 212 | V | 213 | +----------------------+ | 214 | | | | 215 | | TML Layer | | 216 | | | | 217 | +----------------------+ | 218 +----------------------------+ 220 Figure 2: The TML-PL interface 222 Figure 2 also shows an interface referred to as CEM/FEM[RFC3746] 223 which is responsible for bootstrapping and parameterization of the 224 TML. In its most basic form the CEM/FEM interface takes the form of 225 a simple static config file which is read on startup in the pre- 226 association phase. 228 Appendix B discusses in more details the service interfaces. 230 3.2.2. TML Parameterization 232 It is expected that it should be possible to use a configuration 233 reference point, such as the FEM or the CEM, to configure the TML. 235 Some of the configured parameters may include: 237 o PL ID 239 o Connection Type and associated data. For example if a TML uses 240 IP/SCTP then parameters such as SCTP ports and IP addresses need 241 to be configured. 243 o Number of transport connections 245 o Connection Capability, such as bandwidth, etc. 247 o Allowed/Supported Connection QoS policy (or Congestion Control 248 Policy) 250 4. SCTP TML overview 252 SCTP [RFC4960] is an end-to-end transport protocol that is equivalent 253 to TCP, UDP, or DCCP in many aspects. With a few exceptions, SCTP 254 can do most of what UDP, TCP, or DCCP can achieve. SCTP as well can 255 do most of what a combination of the other transport protocols can 256 achieve (e.g. TCP and DCCP or TCP and UDP). 258 Like TCP, it provides ordered, reliable, connection-oriented, flow- 259 controlled, congestion controlled data exchange. Unlike TCP, it does 260 not provide byte streaming and instead provides message boundaries. 262 Like UDP, it can provide unreliable, unordered data exchange. Unlike 263 UDP, it does not provide multicast support 265 Like DCCP, it can provide unreliable, ordered, congestion controlled, 266 connection-oriented data exchange. 268 SCTP also provides other services that none of the 3 transport 269 protocols mentioned above provide. These include: 271 o Multi-homing 272 An SCTP connection can make use of multiple destination IP 273 addresses to communicate with its peer. 275 o Runtime IP address binding 276 With the SCTP Dynamic Address Reconfiguration ([RFC5061]) feature, 277 a new IP address can be bound at runtime. This allows for 278 migration of endpoints without restarting the association 279 (valuable for high availability). 281 o A range of reliability shades with congestion control 282 SCTP offers a range of services from full reliability to none, and 283 from full ordering to none. With SCTP, on a per message basis, 284 the application can specify a message's time-to-live. When the 285 expressed time expires, the message can be "skipped". 287 o Built-in heartbeats 288 SCTP has built-in heartbeat mechanism that validate the 289 reachability of peer addresses. 291 o Multi-streaming 292 A known problem with TCP is head of line (HOL) blocking. If you 293 have independent messages, TCP enforces ordering of such messages. 294 Loss at the head of the messages implies delays of delivery of 295 subsequent packets. SCTP allows for defining up to 64K 296 independent streams over the same socket connection, which are 297 ordered independently. 299 o Message boundaries with reliability 300 SCTP allows for easier message parsing (just like UDP but with 301 reliability built in) because it establishes boundaries on a PL 302 message basis. On a TCP stream, one would have to use techniques 303 such peeking into the message to figure the boundaries. 305 o Improved SYN DOS protection 306 Unlike TCP, which does a 3 way connection setup handshake, SCTP 307 does a 4 way handshake. This improves against SYN-flood attacks 308 because listening sockets do not set up state until a connection 309 is validated. 311 o Simpler transport events 312 An application (such as the TML) can subscribe to be notified of 313 both local and remote transport events. Events that can be 314 subscribed-to include indication of association changes, 315 addressing changes, remote errors, expiry of timed messages, etc. 316 These events are off by default and require explicit subscription. 318 o Simplified replicasting 319 Although SCTP does not allow for multicasting it allows for a 320 single message from an application to be sent to multiple peers. 321 This reduces the messaging that typically crosses different memory 322 domains within a host (example in a kernel to user space domain of 323 an operating system). 325 4.1. Rationale for using SCTP for TML 327 SCTP has all the features required to provide a robust TML. As a 328 transport that is all-encompassing, it negates the need for having 329 multiple transport protocols in order to satisfy the TML requirements 330 ([FE-PROTO] section 5). As a result it allows for simpler coding and 331 therefore reduces a lot of the interoperability concerns. 333 SCTP is also very mature and widely used making it a good choice for 334 ubiquitous deployment. 336 4.2. Meeting TML requirements 338 PL 339 +----------------------+ 340 | | 341 +-----------+----------+ 342 | TML API 343 TML | 344 +-----------+----------+ 345 | | | 346 | +------+------+ | 347 | | TML core | | 348 | +-+----+----+-+ | 349 | | | | | 350 | SCTP socket API | 351 | | | | | 352 | | | | | 353 | +-+----+----+-+ | 354 | | SCTP | | 355 | +------+------+ | 356 | | | 357 | | | 358 | +------+------+ | 359 | | IP | | 360 | +-------------+ | 361 +----------------------+ 363 Figure 3: The TML-SCTP interface 365 Figure 3 details the interfacing between the PL and SCTP TML and the 366 internals of the SCTP TML. The core of the TML interacts on its 367 north-bound interface to the PL (utilizing the TML API). On the 368 south-bound interface, the TML core interfaces to the SCTP layer 369 utilizing the standard socket interface[SCTP-API] There are three 370 SCTP socket connections opened between any two PL endpoints (whether 371 FE or CE). 373 4.2.1. SCTP TML Channels 375 +--------------------+ 376 | | 377 | TML core | 378 | | 379 +-+-------+--------+-+ 380 | | | 381 | Med prio, | 382 | Semi-reliable | 383 | channel | 384 | | Low prio, 385 | | Unreliable 386 | | channel 387 | | | 388 ^ ^ ^ 389 | | | 390 Y Y Y 391 High prio,| | | 392 reliable | | | 393 channel | | | 394 Y Y Y 395 +-+--------+--------+-+ 396 | | 397 | SCTP | 398 | | 399 +---------------------+ 401 Figure 4: The TML-SCTP channels 403 Figure 4 details further the interfacing between the TML core and 404 SCTP layers. There are 3 channels used to separate and prioritize 405 the different types of ForCES traffic. Each channel constitutes a 406 socket interface. It should be noted that all SCTP channels are 407 congestion aware (and for that reason that detail is left out of the 408 description of the 3 channels). SCTP port 6700, 6701, 6702 are used 409 for the higher, medium and lower priority channels respectively. 411 4.2.1.1. Justifying Choice of 3 Sockets 413 SCTP allows up to 64K streams to be sent over a single socket 414 interface. The authors initially envisioned using a single socket 415 for all three channels (mapping a channel to an SCTP stream). This 416 simplifies programming of the TML as well as conserves use of SCTP 417 ports. 419 Further analysis revealed head of line blocking issues with this 420 initial approach. Lower priority packets not needing reliable 421 delivery could block higher priority packets (needing reliable 422 delivery) under congestion situation for an indeterminate period of 423 time (depending on how many outstanding lower priority packets are 424 pending). For this reason, we elected to go with mapping each of the 425 three channels to a different SCTP socket (instead of a different 426 stream within a single socket). 428 4.2.1.2. Higher Priority, Reliable channel 430 The higher priority (HP) channel uses a standard SCTP reliable socket 431 on port 6700. It is used for CE solicited messages and their 432 responses: 434 1. ForCES configuration messages flowing from CE to FE and responses 435 from the FE to CE. 437 2. ForCES query messages flowing from CE to FE and responses from 438 the FE to the CE. 440 It is recommended that PL priorities 4-7 be used for this channel and 441 that the following PL messages use the HP channel for transport: 443 o Association Setup 445 o Association Setup Response 447 o Association Teardown 449 o Config 451 o Config Response 453 o Query 455 o Query Response 457 4.2.1.3. Medium Priority, Semi-Reliable channel 459 The medium priority (MP) channel uses SCTP-PR on port 6701. Time 460 limits on how long a message is valid are set on each outgoing 461 message. This channel is used for events from the FE to the CE that 462 are obsoleted over time. Events that are accumulative in nature and 463 are recoverable by the CE (by issuing a query to the FE) can tolerate 464 lost events and therefore should use this channel. For example, a 465 generated event which carries the value of a counter that is 466 monotonically incrementing fits to use this channel. 468 It is recommended that PL priorities 2-3 be used for this channel and 469 that the following PL messages use the MP channel for transport: 471 o Event Notification 473 4.2.1.4. Lower Priority, Unreliable channel 475 The lower priority (LP) channel uses SCTP port 6702. This channel 476 also uses SCTP-PR with lower timeout values than the MP channel. The 477 reason an unreliable channel is used for redirect messages is to 478 allow the control protocol at both the CE and its peer-endpoint to 479 take charge of how the end-to-end semantics of the said control 480 protocol's operations. For example: 482 1. Some control protocols are reliable in nature, therefore making 483 this channel reliable introduces an extra layer of reliability 484 which could be harmful. So any end-to-end retransmits will 485 happen from remote. 487 2. Some control protocols may desire to have obsolescence of 488 messages over retransmissions; making this channel reliable 489 contradicts that desire. 491 Given ForCES PL level heartbeats are traffic sensitive, sending them 492 over the LP channel also makes sense. If the other end is not 493 processing other channels it will eventually get heartbeats; and if 494 it is busy processing other channels heartbeats will be obsoleted 495 locally over time (and it does not matter if they did not make it). 497 It is recommended that PL priorities 0-1 be used for this channel and 498 that that the following PL messages use the LP channel for transport: 500 o Packet Redirect 502 o Heartbeats 504 4.2.1.5. Scheduling of The 3 Channels 506 Strict priority work-conserving scheduling is used to process both on 507 sending and receiving (of the PL messages) by the TML Core as shown 508 in Figure 5. 510 This means that the HP messages are always processed first until 511 there are no more left. The LP channel is processed only if a 512 channel that is higher priority than itself has no more messages left 513 to process. This means that under congestion situation, a higher 514 priority channel with sufficient messages that occupy the available 515 bandwidth would starve lower priority channel(s). 517 The design intent of the SCTP TML is to tie prioritization as 518 described in Section 4.2.1.1 and transport congestion control to 519 provide implicit node congestion control. This is further detailed 520 in Appendix A.2. 522 SCTP channel +----------+ 523 Work available | DONE +---<--<--+ 524 | +---+------+ | 525 Y ^ 526 | +-->--+ +-->---+ | 527 +-->-->-+ | | | | | 528 | | | | | | ^ 529 | ^ ^ Y ^ Y | 530 ^ / \ | | | | | 531 | / \ | ^ | ^ ^ 532 | / Is \ | / \ | / \ | 533 | / there \ | /Is \ | /Is \ | 534 ^ / HP work \ ^ /there\ ^ /there\ ^ 535 | \ ? / | /MP work\ | /LP work\ | 536 | \ / | \ ? / | \ ? / | 537 | \ / | \ / | \ / ^ 538 | \ / ^ \ / ^ \ / | 539 | \ / | \ / | \ / | 540 ^ Y-->-->-->+ Y-->-->-->+ Y->->->-+ 541 | | NO | NO | NO 542 | | | | 543 | Y Y Y 544 | | YES | YES | 545 ^ | | | 546 | Y Y Y 547 | +----+------+ +---|-------+ +----|------+ 548 | |- process | |- process | |- process | 549 | | HP work | | MP work | | LP work | 550 | +------+----+ +-----+-----+ +-----+-----+ 551 | | | | 552 ^ Y Y Y 553 | | | | 554 | Y Y Y 555 +--<--<---+--<--<----<----+-----<---<-----+ 557 Figure 5: SCTP TML Strict Priority Scheduling 559 4.2.1.6. SCTP TML Parameterization 561 The following is a list of parameters needed for booting the TML. It 562 is expected these parameters will be extracted via the FEM/CEM 563 interface for each PL ID. 565 1. The IP address or a resolvable DNS/hostname of the CE/FE. 567 2. Whether to use IPsec or not. If IPsec is used, how to 568 parameterize the different required ciphers, keys etc as 569 described in Section 7.1 571 3. The HP SCTP port, as discussed in Section 4.2.1.2. The default 572 HP port value is 6700 (Section 6). 574 4. The MP SCTP port, as discussed in Section 4.2.1.3. The default 575 MP port value is 6701 (Section 6). 577 5. The LP SCTP port, as discussed in Section 4.2.1.4. The default 578 LP port value is 6702 (Section 6). 580 4.2.2. Satisfying TML Requirements 582 [FE-PROTO] section 5 lists requirements that a TML needs to meet. 583 This section describes how the SCTP TML satisfies those requirements. 585 4.2.2.1. Satisfying Reliability Requirement 587 As mentioned earlier, a shade of reliability ranges is possible in 588 SCTP. Therefore this requirement is met. 590 4.2.2.2. Satisfying Congestion Control Requirement 592 Congestion control is built into SCTP. Therefore, this requirement 593 is met. 595 4.2.2.3. Satisfying Timeliness and Prioritization Requirement 597 By using 3 sockets in conjunction with the partial-reliability 598 feature, both timeliness and prioritization can be achieved. 600 4.2.2.4. Satisfying Addressing Requirement 602 There are no extra headers required for SCTP to fulfil this 603 requirement. SCTP can be told to replicast packets to multiple 604 destinations. The TML implementation will need to translate PL level 605 addresses, to a variety of unicast IP addresses in order to emulate 606 multicast and broadcast PL addresses. 608 4.2.2.5. Satisfying HA Requirement 610 Transport link resiliency is one of SCTP's strongest point. Failure 611 detection and recovery is built in, as mentioned earlier. 613 o The SCTP multi-homing feature is used to provide path diversity. 614 Should one of the peer IP addresses become unreachable, the 615 other(s) are used without needing lower layer convergence 616 (routing, for example) or even the TML becoming aware. 618 o SCTP heartbeats and data transmission thresholds are used on a per 619 peer IP address to detect reachability faults. The faults could 620 be a result of an unreachable address or peer, which may be caused 621 by a variety of reasons, like interface, network, or endpoint 622 failures. The cause of the fault is noted. 624 o With the ADDIP feature, one can migrate IP addresses to other 625 nodes at runtime. This is not unlike the VRRP[RFC3768] protocol 626 use. This feature is used in addition to multi-homing in a 627 planned migration of activity from one FE/CE to another. In such 628 a case, part of the provisioning recipe at the CE for replacing an 629 FE involves migrating activity of one FE to another. 631 4.2.2.6. Satisfying Node Overload Prevention Requirement 633 The architecture of this TML defines three separate channels, one per 634 socket, to be used within any FE-CE setup. The scheduling design for 635 processing the TML channels (Section 4.2.1.5) is strict priority. A 636 fundamental desire of the strict prioritization is to ensure that 637 more important work always gets node resources such as CPU and 638 bandwidth over lesser important work. 640 When a ForCES node CPU is overwhelmed because the incoming packet 641 rate is higher than it can keep up with, the channel queues grow and 642 transport congestion subsequently follows. By virtue of using SCTP, 643 the congestion is propagated back to the source of the incoming 644 packets and eventually alleviated. 646 The HP channel work gets prioritized at the expense of the MP which 647 gets prioritized over LP channels. The preferential scheduling only 648 kicks in when there is node overload regardless of whether there is 649 transport congestion. As a result of the preferential work 650 treatment, the ForCES node achieves a robust steady processing 651 capacity. Refer to Appendix A.2 for details on scheduling. 653 For an example of how the overload prevention works: consider a 654 scenario where an overwhelming amount redirected packets (from 655 outside the NE) coming into the NE may overload the FE while it has 656 outstanding config work from the CE. In such a case, the FE, while 657 it is busy processing config requests from the CE ignores processing 658 the redirect packets on the LP channel. If enough redirect packets 659 accumulate, they are dropped either because the LP channel threshold 660 is exceeded or because they are obsoleted. If on the other hand, the 661 FE has successfully processed the higher priority channels and their 662 related work, then it can proceed and process the LP channel. So as 663 demonstrated in this case, the TML ties transport and node overload 664 implicitly together. 666 4.2.2.7. Satisfying Encapsulation Requirement 668 There is no extra encapsulation added by the SCTP TML. 670 In the future, should the need arise, a new SCTP extension/chunk can 671 be defined to meet newer ForCES requirements [RFC4960]. 673 5. SCTP TML Channel Work 675 There are two levels of TML channel work within an NE when a ForCES 676 node (CE or FE) is connected to multiple other ForCES nodes: 678 1. NE-level I/O work where a ForCES node (CE or FE) needs to choose 679 which of the peer nodes to process. 681 2. Node-level I/O work where a ForCES node, handles the three SCTP 682 TML channels separately for each single ForCES endpoint. 684 NE-level scheduling definition is left up to the implementation and 685 is considered out of scope for this document. Appendix A.4 discuss 686 briefly some constraints that an implementor needs to worry about. 688 This document provides suggestions on SCTP channel work 689 implementation in Appendix A. 691 The FE SHOULD do channel connections to the CE in the order of 692 incrementing priorities i.e. LP socket first, followed by MP and 693 ending with HP socket connection. The CE, however, MUST NOT assume 694 that there is ordering of socket connections from any FE. 696 6. IANA Considerations 698 This document makes request of IANA to reserve SCTP ports 6700, 6701, 699 and 6702. 701 7. Security Considerations 703 The SCTP TML provides the following security services to the PL 704 level: 706 o A mechanism to authenticate ForCES CEs and FEs at transport level 707 in order to prevent the participation of unauthorized CEs and 708 unauthorized FEs in the control and data path processing of a 709 ForCES NE. 711 o A mechanism to ensure message authentication of PL data and 712 headers transferred from the CE to FE (and vice-versa) in order to 713 prevent the injection of incorrect data into PL messages. 715 o A mechanism to ensure the confidentiality of PL data and headers 716 transferred from the CE to FE (and vice-versa), in order to 717 prevent disclosure of PL level information transported via the 718 TML. 720 Security choices provided by the TML are made by the operator and 721 take effect during the pre-association phase of the ForCES protocol. 722 An operator may choose to use all, some or none of the security 723 services provided by the TML in a CE-FE connection. 725 When operating under a secured environment, or for other operational 726 concerns (in some cases performance issues) the operator may turn off 727 all the security functions between CE and FE. 729 IP Security Protocol (IPsec) [RFC4301] is used to provide needed 730 security mechanisms. 732 IPsec is an IP level security scheme transparent to the higher-layer 733 applications and therefore can provide security for any transport 734 layer protocol. This gives IPsec the advantage that it can be used 735 to secure everything between the CE and FE without expecting the TML 736 implementation to be aware of the details. 738 The IPsec architecture is designed to provide message integrity and 739 message confidentiality outlined in the TML security requirements 740 ([FE-PROTO]). Mutual authentication and key exchange protocol are 741 provided by Internet Key Exchange (IKE)[RFC4109]. 743 7.1. IPsec Usage 745 A ForCES FE or CE MUST support the following: 747 o Internet Key Exchange (IKE)[RFC4109] with certificates for 748 endpoint authentication. 750 o Transport Mode Encapsulating Security Payload (ESP)[RFC4303]. 752 o HMAC-SHA1-96 [RFC2404] for message integrity protection 754 o AES-CBC with 128-bit keys [RFC3602] for message confidentiality. 756 o Replay protection[RFC4301]. 758 It is expected to be possible for the CE or FE to be operationally 759 configured to negotiate other cipher suites and even use manual 760 keying. 762 7.1.1. SAD and SPD setup 764 To minimize the operational configuration it is recommended that only 765 the IANA issued SCTP protocol number(132) be used as a selector in 766 the Security Policy Database (SPD) for ForCES. In such a case only a 767 single SPD and SAD entry is needed. 769 It should be straightforward to extend such a policy to alternatively 770 use the 3 SCTP TML port numbers as SPD selectors. But as noted above 771 this choice will require increased number of SPD entries. 773 In scenarios where multiple IP addresses are used within a single 774 association, and there is desire to configure different policies on a 775 per IP address, then it is recommended to follow [RFC3554] 777 8. Acknowledgements 779 The authors would like to thank Joel Halpern, Michael Tuxen, Randy 780 Stewart and Evangelos Haleplidis for engaging us in discussions that 781 have made this draft better. 783 9. References 785 9.1. Normative References 787 [RFC2404] Madson, C. and R. Glenn, "The Use of HMAC-SHA-1-96 within 788 ESP and AH", RFC 2404, November 1998. 790 [RFC3554] Bellovin, S., Ioannidis, J., Keromytis, A., and R. 791 Stewart, "On the Use of Stream Control Transmission 792 Protocol (SCTP) with IPsec", RFC 3554, July 2003. 794 [RFC3602] Frankel, S., Glenn, R., and S. Kelly, "The AES-CBC Cipher 795 Algorithm and Its Use with IPsec", RFC 3602, 796 September 2003. 798 [RFC4109] Hoffman, P., "Algorithms for Internet Key Exchange version 799 1 (IKEv1)", RFC 4109, May 2005. 801 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 802 Internet Protocol", RFC 4301, December 2005. 804 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 805 RFC 4303, December 2005. 807 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 808 RFC 4960, September 2007. 810 [RFC5061] Stewart, R., Xie, Q., Tuexen, M., Maruyama, S., and M. 811 Kozuka, "Stream Control Transmission Protocol (SCTP) 812 Dynamic Address Reconfiguration", RFC 5061, 813 September 2007. 815 9.2. Informative References 817 [FE-MODEL] 818 Halpern, J. and J. Hadi Salim, "ForCES Forwarding Element 819 Model", October 2008. 821 [FE-PROTO] 822 Doria (Ed.), A., Haas (Ed.), R., Hadi Salim (Ed.), J., 823 Khosravi (Ed.), H., M. Wang (Ed.), W., Dong, L., and R. 824 Gopal, "ForCES Protocol Specification", November 2008. 826 [RFC3654] Khosravi, H. and T. Anderson, "Requirements for Separation 827 of IP Control and Forwarding", RFC 3654, November 2003. 829 [RFC3746] Yang, L., Dantu, R., Anderson, T., and R. Gopal, 830 "Forwarding and Control Element Separation (ForCES) 831 Framework", RFC 3746, April 2004. 833 [RFC3768] Hinden, R., "Virtual Router Redundancy Protocol (VRRP)", 834 RFC 3768, April 2004. 836 [SCTP-API] 837 Stewart, R., Poon, K., Tuexen, M., Yasevich, V., and P. 838 Lei, "Sockets API Extensions for Stream Control 839 Transmission Protocol (SCTP)", Feb. 2009. 841 Appendix A. SCTP TML Channel Work Implementation 843 As mentioned in Section 5, there are two levels of TML channel work 844 within an NE when a ForCES node (CE or FE) is connected to multiple 845 other ForCES nodes: 847 1. NE-level I/O work where a ForCES node (CE or FE) needs to choose 848 which of the peer nodes to process. 850 2. Node-level I/O work where a ForCES node, handles the three SCTP 851 TML channels separately for each single ForCES endpoint. 853 NE-level scheduling definition is left up to the implementation and 854 is considered out of scope for this document. Appendix A.4 discuss 855 briefly some constraints that an implementor needs to worry about. 857 This document and in particular Appendix A.1, Appendix A.2 and 858 Appendix A.3 discuss details of node-level I/O work. 860 A.1. SCTP TML Channel Initialization 862 As discussed in Section 5, it is recommended that the FE SHOULD do 863 socket connections to the CE in the order of incrementing priorities 864 i.e. LP socket first, followed by MP and ending with HP socket 865 connection. The CE, however, MUST NOT assume that there is ordering 866 of socket connections from any FE. Appendix B.1 has more details on 867 the expected initialization of SCTP channel work. 869 A.2. Channel work scheduling 871 This section provides high level details of the scheduling view of 872 the SCTP TML core (Section 4.2.1). A practical scheduler 873 implementation takes care of many little details (such as timers, 874 work quanta, etc) not described in this document. The implementor is 875 left to take care of those details. 877 The CE(s) and FE(s) are coupled together in the principles of the 878 scheduling scheme described here to tie together node overload with 879 transport congestion. The design intent is to provide the highest 880 possible robust work throughput for the NE under any network or 881 processing congestion. 883 A.2.1. FE Channel work scheduling 885 The FE scheduling, in priority order, needs to I/O process: 887 1. The HP channel I/O in the following priority order: 889 1. Transmitting back to the CE any outstanding result of 890 executed work via the HP channel transmit path. 892 2. Taking new incoming work from the CE which creates ForCES 893 work to be executed by the FE. 895 2. ForCES events which result in transmission of unsolicited ForCES 896 packets to the CE via the MP channel. 898 3. Incoming Redirect work in the form of control packets that come 899 from the CE via LP channel. After redirect processing, these 900 packets get sent out on external (to the NE) interface. 902 4. Incoming Redirect work in the form of control packets that come 903 from other NEs via external (to the NE) interfaces. After some 904 processing, such packets are sent to the CE. 906 It is worth emphasizing at this point again that the SCTP TML 907 processes the channel work in strict priority. For example, as long 908 as there are messages to send to the CE on the HP channel, they will 909 be processed first until there are no more left before processing the 910 next priority work (which is to read new messages on the HP channel 911 incoming from the CE). 913 A.2.2. CE Channel work scheduling 915 The CE scheduling, in priority order, needs to deal with: 917 1. The HP channel I/O in the following priority order: 919 1. Process incoming responses to requests of work it made to the 920 FE(s). 922 2. Transmitting any outstanding HP work it needs for the FE(s) 923 to complete. 925 2. Incoming ForCES events from the FE(s) via the MP channel. 927 3. Outgoing Redirect work in the form of control packets that get 928 sent from the CE via LP channel destined to external (to the NE) 929 interface on FE(s). 931 4. Incoming Redirect work in the form of control packets that come 932 from other NEs via external (to the NE) interfaces on the FE(s). 934 It is worth to repeat for emphasis again that the SCTP TML processes 935 the channel work in strict priority. For example, if there are 936 messages incoming from an FE on the HP channel, they will be 937 processed first until there are no more left before processing the 938 next priority work which is to transmit any outstanding HP channel 939 messages going to the FE. 941 A.3. SCTP TML Channel Termination 943 Appendix B.2 describes a controlled disassociation of the FE from the 944 NE. 946 It is also possible for connectivity to be lost between the FE and CE 947 on one or more sockets. In cases where SCTP multi-homing features 948 are used for path availability, the disconnection of a socket will 949 only occur if all paths are unreachable; otherwise, SCTP will ensure 950 reachability. In the situation of a total connectivity loss of even 951 one SCTP socket, it is recommended that the FE and CE SHOULD assume a 952 state equivalent to ForCES Association Teardown being issued and 953 follow the sequence described in Appendix B.2. 955 A CE could also disconnect sockets to an FE to indicate an "emergency 956 teardown". The "emergency teardown" may be necessary in cases when a 957 CE needs to disconnect an FE but knows that an FE is busy processing 958 a lot of outstanding commands (some of which the FE hasn't got around 959 to processing yet). By virtue of the CE closing the connections, the 960 FE will immediately be asynchronously notified and will not have to 961 process any outstanding commands from the CE. 963 A.4. SCTP TML NE level channel scheduling 965 In handling NE-level I/O work, an implementation needs to worry about 966 being both fair and robust across peer ForCES nodes. 968 Fairness is desired so that each peer node makes progress across the 969 NE. For the sake of illustration consider two FEs connected to a CE; 970 whereas one FE has a few HP messages that need to be processed by the 971 CE, another may have infinite HP messages. The scheduling scheme may 972 decide to use a quota scheduling system to ensure that the second FE 973 does not hog the CE cycles. 975 Robustness is desired so that the NE does not succumb to a DoS attack 976 from hostile entities and always achieves a maximum stable workload 977 processing level. For the sake of illustration consider again two 978 FEs connected to a CE. Consider FE1 as having a large number of HP 979 and MP messages and FE2 having a large number of MP and LP messages. 980 The scheduling scheme needs to ensure that while FE1 always gets its 981 messages processed, at some point we allow FE2 messages to be 982 processed. A promotion and preemption based scheduling could be used 983 by the CE to resolve this issue. 985 Appendix B. Service Interface 987 This section provides high level service interface between FEM/CEM 988 and TML, the PL and TML, and between local and remote TMLs. The 989 intent of this interface discussion is to provide general guidelines. 990 The implementer is expected to worry about details and even follow a 991 different approach if needed. 993 The theory of operation for the PL-TML service is as follows: 995 1. The PL starts up and bootstraps the TML. The end result of a 996 successful TML bootstrap is that the CE TML and the FE TML 997 connect to each other at the transport level. 999 2. Sending and reception of the PL level messages commences after a 1000 successful TML bootstrap. The PL uses send and receive PL-TML 1001 interfaces to communicate to its peers. The TML is agnostic to 1002 the nature of the messages being sent or received. The first 1003 message exchanges that happen are to establish ForCES 1004 association. Subsequent messages maybe either unsolicited events 1005 from the FE PL, control message redirects from/to the CE to/from 1006 FE, and configuration from the CE to the FE and their responses 1007 flowing from the FE to the CE. 1009 3. The PL does a shutdown of the TML after terminating ForCES 1010 association. 1012 B.1. TML Boot-strapping 1014 Figure 6 illustrates a flow for the TML bootstrapped by the PL. 1016 When the PL starts up (possibly after some internal initialization), 1017 it boots up the TML. The TML first interacts with the FEM/CEM and 1018 acquires the necessary TML parameterization (Section 4.2.1.6). Next 1019 the TML uses the information it retrieved from the FEM/CEM interface 1020 to initialize itself. 1022 The TML on the FE proceeds to connect the 3 channels to the CE. The 1023 socket interface is used for each of the channels. The TML continues 1024 to re-try the connections to the CE until all 3 channels are 1025 connected. It is advisable that the number of connection retry 1026 attempts and the time between each retry is also configurable via the 1027 FEM. On failure to connect one or more channels, and after the 1028 configured number of retry thresholds is exceeded, the TML will 1029 return an appropriate failure indicator to the PL. On success (as 1030 shown in Figure 6), a success indication is presented to the TML. 1032 FE PL FE TML FEM CEM CE TML CE PL 1033 | | | | | | 1034 | | | | | Bootup | 1035 | | | | |<-------------------| 1036 | Bootup | | | | | 1037 |----------->| | |get CEM info| | 1038 | |get FEM info | |<-----------| | 1039 | |------------>| ~ ~ | 1040 | ~ ~ |----------->| | 1041 | |<------------| | | 1042 | | |-initialize TML | 1043 | | |-create the 3 chans.| 1044 | | | to listen to FEs | 1045 | | | | 1046 | |-initialize TML |Bootup success | 1047 | |-create the 3 chans. locally |------------------->| 1048 | |-connect 3 chans. remotely | | 1049 | |------------------------------>| | 1050 | ~ ~ - FE TML connected ~ 1051 | ~ ~ - FE TML info init ~ 1052 | | channels connected | | 1053 | |<------------------------------| | 1054 | Bootup | | | 1055 | succeeded | | | 1056 |<-----------| | | 1057 | | | | 1059 Figure 6: SCTP TML Bootstrapping 1061 On the CE things are slightly different. After initializing from the 1062 CEM, the TML on the CE side proceeds to initialize the 3 channels to 1063 listen to remote connections from the FEs. The success or failure 1064 indication is passed on to the CE PL level (in the same manner as was 1065 done in the FE). 1067 Post boot-up, the CE TML waits for connections from the FEs. Upon a 1068 successful connection by an FE, the CE TML level keeps track of the 1069 transport level details of the FE. Note, at this stage only 1070 transport level connection has been established; ForCES level 1071 association follows using send/receive PL-TML interfaces (refer to 1072 Appendix B.3 and Figure 8). 1074 B.2. TML Shutdown 1076 Figure 7 shows an example of an FE shutting down the TML. It is 1077 assumed at this point that the ForCES Association Teardown has been 1078 issued by the CE. 1080 When the FE PL issues a shutdown to its TML for a specific PL ID, the 1081 TML releases all the channel connections to the CE. This is achieved 1082 by closing the sockets used to communicate to the CE. 1084 FE PL FE TML CE TML CE PL 1085 | | | | 1086 | Shutdown | | | 1087 |----------->| | | 1088 | |-disconnect 3 chans. | | 1089 | |------------------------>| | 1090 | | | | 1091 | | |-FE TML info cleanup| 1092 | | |-optionally tell PL | 1093 | | |------------------->| 1094 | |- clean up any state of | | 1095 | | channels disconnected | | 1096 | | | | 1097 | |<------------------------| | 1098 | Shutdown | | | 1099 | succeeded | | | 1100 |<-----------| | | 1101 | | | | 1103 Figure 7: FE Shutting down 1105 On the CE side, a TML level disconnection would result in possible 1106 cleanup of the FE state. Optionally, depending on the 1107 implementation, there may be need to inform the PL about the TML 1108 disconnection. 1110 B.3. TML Sending and Receiving 1112 The TML is agnostic to the nature of the PL message it delivers to 1113 the remote TML (which subsequently delivers the message to its PL). 1114 Figure 8 shows an example of a message exchange originated at the FE 1115 and sent to the CE (such as a ForCES association message) which 1116 illustrates all the necessary service interfaces for sending and 1117 receiving. 1119 When the FE PL sends a message to the TML, the TML is expected to 1120 pick one of HP/MP/LP channels and send out the ForCES message. 1122 FE PL FE TML CE TML CE PL 1123 | | | | 1124 |PL send | | | 1125 |----------->| | | 1126 | | | | 1127 | |-Format msg. | | 1128 | |-pick channel | | 1129 | |-TML Send | | 1130 | |------------->| | 1131 | | |-TML Receive on chan. | 1132 | | |-decapsulate | 1133 | | |- mux to PL/PL recv | 1134 | | |--------------------->| 1135 | | | ~ 1136 | | | ~ PL Process 1137 | | | ~ 1138 | | | PL send | 1139 | | |<---------------------| 1140 | | |-Format msg. for send | 1141 | | |-pick chan to send on | 1142 | | |-TML send | 1143 | |<-------------| | 1144 | |-TML Receive | | 1145 | |-decapsulate | | 1146 | |-mux to PL | | 1147 | PL Recv | | | 1148 |<---------- | | | 1149 | | | | 1151 Figure 8: Send and Recv Flow 1153 When the CE TML receives the ForCES message on the channel it was 1154 sent on, it demultiplexes the message to the CE PL. 1156 The CE PL, after some processing (in this example dealing with the 1157 FE's association), sends to the TML the response. And as in the case 1158 of FE PL, the CE TML picks the channel to send on before sending. 1160 The processing of the ForCES message upon arriving at the FE TML and 1161 delivery to the FE PL is similar to the CE side equivalent as shown 1162 above in Appendix B.3. 1164 Authors' Addresses 1166 Jamal Hadi Salim 1167 Mojatatu Networks 1168 Ottawa, Ontario 1169 Canada 1171 Email: hadi@mojatatu.com 1173 Kentaro Ogawa 1174 NTT Corporation 1175 3-9-11 Midori-cho 1176 Musashino-shi, Tokyo 180-8585 1177 Japan 1179 Email: ogawa.kentaro@lab.ntt.co.jp