Network A. Antony Internet-Draft S. Klassert Intended status: Standards Track secunet Expires:May 6,August 26, 2021 P. Wouters Red HatNovember 2, 2020February 22, 2021 IKEv2 support for per-queue Child SAsdraft-pwouters-multi-sa-performance-00draft-pwouters-multi-sa-performance-01 Abstract This document defines two NotificationPayload (NUM_QUEUES and QUEUE_INFO)Payloads for the Internet Key Exchange Protocol Version 2(IKEv2).(IKEv2): NUM_QUEUES and QUEUE_INFO. These payloads add support for indicating that the negotiating of multiple identical Child SAsthat canare to be used totooptimize performance based on the number of queues or CPUs,orcwor to create multiple Child SAs for different Quality of Service (QoS) levels. It indicates that a newer idetnical Child SA should not be interpreted as a replacement Child SA. Using multiple identical Child Sa's has theadditionalbenefit thatmultiple streams have theireach stream has its own Sequence Number, ensuring that CPU's don't have to synchronize their crypto state or disable their packet replaywindowdetection. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onMay 6,August 26, 2021. Copyright Notice Copyright (c)20202021 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Performance bottlenecks . . . . . . . . . . . . . . . . . . . 3 3. Negotiation of performance specific Child SAs . . . . . . . . 3 4. Implementation specifics . . . . . . . . . . . . . . . . . . 4 4.1. OneChild perCPU per Child . . . . . . . . . . . . . . . . . . . . 4 4.2. QoS Child SA's . . . . . . . . . . . . . . . . . . . . .56 5. Payload Format . . . . . . . . . . . . . . . . . . . . . . . 6 5.1. NUM_QUEUES Notify Payload . . . . . . . . . . . . . . . .67 5.2. QUEUE_INFO Notify Payload . . . . . . . . . . . . . . . .67 6. Security Considerations . . . . . . . . . . . . . . . . . . .78 7. Implementation Status . . . . . . . . . . . . . . . . . . . .78 7.1. Linux XFRM . . . . . . . . . . . . . . . . . . . . . . . 8 7.2. Libreswan . . . . . . . . . . . . . . . . . . . . . . . .89 7.3. strongSWAN . . . . . . . . . . . . . . . . . . . . . . . 9 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . .910 9. References . . . . . . . . . . . . . . . . . . . . . . . . .910 9.1. Normative References . . . . . . . . . . . . . . . . . .910 9.2. Informative References . . . . . . . . . . . . . . . . . 10 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .1011 1. Introduction IPsec implementations are currently limited to using one queue or CPU per Child SA. The result is that a machine with many queues/CPUs is limited to only using one these per Child SA. This severely limits the speeds that can be obtained. An unencrypted link of 10gbps or more is commonly reduced to 2-3gbps when IPsec is used to encrypt the link, for example when using AES-GCM. Furthermore IPsec implementations are currently limited to use the same Child SA for all Quality of Service (QoS) typesbacausebecause the QoS type is not a part of the TS. The result is that IPsec can't do active Quality of Servicepriorizingprioritizing without disabling the anti replay detection. While this could be mitigated by setting up multiple narrowed Child SA's, for example using Populate From Packet (PFP) as specified in [RFC4301], this IPsec feature is not widely implemented. To make better use of multiple network queues and CPUs, it can be beneficial to negotiate and install multiple identical Child SAs. IKEv2 [RFC7296] already allows installing multiple identical Child SAs, but often implementations will assume the older Child SA is being replaced by the newer Child Sa, even when no INITIAL_CONTACT notify payload was received. When two IKEv2 peers want to negotiate multiple Child SAs, itwould beis usefulfor themto be able to convey how manyof theseChild SAs areconsidered acceptable to install.required for optimized traffic. This avoids triggering CREATE_CHILD_SA exchanges that will only be rejectedwith TS_UNACCEPTABLE.by the peer. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Performance bottlenecks Currently, most IPsec implementations are limited by using one CPU or network queue per Child SA. There are a number ofperformancepractical reasons for this, but a key limitation is that sharing the AEAD state, counters and sequence numbers between multiple CPUs is not feasible without a significant performance penalty. There is a need to negotiate and establish multiple Child SA's with identical TSi/TSr on a per-queue or per-CPU basis. 3. Negotiation of performance specific Child SAs The number of Child SA's notify payload refers to the number of instances for this particular TSi/TSrcombination.combination beyond the initial Child SA. Bothendspeers send theirPreferredminimum number of Child SAsand the maximum of Child SAstheyare willingprefer to install. Bothendspeers pick thehighest preferred number up to the lowestmaximumnumber.iof the two numbers (within reason). That is if oneendpeer prefers 16but accepts 32,and the otherendpeer prefers48 and accepts48, then the numberpickednegotiated is32.48. If a33rd49th Child SA isattempted, the peerattempted withthe 32 maximum SHOULD returnQUEUE_INFO notify payload, it can be rejected using TS_UNACCEPTABLE. The NUM_QUEUES Notify payload is sent as part of the IKE_AUTH or as part of an CREATE_CHILD_SAmessage that contains the Traffic Selector payloadExchange foraan initial new ChildSA. If there are multiple IKE_AUTH exchanges, such as when using EAP,SA request. It identifies theTSi/TSr payloadsinitial Child SA of a set, and allows theNotify payloads defines in this document only appear inpeers to ensure that thefirst IKE_AUTH message. In CREATE_CHILD_SA,initial Child SA (or its rekeyed version) remains active for the lifetime of the IPsec connection. Further CREATE_CHILD_SA messages for subsequent copies of the original Child SA MUST NOT contain the NUM_QUEUESNotifynotify payload. This initial Child SA (or its REKEYed successor) MUSTonly be sent in messagesremain active fornew setthe lifetime of the IPsec session to ensure there is always a CHILD SA that can be selected to send traffic over. Subsequent Child SA's(the message used to set up the Head SA)can be installed with an additional selector, such as CPU or queue, or ToS value. The QUEUE_INFO Notify MUSTonlybe sent in CREATE_CHILD_SA forSub SA's.subsequent copies of the original Child SA. It is used to indicate the queue or CPU or QoS value of this specific copy of the initial Child SA. These additional Child SA's can be started on-demand or all at once and can also be deleted if a peer deems this specific queue or CPU or QoS value to be idle. During CREATE_CHILD_SA's sent for Child SA rekey, the QUEUE_INFOMAYNotify MUST NOT be included. As with Traffic Selector payloads, the QUEUE_INFO may not be different from the Child SA being rekeyed. This implies a CREATE_CHILD_SA exchange can only have either a QUEUE_INFO or NUM_QUEUES notify. Ifit is included itboth Notify types are received, NUM_QUEUES has precedence and QUEUE_INFO MUST be ignored. The NUM_QUEUES notify, even though it can be sent in IKE_AUTH exchange with TS, is not an attribute of thesameIKE peer. It is an attribute of the Child SA, similar asforhow the USE_TRANSPORT notify payload. This allows an IKE peer to have multiple ChildSA being rekeyed.SA's covering different traffic selectors and selectively decide whether or not to use multiple Child SA's for those different Child SA's. 4. Implementation specifics There are various considerations that an implementationcouldcan use to determine the best way to install the multiple Child SAs. Below are examples of such strategies. 4.1. OneChild perCPU per Child A simple distribution could be to install one Child SAperon each CPU. Note that at least one of the Child SAs must be the "fallback" in case there is no specific Child SA on a specific CPU. This role iscalled the Head SA, whereperformed by theper-CPU Child SA is called a Sub CA. Theinitial Child SAnegotiated with IKE becomesof theHead SA.set of identical Child SAs. This ensures that any CPU generating traffic to be encrypted has an available (if not optimal) Child SA to use. Any subsequent Child SA's with identical TSi/TSr areconsidered Sub SA's andinstalled in such a way to only be usedonlyby a single CPU. Implementations supporting per-CPU SAs SHOULD extend their mechanism of on-demand negotiation that is triggered by traffic to include a CPU (or queue) identifier in their ACQUIRE message from the SPD to the IKE daemon (eg via NETLINK of PFKEYv2). If thekernel'sACQUIRE message does not support sending a per-CPU identifier, then the IKE daemonshouldmay initiate all its Child SAs immediately upon receiving an ACQUIRE. Performing per-CPU Child SA negotiations can result in both peers initiatingSubadditional Child SAs at once. This is especially likely in the per-CPU acquire case. Responders should install theSubadditional Child SA onthea CPU with the least amount ofSubadditional Child SA's for this TSi/TSr pair. It should count outstanding ACQUIREs as an assignedSubadditional Child SA. It is still possible that when the peers only have one slot left to assign, that both peers send an ACQUIRE at the same time. The initiator that receives the CREATE_CHID_SA response last, eg the initiator of the slowest duplicate Child SA, MAY send a delete to delete the duplicate additional Child SA. As an optimization,Sub SA'sadditional Child SAs that see little traffic MAY be deleted.However, it MUST NOT delete an idle Head SA. This ensures both peers always have aThe initial Child SA thatcan be used by a CPU that doesis nothavelimited to aSub SA (yet) and ensures encrypted traffic can alwayssingle CPU MUST NOT beexchanged, evendeleted when idle, as it is likely to be idle if enough per-CPU Child SA's are installed. However, if one of those per-CPU child SA's is deleted because it was idle, and subsequently that CPU starts the generate traffictriggeredagain, that traffic should be encrypted by the initial non-CPU specific Child SA while the IKE daemon processes the ACQUIRE to bring up a new per-CPUACQUIRE.Child SA. When the number of queues or CPUs are different between the peers, the peer with the least amount of queues or CPUs MAY decide to not install a second outbound Child SA as it will never useitthat Child SA to send traffic. However, it MUST install all inbound Child SA's as itcannot predict which of these the other peer will usehas commited tosend traffic.receiving traffic on these negotiated Child SAs. It MUST NOT generate an error when deleting the (missing) outbound SA component ofthesuch a Child SA. A per-CPU ACQUIRE message SHOULD still send the Traffic Selector (TSi) entry containing the information of the triggerpacket.packet . This information MAY be used by theresponderpeer to select the mostefficientoptimal target CPU touse.install the additional Child SA on. For example, if the trigger packet was for a TCP destination to port 25 (SMTP), it might be able to install the Child SA on the CPU that is also running the mail server process.SeeTrigger packet Traffic Selectors are documented in [RFC7296] Section 2.9. The QUEUE_INFO Notify payloadMAYMUST be sent in the CREATE_CHILD_SA request for the additional(subSA)Child SAs. Itcan beis used to convey the QoS stream orCPUID.CPU id. Note that this ID value does not neccessarilly have to match any physical CPU IDs. [Clarify narrowing Traffic Selectors. Should it be allowed/forbidden ?] [Clarify CP / INTERNAL_ADDRESS. Should it be allowed/forbidden ?] [UDP enacap Due to the nature handling of UDP encapsulated ESP at the receiver NIC queus and intermediate routers for parallel paths, UDP encapsulated ESPwill usedmay use multiple source ports. We need define a way to select UDP source ports for the Sub SA while IKE SA and the Head remain on UDP port 4500 - 4500. NOTE:this is implemented inlibreswanonhas an expirmental implementation for Linux XFRM.] [Add text about how this parallel SA use may inter operate with 6311? may be not?] 4.2. QoS Child SA's To install multiple Child SA's for different QoS levels, a method similar to per-CPU is used. The initial Child SA is used for all QoS levels not matched by more specific Child SA's. Additional Child SA's are installed per QoS level, which can be done on-demand if the kernel's IPsec subsystem can send per-QoS level ACQUIREs to the IKE daemon. A request for a Child SA for a specific QoS value MUST include the QUEUE_INFO Notify payload set to the required QoS value so that both endpoints use the same Child SA for the same QoS level. If a certain QoS level proposed is not acceptable to theresonder,responder, TS_UNACCEPTABLE MUST be returned. During Child SA REKEY, the QUEUE_INFO NotifyMAYMUST NOT be includedbutand MUSTcontain the same value as the Child SA that is being rekeyed. [ This kind of suggests this shouldbea TS_TYPE and not a Notify ]ignored when received. 5. Payload Format All multi-octet fields representing integers are laid out in big endian order (also known as "most significant byte first", or "network byte order"). 5.1. NUM_QUEUES Notify PayloadThe NUM_QUEUES Notify payload is related to a Child SA, and MAY be exchanged in IKE_AUTH or in a CREATE_CHILD_SA for new SA. It MUST NOT be sent in CREATE_CHILD_SA for REKEY. If received for a REKEY operation, it MUST be ignored. See [RFC7296] Section 1.3.1.1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-----------------------------+-------------------------------+ ! Next Payload !C! RESERVED ! Payload Length ! +---------------+---------------+-------------------------------+ ! Protocol ID ! SPI Size ! Notify Message Type ! +---------------+---------------+-------------------------------+ !PreferredMinimum number of IPsec SAs| Max accepted number of SAs! +-------------------------------+-------------------------------+ o Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0. o SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0.by the IPsec protocol IDo Notify Message Type (2 octets) - set to [TBD] oPreferredMinimum number of per-CPU IPsec SAs(2 octets). Value MUST be greater than 0. If 0 is received, it MUST be interpreted as 1. o Maximum accepted number of per-CPU IPsec SAs (2(4 octets). initiator value Value MUST be greater than 0. If 0 is received, it MUST be interpreted as 1. Note: The first Child SA that is not bound to a single CPU(Head SA)is not counted as part of these numbers. 5.2. QUEUE_INFO Notify PayloadThe QUEUE_INFO Notify payload is an optional related to a Child SA, and MAY be exchanged in IKE_AUTH or in a CREATE_CHILD_SA for new SA. It MUST NOT be sent in CREATE_CHILD_SA for REKEY, see [RFC7296] Section 1.3.1.1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-----------------------------+-------------------------------+ ! Next Payload !C! RESERVED ! Payload Length ! +---------------+---------------+-------------------------------+ ! Protocol ID ! SPI Size ! Notify Message Type ! +---------------+---------------+-------------------------------+ ! ! ~ Optional payload data ~ ! ! +-------------------------------+-------------------------------+ o Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0. o SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0.by the IPsec protocol IDo Notify Message Type (2 octets) - set to [TBD] o Optional Payload Data. This can be set to identify the QoSoptionsvalue orCPU-IDthe CPU ID. The interpretation of the value is left to local implementations? [Probable needs to be specified by this document] 6. Security Considerations [TO DO] 7. Implementation Status [Note to RFC Editor: Please remove this section and the reference to [RFC6982] before publication.] This section records the status of known implementations of the protocol defined by this specification at the time of posting of this Internet-Draft, and is based on a proposal described in [RFC7942]. The description of implementations in this section is intended to assist the IETF in its decision processes in progressing drafts to RFCs. Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors. This is not intended as, and must not be construed to be, a catalog of available implementations or their features. Readers are advised to note that other implementations may exist. According to [RFC7942], "this will allow reviewers and working groups to assign due consideration to documents that have the benefit of running code, which may serve as evidence of valuable experimentation and feedback that have made the implemented protocols more mature. It is up to the individual working groups to use this information as they see fit". Authors are requested to add a note to the RFC Editor at the top of this section, advising the Editor to remove the entire section before publication, as well as the reference to [RFC7942]. 7.1. Linux XFRM Organization: Linux kernel XFRM Name: XFRM-PCPU-v1 https://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux- stk.git/log/?h=xfrm-pcpu-v1 Description: An initial Kernel IPsec implementation of the per-CPU method. Level of maturity: Alpha Coverage:Fully implements HeadImplements Initial Child SA and per-CPUSub SA'sadditional Child SA's. Also implements per-CPU ACQUIRES using NETLINK. PFKEYv2 is not supported. Licensing: GPLv2 Implementation experience: TBD Contact: Linux IPsec: members@linux-ipsec.org 7.2. Libreswan Organization: The Libreswan Project Name: pcpu-3 https://libreswan.org/wiki/XFRM_pCPU Description: An initial IKE implementation of the per-CPU method. Level of maturity: Alpha Coverage: implementsHeadInitial Child SA and per-CPUSub SA's.additional Child SA's Licensing: GPLv2 Implementation experience: TBD Contact: Libreswan Development: swan-dev@libreswan.org 7.3. strongSWAN Organization: Secunet Name:XXXX https://secunet.com/somethingUStrongSWAN https://github.com/antonyantony/strongswan/ Description: An initial IKE implementation of the per-CPU method. Level of maturity: Alpha Coverage: implementsHeadInitial Child SA and per-CPUSub SA's.additional Child SA's Licensing: GPLv2 Implementation experience:TBDthe Linux XFRM implemenation needs an addtional flag on the SPD entry, XFRM_POLICY_CPU_ACQUIRE. It should be set only on the "outgoing" policy. The flag should be disabled when the policy is a trap policy without SPD state. After a successfull negotation of NUM_QUEUES, the SPD policy is updated to enable the XFRM_POLICY_CPU_ACQUIRE flag. For the outgoing additional Child SAs, the u32 XFRMA_SA_PCPU attribute is set, starting from 0. The incoming SA do not need XFRMA_SA_PCPU. The kernel internally set the value 0xFFFFFF. The strongswan implentation uses private space values for NUM_QUEUES (40970) and QUEUE_INFO (40971). The iproute2 software that supporte these two attributes is available at https://github.com/antonyantony/iproute2/tree/pcpu-v1 Contact: Antony Antony: antony.antony@secunet.com. 8. IANA Considerations This document definesonetwo new IKEv2 NotifyMessagemessages for the IANA "IKEv2 Notify Message Types - Status Types" registry. Value Notify Messages - Status Types Reference ----- ------------------------------ --------------- [TBD] NUM_QUEUES [this document] [TBD] QUEUE_INFO [this document] Figure 1 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. Kivinen, "Internet Key Exchange Protocol Version 2 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October 2014, <https://www.rfc-editor.org/info/rfc7296>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. 9.2. Informative References [RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, December 2005, <https://www.rfc-editor.org/info/rfc4301>. [RFC6982] Sheffer, Y. and A. Farrel, "Improving Awareness of Running Code: The Implementation Status Section", RFC 6982, DOI 10.17487/RFC6982, July 2013, <https://www.rfc-editor.org/info/rfc6982>. [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running Code: The Implementation Status Section", BCP 205, RFC 7942, DOI 10.17487/RFC7942, July 2016, <https://www.rfc-editor.org/info/rfc7942>. Authors' Addresses Antony Antony secunet Security Networks AG Email: antony.antony@secunet.com Steffen Klassert secunet Security Networks AG Email: steffen.klassert@secunet.com Paul Wouters Red Hat Email: pwouters@redhat.com