idnits 2.17.1
draft-omara-sframe-00.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
No issues found here.
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** There are 41 instances of too long lines in the document, the longest
one being 21 characters in excess of 72.
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the IETF Trust and authors Copyright Line does not
match the current year
== Line 796 has weird spacing: '...verhead bps@4...'
== Line 804 has weird spacing: '...verhead bps@3...'
-- The document date (May 19, 2020) is 1438 days in the past. Is this
intentional?
Checking references for intended status: Informational
----------------------------------------------------------------------------
== Missing Reference: 'RFC5116' is mentioned on line 591, but not defined
== Missing Reference: 'Optional' is mentioned on line 601, but not defined
Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group E. Omara
3 Internet-Draft J. Uberti
4 Intended status: Informational Google
5 Expires: November 20, 2020 A. GOUAILLARD
6 S. Murillo
7 CoSMo Software
8 May 19, 2020
10 Secure Frame (SFrame)
11 draft-omara-sframe-00
13 Abstract
15 This document describes the Secure Frame (SFrame) end-to-end
16 encryption and authentication mechanism for media frames in a
17 multiparty conference call, in which central media servers (SFUs) can
18 access the media metadata needed to make forwarding decisions without
19 having access to the actual media. The proposed mechanism differs
20 from other approaches through its use of media frames as the
21 encryptable unit, instead of individual RTP packets, which makes it
22 more bandwidth efficient and also allows use with non-RTP transports.
24 Status of This Memo
26 This Internet-Draft is submitted in full conformance with the
27 provisions of BCP 78 and BCP 79.
29 Internet-Drafts are working documents of the Internet Engineering
30 Task Force (IETF). Note that other groups may also distribute
31 working documents as Internet-Drafts. The list of current Internet-
32 Drafts is at https://datatracker.ietf.org/drafts/current/.
34 Internet-Drafts are draft documents valid for a maximum of six months
35 and may be updated, replaced, or obsoleted by other documents at any
36 time. It is inappropriate to use Internet-Drafts as reference
37 material or to cite them other than as "work in progress."
39 This Internet-Draft will expire on November 20, 2020.
41 Copyright Notice
43 Copyright (c) 2020 IETF Trust and the persons identified as the
44 document authors. All rights reserved.
46 This document is subject to BCP 78 and the IETF Trust's Legal
47 Provisions Relating to IETF Documents
48 (https://trustee.ietf.org/license-info) in effect on the date of
49 publication of this document. Please review these documents
50 carefully, as they describe your rights and restrictions with respect
51 to this document. Code Components extracted from this document must
52 include Simplified BSD License text as described in Section 4.e of
53 the Trust Legal Provisions and are provided without warranty as
54 described in the Simplified BSD License.
56 Table of Contents
58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
60 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
61 4. SFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
62 4.1. SFrame Format . . . . . . . . . . . . . . . . . . . . . . 7
63 4.2. SFrame Header . . . . . . . . . . . . . . . . . . . . . . 7
64 4.3. Encryption Schema . . . . . . . . . . . . . . . . . . . . 8
65 4.3.1. Key Derivation . . . . . . . . . . . . . . . . . . . 8
66 4.3.2. Encryption . . . . . . . . . . . . . . . . . . . . . 9
67 4.3.3. Decryption . . . . . . . . . . . . . . . . . . . . . 10
68 4.3.4. Duplicate Frames . . . . . . . . . . . . . . . . . . 11
69 4.3.5. Key Rotation . . . . . . . . . . . . . . . . . . . . 11
70 4.4. Authentication . . . . . . . . . . . . . . . . . . . . . 12
71 4.5. Ciphersuites . . . . . . . . . . . . . . . . . . . . . . 14
72 4.5.1. SFrame . . . . . . . . . . . . . . . . . . . . . . . 14
73 4.5.2. DTLS-SRTP . . . . . . . . . . . . . . . . . . . . . . 15
74 5. Key Management . . . . . . . . . . . . . . . . . . . . . . . 15
75 5.1. MLS-SFrame . . . . . . . . . . . . . . . . . . . . . . . 15
76 6. Media Considerations . . . . . . . . . . . . . . . . . . . . 16
77 6.1. SFU . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
78 6.1.1. LastN and RTP stream reuse . . . . . . . . . . . . . 16
79 6.1.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . 16
80 6.1.3. SVC . . . . . . . . . . . . . . . . . . . . . . . . . 16
81 6.2. Video Key Frames . . . . . . . . . . . . . . . . . . . . 17
82 6.3. Partial Decoding . . . . . . . . . . . . . . . . . . . . 17
83 7. Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 17
84 7.1. Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 17
85 7.2. Video . . . . . . . . . . . . . . . . . . . . . . . . . . 18
86 7.3. SFrame vs PERC-lite . . . . . . . . . . . . . . . . . . . 18
87 7.3.1. Audio . . . . . . . . . . . . . . . . . . . . . . . . 19
88 7.3.2. Video . . . . . . . . . . . . . . . . . . . . . . . . 19
89 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19
90 8.1. Key Management . . . . . . . . . . . . . . . . . . . . . 19
91 8.2. Authentication tag length . . . . . . . . . . . . . . . . 19
92 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
93 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 19
94 10.1. Normative References . . . . . . . . . . . . . . . . . . 19
95 10.2. Informative References . . . . . . . . . . . . . . . . . 20
96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20
98 1. Introduction
100 Modern multi-party video call systems use Selective Forwarding Unit
101 (SFU) servers to efficiently route RTP streams to call endpoints
102 based on factors such as available bandwidth, desired video size,
103 codec support, and other factors. In order for the SFU to work
104 properly though, it needs to be able to access RTP metadata and RTCP
105 feedback messages, which is not possible if all RTP/RTCP traffic is
106 end-to-end encrypted.
108 As such, two layers of encryptions and authentication are required:
109 1- Hop-by-hop (HBH) encryption of media, metadata, and feedback
110 messages between the the endpoints and SFU 2- End-to-end (E2E)
111 encryption of media between the endpoints
113 While DTLS-SRTP can be used as an efficient HBH mechanism, it is
114 inherently point-to-point and therefore not suitable for a SFU
115 context. In addition, given the various scenarios in which video
116 calling occurs, minimizing the bandwidth overhead of end-to-end
117 encryption is also an important goal.
119 This document proposes a new end-to-end encryption mechanism known as
120 SFrame, specifically designed to work in group conference calls with
121 SFUs.
123 +-------------------------------+-------------------------------+^+
124 |V=2|P|X| CC |M| PT | sequence number | |
125 +-------------------------------+-------------------------------+ |
126 | timestamp | |
127 +---------------------------------------------------------------+ |
128 | synchronization source (SSRC) identifier | |
129 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| |
130 | contributing source (CSRC) identifiers | |
131 | .... | |
132 +---------------------------------------------------------------+ |
133 | RTP extension(s) (OPTIONAL) | |
134 +^---------------------+------------------------------------------+ |
135 | | payload header | | |
136 | +--------------------+ payload ... | |
137 | | | |
138 +^+---------------------------------------------------------------+^+
139 | : authentication tag : |
140 | +---------------------------------------------------------------+ |
141 | |
142 ++ Encrypted Portion* Authenticated Portion +--+
144 SRTP packet format
146 2. Terminology
148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
150 "OPTIONAL" in this document are to be interpreted as described in BCP
151 14 [RFC2119] [RFC8174] when, and only when, they appear in all
152 capitals, as shown here.
154 SFU: Selective Forwarding Unit (AKA RTP Switch)
156 IV: Initialization Vector
158 MAC: Message Authentication Code
160 E2EE: End to End Encryption
162 HBH: Hop By Hop
164 KMS: Key Management System
166 3. Goals
168 SFrame is designed to be a suitable E2EE protection scheme for
169 conference call media in a broad range of scenarios, as outlined by
170 the following goals:
172 1. Provide an secure E2EE mechanism for audio and video in
173 conference calls that can be used with arbitrary SFU servers.
175 2. Decouple media encryption from key management to allow SFrame to
176 be used with an arbitrary KMS.
178 3. Minimize packet expansion to allow successful conferencing in as
179 many network conditions as possible.
181 4. Independence from the underlying transport, including use in non-
182 RTP transports, e.g., WebTransport.
184 5. When used with RTP and its associated error resilience
185 mechanisms, i.e., RTX and FEC, require no special handling for
186 RTX and FEC packets.
188 6. Minimize the changes needed in SFU servers.
190 7. Minimize the changes needed in endpoints.
192 8. Work with the most popular audio and video codecs used in
193 conferencing scenarios.
195 4. SFrame
197 We propose a frame level encryption mechanism that provides effective
198 end-to-end encryption, is simple to implement, has no dependencies on
199 RTP, and minimizes encryption bandwidth overhead. Because SFrame
200 encrypts the full frame, rather than individual packets, bandwidth
201 overhead is reduced by having a single IV and authentication tag for
202 each media frame.
204 Also, because media is encrypted prior to packetization, the
205 encrypted frame is packetized using a generic RTP packetizer instead
206 of codec-dependent packetization mechanisms. With this move to a
207 generic packetizer, media metadata is moved from codec-specific
208 mechanisms to a generic frame RTP header extension which, while
209 visible to the SFU, is authenticated end-to-end. This extension
210 includes metadata needed for SFU routing such as resolution, frame
211 beginning and end markers, etc.
213 The generic packetizer splits the E2E encrypted media frame into one
214 or more RTP packets and adds the SFrame header to the beginning of
215 the first packet and an auth tag to the end of the last packet.
217 +-------------------------------------------------------+
218 | |
219 | +----------+ +------------+ +-----------+ |
220 | | | | SFrame | |Packetizer | | DTLS+SRTP
221 | | Encoder +----->+ Enc +----->+ +-------------------------+
222 ,+. | | | | | | | | +--+ +--+ +--+ |
223 `|' | +----------+ +-----+------+ +-----------+ | | | | | | | |
224 /|\ | ^ | | | | | | | |
225 + | | | | | | | | | |
226 / \ | | | +--+ +--+ +--+ |
227 Alice | +-----+------+ | Encrypted Packets |
228 | |Key Manager | | |
229 | +------------+ | |
230 | || | |
231 | || | |
232 | || | |
233 +-------------------------------------------------------+ |
234 || |
235 || v
236 +------------+ +-----+------+
237 E2EE channel | Messaging | | Media |
238 via the | Server | | Server |
239 Messaging Server | | | |
240 +------------+ +-----+------+
241 || |
242 || |
243 +-------------------------------------------------------+ |
244 | || | |
245 | || | |
246 | || | |
247 | +------------+ | |
248 | |Key Manager | | |
249 ,+. | +-----+------+ | Encrypted Packets |
250 `|' | | | +--+ +--+ +--+ |
251 /|\ | | | | | | | | | |
252 + | v | | | | | | | |
253 / \ | +----------+ +-----+------+ +-----------+ | | | | | | | |
254 Bob | | | | SFrame | | De+ | | +--+ +--+ +--+ |
255 | | Decoder +<-----+ Dec +<-----+Packetizer +<------------------------+
256 | | | | | | | | DTLS+SRTP
257 | +----------+ +------------+ +-----------+ |
258 | |
259 +-------------------------------------------------------+
261 The E2EE keys used to encrypt the frame are exchanged out of band
262 using a secure E2EE channel.
264 4.1. SFrame Format
266 +------------+------------------------------------------+^+
267 |S|LEN|X|KID | Frame Counter | |
268 +^+------------+------------------------------------------+ |
269 | | | |
270 | | | |
271 | | | |
272 | | | |
273 | | Encrypted Frame | |
274 | | | |
275 | | | |
276 | | | |
277 | | | |
278 +^+-------------------------------------------------------+^+
279 | | Authentication Tag | |
280 | +-------------------------------------------------------+ |
281 | |
282 | |
283 +----+Encrypted Portion Authenticated Portion+---+
285 4.2. SFrame Header
287 Since each endpoint can send multiple media layers, each frame will
288 have a unique frame counter that will be used to derive the
289 encryption IV. The frame counter must be unique and monotonically
290 increasing to avoid IV reuse.
292 As each sender will use their own key for encryption, so the SFrame
293 header will include the key id to allow the receiver to identify the
294 key that needs to be used for decrypting.
296 Both the frame counter and the key id are encoded in a variable
297 length format to decrease the overhead, so the first byte in the
298 Sframe header is fixed and contains the header metadata with the
299 following format:
301 0 1 2 3 4 5 6 7
302 +-+-+-+-+-+-+-+-+
303 |S|LEN |X| K |
304 +-+-+-+-+-+-+-+-+
305 SFrame header metadata
307 Signature flag (S): 1 bit This field indicates the payload contains a
308 signature if set. Counter Length (LEN): 3 bits This field indicates
309 the length of the CTR fields in bytes. Extended Key Id Flag (X): 1
310 bit Indicates if the key field contains the key id or the key length.
311 Key or Key Length: 3 bits This field contains the key id (KID) if the
312 X flag is set to 0, or the key length (KLEN) if set to 1.
314 If X flag is 0 then the KID is in the range of 0-7 and the frame
315 counter (CTR) is found in the next LEN bytes:
317 0 1 2 3 4 5 6 7
318 +-+-+-+-+-+-+-+-+---------------------------------+
319 |S|LEN |0| KID | CTR... (length=LEN) |
320 +-+-+-+-+-+-+-+-+---------------------------------+
322 Key id (KID): 3 bits The key id (0-7). Frame counter (CTR):
323 (Variable length) Frame counter value up to 8 bytes long.
325 if X flag is 1 then KLEN is the length of the key (KID), that is
326 found after the SFrame header metadata byte. After the key id (KID),
327 the frame counter (CTR) will be found in the next LEN bytes:
329 0 1 2 3 4 5 6 7
330 +-+-+-+-+-+-+-+-+---------------------------+---------------------------+
331 |S|LEN |1|KLEN | KID... (length=KLEN) | CTR... (length=LEN) |
332 +-+-+-+-+-+-+-+-+---------------------------+---------------------------+
334 Key length (KLEN): 3 bits The key length in bytes. Key id (KID):
335 (Variable length) The key id value up to 8 bytes long. Frame counter
336 (CTR): (Variable length) Frame counter value up to 8 bytes long.
338 4.3. Encryption Schema
340 4.3.1. Key Derivation
342 Each client creates a 32 bytes secret key K and share it with with
343 other participants via an E2EE channel. From K, we derive 3 secrets:
345 1- Salt key used to calculate the IV
347 Key = HKDF(K, 'SFrameSaltKey', 16)
349 2- Encryption key to encrypt the media frame
351 Key = HKDF(K, 'SFrameEncryptionKey', 16)
353 3- Authentication key to authenticate the encrypted frame and the
354 media metadata
356 Key = HKDF(K, 'SFrameAuthenticationKey', 32)
357 The IV is 128 bits long and calculated from the CTR field of the
358 Frame header:
360 IV = CTR XOR Salt key
362 4.3.2. Encryption
364 After encoding the frame and before packetizing it, the necessary
365 media metadata will be moved out of the encoded frame buffer, to be
366 used later in the RTP generic frame header extension. The encoded
367 frame, the metadata buffer and the frame counter are passed to SFrame
368 encryptor. The encryptor constructs SFrame header using frame
369 counter and key id and derive the encryption IV. The frame is
370 encrypted using the encryption key and the header, encrypted frame,
371 the media metadata and the header are authenticated using the
372 authentication key. The authentication tag is then truncated (If
373 supported by the cipher suite) and prepended at the end of the
374 ciphertext.
376 The encrypted payload is then passed to a generic RTP packetized to
377 construct the RTP packets and encrypts it using SRTP keys for the HBH
378 encryption to the media server.
380 +---------------+ +---------------+
381 | | | frame metadata+----+
382 | | +---------------+ |
383 | frame | |
384 | | |
385 | | |
386 +-------+-------+ |
387 | |
388 CTR +---------------> IV |Enc Key <----Master Key |
389 derive IV | | |
390 + | | |
391 | + v |
392 | encrypt Auth Key |
393 | | + |
394 | | | |
395 | v | |
396 | +-------+-------+ | |
397 | | | | |
398 | | encrypted | v |
399 | | frame +---->Authenticate<-----+
400 + | | +
401 encode CTR | | |
402 + +-------+-------+ |
403 | | |
404 | | |
405 | | |
406 | generic RTP packetize |
407 | + |
408 | | |
409 | | +--------------+
410 +----------+ v |
411 | |
412 | +---------------+ +---------------+ +---------------+ |
413 +-> | SFrame header | | | | | |
414 +---------------+ | | | payload N/N | |
415 | | | payload 2/N | | | |
416 | payload 1/N | | | +---------------+ |
417 | | | | | auth tag | <-+
418 +---------------+ +---------------+ +---------------+
419 Encryption flow
421 4.3.3. Decryption
423 The receiving clients buffer all packets that belongs to the same
424 frame using the frame beginning and ending marks in the generic RTP
425 frame header extension, and once all packets are available, it passes
426 it to Frame for decryption. SFrame maintains multiple decryptor
427 objects, one for each client in the call. Initially the client might
428 not have the mapping between the incoming streams the user's keys, in
429 this case SFrame tries all unmapped keys until it finds one that
430 passes the authentication verification and use it to decrypt the
431 frame. If the client has the mapping ready, it can push it down to
432 SFrame later.
434 The KeyId field in the SFrame header is used to find the right key
435 for that user, which is incremented by the sender when they switch to
436 a new key.
438 For frames that are failed to decrypt because there is not key
439 available yet, SFrame will buffer them and retries to decrypt them
440 once a key is received.
442 4.3.4. Duplicate Frames
444 Unlike messaging application, in video calls, receiving a duplicate
445 frame doesn't necessary mean the client is under a replay attack,
446 there are other reasons that might cause this, for example the sender
447 might just be sending them in case of packet loss. SFrame decryptors
448 use the highest received frame counter to protect against this. It
449 allows only older frame pithing a short interval to support out of
450 order delivery.
452 4.3.5. Key Rotation
454 Because the E2EE keys could be rotated during the call when people
455 join and leave, these new keys are exchanged using the same E2EE
456 secure channel used in the initial key negotiation. Sending new
457 fresh keys is an expensive operation, so the key management component
458 might chose to send new keys only when other clients leave the call
459 and use hash ratcheting for the join case, so no need to send a new
460 key to the clients who are already on the call. SFrame supports both
461 modes
463 4.3.5.1. Key Ratcheting
465 When SFrame decryptor fails to decrypt one of the frames, it
466 automatically ratchets the key forward and retries again until one
467 ratchet succeed or it reaches the maximum allowed ratcheting window.
468 If a new ratchet passed the decryption, all previous ratchets are
469 deleted.
471 K(i) = HKDF(K(i-1), 'SFrameRatchetKey', 32)
473 4.3.5.2. New Key
475 SFrame will set the key immediately on the decrypts when it is
476 received and destroys the old key material, so if the key manager
477 sends a new key during the call, it is recommended not to start using
478 it immediately and wait for a short time to make sure it is delivered
479 to all other clients before using it to decrease the number of
480 decryption failure. It is up to the application and the key manager
481 to define how long this period is.
483 4.4. Authentication
485 Every client in the call knows the secret key for all other clients
486 so it can decrypt their traffic, it also means a malicious client can
487 impersonate any other client in the call by using the victim key to
488 encrypt their traffic. This might not be a problem for consumer
489 application where the number of clients in the call is small and
490 users know each others, however for enterprise use case where large
491 conference calls are common, an authentication mechanism is needed to
492 protect against malicious users. This authentication will come with
493 extra cost.
495 Adding a digital signature to each encrypted frame will be an
496 overkill, instead we propose adding signature over multiple frames.
498 The signature is calculated by concatenating the authentication tags
499 of the frames that the sender wants to authenticate (in reverse sent
500 order) and signing it with the signature key. Signature keys are
501 exchanged out of band along the encryption keys.
503 Signature = Sign(Key, AuthTag(Frame N) || AuthTag(Frame N-1) || ...|| AuthTag(Frame N-M))
505 The authentication tags for the previous frames covered by the
506 signature and the signature itself will be appended at end of the
507 frame, after the current frame authentication tag, in the same order
508 that the signature was calculated, and the SFrame header metadata
509 signature bit (S) will be set to 1.
511 +^ +------------------+
512 | | SFrame header S=1|
513 | +------------------+
514 | | Encrypted |
515 | | payload |
516 | | |
517 |^ +------------------+ ^+
518 | | Auth Tag N | |
519 | +------------------+ |
520 | | Auth Tag N-1 | |
521 | +------------------+ |
522 | | ........ | |
523 | +------------------+ |
524 | | Auth Tag N-M | |
525 | +------------------+ ^|
526 | | NUM | Signature : |
527 | +-----+ + |
528 | : | |
529 | +------------------+ |
530 | |
531 +-> Authenticated with +-> Signed with
532 Auth Tag N Signature
534 Encrypted Frame with Signature
536 Note that the authentication tag for the current frame will only
537 authenticate the SFrame header and the encrypted payload, ant not the
538 signature nor the previous frames's authentication tags (N-1 to N-M)
539 used to calculate the signature.
541 The last byte (NUM) after the authentication tag list and before the
542 signature indicates the number of the authentication tags from
543 previous frames present in the current frame. All the
544 authentications tags MUST have the same size, which MUST be equal to
545 the authentication tag size of the current frame. The signature is
546 fixed size depending on the signature algorithm used (for example, 64
547 bytes for Ed25519).
549 The receiver has to keep track of all the frames received but yet not
550 verified, by storing the authentication tags of each received frame.
551 When a signature is received, the receiver will verify it with the
552 signature key associated to the key id of the frame the signature was
553 sent in. If the verification is successful, the received will mark
554 the frames as authenticated and remove them from the list of the not
555 verified frames. It is up to the application to decide what to do
556 when signature verification fails.
558 When using SVC, the hash will be calculated over all the frames of
559 the different spatial layers within the same superframe/picture.
560 However the SFU will be able to drop frames within the same stream
561 (either spatial or temporal) to match target bitrate.
563 If the signature is sent on a frame which layer that is dropped by
564 the SFU, the receiver will not receive it and will not be able to
565 perform the signature of the other received layers.
567 An easy way of solving the issue would be to perform signature only
568 on the base layer or take into consideration the frame dependency
569 graph and send multiple signatures in parallel (each for a branch of
570 the dependency graph).
572 In case of simulcast or K-SVC, each spatial layer should be
573 authenticated with different signatures to prevent the SFU to discard
574 frames with the signature info.
576 In any case, it is possible that the frame with the signature is lost
577 or the SFU drops it, so the receiver MUST be prepared to not receive
578 a signature for a frame and remove it from the pending to be verified
579 list after a timeout.
581 4.5. Ciphersuites
583 4.5.1. SFrame
585 Each SFrame session uses a single ciphersuite that specifies the
586 following primitives:
588 o A hash function This is used for the Key derivation and frame
589 hashes for signature. We recommend using SHA256 hash function.
591 o An AEAD encryption algorithm [RFC5116] While any AEAD algorithm can
592 be used to encrypt the frame, we recommend using algorithms with safe
593 MAC truncation like AES-CTR and HMAC to reduce the per-frame
594 overhead. In this case we can use 80 bits MAC for video frames and
595 32 bits for audio frames similar to DTLS-SRTP cipher suites:
597 1- AES_CM_128_HMAC_SHA256_80
599 2- AES_CM_128_HMAC_SHA256_32
601 o [Optional] A signature algorithm If signature is supported, we
602 recommend using ed25519
604 4.5.2. DTLS-SRTP
606 SRTP is used as an HBH encryption, since the media payload is already
607 encrypted, and SRTP only protects the RTP headers, one implementation
608 could use 4 bytes outer auth tag to decrease the overhead, however it
609 is up to the application to use other ciphers like AES-128-GCM with
610 full authentication tag.
612 5. Key Management
614 SFrame must be integrated with an E2EE key management framework to
615 exchange and rotate the encryption keys. This framework will
616 maintain a group of participant endpoints who are in the call. At
617 call setup time, each endpoint will create a fresh key material and
618 optionally signing key pair for that call and encrypt the key
619 material and the public signing key to every other endpoints. They
620 encrypted keys are delivered by the messaging delivery server using a
621 reliable channel.
623 The KMS will monitor the group changes, and exchange new keys when
624 necessary. It is up to the application to define this group, for
625 example one application could have ephemeral group for every call and
626 keep rotating key when end points joins or leave the call, while
627 another application could have a persisted group that can be used for
628 multiple calls and exchange keys with all group endpoints for every
629 call.
631 When a new key material is created during the call, we recommend not
632 to start using it immediately in SFrame to give time for the new keys
633 to be delivered. If the application supports delivery receipts, it
634 can be used to track if the key is delivered to all other endpoints
635 on the call before using it.
637 Keys must have a sequential id starting from 0 and incremented eery
638 time a new key is generated for this endpoint. The key id will be
639 added in the SFrame header during encryption, so the recipient know
640 which key to use for the decryption.
642 5.1. MLS-SFrame
644 While any other E2EE KMS can be used with SFrame, there is a big
645 advantage if it is used with [MLSARCH] which natively supports very
646 large groups efficiently. When [MLSPROTO] is used, the endpoints
647 keys (AKA Application secret) can be used directly for SFrame without
648 the need to exchange separate key material. The application secret
649 is rotated automatically by [MLSPROTO] when group membership changes.
651 6. Media Considerations
653 6.1. SFU
655 Selective Forwarding Units (SFUs) as described in
656 https://tools.ietf.org/html/rfc7667#section-3.7 receives the RTP
657 streams from each participant and selects which ones should be
658 forwarded to each of the other participants. There are several
659 approaches about how to do this stream selection but in general, in
660 order to do so, the SFU needs to access metadata associated to each
661 frame and modify the RTP information of the incoming packets when
662 they are transmitted to the received participants.
664 This section describes how this normal SFU modes of operation
665 interacts with the E2EE provided by SFrame
667 6.1.1. LastN and RTP stream reuse
669 The SFU may choose to send only a certain number of streams based on
670 the voice activity of the participants. To reduce the number of SDP
671 O/A required to establish a new RTP stream, the SFU may decide to
672 reuse previously existing RTP sessions or even pre-allocate a
673 predefined number of RTP streams and choose in each moment in time
674 which participant media will be sending through it. This means that
675 in the same RTP stream (defined by either SSRC or MID) may carry
676 media from different streams of different participants. As different
677 keys are used by each participant for encoding their media, the
678 receiver will be able to verify which is the sender of the media
679 coming within the RTP stream at any given point if time, preventing
680 the SFU trying to impersonate any of the participants with another
681 participant's media. Note that in order to prevent impersonation by
682 a malicious participant (not the SFU) usage of the signature is
683 required. In case of video, the a new signature should be started
684 each time a key frame is sent to allow the receiver to identify the
685 source faster after a switch.
687 6.1.2. Simulcast
689 When using simulcast, the same input image will produce N different
690 encoded frames (one per simulcast layer) which would be processed
691 independently by the frame encryptor and assigned an unique counter
692 for each.
694 6.1.3. SVC
696 In both temporal and spatial scalability, the SFU may choose to drop
697 layers in order to match a certain bitrate or forward specific media
698 sizes or frames per second. In order to support it, the sender MUST
699 encode each spatial layer of a given picture in a different frame.
700 That is, an RTP frame may contain more than one SFrame encrypted
701 frame with an incrementing frame counter.
703 6.2. Video Key Frames
705 Forward and Post-Compromise Security requires that the e2ee keys are
706 updated anytime a participant joins/leave the call.
708 The key exchange happens async and on a different path than the SFU
709 signaling and media. So it may happen that when a new participant
710 joins the call and the SFU side requests a key frame, the sender
711 generates the e2ee encrypted frame with a key not known by the
712 receiver, so it will be discarded. When the sender updates his
713 sending key with the new key, it will send it in a non-key frame, so
714 the receiver will be able to decrypt it, but not decode it.
716 Receiver will re-request an key frame then, but due to sender and sfu
717 policies, that new key frame could take some time to be generated.
719 If the sender sends a key frame when the new e2ee key is in use, the
720 time required for the new participant to display the video is
721 minimized.
723 6.3. Partial Decoding
725 Some codes support partial decoding, where it can decrypt individual
726 packets without waiting for the full frame to arrive, with SFrame
727 this won't be possible because the decoder will not access the
728 packets until the entire frame is arrived and decrypted.
730 7. Overhead
732 The encryption overhead will vary between audio and video streams,
733 because in audio each packet is considered a separate frame, so it
734 will always have extra MAC and IV, however a video frame usually
735 consists of multiple RTP packets. The number of bytes overhead per
736 frame is calculated as the following 1 + FrameCounter length + 4 The
737 constant 1 is the SFrame header byte and 4 bytes for the HBH
738 authentication tag for both audio and video packets.
740 7.1. Audio
742 Using three different audio frame durations 20ms (50 packets/s) 40ms
743 (25 packets/s) 100ms (10 packets/s) Up to 3 bytes frame counter (3.8
744 days of data for 20ms frame duration) and 4 bytes fixed MAC length.
746 +------------+-----------+-----------+----------+-----------+
747 | Counter len| Packets | Overhead | Overhead | Overhead |
748 | | | bps@20ms | bps@40ms | bps@100ms |
749 +------------+-----------+-----------+----------+-----------+
750 | 1 | 0-255 | 2400 | 1200 | 480 |
751 | 2 | 255 - 65K | 2800 | 1400 | 560 |
752 | 3 | 65K - 16M | 3200 | 1600 | 640 |
753 +------------+--------- -+-----------+----------+-----------+
755 7.2. Video
757 The per-stream overhead bits per second as calculated for the
758 following video encodings: 30fps@1000Kbps (4 packets per frame)
759 30fps@512Kbps (2 packets per frame) 15fps@200Kbps (2 packets per
760 frame) 7.5fps@30Kbps (1 packet per frame) Overhead bps = (Counter
761 length + 1 + 4 ) * 8 * fps
763 +------------+-----------+------------+------------+------------+
764 | Counter len| Frames | Overhead | Overhead | Overhead |
765 | | | bps@30fps | bps@15fps | bps@7.5fps |
766 +------------+-----------+------------+------------+------------+
767 | 1 | 0-255 | 1440 | 1440 | 720 |
768 | 2 | 256 - 65K | 1680 | 1680 | 840 |
769 | 3 | 56K - 16M | 1920 | 1920 | 960 |
770 | 4 | 16M - 4B | 2160 | 2160 | 1080 |
771 +------------+-----------+------------+------------+------------+
773 7.3. SFrame vs PERC-lite
775 [PERC] has significant overhead over SFrame because the overhead is
776 per packet, not per frame, and OHB (Original Header Block) which
777 duplicates any RTP header/extension field modified by the SFU.
778 [PERCLITE] is slightly better because it doesn't
780 use the OHB anymore, however it still does per packet encryption
781 using SRTP. Below the the overheard in [PERCLITE] implemented by
782 Cosmos Software which uses extra 11 bytes per packet to preserve the
783 PT, SEQ_NUM, TIME_STAMP and SSRC fields in addition to the extra MAC
784 tag per packet.
786 OverheadPerPacket = 11 + MAC length Overhead bps = PacketPerSecond *
787 OverHeadPerPacket * 8
789 Similar to SFrame, we will assume the HBH authentication tag length
790 will always be 4 bytes for audio and video even though it is not the
791 case in this [PERCLITE] implementation
793 7.3.1. Audio
795 +-------------------+--------------------+--------------------+
796 | Overhead bps@20ms | Overhead bps@40ms | Overhead bps@100ms |
797 +-------------------+--------------------+--------------------+
798 | 6000 | 3000 | 1200 |
799 +-------------------+--------------------+--------------------+
801 7.3.2. Video
803 +---------------------+----------------------+-----------------------+
804 | Overhead bps@30fps | Overhead bps@15fps | Overhead bps@7.5fps |
805 |(4 packets per frame)| (2 packets per frame)| (1 packet per frame) |
806 +---------------------+----------------------+-----------------------+
807 | 14400 | 7200 | 3600 |
808 +---------------------+----------------------+-----------------------+
810 For a conference with a single incoming audio stream (@ 50 pps) and 4
811 incoming video streams (@200 Kbps), the savings in overhead is 34800
812 - 9600 = ~25 Kbps, or ~3%.
814 8. Security Considerations
816 8.1. Key Management
818 Key exchange mechanism is out of scope of this document, however
819 every client MUST change their keys when new clients joins or leaves
820 the call for "Forward Secrecy" and "Post Compromise Security".
822 8.2. Authentication tag length
824 The cipher suites defined in this draft use short authentication tags
825 for encryption, however it can easily support other ciphers with full
826 authentication tag if the short ones are proved insecure.
828 9. IANA Considerations
830 This document makes no requests of IANA.
832 10. References
834 10.1. Normative References
836 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
837 Requirement Levels", BCP 14, RFC 2119,
838 DOI 10.17487/RFC2119, March 1997,
839 .
841 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
842 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
843 May 2017, .
845 10.2. Informative References
847 [MLSARCH] Omara, E., Barnes, R., Rescorla, E., Inguva, S., Kwon, A.,
848 and A. Duric, "Messaging Layer Security Architecture",
849 2020.
851 [MLSPROTO]
852 Barnes, R., Millican, J., Omara, E., Cohn-Gordon, K., and
853 R. Robert, "Messaging Layer Security Protocol", 2020.
855 [PERC] Jennings, C., Jones, P., Barnes, R., and A. Roach, "PERC",
856 2020, .
858 [PERCLITE]
859 GOUAILLARD, A. and S. Murillo, "PERC-Lite", 2020,
860 .
862 Authors' Addresses
864 Emad Omara
865 Google
867 Email: emadomara@google.com
869 Justin Uberti
870 Google
872 Email: juberti@google.com
874 Alexandre GOUAILLARD
875 CoSMo Software
877 Email: Alex.GOUAILLARD@cosmosoftware.io
879 Sergio Garcia Murillo
880 CoSMo Software
882 Email: sergio.garcia.murillo@cosmosoftware.io