idnits 2.17.1 draft-ietf-payload-rtp-jpegxs-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 8, 2020) is 1478 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1014 -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO21122-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO21122-2' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO21122-3' -- Possible downref: Non-RFC (?) normative reference: ref. 'SMPTE-ST2110-10' -- Possible downref: Non-RFC (?) normative reference: ref. 'SMPTE-ST2110-21' Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 avtcore S. Lugan 3 Internet-Draft A. Descampe 4 Intended status: Standards Track C. Damman 5 Expires: October 10, 2020 intoPIX 6 T. Richter 7 IIS 8 A. Willeme 9 UCL/ICTEAM 10 April 8, 2020 12 RTP Payload Format for ISO/IEC 21122 (JPEG XS) 13 draft-ietf-payload-rtp-jpegxs-03 15 Abstract 17 This document specifies a Real-Time Transport Protocol (RTP) payload 18 format to be used for transporting JPEG XS (ISO/IEC 21122) encoded 19 video. JPEG XS is a low-latency, lightweight image coding system. 20 Compared to an uncompressed video use case, it allows higher 21 resolutions and frame rates, while offering visually lossless 22 quality, reduced power consumption, and end-to-end latency confined 23 to a fraction of a frame. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on October 10, 2020. 42 Copyright Notice 44 Copyright (c) 2020 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (https://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Conventions, Definitions, and Abbreviations . . . . . . . . . 3 61 3. Media Format Description . . . . . . . . . . . . . . . . . . 4 62 3.1. Image Data Structures . . . . . . . . . . . . . . . . . . 4 63 3.2. Codestream . . . . . . . . . . . . . . . . . . . . . . . 5 64 3.3. Video support box and colour specification box . . . . . 5 65 3.4. JPEG XS Frame . . . . . . . . . . . . . . . . . . . . . . 5 66 4. Payload Format . . . . . . . . . . . . . . . . . . . . . . . 6 67 4.1. RTP packetization . . . . . . . . . . . . . . . . . . . . 6 68 4.2. Payload Header . . . . . . . . . . . . . . . . . . . . . 8 69 4.3. Payload Data . . . . . . . . . . . . . . . . . . . . . . 11 70 4.4. Traffic Shaping and Delivery Timing . . . . . . . . . . . 14 71 5. Congestion Control Considerations . . . . . . . . . . . . . . 14 72 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 14 73 6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 14 74 6.2. Mapping to SDP . . . . . . . . . . . . . . . . . . . . . 17 75 6.2.1. General . . . . . . . . . . . . . . . . . . . . . . . 17 76 6.2.2. Media type and subtype . . . . . . . . . . . . . . . 18 77 6.2.3. Traffic shaping . . . . . . . . . . . . . . . . . . . 18 78 6.2.4. Offer/Answer Considerations . . . . . . . . . . . . . 18 79 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 80 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19 81 9. RFC Editor Considerations . . . . . . . . . . . . . . . . . . 20 82 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 83 10.1. Normative References . . . . . . . . . . . . . . . . . . 20 84 10.2. Informative References . . . . . . . . . . . . . . . . . 22 85 10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 88 1. Introduction 90 This document specifies a payload format for packetization of JPEG XS 91 encoded video signals into the Real-time Transport Protocol (RTP) 92 [RFC3550]. 94 The JPEG XS coding system offers compression and recompression of 95 image sequences with very moderate computational resources while 96 remaining robust under multiple compression and decompression cycles 97 and mixing of content sources, e.g. embedding of subtitles, overlays 98 or logos. Typical target compression ratios ensuring visually 99 lossless quality are in the range of 2:1 to 10:1, depending on the 100 nature of the source material. The end-to-end latency can be 101 confined to a fraction of a frame, typically between a small number 102 of lines down to below a single line. 104 2. Conventions, Definitions, and Abbreviations 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in RFC 2119 [RFC2119]. 110 Application Data Unit (ADU) 111 The unit of source data provided as payload to the transport 112 layer, and corresponding, in this RTP payload definition, to a 113 single JPEG XS frame. 115 Colour specification box (CS box) 116 A ISO colour specification box defined in ISO/IEC 21122-3 117 [ISO21122-3] that includes colour-related metadata required to 118 correctly display JPEG XS frames, such as colour primaries, 119 transfer characteristics and matrix coefficients. 121 EOC marker 122 A marker that consists of the two bytes 0xff11 indicating the end 123 of a JPEG XS codestream. 125 JPEG XS codestream 126 A sequence of bytes representing a compressed image formatted 127 according to JPEG XS Part-1 [ISO21122-1]. 129 JPEG XS codestream header 130 A sequence of bytes, starting with a SOC marker, at the beginning 131 of each JPEG XS codestream encoded in multiple markers and marker 132 segments that does not carry entropy coded data, but metadata such 133 as the frame dimension and component precision. 135 JPEG XS frame 136 The concatenation of a video support box, as defined in ISO/IEC 137 21122-3 [ISO21122-3], a colour specification box, as defined in 138 ISO/IEC 21122-3 [ISO21122-3] as well, and either one JPEG XS 139 codestream in the case of a progressive frame or two JPEG XS 140 codestreams in the case of an interlaced frame. 142 JPEG XS header segment 143 The concatenation of a video support box, as defined in ISO/IEC 144 21122-3 [ISO21122-3], a colour specification box, as defined in 145 ISO/IEC 21122-3 as well [ISO21122-3] and a JPEG XS codestream 146 header. 148 JPEG XS stream 149 A sequence of JPEG XS frames. 151 Marker 152 A two-byte functional sequence that is part of a JPEG XS 153 codestream starting with a 0xff byte and a subsequent byte 154 defining its function. 156 Marker segment 157 A marker along with a 16-bit marker size and payload data 158 following the size. 160 Packetization unit 161 A portion of a Application Data Unit whose boundaries shall 162 coincide with boundaries of RTP packet payloads, i.e. the first 163 (resp. last) byte of a packetization unit shall be the first 164 (resp. last) byte of a RTP packet payload. 166 Slice 167 The smallest independently decodable unit of a JPEG XS codestream, 168 bearing in mind that it decodes to wavelet coefficients which 169 still require inverse wavelet filtering to give an image. 171 SOC marker 172 A marker that consists of the two bytes 0xff10 indicating the 173 start of a JPEG XS codestream. 175 Video support box (VS box) 176 A ISO video support box defined in ISO/IEC 21122-3 [ISO21122-3] 177 that includes metadata required to play back a JPEG XS stream, 178 such as its maximum bitrate, its subsampling structure, its buffer 179 model and its frame rate. 181 3. Media Format Description 183 3.1. Image Data Structures 185 JPEG XS is a low-latency lightweight image coding system for coding 186 continuous-tone grayscale or continuous-tone colour digital images. 188 This coding system provides an efficient representation of image 189 signals through the mathematical tool of wavelet analysis. The 190 wavelet filter process separates each component into multiple bands, 191 where each band consists of multiple coefficients describing the 192 image signal of a given component within a frequency domain specific 193 to the wavelet filter type, i.e. the particular filter corresponding 194 to the band. 196 Wavelet coefficients are grouped into precincts, where each precinct 197 includes all coefficients over all bands that contribute to a spatial 198 region of the image. 200 One or multiple precincts are furthermore combined into slices 201 consisting of an integer number of precincts. Precincts do not cross 202 slice boundaries, and wavelet coefficients in precincts that are part 203 of different slices can be decoded independently from each other. 204 Note, however, that the wavelet transformation runs across slice 205 boundaries. A slice always extends over the full width of the image, 206 but may only cover parts of its height. 208 3.2. Codestream 210 A JPEG XS codestream header, followed by several slices, and 211 terminated by an EOC marker form a JPEG XS codestream. 213 The overall codestream format, including the definition of all 214 markers, is further defined in ISO/IEC 21122-1 [ISO21122-1]. It 215 represents sample values of a single image, bare any interpretation 216 relative to a colour space. 218 3.3. Video support box and colour specification box 220 While the information defined in the codestream is sufficient to 221 reconstruct the sample values of one image, the interpretation of the 222 samples remains undefined by the codestream itself. This 223 interpretation is given by the video support box and the colour 224 specification box which contain significant information to correctly 225 play the JPEG XS stream. The layout and syntax of these boxes, 226 together with their content, are defined in ISO/IEC 21122-3 227 [ISO21122-3]. The video support box provides information on the 228 maximum bitrate, the frame rate, the subsampling image format, the 229 timecode of the current JPEG XS frame, the profile, level and 230 sublevel used (as defined in ISO/IEC 21122-2 [ISO21122-2]), and 231 optionally on the buffer model and the mastering display metadata. 232 The colour specification box indicates the colour primaries, transfer 233 characteristics, matrix coefficients and video full range flag needed 234 to specify the colour space of the video stream. 236 3.4. JPEG XS Frame 238 The concatenation of a video support box, a colour specification box 239 and one or two JPEG XS codestreams forms a JPEG XS frame. In the 240 case of a video stream made of progressive frames, only one 241 codestream follows the boxes. In the case of a video stream made of 242 interlaced frames, two codestreams follow the boxes, each 243 corresponding to a field of the interlaced frame. The video 244 information box included in the video support box contains a frat 245 field indicating if the frame is progressive or interlaced (see ISO/ 246 IEC 21122-3 [ISO21122-3]). This information can also be found in 247 each RTP packet header (see Section 4.2). 249 4. Payload Format 251 This section specifies the payload format for JPEG XS streams over 252 the Real-time Transport Protocol (RTP) [RFC3550]. 254 In order to be transported over RTP, each JPEG XS stream is 255 transported in a distinct RTP stream, identified by a distinct SSRC. 257 A JPEG XS stream is divided into Application Data Units (ADUs), each 258 ADU corresponding to a single JPEG XS frame. 260 4.1. RTP packetization 262 An ADU is made of several packetization units. If a packetization 263 unit is bigger than the maximum size of a RTP packet payload, the 264 unit is split into multiple RTP packet payloads, as illustrated in 265 Figure 1. As seen there, each packet shall contain (part of) one and 266 only one packetization unit. A packetization unit may extend over 267 multiple packets. The payload of every packet shall have the same 268 size (based e.g. on the Maximum Transfer Unit of the network), except 269 (possibly) the last packet of a packetization unit. The boundaries 270 of a packetization unit shall coincide with the boundaries of the 271 payload of a packet, i.e. the first (resp. last) byte of the 272 packetization unit shall be the first (resp. last) byte of the 273 payload. 275 RTP +-----+------------------------+ 276 Packet #1 | Hdr | Packetization unit #1 | 277 +-----+------------------------+ 278 RTP +-----+--------------------------------------+ 279 Packet #2 | Hdr | Packetization unit #2 | 280 +-----+--------------------------------------+ 281 RTP +-----+--------------------------------------------------+ 282 Packet #3 | Hdr | Packetization unit #3 (part 1/3) | 283 +-----+--------------------------------------------------+ 284 RTP +-----+--------------------------------------------------+ 285 Packet #4 | Hdr | Packetization unit #3 (part 2/3) | 286 +-----+--------------------------------------------------+ 287 RTP +-----+----------------------------------------------+ 288 Packet #5 | Hdr | Packetization unit #3 (part 3/3) | 289 +-----+----------------------------------------------+ 290 ... 291 RTP +-----+-----------------------------------------+ 292 Packet #P | Hdr | Packetization unit #N (part q/q) | 293 +-----+-----------------------------------------+ 295 Figure 1: Example of ADU packetization 297 There are two different packetization modes defined for this RTP 298 payload format. 300 1. Codestream packetization mode: in this mode, the packetization 301 unit shall be the entire codestream, preceeded by boxes, if any. 302 This means that a progressive frame will have a single 303 packetization unit, while an interlaced frame will have two. The 304 progressive case is illustrated in Figure 2. 306 2. Slice packetization mode: in this mode, the packetization unit 307 shall be the slice, i.e. there shall be data from no more than 308 one slice per RTP packet. The first packetization unit shall be 309 made of the JPEG XS header segment (i.e. the concatenation of the 310 VS box, the CS box and the JPEG XS codestream header). This 311 first unit is then followed by successive units, each containing 312 one and only one slice. The packetization unit containing the 313 last slice of a JPEG XS codestream shall also contain the EOC 314 marker immediately following this last slice. This is 315 illustrated in Figure 3. In the case of interlaced frame, the 316 JPEG XS codestream header of the second field shall be in its own 317 packetization unit. 319 RTP +-----+--------------------------------------------------+ 320 Packet #1 | Hdr | VS box + CS box + JPEG XS codestream (part 1/q) | 321 +-----+--------------------------------------------------+ 322 RTP +-----+--------------------------------------------------+ 323 Packet #2 | Hdr | JPEG XS codestream (part 2/q) | 324 +-----+--------------------------------------------------+ 325 ... 326 RTP +-----+--------------------------------------+ 327 Packet #P | Hdr | JPEG XS codestream (part q/q) | 328 +-----+--------------------------------------+ 330 Figure 2: Example of codestream packetization mode 332 RTP +-----+----------------------------+ 333 Packet #1 | Hdr | JPEG XS header segment | 334 +-----+----------------------------+ 335 RTP +-----+--------------------------------------------------+ 336 Packet #2 | Hdr | Slice #1 (part 1/2) | 337 +-----+--------------------------------------------------+ 338 RTP +-----+-------------------------------------------+ 339 Packet #3 | Hdr | Slice #1 (part 2/2) | 340 +-----+-------------------------------------------+ 341 RTP +-----+--------------------------------------------------+ 342 Packet #4 | Hdr | Slice #2 (part 1/3) | 343 +-----+--------------------------------------------------+ 344 ... 345 RTP +-----+---------------------------------------+ 346 Packet #P | Hdr | Slice #N (part q/q) + EOC marker | 347 +-----+---------------------------------------+ 349 Figure 3: Example of slice packetization mode 351 Thanks to the constant bit-rate of JPEG XS, the codestream 352 packetization mode guarantees that a JPEG XS RTP stream will produce 353 a constant number of bytes per frame, and a constant number of RTP 354 packets per frame. To reach the same guarantee with the slice 355 packetization mode, an additional constraint needs to be set at the 356 rate allocation stage in the JPEG XS encoder. For instance, one 357 option would be to impose a constant bit-rate at the slice level. 359 4.2. Payload Header 361 Figure 4 illustrates the RTP payload header used in order to 362 transport a JPEG XS stream. 364 0 1 2 3 365 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 | V |P|X| CC |M| PT | sequence number | 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 369 | timestamp | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 | synchronization source (SSRC) identifier | 372 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 373 | contributing source (CSRC) identifiers | 374 | .... | 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 |T|K|L| I |F counter| SEP counter | P counter | 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 379 Figure 4: RTP and payload headers 381 The version (V), padding (P), extension (X), CSRC count (CC), 382 sequence number, synchronization source (SSRC) and contributing 383 source (CSRC) fields follow their respective definitions in RFC 3550 384 [RFC3550]. 386 The timestamp SHOULD be based on a 90 kHz clock reference. 388 As per specified in RFC 3550 [RFC3550] and RFC 4175 [RFC4175], the 389 RTP timestamp designates the sampling instant of the first octet of 390 the frame to which the RTP packet belongs. Packets shall not include 391 data from multiple frames, and all packets belonging to the same 392 frame shall have the same timestamp. Several successive RTP packets 393 will consequently have equal timestamps if they belong to the same 394 frame (that is until the marker bit is set to 1, marking the last 395 packet of the frame), and the timestamp is only increased when a new 396 frame begins. 398 If the sampling instant does not correspond to an integer value of 399 the clock, the value shall be truncated to the next lowest integer, 400 with no ambiguity. 402 The remaining fields are defined as follows: 404 Marker (M) [1 bit]: 405 The M bit is used to indicate the last packet of a frame. This 406 enables a decoder to finish decoding the frame. 408 Payload Type (PT) [7 bits]: 409 A dynamically allocated payload type field that designates the 410 payload as JPEG XS video. 412 Transmission mode (T) [1 bit]: 413 The T bit is set to indicate that packets are sent sequentially by 414 the transmitter. A receiver could use this information to 415 dimension its input buffer(s) accordingly. If T=0, nothing can be 416 assumed about the transmission order and packets may be sent out- 417 of-order by the transmitter. If T=1, packets must be sent 418 sequentially by the transmitter. 420 pacKetization mode (K) [1 bit]: 421 The K bit is set to indicate which packetization mode is used. 422 K=0 indicates codestream packetization mode, while K=1 indicates 423 slice packetization mode. If Transmission mode (T) is set to 0, 424 slice packetization mode must be used and K must be set to 1. 426 Last (L) [1 bit]: 427 The L bit is set to indicate the last packet of a packetization 428 unit. As the end of the frame also ends the packet containing the 429 last unit of the frame, the L bit is set whenever the M bit is 430 set. In the case of a progressive frame using the codestream 431 packetization mode, the L bit and M bit are equivalent. 433 Interlaced mode (I) [2 bit]: 434 These 2 bits are used to indicate how the JPEG XS frame is scanned 435 (progressive or interlaced). 437 00: The payload is progressively scanned. 439 01: Reserved for future use. 441 10: The payload is part of the first field of an interlaced video 442 frame. The height specified in the JPEG XS picture header is 443 half of the height of the entire displayed image. 445 11: The payload is part of the second field of an interlaced 446 video frame. The height specified in the JPEG XS picture header 447 is half of the height of the entire displayed image. 449 F counter [5 bits]: 450 The frame (F) counter identifies the frame number modulo 32 to 451 which a packet belongs. Frame numbers are incremented by 1 for 452 each frame transmitted. The frame number, in addition to the time 453 stamp, may help the decoder manage its input buffer and bring 454 packets back into their natural order. 456 SEP counter [11 bits]: 457 The Slice and Extended Packet (SEP) counter is used differently 458 depending on the packetization mode. 460 * In the case of codestream packetization mode (K=0), this 461 counter resets whenever the Packet counter resets (see 462 hereunder), and increments by 1 whenever the Packet counter 463 overruns. 465 * In the case of slice packetization mode (K=1), this counter 466 identifies the slice modulo 2047 to which the packet 467 contributes. If the data belongs to the JPEG XS header 468 segment, this field shall have its maximal value, namely 2047 = 469 0x07ff. Otherwise, it is the slice index modulo 2047. Slice 470 indices are counted from 0 (corresponding to the top of the 471 frame). 473 P counter [11 bits]: 474 The packet (P) counter identifies the packet number modulo 2048 475 within the current packetization unit. It is set to 0 at the 476 start of the packetization unit and incremented by 1 for every 477 subsequent packet (if any) belonging to the same unit. 478 Practically, if codestream packetization mode is enabled, this 479 field counts the packets within a codestream and is extended by 480 the SEP counter when it overruns. If slice packetization mode is 481 enabled, this field counts the packets within a slice or within 482 the JPEG XS header segment. 484 4.3. Payload Data 486 The payload data of a JPEG XS RTP stream consists of a concatenation 487 of multiple JPEG XS frames. 489 Each JPEG XS frame is the concatenation of one or more packetization 490 unit(s), as explained in Section 4.1. Figure 5 depicts this layout 491 for an interlaced frame in the codestream packetization mode and 492 Figure 6 depicts this layout for a progressive frame in the slice 493 packetization mode. The Frame counter value is not indicated because 494 the value is constant for all packetization units of a given frame. 496 +=====[ Packetization unit (PU) #1 ]====+ 497 | Video support box | SEP counter = 0 498 | +---------------------------------+ | P counter = 0 499 | : Sub boxes of the VS box : | 500 | +---------------------------------+ | 501 +- - - - - - - - - - - - - - - - - - - -+ 502 | Colour specification box | 503 | +---------------------------------+ | 504 | : Fields of the CS box : | 505 | +---------------------------------+ | 506 +- - - - - - - - - - - - - - - - - - - -+ 507 | JPEG XS codestream (field 0) | 508 : (part 1/q) : M=0, K=0, L=0, I=10 509 +---------------------------------------+ 510 | JPEG XS codestream (field 0) | SEP counter = 0 511 | (part 2/q) | P counter = 1 512 : : M=0, K=0, L=0, I=10 513 +---------------------------------------+ 514 | JPEG XS codestream (field 0) | SEP counter = 0 515 | (part 3/q) | P counter = 2 516 : : M=0, K=0, L=0, I=10 517 +---------------------------------------+ 518 : : 519 +---------------------------------------+ 520 | JPEG XS codestream (field 0) | SEP counter = 1 521 | (part 2049/q) | P counter = 0 522 : : M=0, K=0, L=0, I=10 523 +---------------------------------------+ 524 : : 525 +---------------------------------------+ 526 | JPEG XS codestream (field 0) | SEP counter = (q-1) div 2048 527 | (part q/q) | P counter = (q-1) mod 2048 528 : : M=0, K=0, L=1, I=10 529 +===============[ PU #2 ]===============+ 530 | JPEG XS codestream (field 1) | SEP counter = 0 531 | (part 1/q) | P counter = 0 532 : : M=0, K=0, L=0, I=11 533 +---------------------------------------+ 534 | JPEG XS codestream (field 1) | SEP counter = 0 535 | (part 2/q) | P counter = 1 536 : : M=0, K=0, L=0, I=11 537 +---------------------------------------+ 538 : : 539 +---------------------------------------+ 540 | JPEG XS codestream (field 1) | SEP counter = (q-1) div 2048 541 | (part q/q) | P counter = (q-1) mod 2048 542 : : M=1, K=0, L=1, I=11 543 +=======================================+ 545 Figure 5: Example of JPEG XS Payload Data (codestream packetization 546 mode, interlaced frame) 548 +====[ PU#1: JPEG XS Header segment ]===+ 549 | Video support box | SEP counter = 0x07FF 550 | +---------------------------------+ | P counter = 0 551 | : Sub boxes of the VS box : | 552 | +---------------------------------+ | 553 +- - - - - - - - - - - - - - - - - - - -+ 554 | Colour specification box | 555 | +---------------------------------+ | 556 | : Fields of the CS box : | 557 | +---------------------------------+ | 558 +- - - - - - - - - - - - - - - - - - - -+ 559 | JPEG XS codestream header | 560 | +---------------------------------+ | 561 | : Markers and marker segments : | 562 | +---------------------------------+ | M=0, T=0, K=1, L=1, I=00 563 +==========[ PU#2: Slice #1 ]===========+ 564 | +---------------------------------+ | SEP counter = 0 565 | | SLH Marker | | P counter = 0 566 | +---------------------------------+ | 567 | : Entropy Coded Data : | 568 | +---------------------------------+ | M=0, T=0, K=1, L=1, I=00 569 +==========[ PU#3: Slice #2 ]===========+ 570 | Slice #2 | SEP counter = 1 571 | (part 1/q) | P counter = 0 572 : : M=0, T=0, K=1, L=0, I=00 573 +---------------------------------------+ 574 | Slice #2 | SEP counter = 1 575 | (part 2/q) | P counter = 1 576 : : M=0, T=0, K=1, L=0, I=00 577 +---------------------------------------+ 578 : : 579 +---------------------------------------+ 580 | Slice #2 | SEP counter = 1 581 | (part q/q) | P counter = q-1 582 : : M=0, T=0, K=1, L=1, I=00 583 +=======================================+ 584 : : 585 +========[ PU#N: Slice #(N-1) ]=========+ 586 | Slice #(N-1) | SEP counter = N-2 587 | (part 1/r) | P counter = 0 588 : : M=0, T=0, K=1, L=0, I=00 589 +---------------------------------------+ 590 : : 591 +---------------------------------------+ 592 | Slice #(N-1) | SEP counter = N-2 593 | (part r/r) | P counter = r-1 594 : + EOC marker : M=1, T=0, K=1, L=1, I=00 595 +=======================================+ 597 Figure 6: Example of JPEG XS Payload Data (slice packetization mode, 598 progressive frame) 600 4.4. Traffic Shaping and Delivery Timing 602 The traffic shaping and delivery timing shall be in accordance with 603 the Network Compatibility Model compliance definitions specified in 604 SMPTE ST 2110-21 [SMPTE-ST2110-21] for either Narrow Linear Senders 605 (Type NL) or Wide Senders (Type W). The session description shall 606 include a format-specific parameter of either TP=2110TPNL or 607 TP=2110TPW to indicate compliance with Type NL or Type W 608 respectively. 610 NOTE: The Virtual Receiver Buffer Model compliance definitions of ST 611 2110-21 do not apply. 613 5. Congestion Control Considerations 615 Congestion control for RTP SHALL be used in accordance with RFC 3550 616 [RFC3550], and with any applicable RTP profile: e.g., RFC 3551 617 [RFC3551]. An additional requirement if best-effort service is being 618 used is users of this payload format MUST monitor packet loss to 619 ensure that the packet loss rate is within acceptable parameters. 620 Circuit Breakers [RFC8083] is an update to RTP [RFC3550] that defines 621 criteria for when one is required to stop sending RTP Packet Streams 622 and applications implementing this standard MUST comply with it. RFC 623 8085 [RFC8085] provides additional information on the best practices 624 for applying congestion control to UDP streams. 626 6. Payload Format Parameters 628 6.1. Media Type Definition 630 Type name: video 632 Subtype name: jxsv 634 Required parameters: 636 rate: The RTP timestamp clock rate. Applications using this 637 payload format SHOULD use a value of 90000. 639 transmission mode: Indicates if packets are sent sequentially by 640 the transmitter. A receiver could use this information to 641 dimension its input buffer(s) accordingly. If set to 0, nothing 642 can be assumed about the transmission order and packets may be sent 643 out-of-order. If value is 1, packets must be sent sequentially by 644 the transmitter. 646 Optional parameters: 648 profile: The JPEG XS profile in use, as defined in ISO/IEC 21122-2 649 (JPEG XS Part 2) [ISO21122-2]. 651 level: The JPEG XS level in use, as defined in ISO/IEC 21122-2 652 (JPEG XS Part 2) [ISO21122-2]. 654 sublevel: The JPEG XS sublevel in use, as defined in ISO/IEC 655 21122-2 (JPEG XS Part 2) [ISO21122-2]. 657 sampling: Signals the colour difference signal sub-sampling 658 structure. 660 Signals utilizing the non-constant luminance Y'C'B C'R signal 661 format of Recommendation ITU-R BT.601-7, Recommendation ITU-R 662 BT.709-6, Recommendation ITU-R BT.2020-2, or Recommendation ITU-R 663 BT.2100 shall use the appropriate one of the following values for 664 the Media Type Parameter "sampling": 666 YCbCr-4:4:4 (4:4:4 sampling) 667 YCbCr-4:2:2 (4:2:2 sampling) 668 YCbCr-4:2:0 (4:2:0 sampling) 670 Signals utilizing the Constant Luminance Y'C C'BC C'RC signal 671 format of Recommendation ITU-R BT.2020-2 shall use the appropriate 672 one of the following values for the Media Type Parameter 673 "sampling": 675 CLYCbCr-4:4:4 (4:4:4 sampling) 676 CLYCbCr-4:2:2 (4:2:2 sampling) 677 CLYCbCr-4:2:0 (4:2:0 sampling) 679 Signals utilizing the constant intensity I CT CP signal format of 680 Recommendation ITU-R BT.2100 shall use the appropriate one of the 681 following values for the Media Type Parameter "sampling": 683 ICtCp-4:4:4 (4:4:4 sampling) 684 ICtCp-4:2:2 (4:2:2 sampling) 685 ICtCp-4:2:0 (4:2:0 sampling) 687 Signals utilizing the 4:4:4 R' G' B' or RGB signal format (such as 688 that of Recommendation ITU-R BT.601, Recommendation ITU-R BT.709, 689 Recommendation ITU-R BT.2020, Recommendation ITU-R BT.2100, SMPTE 690 ST 2065-1 or ST 2065-3) shall use the following value for the Media 691 Type Parameter sampling. 693 RGB RGB or R' G' B' samples 695 Signals utilizing the 4:4:4 X' Y' Z' signal format (such as defined 696 in SMPTE ST 428-1) shall use the following value for the Media Type 697 Parameter sampling. 699 XYZ X' Y' Z' samples 701 Key signals as defined in SMPTE RP 157 shall use the value key for 702 the Media Type Parameter sampling. The Key signal is represented 703 as a single component. 705 KEY samples of the key signal 707 depth: Determines the number of bits per sample. This is an 708 integer with typical values including 8, 10, 12, and 16. 710 width: Determines the number of pixels per line. This is an 711 integer between 1 and 32767. 713 height: Determines the number of lines per frame. This is an 714 integer between 1 and 32767. 716 exactframerate: Signals the frame rate in frames per second. 717 Integer frame rates shall be signaled as a single decimal number 718 (e.g. "25") whilst non-integer frame rates shall be signaled as a 719 ratio of two integer decimal numbers separated by a "forward-slash" 720 character (e.g. "30000/1001"), utilizing the numerically smallest 721 numerator value possible. 723 colorimetry: Specifies the system colorimetry used by the image 724 samples. Valid values and their specification are: 726 BT601-5 ITU Recommendation BT.601-5 727 BT709-2 ITU Recommendation BT.709-2 728 SMPTE240M SMPTE standard 240M 729 BT601 as specified in Recommendation ITU-R BT.601-7 730 BT709 as specified in Recommendation ITU-R BT.709-6 731 BT2020 as specified in Recommendation ITU-R BT.2020-2 732 BT2100 as specified in Recommendation ITU-R BT.2100 733 Table 2 titled "System colorimetry" 734 ST2065-1 as specified in SMPTE ST 2065-1 Academy Color 735 Encoding Specification (ACES) 736 ST2065-3 as specified for Academy Density Exchange 737 Encoding (ADX) in SMPTE ST 2065-3 738 XYZ as specified in ISO 11664-1 section titled 739 "1931 Observer" 741 Signals utilizing the Recommendation ITU-R BT.2100 colorimetry 742 should also signal the representational range using the optional 743 parameter RANGE defined below. 745 interlace: If this OPTIONAL parameter name is present, it indicates 746 that the video is interlaced. If this parameter name is not 747 present, the progressive video format shall be assumed. 749 TCS: Transfer Characteristic System. This parameter specifies the 750 transfer characteristic system of the image samples. Valid values 751 and their specification are: 753 SDR (Standard Dynamic Range) Video streams of standard 754 dynamic range, that utilize the OETF of Recommendation 755 ITU-R BT.709 or Recommendation ITU-R BT.2020. Such 756 streams shall be assumed to target the EOTF specified 757 in ITU-R BT.1886. 758 PQ Video streams of high dynamic range video that utilize 759 the Perceptual Quantization system of Recommendation 760 ITU-R BT.2100 761 HLG Video streams of high dynamic range video that utilize 762 the Hybrid Log-Gamma system of Recommendation ITU-R 763 BT.2100 765 RANGE: This parameter should be used to signal the encoding range 766 of the sample values within the stream. When paired with ITU Rec 767 BT.2100 colorimetry, this parameter has two allowed values NARROW 768 and FULL, corresponding to the ranges specified in table 9 of ITU 769 Rec BT.2100. In any other context, this parameter has three 770 allowed values: NARROW, FULLPROTECT, and FULL, which correspond to 771 the ranges specified in SMPTE RP 2077. In the absence of this 772 parameter, NARROW shall be the assumed value in either case. 774 Encoding considerations: 775 This media type is framed and binary; see Section 4.8 in RFC 6838 776 [RFC6838]. 778 Security considerations: 779 Please see the Security Considerations section in RFC XXXX 781 6.2. Mapping to SDP 783 6.2.1. General 785 A Session Description Protocol (SDP) object shall be created for each 786 RTP stream and it shall be in accordance with the provisions of SMPTE 787 ST 2110-10 [SMPTE-ST2110-10]. 789 The information carried in the media type specification has a 790 specific mapping to fields in the Session Description Protocol (SDP), 791 which is commonly used to describe RTP sessions. 793 6.2.2. Media type and subtype 795 The media type ("video") goes in SDP "m=" as the media name. 797 The media subtype ("jxsv") goes in SDP "a=rtpmap" as the encoding 798 name, followed by a slash ("/") and the required parameter "rate" 799 corresponding to the RTP timestamp clock rate (which for the payload 800 format defined in this document SHOULD be 90000), followed by a slash 801 ("/") and the required parameter "transmission mode" set to 1 if 802 packets are sent sequentially by the transmitter, or 0 if 803 transmission order is not constrained. The optional parameters go in 804 the SDP "a=fmtp" attribute by copying them directly from the MIME 805 media type string as a semicolon-separated list of parameter=value 806 pairs. 808 A sample SDP mapping for JPEG XS video is as follows: 810 m=video 30000 RTP/AVP 112 811 a=rtpmap:112 jxsv/90000/1 812 a=fmtp:112 sampling=YCbCr-4:2:2; width=1920; height=1080; 813 depth=10; colorimetry=BT709; TCS=SDR; 814 RANGE=FULL; TP=2110TPNL 816 In this example, a JPEG XS RTP stream is being sent to UDP 817 destination port 30000, with an RTP dynamic payload type of 112 and a 818 media clock rate of 90000 Hz. Note that the "a=fmtp:" line has been 819 wrapped to fit this page, and will be a single long line in the SDP 820 file. 822 6.2.3. Traffic shaping 824 The SDP object shall include the TP parameter (either 2110TPNL or 825 2110TPW as specified in Section 4.4) and may include the CMAX 826 parameter as specified in SMPTE ST 2110-21 [SMPTE-ST2110-21]. 828 6.2.4. Offer/Answer Considerations 830 The following considerations apply when using SDP offer/answer 831 procedures [RFC3264] to negotiate the use of the JPEG XS payload in 832 RTP: 834 o The "encode" parameter can be used for sendrecv, sendonly, and 835 recvonly streams. Each encode type MUST use a separate payload 836 type number. 838 o Any unknown parameter in an offer MUST be ignored by the receiver 839 and MUST NOT be included in the answer. 841 7. IANA Considerations 843 This memo requests that IANA registers video/jxsv as specified in 844 Section 6.1. The media type is also requested to be added to the 845 IANA registry for "RTP Payload Format MIME types" [1]. 847 8. Security Considerations 849 RTP packets using the payload format defined in this specification 850 are subject to the security considerations discussed in the RTP 851 specification [RFC3550] and in any applicable RTP profile such as 852 RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ 853 SAVPF [RFC5124]. This implies that confidentiality of the media 854 streams is achieved by encryption. 856 However, as "Securing the RTP Framework: Why RTP Does Not Mandate a 857 Single Media Security Solution" [RFC7202] discusses, it is not an RTP 858 payload format's responsibility to discuss or mandate what solutions 859 are used to meet the basic security goals like confidentiality, 860 integrity, and source authenticity for RTP in general. This 861 responsibility lies on anyone using RTP in an application. They can 862 find guidance on available security mechanisms and important 863 considerations in "Options for Securing RTP Sessions" [RFC7201]. 864 Applications SHOULD use one or more appropriate strong security 865 mechanisms. 867 This payload format and the JPEG XS encoding do not exhibit any 868 substantial non-uniformity, either in output or in complexity to 869 perform the decoding operation and thus are unlikely to pose a 870 denial-of-service threat due to the receipt of pathological 871 datagrams. 873 It is important to note that HD or UHDTV JPEG XS-encoded video can 874 have significant bandwidth requirements (typically more than 1 Gbps 875 for ultra high-definition video, especially if using high framerate). 876 This is sufficient to cause potential for denial-of-service if 877 transmitted onto most currently available Internet paths. 879 Accordingly, if best-effort service is being used, users of this 880 payload format MUST monitor packet loss to ensure that the packet 881 loss rate is within acceptable parameters. Packet loss is considered 882 acceptable if a TCP flow across the same network path, and 883 experiencing the same network conditions, would achieve an average 884 throughput, measured on a reasonable timescale, that is not less than 885 the RTP flow is achieving. This condition can be satisfied by 886 implementing congestion control mechanisms to adapt the transmission 887 rate (or the number of layers subscribed for a layered multicast 888 session), or by arranging for a receiver to leave the session if the 889 loss rate is unacceptably high. 891 This payload format may also be used in networks that provide 892 quality-of-service guarantees. If enhanced service is being used, 893 receivers SHOULD monitor packet loss to ensure that the service that 894 was requested is actually being delivered. If it is not, then they 895 SHOULD assume that they are receiving best-effort service and behave 896 accordingly. 898 9. RFC Editor Considerations 900 Note to RFC Editor: This section may be removed after carrying out 901 all the instructions of this section. 903 RFC XXXX is to be replaced by the RFC number this specification 904 receives when published. 906 10. References 908 10.1. Normative References 910 [ISO21122-1] 911 International Organization for Standardization (ISO) - 912 International Electrotechnical Commission (IEC), 913 "Information technology - JPEG XS low-latency lightweight 914 image coding system - Part 1: Core coding system", ISO/ 915 IEC PRF 21122-1, under development, 916 . 918 [ISO21122-2] 919 International Organization for Standardization (ISO) - 920 International Electrotechnical Commission (IEC), 921 "Information technology - JPEG XS low-latency lightweight 922 image coding system - Part 2: Profiles and buffer models", 923 ISO/IEC PRF 21122-2, under development, 924 . 926 [ISO21122-3] 927 International Organization for Standardization (ISO) - 928 International Electrotechnical Commission (IEC), 929 "Information technology - JPEG XS low-latency lightweight 930 image coding system - Part 3: Transport and container 931 formats", ISO/IEC FDIS 21122-3, under development, 932 . 934 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 935 Requirement Levels", BCP 14, RFC 2119, 936 DOI 10.17487/RFC2119, March 1997, 937 . 939 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 940 with Session Description Protocol (SDP)", RFC 3264, 941 DOI 10.17487/RFC3264, June 2002, 942 . 944 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 945 Jacobson, "RTP: A Transport Protocol for Real-Time 946 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 947 July 2003, . 949 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 950 Video Conferences with Minimal Control", STD 65, RFC 3551, 951 DOI 10.17487/RFC3551, July 2003, 952 . 954 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 955 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 956 RFC 3711, DOI 10.17487/RFC3711, March 2004, 957 . 959 [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type 960 Specifications and Registration Procedures", BCP 13, 961 RFC 6838, DOI 10.17487/RFC6838, January 2013, 962 . 964 [RFC8083] Perkins, C. and V. Singh, "Multimedia Congestion Control: 965 Circuit Breakers for Unicast RTP Sessions", RFC 8083, 966 DOI 10.17487/RFC8083, March 2017, 967 . 969 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 970 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 971 March 2017, . 973 [SMPTE-ST2110-10] 974 Society of Motion Picture and Television Engineers, "SMPTE 975 Standard - Professional Media Over Managed IP Networks: 976 System Timing and Definitions", SMPTE ST 2110-10:2017, 977 2017, . 979 [SMPTE-ST2110-21] 980 Society of Motion Picture and Television Engineers, "SMPTE 981 Standard - Professional Media Over Managed IP Networks: 982 Traffic Shaping and Delivery Timing for Video", SMPTE ST 983 2110-21:2017, 2017, 984 . 986 10.2. Informative References 988 [RFC4175] Gharai, L. and C. Perkins, "RTP Payload Format for 989 Uncompressed Video", RFC 4175, DOI 10.17487/RFC4175, 990 September 2005, . 992 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 993 "Extended RTP Profile for Real-time Transport Control 994 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 995 DOI 10.17487/RFC4585, July 2006, 996 . 998 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 999 Real-time Transport Control Protocol (RTCP)-Based Feedback 1000 (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February 1001 2008, . 1003 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1004 Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, 1005 . 1007 [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP 1008 Framework: Why RTP Does Not Mandate a Single Media 1009 Security Solution", RFC 7202, DOI 10.17487/RFC7202, April 1010 2014, . 1012 10.3. URIs 1014 [1] http://www.iana.org/assignments/rtp-parameters 1016 Authors' Addresses 1018 Sebastien Lugan 1019 intoPIX S.A. 1020 Rue Emile Francqui, 9 1021 1435 Mont-Saint-Guibert 1022 Belgium 1024 Phone: +32 10 23 84 70 1025 Email: rtp@intopix.com 1026 URI: http://www.intopix.com 1027 Antonin Descampe 1028 intoPIX S.A. 1029 Rue Emile Francqui, 9 1030 1435 Mont-Saint-Guibert 1031 Belgium 1033 Phone: +32 10 23 84 70 1034 Email: a.descampe@intopix.com 1035 URI: http://www.intopix.com 1037 Corentin Damman 1038 intoPIX S.A. 1039 Rue Emile Francqui, 9 1040 1435 Mont-Saint-Guibert 1041 Belgium 1043 Phone: +32 10 23 84 70 1044 Email: c.damman@intopix.com 1045 URI: http://www.intopix.com 1047 Thomas Richter 1048 Fraunhofer IIS 1049 Am Wolfsmantel 33 1050 91048 Erlangen 1051 Germany 1053 Phone: +49 9131 776 5126 1054 Email: thomas.richter@iis.fraunhofer.de 1055 URI: https://www.iis.fraunhofer.de/ 1057 Alexandre Willeme 1058 Universite catholique de Louvain 1059 Place du Levant, 2 - bte L5.04.04 1060 1348 Louvain-la-Neuve 1061 Belgium 1063 Phone: +32 10 47 80 82 1064 Email: alexandre.willeme@uclouvain.be 1065 URI: https://uclouvain.be/en/icteam