idnits 2.17.1 draft-schierl-payload-rtp-h265-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([HEVC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 58 has weird spacing: '... units in e...' == Line 62 has weird spacing: '...nternet video...' == Line 142 has weird spacing: '... share a si...' == Line 147 has weird spacing: '...nsation and ...' == Line 154 has weird spacing: '...eblocks of u...' == (13 more instances...) -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. -- The document date (February 27, 2012) is 4414 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'EHVC' is mentioned on line 318, but not defined == Missing Reference: 'TBD' is mentioned on line 1722, but not defined == Unused Reference: 'RFC6184' is defined on line 1800, but no explicit reference was found in the text == Unused Reference: 'RFC6190' is defined on line 1803, but no explicit reference was found in the text == Unused Reference: 'RFC3264' is defined on line 1810, but no explicit reference was found in the text == Unused Reference: 'RFC4648' is defined on line 1814, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 2 errors (**), 0 flaws (~~), 13 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Payload WG T. Schierl 2 Internet Draft Fraunhofer HHI 3 Intended status: Standards track S. Wenger 4 Expires: August 2012 Vidyo 5 Y.-K. Wang 6 Qualcomm 7 M. M. Hannuksela 8 Nokia 9 February 27, 2012 11 RTP Payload Format for High Efficiency Video Coding 12 draft-schierl-payload-rtp-h265-00.txt 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with 17 the provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other documents 26 at any time. It is inappropriate to use Internet-Drafts as 27 reference material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on August 27, 2012. 37 Copyright and License Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with 47 respect to this document. Code Components extracted from this 48 document must include Simplified BSD License text as described in 49 Section 4.e of the Trust Legal Provisions and are provided without 50 warranty as described in the Simplified BSD License. 52 Abstract 54 This memo describes an RTP payload format for High Efficiency Video 55 Coding (HEVC) [HEVC], which is currently being developed by the 56 Joint Collaborative Team on Video Coding (JCT-VC). The RTP payload 57 format allows for packetization of one or more Network Abstraction 58 Layer (NAL) units in each RTP packet payload, as well as 59 fragmentation of a NAL unit into multiple RTP packets. Furthermore, 60 it supports transmission of an HEVC stream over a single as well as 61 multiple RTP flows. The payload format has wide applicability in 62 videoconferencing, Internet video streaming, and high bit-rate 63 entertainment-quality video, among others. 65 Table of Contents 67 Status of this Memo ............................................. 1 68 Abstract ........................................................ 3 69 Table of Contents ............................................... 3 70 1 . Introduction ................................................ 5 71 1.1 . The HEVC Codec.......................................... 5 72 1.1.1 Overview ............................................ 5 73 1.1.2 Parallel Processing Support ......................... 6 74 1.1.3 Parameter Sets ..................................... 9 75 1.1.4 NAL Unit Header .................................... 9 76 1.2 . Overview of the Payload Format ........................ 11 77 2 . Conventions ................................................ 12 78 3 . Definitions and Abbreviations .............................. 12 79 3.1 Definitions ............................................. 12 80 3.1.1 Definitions from the HEVC Specification ............ 12 81 3.1.2 Definitions Specific to This Memo .................. 13 82 3.2 Abbreviations ........................................... 14 83 4 . RTP Payload Format ......................................... 14 84 4.1 RTP Header Usage......................................... 14 85 4.2 NAL Unit Header Usage ................................... 16 86 4.3 Payload Structures ...................................... 16 87 4.4 Transmission Modes ...................................... 17 88 4.5 Packetization Modes ..................................... 17 89 4.6 Decoding Order .......................................... 18 90 4.7 Aggregation Packets ..................................... 20 91 4.7.1 Single Time Aggregation Packet (STAP) .............. 22 92 4.8 Fragmentation Units (FUs) ............................... 24 93 5 . Packetization Rules ........................................ 28 94 5.1 Common Packetization Rules .............................. 28 95 5.2 Non-Interleaved mode .................................... 29 96 5.3 Interleaved mode......................................... 29 98 6 . De-Packetization Process .................................. 29 99 6.1 Non-Interleaved Mode .................................... 30 100 6.2 Interleaved Mode......................................... 30 101 6.2.1 Size of the De-interleaving Buffer ................. 30 102 6.2.2 De-interleaving Process ............................ 31 103 6.3 Additional De-Packetization Guidelines .................. 33 104 7 . Payload Format Parameters ................................. 33 105 7.1 Media Type Registration ................................ 34 106 7.2 SDP Parameters .......................................... 39 107 7.2.1 Mapping of Payload Type Parameters to SDP .......... 39 108 7.2.2 Usage with the SDP Offer/Answer Model .............. 39 109 7.2.3 Usage with SDP Offer/Answer Model .................. 40 110 7.2.4 Usage in Declarative Session Descriptions .......... 40 111 7.2.5 Signaling of Parallel Processing ................... 40 112 7.3 Examples ................................................ 41 113 7.4 Parameter Set Considerations ............................ 41 114 8 . Security Considerations ................................... 41 115 9 . Congestion Control ......................................... 41 116 10 . IANA Consideration......................................... 41 117 11 . Informative Appendix: Application Examples ................ 41 118 11.1 Introduction ........................................... 41 119 11.2 Streaming .............................................. 41 120 11.3 Videoconferencing (Unicast to MANE, Unicast to Endpoints)41 121 11.4 Mobile TV (Multicast to MANE, Unicast to Endpoint) ..... 41 122 12 . Acknowledgements .......................................... 41 123 13 . References ................................................ 42 124 13.1 Normative References ................................... 42 125 13.2 Informative References ................................. 42 126 14 . Authors' Addresses......................................... 43 128 1. Introduction 130 1.1. The HEVC Codec 132 1.1.1 Overview 134 High Efficiency Video Coding [HEVC] is a forthcoming video coding 135 standard under development by the Joint Collaborative Team on Video 136 Coding (JCT-VC) formed by the ITU-T and ISO/IEC. It is reported to 137 provide significantly coding efficiency gains over H.264 [H.264]. 138 The standard will be found under ISO/IEC as ISO/IEC 23008-2, 139 informally as MPEG H Part 2. ITU-T may decide soon on the final 140 recommendation number. 142 H.264 and HEVC share a similar hybrid video codec design. 143 Conceptually, both technologies include a video coding layer (VCL), 144 and a network abstraction layer (NAL). 146 The VCL of HEVC includes a prediction stage that involves motion 147 compensation and spatial intra-prediction, integer transforms 148 applied to prediction residuals, and an entropy coding stage that 149 uses an arithmetic coding. As in H.264, in-loop deblocking filtering 150 is applied to the reconstructed picture. 152 An important difference of HEVC compared to H.264 is the coding 153 structure within a picture. In HEVC each picture is divided into 154 treeblocks of up to 64x64 luma samples. Treeblocks can be 155 recursively split into smaller Coding Units (CUs) using a generic 156 quad-tree segmentation structure. CUs can be further split into 157 Prediction Units (PUs) used for intra- and inter-prediction and 158 Transform Units (TUs) defined for transform and quantization. HEVC 159 includes integer transforms for a number of TU sizes. HEVC also 160 includes two new in-loop filters that may be applied after the 161 deblocking filtering: Sample Adaptive Offset (SAO) and Adaptive Loop 162 Filter (ALF). 164 On random accessibility provisioning, HEVC introduces besides 165 Instantaneous Decoder Refresh (IDR) pictures a Clean Random Access 166 (CRA) picture, which is similar to what has been conventionally 167 called open Group-of-Pictures (GOP) intra picture. Compared to 168 H.264 wherein a CRA picture may be signalled using a recovery point 169 Supplemental Enhancement Information (SEI) message, in HEVC a 170 distinct NAL unit type is used for indication of a CRA picture. 171 Furthermore, HEVC specifies that a conforming bitstream may start 172 with a CRA picture, compared to in H.264 a conforming must start 173 with an IDR picture. 175 Temporal layer access (TLA) pictures were introduced in HEVC to 176 indicate temporal layer switching points. 178 Predictively coded pictures can include uni-predicted and bi- 179 predicted slices. The flexibility in creating picture coding 180 structures is roughly comparable to H.264. 182 The VCL generates and consumes syntax structures designed to be 183 adaptable to MTU sizes commonly found in IP networks, irrespective 184 of the size of a coded picture. Picture segmentation is achieved 185 through slices. A concept of "fine granularity slices" (FGS) is 186 included that allows to create slice boundaries within a treeblock. 188 The Network Adaptation Layer (NAL) is responsible for information 189 required to the decoding process of more than one slice, which are 190 collected in parameter sets. A number of data structures not 191 strictly required for the decoding process, but potentially helpful 192 in decoding systems can be conveyed in data structures such as 193 Supplementary Enhancement Information (SEI) messages, Access unit 194 delimiters, and so on. 196 All the aforementioned MTU-sized (or smaller) data structures are 197 available in the form of Network Adaptation Layer Units. 199 The single distinguishing difference between HEVC and H.264 with 200 respect to the RTP payload format design is the availability of VCL- 201 based coding tools that are specifically designed to enable 202 processing on high-level parallel architectures. These tools are 203 described below in sufficient detail to provide motivation for the 204 parallel processing signaling support that is described in section 205 7.2.5. 207 1.1.2 Parallel Processing Support 209 The reportedly significantly higher computational demand of HEVC 210 over H.264, in conjunction with the ever increasing video resolution 211 (both spatially and temporally) required by the market, led to the 212 adoption of VCL coding tools specifically targeted to allow for 213 parallelization on the sub-picture level. That is, parallelization 214 occurs, at the minimum, at the granularity of an integer number of 215 treeblocks. The targets for this type of high-level parallelization 216 are multicore CPUs and DSPs as well as multiprocessor systems. In a 217 system design, to be useful, these tools require signaling support, 218 which is provided in section 7.2.5 of this memo. This section 219 provides a brief overview of the tools available in [HEVC]. This 220 section is expected to be updated frequently as the HEVC draft 221 evolves. 223 For parallelization, four picture partition strategies are 224 available. 226 Regular slices are segments of the bitstream that can be 227 reconstructed independently from other regular slices within the 228 same picture (though there may still be interdependencies through 229 loop filtering operations). Regular slices are the only tool that 230 can be used for parallelization that is also available, in virtually 231 identical form, in H.264. Regular slices based parallelization does 232 not require much inter-processor or inter-core communication (except 233 for inter-processor or inter-core data sharing for motion 234 compensation when decoding a predictively coded picture, which is 235 typically much heavier than inter-processor or inter-core data 236 sharing due to in-picture prediction), as slices are designed to be 237 independently decodable. However, for the same reason, regular 238 slices can require some coding overhead. Further, regular slices 239 (in contrast to some of the other tools mentioned below) also serve 240 as the key mechanism for bitstream partitioning to match MTU size 241 requirements, due to the in-picture independence of regular slices 242 and that each regular slice is encapsulated in its own NAL unit. In 243 many cases, the goal of parallelization and the goal of MTU size 244 matching can place contradicting demands to the slice layout in a 245 picture. The realization of this situation led to the development 246 of the more advanced tools mentioned below. This payload format 247 does not contain any specific mechanisms aiding parallelization 248 through regular slices. 250 Entropy slices, like regular slices, break entropy decoding 251 dependencies but allow prediction (and filtering) to cross slice 252 boundaries. Insofar, they can be used as a lightweight mechanism to 253 parallelize the entropy decoding, without having impact on other 254 decoding steps. The lightweightness comes from that though each 255 entropy slice is encapsulated into its own NAL unit, it has a much 256 shorter slice header as most of the slice header syntax elements are 257 not present and must be inherited from the preceding full slice 258 header. Due to the allowance of in-picture prediction between 259 neighboring entropy slices within a picture, the required inter- 260 processor/inter-core communication to enable in-picture prediction 261 can be substantial. Due to the same reason, entropy slices cannot 262 be used for MTU size matching. Entropy slices appear to be only 263 useful for system architectures that execute the entropy decoding 264 process on a multicore/multi-CPU architecture, but execute the 265 remaining decoding functionality on dedicated signal processing 266 hardware. At the time of writing, entropy slices are not included 267 in any profile defined in draft HEVC. No support of entropy slices 268 is included in this memo. 270 In Wavefront Parallel Processing, the picture is partitioned into 271 rows of treeblocks. Entropy decoding and prediction are allowed to 272 use data from treeblocks in other partitions. Parallel processing 273 is possible through parallel decoding of rows of treeblocks, where 274 the start of the decoding of a row is delayed by two treeblocks, so 275 to ensure that data related to a treeblock above and to the right of 276 the subject treeblock is available before the subject treeblock is 277 being decoded. Using this staggered start (which appears like a 278 wavefront when represented graphically), parallelization is possible 279 with up to as many processors/cores as the picture contains 280 treeblock rows. At the time of writing, the draft HEVC includes a 281 mechanism to organize the coded bits of different treeblock rows to 282 be friendly to a particular number of parallel processors/cores. 283 For example, it is possible that coded bits of even numbers of 284 treeblock rows (treeblock rows 0, 2, 4, ...) all come before coded 285 bits of odd numbers of treeblock rows (treeblock rows 1, 3, 5, ...), 286 such that the bitstream is friendly to two parallel 287 processors/cores, though decoding of an earlier-coming treeblock row 288 (e.g. treeblock row 2) refers to an later-coming treeblock row (e.g. 289 treeblock row 1). Similarly as entropy slices, due to the allowance 290 of in-picture prediction between neighboring treeblock rows within a 291 picture, the required inter-processor/inter-core communication to 292 enable in-picture prediction can be substantial. The wavefront 293 parellel processing partitioning does not result into more NAL units 294 compared to when it is not applied, thus wavefront parellel 295 processing cannot be used for MTU size matching. At the time of 296 writing, wavefront parallel processing is not included in any 297 profile of draft HEVC. This memo does not specify support for it. 299 Tiles define horizontal and vertical boundaries that partition a 300 picture into tile columns and rows. The scan order of treeblocks is 301 changed to be local within a tile (in the order of a treeblock 302 raster can of a tile), before decoding the top-left treeblock of the 303 next tile in the order of tile raster scan of a picture. Similar to 304 regular slices, tiles break in-picture prediction dependencies 305 (including entropy decoding dependencies). However, they do not 306 need to be included into individual NAL units (same as wavefront 307 parallel processing in this regard), hence tiles cannot be used for 308 MTU size matching. Each tile can be processed by one 309 processor/core, and the inter-processor/inter-core communication 310 required for in-picture prediction between processing units decoding 311 neighboring tiles is limited to conveying the shared slice header in 312 cases a slice is spanning more than one tile, and loop filtering 313 related sharing of reconstructed samples and metadata. Insofar, 314 tiles are less demanding in terms of memory bandwidth compared to 315 WPP due to the in-picture independence between two neighboring 316 partitions. Tiles are included in the (single) existing profile of 318 [EHVC] and the support in the context of this memo will be specified 319 in section 7 of this memo. 321 The interaction between regular slices and tiles is simplified by 322 constraints of the HEVC draft. Specifically, for each slice and 323 tile, either or both of the following conditions must be fulfilled: 324 1) all coded blocks in a slice belong to the same tile; 2) all coded 325 blocks in a tile belong to the same slice. 327 1.1.3 Parameter Sets 329 The parameter set concept is borrowed from [H.264]. In addition to 330 Sequence Parameter Sets (SPS), carrying data valid to the whole 331 video sequence, and Picture Parameter Sets (PPS), carrying 332 information valid on a picture by picture base, the new Adaption 333 Parameters Sets (APS) carries picture-adaptive information that is 334 also valid on a picture by picture base but is expected to change 335 (typically much) more frequently than the information in PPS. 337 1.1.4 NAL Unit Header 339 HEVC maintains the NAL unit concept of H.264 with modifications. 340 HEVC uses a two-byte NAL unit header. Table 1 lists the allocation 341 of NAL unit types for VCL NAL units and non-VCL NAL units. 343 Table 1. NAL unit types in HEVC 345 Type NAL Unit Name NAL unit type class 346 ---------------------------------------------------------------- 347 0 Unspecified non-VCL 348 1 Coded slice of a non-IDR, non-CRA VCL 349 and non-TLA pictures 350 2 Reserved - 351 3 Coded slice of a TLA picture VCL 352 4 Coded slice of a CRA picture VCL 353 5 Coded slice of an IDR picture VCL 354 6 Supplemental enhancement information (SEI) non-VCL 355 7 Sequence parameter set non-VCL 356 8 Picture parameter set non-VCL 357 9 Access unit delimiter non-VCL 358 10..11 Reserved - 359 12 Filler data non-VCL 360 13 Reserved - 361 14 Adaptation parameter set non-VCL 362 15..23 Reserved - 363 24..63 unspecified non-VCL 365 The syntax and semantics of the NAL unit header are specified in 366 [HEVC], but the essential properties of the NAL unit header are 367 summarized below for convenience. 369 The first byte of the NAL unit header has the following format: 371 +---------------+ 372 |0|1|2|3|4|5|6|7| 373 +-+-+-+-+-+-+-+-+ 374 |F|N| Type | 375 +---------------+ 377 The semantics of the components of the NAL unit type octets, as 378 specified in [HEVC], are described briefly below. In addition to 379 the name and size of each field, the corresponding syntax element 380 name in [HEVC] is also provided. 382 F: 1 bit 383 forbidden_zero_bit. HEVC declares a value of 1 as a syntax 384 violation. Note: the bit is wasted for compatibility with MPEG-2 385 transport systems. 387 N: 1 bit 388 nal_ref_flag. A value of 0 indicates that the content of the NAL 389 unit is not used to reconstruct reference pictures for future 390 prediction. Such NAL units can be discarded without potentially 391 damaging the integrity of the reference pictures. A value of 1 392 indicates that the decoding of the NAL unit is required to 393 maintain the integrity of reference pictures or that the NAL unit 394 contains a parameter set. 396 Type: 6 bits 397 nal_unit_type. This component specifies the NAL unit type as 398 defined in Table 7-1 of [HEVC], and in Table 1 in this memo. For 399 a reference of all currently defined NAL unit types and their 400 semantics, please refer to Section 7.4.1 in [HEVC]. 402 In NAL units specified by HEVC, the second octet in the NAL unit 403 header is shown below. 405 +---------------+ 406 |0|1|2|3|4|5|6|7| 407 +-+-+-+-+-+-+-+-+ 408 | TID | R | 409 +---------------+ 411 TID: 3 bits 412 temporal_id. This component indicates the temporal identifier of 413 the NAL unit in the coded sequence. For IDR pictures or CRA 414 pictures the value is 0. For TLA pictures the value of 415 temporal_id must be greater than 0. 417 R: 5 bits 418 reserved_5 bits. Reserved bits for future extension (such as 419 scalability and three-dimension video extensions). R MUST be 420 equal to "00001" (in binary form). Decoders must ignore (i.e. 421 remove from the bitstream and discard) NAL units with values of 422 reserved_one_5bits not equal to '00001'. 424 This memo extends the semantics of F, N, and TID, as described in 425 Section 4.2. 427 1.2. Overview of the Payload Format 429 This payload format defines the following processes required for 430 transport of HEVC coded data over RTP [RFC3550]: 432 o Usage of RTP header with this payload format 434 o Packetization of HEVC coded NAL units into RTP packets 435 o Transmission of HEVC NAL units of the same bitstream within a 436 single RTP session or within multiple RTP sessions 438 o Payload format parameters to be used within the Session 439 Description Protocol (SDP) [RFC4566]. 441 2. Conventions 443 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 444 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 445 document are to be interpreted as described in BCP 14, RFC 2119 446 [RFC2119]. 448 This specification uses the notion of setting and clearing a bit 449 when bit fields are handled. Setting a bit is the same as assigning 450 that bit the value of 1 (On). Clearing a bit is the same as 451 assigning that bit the value of 0 (Off). 453 3. Definitions and Abbreviations 455 3.1 Definitions 457 This document uses the terms and definitions of [HEVC]. Section 458 3.1.1 lists relevant definitions copied from [HEVC] for convenience. 459 Section 3.1.2 gives definitions specific to this memo. 461 3.1.1 Definitions from the HEVC Specification 463 access unit: A set of NAL units that are consecutive in decoding 464 order and contain exactly one coded picture. In addition to the 465 coded slice NAL units of the coded picture, the access unit may 466 also contain other NAL units not containing slices of the coded 467 picture. The decoding of an access unit always results in a 468 decoded picture. 470 coded video sequence: A sequence of access units that consists, 471 in decoding order, of an IDR access unit followed by zero or more 472 non-IDR access units including all subsequent access units up to 473 but not including any subsequent IDR access unit. 475 CRA access unit: An access unit in which the coded picture is a 476 CRA picture. 478 CRA picture: A coded picture containing only I slices and for 479 which each slice has nal_unit_type equal to 4; all coded pictures 480 that follow the Clean Random Access (CRA) picture both in 481 decoding order and output order shall not use inter prediction 482 from any picture that precedes the CRA picture either in decoding 483 order or output order; and any picture that precedes the CRA 484 picture in decoding order also precedes the CRA picture in output 485 order. 487 IDR access unit: An access unit in which the coded picture is an 488 IDR picture. 490 IDR picture: A coded picture for which the variable IdrPicFlag is 491 equal to 1. An IDR picture causes the decoding process to mark 492 all reference pictures as "unused for reference". All coded 493 pictures that follow an IDR picture in decoding order can be 494 decoded without inter prediction from any picture that precedes 495 the IDR picture in decoding order. The first picture of each 496 coded video sequence in decoding order is an IDR picture. 498 Random Access: The act of starting the decoding process for a 499 bitstream at a point other than the beginning of the stream. 501 Tile: An integer number of treeblocks co-occurring in one column 502 and one row (each of which comprising one or more columns or rows 503 of treeblocks), ordered consecutively in treeblock raster scan of 504 the tile. The division of each picture into tiles is a 505 partitioning. Tiles in a picture are ordered consecutively in 506 tile raster scan of the picture. Although a slice contains 507 treeblocks that are consecutive in treeblock raster scan of a 508 tile, these treeblocks are not necessarily consecutive in 509 treeblock raster scan of the picture. 511 3.1.2 Definitions Specific to This Memo 513 media aware network element (MANE): A network element, such as a 514 middlebox or application layer gateway that is capable of parsing 515 certain aspects of the RTP payload headers or the RTP payload and 516 reacting to their contents. 518 Informative note: The concept of a MANE goes beyond normal 519 routers or gateways in that a MANE has to be aware of the 520 signaling (e.g., to learn about the payload type mappings of 521 the media streams), and in that it has to be trusted when 522 working with SRTP. The advantage of using MANEs is that they 523 allow packets to be dropped according to the needs of the 524 media coding. For example, if a MANE has to drop packets due 525 to congestion on a certain link, it can identify and remove 526 those packets whose elimination produces the least adverse 527 effect on the user experience. After dropping packets, MANEs 528 must rewrite RTCP packets to match the changes to the RTP 529 packet stream as specified in Section 7 of [RFC3550]. 531 NAL unit decoding order: A NAL unit order that conforms to the 532 constraints on NAL unit order given in Section 7.4.1.2.3 in 533 [HEVC]. 535 NALU-time: The value that the RTP timestamp would have if the NAL 536 unit would be transported in its own RTP packet. 538 RTP packet stream: A sequence of RTP packets with increasing 539 sequence numbers (except for wrap-around), identical PT and 540 identical SSRC (Synchronization Source), carried in one RTP 541 session. Within the scope of this memo, one RTP packet stream is 542 utilized to transport one or more layers. 544 transmission order: The order of packets in ascending RTP 545 sequence number order (in modulo arithmetic). Within an 546 aggregation packet, the NAL unit transmission order is the same 547 as the order of appearance of NAL units in the packet. 549 3.2 Abbreviations 551 TBD 553 4. RTP Payload Format 555 4.1 RTP Header Usage 557 The format of the RTP header is specified in [RFC3550] and reprinted 558 in Figure 1 for convenience. This payload format uses the fields of 559 the header in a manner consistent with that specification. 561 The RTP payload (and the settings for some RTP header bits) for 562 aggregation packets and fragmentation units are specified in 563 Sections 4.6 and 4.8, respectively. 565 0 1 2 3 566 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 568 |V=2|P|X| CC |M| PT | sequence number | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 | timestamp | 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | synchronization source (SSRC) identifier | 573 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 574 | contributing source (CSRC) identifiers | 575 | .... | 576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 Figure 1 RTP header according to [RFC3550] 580 The RTP header information to be set according to this RTP payload 581 format is set as follows: 583 Marker bit (M): 1 bit 585 Set for the very last packet of the access unit indicated by the 586 RTP timestamp, in line with the normal use of the M bit in video 587 formats, to allow an efficient playout buffer handling. For 588 aggregation packets (STAP), the marker bit in the RTP header MUST 589 be set to the value that the marker bit of the last NAL unit of 590 the aggregation packet would have been if it were transported in 591 its own RTP packet. Decoders MAY use this bit as an early 592 indication of the last packet of an access unit but MUST NOT rely 593 on this property. 595 Informative note: Only one M bit is associated with an 596 aggregation packet carrying multiple NAL units. Thus, if a 597 gateway has re-packetized an aggregation packet into several 598 packets, it cannot reliably set the M bit of those packets. 600 Payload type (PT): 7 bits 602 The assignment of an RTP payload type for this new packet format 603 is outside the scope of this document and will not be specified 604 here. The assignment of a payload type has to be performed 605 either through the profile used or in a dynamic way. 607 Sequence number (SN): 16 bits 609 Set and used in accordance with RFC 3550. In some packetization 610 modes (list TBD), the sequence number is used to determine 611 decoding order for the NALUs. 613 Timestamp: 32 bits 615 The RTP timestamp is set to the sampling timestamp of the 616 content. A 90 kHz clock rate MUST be used. 618 If the NAL unit has no timing properties of its own (e.g., 619 parameter set and SEI NAL units), the RTP timestamp is set to the 620 RTP timestamp of the coded picture of the access unit in which 621 the NAL unit is included, according to Section 7.4.1.2.3 of 622 [HEVC]. 624 Receivers SHOULD ignore any picture timing SEI messages included 625 in access units that have only one display timestamp. Instead, 626 receivers SHOULD use the RTP timestamp for synchronizing the 627 display process. If one access unit has more than one display 628 timestamp carried in a picture timing SEI message, then the 629 information in the SEI message SHOULD be treated as relative to 630 the RTP timestamp, with the earliest event occurring at the time 631 given by the RTP timestamp and subsequent events later, as given 632 by the difference in picture time values carried in the picture 633 timing SEI message. Let tSEI1, tSEI2, ..., tSEIn be the display 634 timestamps carried in the SEI message of an access unit, where 635 tSEI1 is the earliest of all such timestamps. Let tmadjst() be a 636 function that adjusts the SEI messages time scale to a 90-kHz 637 time scale. Let TS be the RTP timestamp. Then, the display time 638 for the event associated with tSEI1 is TS. The display time for 639 the event with tSEIx, where x is [2..n], is TS + tmadjst (tSEIx - 640 tSEI1). 642 4.2 NAL Unit Header Usage 644 The structure and semantics of the NAL unit header according to the 645 HEVC specification [HEVC] were introduced in Section 1.1.4. This 646 section specifies the extended semantics of the NAL unit header 647 fields. 649 4.3 Payload Structures 651 The NAL unit structure is central to HEVC [HEVC], all HEVC coded 652 bits for representing a video signal are encapsulated in NAL units. 653 Therefore each RTP packet payload is structured as a NAL unit, which 654 contains one or a part of one NAL unit specified in HEVC, or 655 aggregates one or more NAL units specified in HEVC. 657 4.4 Transmission Modes 659 This memo enables transmission of an HEVC bitstream over a single 660 RTP session or multiple RTP sessions. 662 TBD: SSRC Muxing for video conf. + TV broadcast/multicast. 664 4.5 Packetization Modes 666 This memo specifies the following packetization modes: 668 o Non-interleaved mode 670 o Interleaved mode 672 In the non-interleaved mode, NAL units are transmitted in NAL unit 673 decoding order. The interleaved mode allows transmission of NAL 674 units out of NAL unit decoding order. 676 The packetization mode in use MAY be signaled by the value of the 677 OPTIONAL packetization-mode media type parameter. The used 678 packetization mode governs which NAL unit types are allowed in RTP 679 payloads. Table 2 summarizes the allowed packet payload types for 680 each packetization mode. Packetization modes are explained in more 681 detail in section 6. 683 Table 2. Summary of allowed NAL unit types for each packetization 684 mode (yes = allowed, no = disallowed, ig = ignore) 686 Payload Packet Non-Interleaved Interleaved 687 Type Type Mode Mode 688 ------------------------------------------------- 689 0 reserved ig ig 690 1-23 NAL unit yes no 691 24 STAP-A yes no 692 25 STAP-B no yes 693 26 FU-A yes yes 694 27 FU-B no yes 695 28-63 reserved ig ig 697 Some NAL unit or payload type values (indicated as reserved in 698 Table 2) are reserved for future extensions. NAL units of those 699 types SHOULD NOT be sent by a sender (direct as packet payloads, or 700 as aggregation units in aggregation packets, or as fragmented units 701 in FU packets) and MUST be ignored by a receiver. For example, the 702 payload types 1-23, with the associated packet type "NAL unit", are 703 allowed in "Non-Interleaved Mode", but disallowed in "Interleaved 704 Mode". However, NAL units of NAL unit types 1-23 can be used in 705 "Interleaved Mode" as aggregation units in STAP-B packets as well as 706 fragmented units in FU-A and FU-B packets. Similarly, NAL units of 707 NAL unit types 1-23 can also be used in the "Non-Interleaved Mode" 708 as aggregation units in STAP-A packets or fragmented units in FU-A 709 packets, in addition to being directly used as packet payloads. 711 4.6 Decoding Order 713 In the interleaved packetization mode, the transmission order of NAL 714 units is allowed to differ from the decoding order of the NAL units. 715 Decoding order number (DON) is a field in the payload structure or a 716 derived variable that indicates the NAL unit decoding order. 717 Rationale and examples of use cases for transmission out of decoding 718 order and for the use of DON are given in section 13. 720 The coupling of transmission and decoding order is controlled by the 721 OPTIONAL sprop-interleaving-depth media type parameter as follows. 722 When the value of the OPTIONAL sprop-interleaving-depth media type 723 parameter is equal to 0 (explicitly or per default), the 724 transmission order of NAL units MUST conform to the NAL unit 725 decoding order. When the value of the OPTIONAL sprop-interleaving- 726 depth media type parameter is greater than 0, 728 o the order of NAL units generated by de-packetizing STAP-Bs, and 729 FUs in two consecutive packets is NOT REQUIRED to be the NAL unit 730 decoding order. 732 The RTP payload structures for an STAP-A, and an FU-A do not include 733 DON. STAP-B and FU-B structures include DON. 735 Informative note: When an FU-A occurs in interleaved mode, it 736 always follows an FU-B, which sets its DON. 738 Informative note: If a transmitter wants to encapsulate a single 739 NAL unit per packet and transmit packets out of their decoding 740 order, STAP-B packet type can be used. 742 In the non-interleaved packetization mode, the transmission order of 743 NAL units in single NAL unit packets, STAP-As, and FU-As MUST be the 744 same as their NAL unit decoding order. The NAL units within an STAP 745 MUST appear in the NAL unit decoding order. Thus, the decoding 746 order is first provided through the implicit order within a STAP, 747 and second provided through the RTP sequence number for the order 748 between STAPs, FUs, and single NAL unit packets. 750 Signaling of the value of DON for NAL units carried in STAP-B, and a 751 series of fragmentation units starting with an FU-B is specified in 752 sections 4.7.1, and 4.8, respectively. The DON value of the first 753 NAL unit in transmission order MAY be set to any value. Values of 754 DON are in the range of 0 to 65535, inclusive. After reaching the 755 maximum value, the value of DON wraps around to 0. 757 The decoding order of two NAL units contained in any STAP-B, or a 758 series of fragmentation units starting with an FU-B is determined as 759 follows. Let DON(i) be the decoding order number of the NAL unit 760 having index i in the transmission order. Function don_diff(m,n) is 761 specified as follows: 763 If DON(m) == DON(n), don_diff(m,n) = 0 765 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 766 don_diff(m,n) = DON(n) - DON(m) 768 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 769 don_diff(m,n) = 65536 - DON(m) + DON(n) 771 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 772 don_diff(m,n) = - (DON(m) + 65536 - DON(n)) 774 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 775 don_diff(m,n) = - (DON(m) - DON(n)) 777 A positive value of don_diff(m,n) indicates that the NAL unit having 778 transmission order index n follows, in decoding order, the NAL unit 779 having transmission order index m. When don_diff(m,n) is equal to 780 0, then the NAL unit decoding order of the two NAL units can be in 781 either order. A negative value of don_diff(m,n) indicates that the 782 NAL unit having transmission order index n precedes, in decoding 783 order, the NAL unit having transmission order index m. 785 Values of the DON field MUST be such that the decoding order 786 determined by the values of DON, as specified above, conforms to the 787 NAL unit decoding order. If the order of two NAL units in NAL unit 788 decoding order is switched and the new order does not conform to the 789 NAL unit decoding order, the NAL units MUST NOT have the same value 790 of DON. If the order of two consecutive NAL units in the NAL unit 791 stream is switched and the new order still conforms to the NAL unit 792 decoding order, the NAL units MAY have the same value of DON. 793 Consequently, NAL units having the same value of DON can be decoded 794 in any order, and two NAL units having a different value of DON 795 should be passed to the decoder in the order specified above. When 796 two consecutive NAL units in the NAL unit decoding order have a 797 different value of DON, the value of DON for the second NAL unit in 798 decoding order SHOULD be the value of DON for the first, incremented 799 by one. 801 An example of the de-packetization process to recover the NAL unit 802 decoding order is given in section 7. 804 Informative note: Receivers should not expect that the absolute 805 difference of values of DON for two consecutive NAL units in the 806 NAL unit decoding order will be equal to one, even in error-free 807 transmission. An increment by one is not required, as at the 808 time of associating values of DON to NAL units, it may not be 809 known whether all NAL units are delivered to the receiver. For 810 example, a gateway may not forward coded slice NAL units of non- 811 reference pictures or SEI NAL units when there is a shortage of 812 bit rate in the network to which the packets are forwarded. In 813 another example, a live broadcast is interrupted by pre-encoded 814 content, such as commercials, from time to time. The first intra 815 picture of a pre-encoded clip is transmitted in advance to ensure 816 that it is readily available in the receiver. When transmitting 817 the first intra picture, the originator does not exactly know how 818 many NAL units will be encoded before the first intra picture of 819 the pre-encoded clip follows in decoding order. Thus, the values 820 of DON for the NAL units of the first intra picture of the pre- 821 encoded clip have to be estimated when they are transmitted, and 822 gaps in values of DON may occur. 824 4.7 Aggregation Packets 826 Aggregation packets are the NAL unit aggregation scheme of this 827 payload specification. The scheme is introduced to reflect the 828 dramatically different MTU sizes of two key target networks: 829 wireline IP networks (with an MTU size that is often limited by the 830 Ethernet MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU- 831 T H.324/M) based wireless communication systems with preferred 832 transmission unit sizes of 254 bytes or less. To prevent media 833 transcoding between the two worlds, and to avoid undesirable 834 packetization overhead, a NAL unit aggregation scheme is introduced. 836 The Single-time aggregation packet (STAP) is defined by this 837 specification: 839 o Single-time aggregation packet (STAP): aggregates NAL units with 840 identical NALU-time. Two types of STAPs are defined, one without 841 DON (STAP-A) and another including DON (STAP-B). 843 Each NAL unit to be carried in an aggregation packet is encapsulated 844 in an aggregation unit. The structure of the RTP payload format for 845 aggregation packets is presented in Figure 2. 847 0 1 2 3 848 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 850 |F|NRI| Type | | 851 +-+-+-+-+-+-+-+-+ | 852 | | 853 | one or more aggregation units | 854 | | 855 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 856 | :...OPTIONAL RTP padding | 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 859 Figure 2 RTP payload format for aggregation packets 861 STAPs do have the following packetization rules: The type field of 862 the NAL unit type octet MUST be set to the appropriate value for 863 STAP, as indicated in Table 2. The F bit MUST be cleared if all F 864 bits of the aggregated NAL units are zero; otherwise, it MUST be 865 set. The value of NRI MUST be the maximum of all the NAL units 866 carried in the aggregation packet. 868 The marker bit in the RTP header is set to the value that the marker 869 bit of the last NAL unit of the aggregated packet would have if it 870 were transported in its own RTP packet. 872 The payload of an aggregation packet consists of one or more 873 aggregation units. See sections 4.7.1 for the single time 874 aggregation unit. An aggregation packet can carry as many 875 aggregation units as necessary; however, the total amount of data in 876 an aggregation packet obviously MUST fit into an IP packet, and the 877 size SHOULD be chosen so that the resulting IP packet is smaller 878 than the MTU size. An aggregation packet MUST NOT contain 879 fragmentation units specified in section 4.8. Aggregation packets 880 MUST NOT be nested; i.e., an aggregation packet MUST NOT contain 881 another aggregation packet. 883 4.7.1 Single Time Aggregation Packet (STAP) 885 Single-time aggregation packet (STAP) SHOULD be used whenever NAL 886 units are aggregated that all share the same NALU-time. The payload 887 of an STAP consists of at least one single-time aggregation unit, as 888 presented in Figure 3. The payload of an STAP-B consists of a 16-bit 889 unsigned decoding order number (DON) (in network byte order) 890 followed by at least one single-time aggregation unit, as presented 891 in Figure 4. 893 0 1 2 3 894 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 895 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 896 : | 897 +-+-+-+-+-+-+-+-+ | 898 | | 899 | single-time aggregation units | 900 | | 901 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 902 | : 903 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 905 Figure 3 Payload format for STAP-A 907 0 1 2 3 908 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 909 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 910 : decoding order number (DON) | | 911 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 912 | | 913 | single-time aggregation units | 914 | | 915 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 916 | : 917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 919 Figure 4 Payload format for STAP-B 921 The DON field specifies the value of DON for the first NAL unit in 922 an STAP-B in transmission order. For each successive NAL unit in 923 appearance order in an STAP-B, the value of DON is equal to (the 924 value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in 925 which '%' stands for the modulo operation. 927 A single-time aggregation unit consists of 16-bit unsigned size 928 information (in network byte order) that indicates the size of the 929 following NAL unit in bytes (excluding these two octets, but 930 including the NAL unit type octet of the NAL unit), followed by the 931 NAL unit itself, including its NAL unit type byte. A single-time 932 aggregation unit is byte aligned within the RTP payload, but it may 933 not be aligned on a 32-bit word boundary. Figure 5 presents the 934 structure of the single-time aggregation unit. 936 0 1 2 3 937 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 939 : NAL unit size | | 940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 941 | | 942 | NAL unit | 943 | | 944 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 945 | : 946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 948 Figure 5 Structure for single-time aggregation unit (STAU) 950 Figure 6 presents an example of an RTP packet that contains an STAP- 951 A. The STAP-A contains two single-time aggregation units, labeled 952 as 1 and 2 in the figure. 954 0 1 2 3 955 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 957 | RTP Header | 958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 959 |STAP NAL HDR | NALU 1 Size | NALU 1 HDR | 960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 961 | NALU 1 HDR | NALU 1 Data | 962 +-+-+-+-+-+-+-+-+ | 963 : : 964 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 965 | | NALU 2 Size | NALU 2 HDR | 966 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 967 | NALU 2 HDR | NALU 2 Data | 968 +-+-+-+-+-+-+-+-+ : 969 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 970 | :...OPTIONAL RTP padding | 971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 Figure 6 An example of an RTP packet including an STAP-A containing 973 two single-time aggregation units 975 Figure 7 presents an example of an RTP packet that contains an STAP- 976 B. The STAP contains two single-time aggregation units, labeled as 977 1 and 2 in the figure. 979 0 1 2 3 980 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 981 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 982 | RTP Header | 983 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 984 |STAP-B NAL HDR | DON | NALU 1 Size | 985 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 986 | NALU 1 Size | NALU 1 HDR | NALU 1 Data | 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 988 : : 989 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 990 | | NALU 2 Size | NALU 2 HDR | 991 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 992 | NALU 2 HDR | NALU 2 Data | 993 +-+-+-+-+-+-+-+-+ : 994 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 995 | :...OPTIONAL RTP padding | 996 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 998 Figure 7 An example of an RTP packet including an STAP-B containing 999 two single-time aggregation units 1001 4.8 Fragmentation Units (FUs) 1003 This payload type allows fragmenting a NAL unit into several RTP 1004 packets. Doing so on the application layer instead of relying on 1005 lower layer fragmentation (e.g., by IP) may have the following use 1006 cases: 1008 o The payload format is capable of transporting NAL units bigger 1009 than 64 kbytes over an IPv4 network that may be present in pre- 1010 recorded video, particularly in High Definition formats (there is 1011 a limit of the number of slices per picture, which results in a 1012 limit of NAL units per picture, which may result in big NAL 1013 units). 1015 o The fragmentation mechanism allows fragmenting a single NAL unit 1016 and applying generic forward error correction. 1018 Fragmentation is defined only for a single NAL unit and not for any 1019 aggregation packets. A fragment of a NAL unit consists of an 1020 integer number of consecutive octets of that NAL unit. Each octet 1021 of the NAL unit MUST be part of exactly one fragment of that NAL 1022 unit. Fragments of the same NAL unit MUST be sent in consecutive 1023 order with ascending RTP sequence numbers (with no other RTP packets 1024 within the same RTP packet stream being sent between the first and 1025 last fragment). Similarly, a NAL unit MUST be reassembled in RTP 1026 sequence number order. 1028 When a NAL unit is fragmented and conveyed within fragmentation 1029 units (FUs), it is referred to as a fragmented NAL unit. STAPs MUST 1030 NOT be fragmented. FUs MUST NOT be nested; i.e., an FU MUST NOT 1031 contain another FU. 1033 The RTP timestamp of an RTP packet carrying an FU is set to the 1034 NALU-time of the fragmented NAL unit. 1036 Figure 8 presents the RTP payload format for FU-A. An FU-A consists 1037 of a fragmentation unit indicator of one octet, a fragmentation unit 1038 header of one octet, and a fragmentation unit payload. 1040 0 1 2 3 1041 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1042 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1043 | FU NAL HDR | FU header | | 1044 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1045 | | 1046 | FU payload | 1047 | | 1048 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1049 | :...OPTIONAL RTP padding | 1050 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1052 Figure 8 RTP payload format for FU-A 1054 Figure 9 presents the RTP payload format for FU-Bs. An FU-B 1055 consists of a fragmentation unit indicator of one octet, a 1056 fragmentation unit header of one octet, a decoding order number 1057 (DON) (in network byte order), and a fragmentation unit payload. In 1058 other words, the structure of FU-B is the same as the structure of 1059 FU-A, except for the additional DON field. 1061 0 1 2 3 1062 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1063 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1064 | FU indicator | FU header | DON | 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1066 | | 1067 | FU payload | 1068 | | 1069 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1070 | :...OPTIONAL RTP padding | 1071 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1073 Figure 9 RTP payload format for FU-B 1075 NAL unit type FU-B MUST be used in the interleaved packetization 1076 mode for the first fragmentation unit of a fragmented NAL unit. NAL 1077 unit type FU-B MUST NOT be used in any other case. In other words, 1078 in the interleaved packetization mode, each NALU that is fragmented 1079 has an FU-B as the first fragment, followed by one or more FU-A 1080 fragments. 1082 The FU NAL HDR octet has the following format: 1084 +---------------+ 1085 |0|1|2|3|4|5|6|7| 1086 +-+-+-+-+-+-+-+-+ 1087 |F|N| Type | 1088 +---------------+ 1090 A value equal to 26 in the Type field of the FU indicator octet 1091 identifies an FU-A packet and a value of 27 identifies an FU-B 1092 packet. The use of the F bit is described in section 5. The value 1093 of the N field MUST be set according to the value of the N field in 1094 the fragmented NAL unit. 1096 The FU header has the following format: 1098 +---------------+ 1099 |0|1|2|3|4|5|6|7| 1100 +-+-+-+-+-+-+-+-+ 1101 |S|E| Type | 1102 +---------------+ 1104 S: 1 bit 1105 When set to one, the Start bit indicates the start of a 1106 fragmented NAL unit. When the following FU payload is not the 1107 start of a fragmented NAL unit payload, the Start bit is set to 1108 zero. 1110 E: 1 bit 1111 When set to one, the End bit indicates the end of a fragmented 1112 NAL unit, i.e., the last byte of the payload is also the last 1113 byte of the fragmented NAL unit. When the following FU payload 1114 is not the last fragment of a fragmented NAL unit, the End bit is 1115 set to zero. 1117 Type: 6 bits 1118 The NAL unit payload type as defined in Table 7-1 of [HEVC]. 1120 The value of DON in FU-Bs is selected as described in section 4.6. 1122 Informative note: The DON field in FU-Bs allows gateways to 1123 fragment NAL units to FU-Bs without organizing the incoming NAL 1124 units to the NAL unit decoding order. 1126 A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the 1127 Start bit and End bit MUST NOT both be set to one in the same FU 1128 header. 1130 The FU payload consists of fragments of the payload of the 1131 fragmented NAL unit so that if the fragmentation unit payloads of 1132 consecutive FUs are sequentially concatenated, the payload of the 1133 fragmented NAL unit can be reconstructed. The NAL unit type octet 1134 of the fragmented NAL unit is not included as such in the 1135 fragmentation unit payload, but rather the information of the NAL 1136 unit type octet of the fragmented NAL unit is conveyed in F and N 1137 fields of the FU indicator octet of the fragmentation unit and in 1138 the type field of the FU header. An FU payload MAY have any number 1139 of octets and MAY be empty. 1141 If a fragmentation unit is lost, the receiver SHOULD discard all 1142 following fragmentation units in transmission order corresponding to 1143 the same fragmented NAL unit. 1145 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1146 fragments of a NAL unit to an (incomplete) NAL unit, even if 1147 fragment n of that NAL unit is not received. In this case, the 1148 forbidden_zero_bit of the NAL unit MUST be set to one to indicate a 1149 syntax violation. 1151 5. Packetization Rules 1153 The packetization modes are introduced in section 4.5. The 1154 packetization rules common to more than one of the packetization 1155 modes are specified in section 5.1. The packetization rules for the 1156 non-interleaved mode are specified in section 5.2, and the 1157 packetization rules for the interleaved mode are specified in 1158 sections 5.3. 1160 5.1 Common Packetization Rules 1162 All senders MUST enforce the following packetization rules 1163 regardless of the packetization mode in use: 1165 o VCL NAL units belonging to the same coded picture (and thus 1166 sharing the same RTP timestamp value) SHOULD be sent in their 1167 original decoding order to minimize the delay. Note that the 1168 decoding order is the order of the NAL units in the bitstream. 1170 o Parameter sets are handled in accordance with the rules and 1171 recommendations given in section 7.4. 1173 o MANEs MUST NOT duplicate any NAL unit except for sequence or 1174 picture parameter set NAL units, as neither this memo nor the 1175 HEVC specification provides means to identify duplicated NAL 1176 units. Sequence and picture parameter set NAL units MAY be 1177 duplicated to make their correct reception more probable, but any 1178 such duplication MUST NOT affect the contents of any active 1179 sequence or picture parameter set. Duplication SHOULD be 1180 performed on the application layer and not by duplicating RTP 1181 packets (with identical sequence numbers). 1183 Senders using the non-interleaved mode and the interleaved mode MUST 1184 enforce the following packetization rule: 1186 o MANEs MAY convert single NAL unit packets into one aggregation 1187 packet, convert an aggregation packet into several single NAL 1188 unit packets, or mix both concepts, in an RTP translator. The 1189 RTP translator SHOULD take into account at least the following 1190 parameters: path MTU size, unequal protection mechanisms (e.g., 1191 through packet-based FEC according to [RFC5109], especially for 1192 sequence and picture parameter set NAL units and coded slice data 1193 partition A NAL units), bearable latency of the system, and 1194 buffering capabilities of the receiver. 1196 Informative note: An RTP translator is required to handle RTCP 1197 as per [RFC3550]. 1199 5.2 Non-Interleaved mode 1201 This mode MUST be supported. This mode is in use when the value of 1202 the OPTIONAL packetization-mode media type parameter is equal to 1. 1203 It is primarily intended for low-delay applications. Only single 1204 NAL unit packets, STAPs, and FUs MAY be used in this mode. The 1205 transmission order of NAL units MUST comply with the NAL unit 1206 decoding order. 1208 5.3 Interleaved mode 1210 This mode is in use when the value of the OPTIONAL packetization- 1211 mode media type parameter is equal to 2. Some receivers MAY support 1212 this mode. STAP-Bs, FU-As, and FU-Bs MAY be used. STAP-As and 1213 single NAL unit packets MUST NOT be used. The transmission order of 1214 packets and NAL units is constrained as specified in section 4.6. 1216 6. De-Packetization Process 1218 The de-packetization process is implementation dependent. 1219 Therefore, the following description should be seen as an example of 1220 a suitable implementation. Other schemes may be used as well as 1221 long as the output for the same input is the same as the process 1222 described below. The output is the same meaning that the number of 1223 NAL units and their order are both the identical. Optimizations 1224 relative to the described algorithms are likely possible. Section 1225 6.1 presents the de-packetization process for the non-interleaved 1226 packetization mode and section 6.2 presents the de-packetization 1227 process for the interleaved packetization mode. 1229 All normal RTP mechanisms related to buffer management apply. In 1230 particular, duplicated or outdated RTP packets (as indicated by the 1231 RTP sequences number and the RTP timestamp) are removed. To 1232 determine the exact time for decoding, factors such as a possible 1233 intentional delay to allow for proper inter-stream synchronization 1234 must be factored in. 1236 6.1 Non-Interleaved Mode 1238 The receiver includes a receiver buffer to compensate for 1239 transmission delay jitter. The receiver stores incoming packets in 1240 reception order into the receiver buffer. Packets are de-packetized 1241 in RTP sequence number order. If a de-packetized packet is a single 1242 NAL unit packet, the NAL unit contained in the packet is passed 1243 directly to the decoder. If a de-packetized packet is an STAP-A, 1244 the NAL units contained in the packet are passed to the decoder in 1245 the order in which they are encapsulated in the packet. For all the 1246 FU-A packets containing fragments of a single NAL unit, the de- 1247 packetized fragments are concatenated in their sending order to 1248 recover the NAL unit, which is then passed to the decoder. 1250 6.2 Interleaved Mode 1252 The general concept behind these de-packetization rules is to 1253 reorder NAL units from transmission order to the NAL unit decoding 1254 order. 1256 The receiver includes a receiver buffer, which is used to compensate 1257 for transmission delay jitter and to reorder NAL units from 1258 transmission order to the NAL unit decoding order. In this section, 1259 the receiver operation is described under the assumption that there 1260 is no transmission delay jitter. To make a difference from a 1261 practical receiver buffer that is also used for compensation of 1262 transmission delay jitter, the receiver buffer is here after called 1263 the de-interleaving buffer in this section. Receivers SHOULD also 1264 prepare for transmission delay jitter; i.e., either reserve separate 1265 buffers for transmission delay jitter buffering and de-interleaving 1266 buffering or use a receiver buffer for both transmission delay 1267 jitter and de-interleaving. Moreover, receivers SHOULD take 1268 transmission delay jitter into account in the buffering operation; 1269 e.g., by additional initial buffering before starting of decoding 1270 and playback. 1272 This section is organized as follows: subsection 6.2.1 presents how 1273 to calculate the size of the de-interleaving buffer. Subsection 1274 6.2.2 specifies the receiver process how to organize received NAL 1275 units to the NAL unit decoding order. 1277 6.2.1 Size of the De-interleaving Buffer 1279 When the SDP Offer/Answer model or any other capability exchange 1280 procedure is used in session setup, the properties of the received 1281 stream SHOULD be such that the receiver capabilities are not 1282 exceeded. In the SDP Offer/Answer model, the receiver can indicate 1283 its capabilities to allocate a de-interleaving buffer with the 1284 deint-buf-cap media type parameter. The sender indicates the 1285 requirement for the de-interleaving buffer size with the sprop- 1286 deint-buf-req media type parameter. It is therefore RECOMMENDED to 1287 set the de-interleaving buffer size, in terms of number of bytes, 1288 equal to or greater than the value of sprop-deint-buf-req media type 1289 parameter. See section 8.1 for further information on deint-buf-cap 1290 and sprop-deint-buf-req media type parameters and section 8.2.2 for 1291 further information on their use in the SDP Offer/Answer model. 1293 When a declarative session description is used in session setup, the 1294 sprop-deint-buf-req media type parameter signals the requirement for 1295 the de-interleaving buffer size. It is therefore RECOMMENDED to set 1296 the de-interleaving buffer size, in terms of number of bytes, equal 1297 to or greater than the value of sprop-deint-buf-req media type 1298 parameter. 1300 6.2.2 De-interleaving Process 1302 There are two buffering states in the receiver: initial buffering 1303 and buffering while playing. Initial buffering occurs when the RTP 1304 session is initialized. After initial buffering, decoding and 1305 playback are started, and the buffering-while-playing mode is used. 1307 Regardless of the buffering state, the receiver stores incoming NAL 1308 units, in reception order, in the de-interleaving buffer as follows. 1309 NAL units of aggregation packets are stored in the de-interleaving 1310 buffer individually. The value of DON is calculated and stored for 1311 each NAL unit. 1313 The receiver operation is described below with the help of the 1314 following functions and constants: 1316 o Function AbsDON is specified in section 7.1. 1318 o Function don_diff is specified in section 4.6. 1320 o Constant N is the value of the OPTIONAL sprop-interleaving-depth 1321 media type type parameter (see section 7.1) incremented by 1. 1323 Initial buffering lasts until one of the following conditions is 1324 fulfilled: 1326 o There are N or more VCL NAL units in the de-interleaving buffer. 1328 o If sprop-max-don-diff is present, don_diff(m,n) is greater than 1329 the value of sprop-max-don-diff, in which n corresponds to the 1330 NAL unit having the greatest value of AbsDON among the received 1331 NAL units and m corresponds to the NAL unit having the smallest 1332 value of AbsDON among the received NAL units. 1334 o Initial buffering has lasted for the duration equal to or greater 1335 than the value of the OPTIONAL sprop-init-buf-time media type 1336 parameter. 1338 The NAL units to be removed from the de-interleaving buffer are 1339 determined as follows: 1341 o If the de-interleaving buffer contains at least N VCL NAL units, 1342 NAL units are removed from the de-interleaving buffer and passed 1343 to the decoder in the order specified below until the buffer 1344 contains N-1 VCL NAL units. 1346 o If sprop-max-don-diff is present, all NAL units m for which 1347 don_diff(m,n) is greater than sprop-max-don-diff are removed from 1348 the de-interleaving buffer and passed to the decoder in the order 1349 specified below. Herein, n corresponds to the NAL unit having 1350 the greatest value of AbsDON among the NAL units in the de- 1351 interleaving buffer. 1353 The order in which NAL units are passed to the decoder is specified 1354 as follows: 1356 o Let PDON be a variable that is initialized to 0 at the beginning 1357 of the RTP session. 1359 o For each NAL unit associated with a value of DON, a DON distance 1360 is calculated as follows. If the value of DON of the NAL unit is 1361 larger than the value of PDON, the DON distance is equal to DON - 1362 PDON. Otherwise, the DON distance is equal to 65535 - PDON + DON 1363 + 1. 1365 o NAL units are delivered to the decoder in ascending order of DON 1366 distance. If several NAL units share the same value of DON 1367 distance, they can be passed to the decoder in any order. 1369 o When a desired number of NAL units have been passed to the 1370 decoder, the value of PDON is set to the value of DON for the 1371 last NAL unit passed to the decoder. 1373 6.3 Additional De-Packetization Guidelines 1375 The following additional de-packetization rules may be used to 1376 implement an operational HEVC de-packetizer: 1378 o Intelligent RTP receivers (e.g., in gateways) may identify lost 1379 FUs. If a lost FU is found, a gateway may decide not to send the 1380 following FUs of the same fragmented NAL unit, as their 1381 information is meaningless for HEVC decoders. In this way a MANE 1382 can reduce network load by discarding useless packets without 1383 parsing a complex bitstream. 1385 o Intelligent receivers having to discard packets or NALUs should 1386 first discard all packets/NALUs in which the value of the NRI 1387 field of the NAL unit type octet is equal to 0. This will 1388 minimize the impact on user experience and keep the reference 1389 pictures intact. If more packets have to be discarded, then 1390 packets with a NRI value equal to zero may be discarded before 1391 packets with a a higher NRI value. However, discarding any 1392 packets with an NRI not equal to zero very likely leads to 1393 decoder drift and SHOULD be avoided. 1395 7. Payload Format Parameters 1397 This section specifies the parameters that MAY be used to select 1398 optional features of the payload format and certain features of the 1399 bitstream. The parameters are specified here as part of the media 1400 type registration for the HEVC codec. A mapping of the parameters 1401 into the Session Description Protocol (SDP) [RFC4566] is also 1402 provided for applications that use SDP. Equivalent parameters could 1403 be defined elsewhere for use with control protocols that do not use 1404 SDP. 1406 Some parameters provide a receiver with the properties of the stream 1407 that will be sent. The names of all these parameters start with 1408 "sprop" for stream properties. Some of these "sprop" parameters are 1409 limited by other payload or codec configuration parameters. For 1410 example, the sprop-parameter-sets parameter is constrained by the 1411 profile-level-id parameter. The media sender selects all "sprop" 1412 parameters rather than the receiver. This uncommon characteristic 1413 of the "sprop" parameters may be incompatible with some signaling 1414 protocol concepts, in which case the use of these parameters SHOULD 1415 be avoided. 1417 7.1 Media Type Registration 1419 The media subtype for the HEVC codec is allocated from the IETF 1420 tree. 1422 The receiver MUST ignore any unspecified parameter. 1424 Media Type name: video 1426 Media subtype name: H265 1428 Required parameters: none 1430 OPTIONAL parameters: 1432 In the following definitions of parameters, "the stream" or "the 1433 NAL unit stream" refers to all NAL units conveyed in the current 1434 RTP session in SST, and all NAL units conveyed in the current RTP 1435 session and all NAL units conveyed in other RTP sessions that the 1436 current RTP session depends on in MST. 1438 profile-level-id: 1440 TBD 1442 sprop-parameter-sets: 1444 TBD 1446 max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br: 1447 TBD 1449 max-mbps: 1450 TBD 1452 max-smbps: 1453 TBD 1455 max-fs: 1456 TBD 1458 max-cpb: 1459 TBD 1461 max-dpb: 1462 TBD 1464 max-br: 1465 TBD 1467 redundant-pic-cap: 1468 TBD 1470 sprop-level-parameter-sets: 1471 TBD 1473 use-level-src-parameter-sets: 1474 TBD 1476 packetization-mode: 1477 This parameter signals the properties of an RTP payload type 1478 or the capabilities of a receiver implementation. Only a 1479 single configuration point can be indicated; thus, when 1480 capabilities to support more than one packetization-mode are 1481 declared, multiple configuration points (RTP payload types) 1482 must be used. 1484 When the value of packetization-mode is equal to 1, the non- 1485 interleaved mode, as defined in section 5.2 MUST be used. 1486 When the value of packetization-mode is equal to 2, the 1487 interleaved mode, as defined in section 5.3, MUST be used. 1488 The value of packetization-mode MUST be an integer in the 1489 range of 1 to 2, inclusive. 1491 sprop-interleaving-depth: 1492 This parameter MUST NOT be present when packetization-mode is 1493 not present or the value of packetization-mode is equal to 0 1494 or 1. This parameter MUST be present when the value of 1495 packetization-mode is equal to 2. 1497 This parameter signals the properties of an RTP packet stream. 1498 It specifies the maximum number of VCL NAL units that precede 1499 any VCL NAL unit in the RTP packet stream in transmission 1500 order and follow the VCL NAL unit in decoding order. 1501 Consequently, it is guaranteed that receivers can reconstruct 1502 NAL unit decoding order when the buffer size for NAL unit 1503 decoding order recovery is at least the value of sprop- 1504 interleaving-depth + 1 in terms of VCL NAL units. 1506 The value of sprop-interleaving-depth MUST be an integer in 1507 the range of 0 to 32767, inclusive. 1509 sprop-deint-buf-req: 1510 This parameter MUST NOT be present when packetization-mode is 1511 not present or the value of packetization-mode is not equal to 1512 2. It MUST be present when the value of packetization-mode is 1513 equal to 2. 1515 sprop-deint-buf-req signals the required size of the de- 1516 interleaving buffer for the RTP packet stream. The value of 1517 the parameter MUST be greater than or equal to the maximum 1518 buffer occupancy (in units of bytes) required in such a de- 1519 interleaving buffer that is specified in section 6.2. It is 1520 guaranteed that receivers can perform the de-interleaving of 1521 interleaved NAL units into NAL unit decoding order, when the 1522 de-interleaving buffer size is at least the value of sprop- 1523 deint-buf-req in terms of bytes. 1525 The value of sprop-deint-buf-req MUST be an integer in the 1526 range of 0 to 4294967295, inclusive. 1528 Informative note: sprop-deint-buf-req indicates the 1529 required size of the de-interleaving buffer only. When 1530 network jitter can occur, an appropriately sized jitter 1531 buffer has to be provisioned for as well. 1533 deint-buf-cap: 1534 This parameter signals the capabilities of a receiver 1535 implementation and indicates the amount of de-interleaving 1536 buffer space in units of bytes that the receiver has available 1537 for reconstructing the NAL unit decoding order. A receiver is 1538 able to handle any stream for which the value of the sprop- 1539 deint-buf-req parameter is smaller than or equal to this 1540 parameter. 1542 If the parameter is not present, then a value of 0 MUST be 1543 used for deint-buf-cap. The value of deint-buf-cap MUST be an 1544 integer in the range of 0 to 4294967295, inclusive. 1546 Informative note: deint-buf-cap indicates the maximum 1547 possible size of the de-interleaving buffer of the receiver 1548 only. When network jitter can occur, an appropriately 1549 sized jitter buffer has to be provisioned for as well. 1551 sprop-init-buf-time: 1552 This parameter MAY be used to signal the properties of an RTP 1553 packet stream. The parameter MUST NOT be present, if the 1554 value of packetization-mode is equal to 1. 1556 The parameter signals the initial buffering time that a 1557 receiver MUST wait before starting decoding to recover the NAL 1558 unit decoding order from the transmission order. The 1559 parameter is the maximum value of (decoding time of the NAL 1560 unit - transmission time of a NAL unit), assuming reliable and 1561 instantaneous transmission, the same timeline for transmission 1562 and decoding, and that decoding starts when the first packet 1563 arrives. 1565 An example of specifying the value of sprop-init-buf-time 1566 follows. A NAL unit stream is sent in the following 1567 interleaved order, in which the value corresponds to the 1568 decoding time and the transmission order is from left to 1569 right: 1571 0 2 1 3 5 4 6 8 7 ... 1573 Assuming a steady transmission rate of NAL units, the 1574 transmission times are: 1576 0 1 2 3 4 5 6 7 8 ... 1578 Subtracting the decoding time from the transmission time 1579 column-wise results in the following series: 1581 0 -1 1 0 -1 1 0 -1 1 ... 1583 Thus, in terms of intervals of NAL unit transmission times, 1584 the value of sprop-init-buf-time in this example is 1. The 1585 parameter is coded as a non-negative base10 integer 1586 representation in clock ticks of a 90-kHz clock. If the 1587 parameter is not present, then no initial buffering time value 1588 is defined. Otherwise the value of sprop-init-buf-time MUST 1589 be an integer in the range of 0 to 4294967295, inclusive. 1591 In addition to the signaled sprop-init-buf-time, receivers 1592 SHOULD take into account the transmission delay jitter 1593 buffering, including buffering for the delay jitter caused by 1594 mixers, translators, gateways, proxies, traffic-shapers, and 1595 other network elements. 1597 sprop-max-don-diff: 1598 This parameter MAY be used to signal the properties of an RTP 1599 packet stream. It MUST NOT be used to signal transmitter or 1600 receiver or codec capabilities. The parameter MUST NOT be 1601 present if the value of packetization-mode is equal to 1. 1602 sprop-max-don-diff is an integer in the range of 0 to 32767, 1603 inclusive. If sprop-max-don-diff is not present, the value of 1604 the parameter is unspecified. sprop-max-don-diff is 1605 calculated as follows: 1607 sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)}, 1608 for any i and any j>i, 1610 where i and j indicate the index of the NAL unit in the 1611 transmission order and AbsDON denotes a decoding order number 1612 of the NAL unit that does not wrap around to 0 after 65535. 1613 In other words, AbsDON is calculated as follows: Let m and n 1614 be consecutive NAL units in transmission order. For the very 1615 first NAL unit in transmission order (whose index is 0), 1616 AbsDON(0) = DON(0). For other NAL units, AbsDON is calculated 1617 as follows: 1619 If DON(m) == DON(n), AbsDON(n) = AbsDON(m) 1621 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 1622 AbsDON(n) = AbsDON(m) + DON(n) - DON(m) 1624 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 1625 AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n) 1627 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 1628 AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n)) 1630 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 1631 AbsDON(n) = AbsDON(m) - (DON(m) - DON(n)) 1633 where DON(i) is the decoding order number of the NAL unit 1634 having index i in the transmission order. The decoding order 1635 number is specified in section 4.6. 1637 Informative note: Receivers may use sprop-max-don-diff to 1638 trigger which NAL units in the receiver buffer can be 1639 passed to the decoder. 1641 max-rcmd-nalu-size: 1642 TBD 1644 sar-understood: 1645 TBD 1647 sar-supported: 1648 TBD 1650 Encoding considerations: 1651 This type is only defined for transfer via RTP (RFC 3550). 1653 Security considerations: 1654 See Section 8 of RFC XXXX. 1656 Public specification: 1657 Please refer to Section 13 of RFC XXXX. 1659 Additional information: 1660 None 1662 File extensions: none 1664 Macintosh file type code: none 1666 Object identifier or OID: none 1668 Person & email address to contact for further information: 1670 Thomas Schierl, ts@thomas-schierl.de 1672 Intended usage: COMMON 1674 Author: 1676 Thomas Schierl, ts@thomas-schierl.de 1678 Change controller: 1679 IETF Audio/Video Transport Payloads working group delegated 1680 from the IESG. 1682 7.2 SDP Parameters 1684 7.2.1 Mapping of Payload Type Parameters to SDP 1686 TBD 1688 7.2.2 Usage with the SDP Offer/Answer Model 1690 The media type video/H265 string is mapped to fields in the Session 1691 Description Protocol (SDP) [RFC4566] as follows: 1693 o The media name in the "m=" line of SDP MUST be video. 1695 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the 1696 media subtype). 1698 o The clock rate in the "a=rtpmap" line MUST be 90000. 1700 o The OPTIONAL parameters "profile-level-id", "packetization-mode", 1701 when present, MUST be included in the "a=fmtp" line of SDP. 1702 These parameters are expressed as a media type string, in the 1703 form of a semicolon separated list of parameter=value pairs. 1705 o The OPTIONAL parameters "sprop-parameter-sets" and "sprop-level- 1706 parameter-sets", when present, MUST be included in the "a=fmtp" 1707 line of SDP or conveyed using the "fmtp" source attribute as 1708 specified in section 6.3 of [RFC5576]. For a particular media 1709 format (i.e., RTP payload type), a "sprop-parameter-sets" or 1710 "sprop-level-parameter-sets" MUST NOT be both included in the 1711 "a=fmtp" line of SDP and conveyed using the "fmtp" source 1712 attribute. When included in the "a=fmtp" line of SDP, these 1713 parameters are expressed as a media type string, in the form of a 1714 semicolon separated list of parameter=value pairs. When conveyed 1715 using the "fmtp" source attribute, these parameters are only 1716 associated with the given source and payload type as parts of the 1717 "fmtp" source attribute. 1719 Informative note: Conveyance of "sprop-parameter-sets" and 1720 "sprop-level-parameter-sets" using the "fmtp" source attribute 1721 allows for out-of-band transport of parameter sets in 1722 topologies like Topo-Video-switch-MCU [TBD]. 1724 An example of media representation in SDP is as follows: 1726 m=video 49170 RTP/AVP 98 1727 a=rtpmap:98 H265/90000 1728 a=fmtp:98 profile-level-id=UVWXYZ; 1729 packetization-mode=1; 1730 sprop-parameter-sets= 1732 7.2.3 Usage with SDP Offer/Answer Model 1734 TBD 1736 7.2.4 Usage in Declarative Session Descriptions 1738 TBD 1740 7.2.5 Signaling of Parallel Processing 1742 TBD 1744 7.3 Examples 1746 TBD. 1748 7.4 Parameter Set Considerations 1750 TBD 1752 8. Security Considerations 1754 TBD 1756 9. Congestion Control 1758 TBD 1760 10. IANA Consideration 1762 A new media type, as specified in Section 7.1 of this memo, should 1763 be registered with IANA. 1765 11. Informative Appendix: Application Examples 1767 11.1 Introduction 1769 TBD 1771 11.2 Streaming 1773 TBD 1775 11.3 Videoconferencing (Unicast to MANE, Unicast to Endpoints) 1777 TBD 1779 11.4 Mobile TV (Multicast to MANE, Unicast to Endpoint) 1781 TBD 1783 12. Acknowledgements 1785 TBD 1787 This document was prepared using 2-Word-v2.0.template.dot. 1789 13. References 1791 13.1 Normative References 1793 [HEVC] JCT-VC, "High-Efficiency Video Coding (HEVC) text 1794 specification Working Draft 6", JCTVC-H1003, February 1795 2012. 1797 [H.264] ITU-T Recommendation H.264, "Advanced video coding for 1798 generic audiovisual services", March 2010. 1800 [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP 1801 Payload Format for H.264 Video", RFC 6184, May 2011. 1803 [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A. 1804 Eleftheriadis, "RTP Payload Format for Scalable Video 1805 Coding", RFC 6190, May 2011. 1807 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1808 Requirement Levels", BCP 14, RFC 2119, March 1997. 1810 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1811 With Session Description Protocol (SDP)", RFC 3264, June 1812 2002. 1814 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 1815 Encodings", RFC 4648, October 2006. 1817 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, 1818 V., "RTP: A Transport Protocol for Real-Time 1819 Applications", STD 64, RFC 3550, July 2003. 1821 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 1822 Description Protocol", RFC 4566, July 2006. 1824 [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific 1825 Media Attributes in the Session Description Protocol", RFC 1826 5576, June 2009. 1828 13.2 Informative References 1830 [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error 1831 Correction", RFC 5109, December 2007. 1833 14. Authors' Addresses 1835 Thomas Schierl 1836 Fraunhofer HHI 1837 Einsteinufer 37 1838 D-10587 Berlin 1839 Germany 1840 Phone: +49-30-31002-227 1841 EMail: ts@thomas-schierl.de 1843 Stephan Wenger 1844 Vidyo, Inc. 1845 433 Hackensack Ave., 7th floor 1846 Hackensack, N.J. 07601 1847 USA 1848 Phone: +1-415-713-5473 1849 EMail: stewe@stewe.org 1851 Ye-Kui Wang 1852 Qualcomm Incorporated 1853 5775 Morehouse Drive 1854 San Diego, CA 92121 1855 USA 1856 Phone: +1-858-651-8345 1857 EMail: yekuiw@qualcomm.com 1859 Miska M. Hannuksela 1860 Nokia Corporation 1861 P.O. Box 1000 1862 33721 Tampere 1863 Finland 1864 Phone: +358-7180-08000 1865 EMail: miska.hannuksela@nokia.com