idnits 2.17.1 draft-ietf-avt-rtp-pfap-00.txt: ** The Abstract section seems to be numbered -(83): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(84): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(182): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(188): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(192): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(253): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == There are 25 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([2]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2001) is 8229 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '4' is defined on line 556, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' == Outdated reference: A later version (-04) exists of draft-ietf-avt-mpeg4-multisl-00 -- Possible downref: Normative reference to a draft: ref. '3' ** Obsolete normative reference: RFC 1889 (ref. '4') (Obsoleted by RFC 3550) Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force AVT WG 2 Internet-Draft Ostermann/Rurainsky/Civanlar 3 draft-ietf-avt-rtp-pfap-00.txt AT&T Labs - Research 4 Expires: April 2002 October 2001 6 RTP Payload Format for Phoneme/Facial Animation Parameter (PFAP) 7 Streams 9 1. Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsolete by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 2. Abstract 31 This document describes a Real-Time Transport Protocol (RTP) payload 32 format for transporting phoneme and facial animation parameter (PFAP) 33 streams over the Internet according to the TtsFAPInterface that is 34 defined as an internal interface of an MPEG-4 client in ISO/IEC 35 14496-3 (MPEG-4 Audio, Subpart 6: Text-to-Speech Interface, 36 TtsFAPInterface) [2]. A recovery strategy for loss-tolerant 37 transmission of such streams is described. 39 RTP Payload Format for October 2001 40 Phoneme/Facial Animation Parameter (PFAP) 42 Table of Contents 44 1. Status of this Memo............................................1 45 2. Abstract.......................................................1 46 3. Introduction...................................................3 47 4. Requirements language..........................................4 48 5. The MPEG-4 class TtsFAPInterface...............................4 49 6. Payload Format.................................................6 50 6.1. Packet descriptor...........................................7 51 6.2. Phoneme descriptor..........................................8 52 6.3. FAP descriptor..............................................9 53 6.4. Recovery information, type 1...............................10 54 7. RTP header fields usage:......................................10 55 8. Recovery Strategy.............................................11 56 9. Security Considerations.......................................11 57 10. References....................................................13 58 11. Author's Addresses............................................13 59 RTP Payload Format for October 2001 60 Phoneme/Facial Animation Parameter (PFAP) 62 3. Introduction 64 Animated talking heads based on MPEG-4 [1] may be implemented on a 65 client that renders the head and synthesizes the speech using a Text- 66 to-Speech (TTS) application on the client. The MPEG-4 standard 67 defines only the input interface and two output interfaces for a 68 compliant TTS application. The output interfaces are supposed to be 69 internal to the MPEG-4 client and, thus, no transport protocol is 70 defined related to transmission of the output data. However, advanced 71 TTS servers may need to be implemented on network-based machines and 72 shared by many users. In order to animate talking heads on a client 73 using a network-based TTS server it will be necessary to stream the 74 outputs of the TTS server to the client. 76 The input to an MPEG-4 compliant TTS server is the �MPEG-4 audio 77 text-to-speech payload� [2] defined for transmitting text to a TTS 78 server. The TTS server synthesizes speech as an audio signal from the 79 text. The text may contain bookmarks that enable the control of the 80 talking head with facial animation parameters (FAP) synchronized with 81 the speech. FAPs may define facial expressions like joy and disgust, 82 head orientation and other deformations of flexible parts of the 83 head. Bookmarks do not influence the synthesized speech. The �MPEG-4 84 audio text-to-speech payload� may also transport optional TTS control 85 information like Gender, Age, and Speech_Rate. The �MPEG-4 audio 86 text-to-speech payload� may be transported using the MPEG-4 payload 87 format as specified in [3]. 89 One of the outputs of the TTS server is the audio stream. This audio 90 stream with the related timing information is handed to the 91 compositor of the MPEG-4 client. The compositor enables synchronized 92 playback of MPEG-4 supported media. In a network based TTS server, 93 the compositor will be located at the client side and the audio 94 stream produced by the TTS server needs to be transmitted to the 95 client. Several RTP payload formats for audio streams already exist 96 and may be used in this context. 98 The other output of the TTS server is the TTS markup information. 99 MPEG-4 defines the class TtsFAPInterface that holds the TTS markup 100 information [2]. This class is used to hand the TTS markup 101 information from the TTS server to the face renderer within the 102 compositor of the MPEG-4 client. The TTS markup information enables 103 an MPEG-4 client to create the animation of the talking head such 104 that the head produces visual speech (mainly lip motion) synchronized 105 with the audio. The TTS markup information contains phonemes, 106 bookmarks, and related timing information. 108 A phoneme is the basic spoken unit in a language. Pronouncing a 109 phoneme involves coordinating movements of the lungs, vocal cavities, 110 larynx, lips, tongue, and teeth. The TTS server translates the text 111 to be synthesized into phonemes. Furthermore, the TTS server computes 112 RTP Payload Format for October 2001 113 Phoneme/Facial Animation Parameter (PFAP) 115 the start time and duration of each phoneme in the synthesized 116 speech. 118 A bookmark is the exact copy of the bookmark in the text sent to the 119 TTS server. MPEG-4 specifies that the start time of a FAP in a 120 bookmark is the start time of the first phoneme of the first word 121 following the bookmark of the current sentence. If there is no word 122 after the bookmark in the current sentence, the start time of the FAP 123 is the same as the start time of the last phoneme of the previous 124 word. Hence, the start time of a FAP always coincides with a phoneme. 125 MPEG-4 allows up to 40 consecutive bookmarks that can be used to 126 render complicated expressions. 128 In order to enable networked TTS servers to be used with MPEG-4, a 129 novel payload format for TTS markup information needs to be defined. 131 In this document we define an RTP payload format for transporting 132 Phoneme/FAP (PFAP) streams over the Internet using RTP. The payload 133 format is based on the TtsFAPInterface defined in Subpart 6 of the 134 ISO/IEC International Standard 14496-3 [2] and outlined in Section 5 135 of this document. The payload format includes packet loss recovery 136 information. 138 4. Requirements language 140 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 141 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 142 document are to be interpreted as described in RFC-2119 [5]. 144 5. The MPEG-4 class TtsFAPInterface 146 In this section, we describe the class TtsFAPInterface, its 147 parameters and its usage since it is the basic structure carried by 148 the new payload format proposed in this document. The class 149 TtsFAPInterface is used to hand the TTS markup information from the 150 TTS server to the face renderer within the compositor of the MPEG-4 151 client. This class holds one phoneme and related information, namely 152 PhonemeSymbol, PhonemeDuration, f0Average, Stress, WordBegin, 153 Bookmark, and Starttime. 155 PhonemeSymbol: 156 This field identifies a phoneme using an 8 bit unsigned integer 157 (PhonemeSymbol). A language usually uses around 50 phonemes. 158 Phonemes may be specified by Unicode. Since MPEG-4 uses the 159 class TtsFAPInterface only internally in a client, it does not 160 specify the mapping of a phoneme specified in Unicode to this 8 161 bit PhonemeSymbol. 163 PhonemeDuration: 164 This field identifies the duration of the PhonemeSymbol in units 165 of milliseconds using a 12 bit unsigned integer. 167 RTP Payload Format for October 2001 168 Phoneme/Facial Animation Parameter (PFAP) 170 f0Average: 171 This field defines the frequency of the synthesized audio signal 172 for this phoneme in units of 2 Hz using an 8 bit unsigned 173 integer. 175 Stress: 176 Stress indicates a stressed phoneme using 1 bit. 178 Bookmark: 179 This field is a string that contains one or more bookmarks that 180 are associated with the current PhonemeSymbol. A definition of 181 the bookmark structure is given in [1], Annex C. A bookmark 182 starts with ��. Between the start and end 183 strings of a bookmark, there are four fields defined: n (FAP 184 number 2<=n<=68), FAPfield (see below), T (transition time), and 185 C (time curve for computation of the amplitude during the 186 transition time). 188 In case of n=2, FAPfield holds the four numbers �e1 a1 e2 a2�, 189 with the two facial expressions e1 and e2 and their target 190 amplitudes a1 and a2, respectively. There are six different 191 facial expressions (1<= e1,e2<=6) defined in Annex C of [1]. In 192 case of 3<=n<=68, FAPfield holds only the target amplitude �a� 193 for FAP n. 195 Amplitudes are given in different units. The unit of an 196 amplitude is determined by the FAP n. The maximum value of the 197 amplitude is signed 2529600. It may be reached for head and eye 198 rotations. In these cases, the unit is AU (Angle Units, 0.00001 199 RAD), and the maximum value corresponds to 25.296 RAD. 201 There are no limits on the transition time T specified in ms. 203 The field C can be 1, 2, or 3, which is an identifier for a time 204 curve equation defined in [1], Annex C. The time curve describes 205 the transition of the FAP amplitude from its current amplitude 206 to the target amplitude a (a1 and a2 in case of n=2) of the FAP 207 at the end of the transition time T. The amplitude of the FAP at 208 the beginning of the transition depends on the previous 209 bookmarks and can be equal to: 210 - 0 if no bookmark with FAP number was used before. 211 - a of the previous bookmark with the same FAP number if 212 a time longer than the previous transition time T has 213 elapsed between these two FAP bookmarks. 214 - The actual reached amplitude due to a previous 215 bookmark with the same FAP number if a time shorter than 216 the previous transition time T has elapsed between the 217 previous bookmark and the current one. 218 At the end of the transition time T, target amplitude a is 219 maintained until another bookmark gives a new target amplitude. 220 To reset a FAP, a bookmark with the same FAP number with a=0 is 221 included in the text. 223 RTP Payload Format for October 2001 224 Phoneme/Facial Animation Parameter (PFAP) 226 In case of C=1, the face renderer will linearly change the 227 amplitude of FAP n from its current amplitude to the target 228 amplitude within the transition time T. In case of C=2, a 229 triangle function is used which linearly changes the amplitude 230 of FAP n from its current value to the target amplitude a within 231 the transition time T/2. After that the amplitude is linearly 232 changed back to the value prior to encountering the bookmark 233 within the transition time T/2. In case of C=3, a spline 234 function is used to change the amplitude from its current 235 amplitude to the target amplitude a within the transition time 236 T. 238 Bookmarks with n=2 allow to change the facial expression of the 239 face (joy, anger, etc.), and n in the range of 3 to 68 allow to 240 animate parts of the head (lips, eyebrow, etc.) 242 Starttime: 243 Start time for this phonemeSymbol with respect to the start of 244 the MPEG-4 session in ms using a long int. MPEG-4 computes the 245 duration of the phonemes by subtracting the start times of 246 consecutive phonemes. In the PFAP payload format, we transmit 247 time durations with each phoneme. 249 6. Payload Format 251 The PFAP payload consists of three types of information: phoneme 252 descriptor, FAP descriptor, and recovery information. Each payload 253 starts with a �packet descriptor� field followed by optional recovery 254 information. Phoneme descriptors and FAP descriptors may follow the 255 packet descriptor or the recovery information if available. FAPs are 256 associated with phonemes to determine their timing in a sentence (see 257 section 3, or [2]). The start time of a FAP is the same as the start 258 time of the first phoneme following the FAP(s). In case that the 259 input to the TTS server ends with a bookmark, the server could send 260 these bookmarks as FAPs prior to the last phoneme of the previous 261 word. Alternatively, the server could create a short silence phoneme 262 that is sent after the final FAP. Therefore, a packet MUST end with a 263 phoneme if it contains any information other than recovery 264 information. 266 The following sections define the specific formats for the packet 267 descriptor and each of the three information types. 269 RTP Payload Format for October 2001 270 Phoneme/Facial Animation Parameter (PFAP) 272 0 1 2 3 273 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 |Pkt descriptor | (optional)Recovery Info | 276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 277 | | 278 |(optional)Recovery Info, (optional)((optional)FAP and Phoneme),| 279 |..., (optional)((optional)FAP and Phoneme) | 280 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 282 Figure 1 � PFAP Payload 284 6.1. Packet descriptor 286 0 1 2 3 4 5 6 7 287 +-+-+-+-+-+-+-+-+ 288 |C| T | PP |IB | 289 +-+-+-+-+-+-+-+-+ 291 Figure 2 � Packet descriptor 293 Complete (C): 1 bit 294 Distinguish between dynamic, and complete recovery information. 295 Zero stands for dynamic, and one for complete recovery 296 information. In case of complete recovery information, the 297 packet MUST only contain recovery information. Recovery 298 information is defined in section �6.4 Recovery information, 299 type 1�. 301 Type (T): 2 bits 302 This field identifies the structure of recovery information with 303 the following meaning: 304 00 no recovery information 305 01 recovery information (defined in �6.4 Recovery 306 information, type 1�) 307 10 reserved 308 11 reserved 310 prevPackets (PP): 3 bits 311 For dynamical recovery (C=0) this field defines the number of 312 previous packets that can be recovered with the following 313 recovery information. For complete recovery information (C=1) 314 this field can be ignored. The interpretation of these three 315 bits is given as follows: 316 (Every packet counts.) 317 000 reserved 318 001 one previous packet is covered 319 010 two previous packets are covered 320 011 four previous packets are covered 321 100 seven previous packets are covered 322 101 15 previous packets are covered 323 110 25 previous packets are covered 324 RTP Payload Format for October 2001 325 Phoneme/Facial Animation Parameter (PFAP) 327 111 40 previous packets are covered 329 InfoBits (IB): 2 bits 330 Indicate the type of the descriptor following the recovery 331 information: 332 00 a Phoneme descriptor follows 333 01 a FAP descriptor follows 334 10 end of packet 335 11 reserved 337 6.2. Phoneme descriptor 339 0 1 2 3 340 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 342 | PhonemeSymbol | PhonemeDuration | f0Average |S|W|IB | 343 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 Figure 3 � Phoneme descriptor 347 PhonemeSymbol: 8 bits 348 This field identifies each phoneme from a phoneme alphabet. The 349 mapping of Phonemes to this 8 bit number is signaled out of 350 band. 352 PhonemeDuration: 12 bits 353 This field identifies the duration of the PhonemeSymbol in units 354 of milliseconds. 356 f0Average: 8 bits 357 This field defines the frequency of the synthesized audio signal 358 for this phoneme in units of 2 Hz. 360 Stress (S): 1 bit 361 S=1 indicates a stressed phoneme. 363 WordBegin (W): 1 bit 364 W=1 indicates the beginning of a word. 366 InfoBits (IB): 2 bits 367 These bits identify the following descriptor (phoneme, FAP) in the 368 stream or indicate the end of text, which can be after a sentence 369 or a paragraph. The meanings of the binary combinations are: 370 00 a Phoneme descriptor follows 371 01 a FAP descriptor follows 372 10 end of packet 373 11 end of text, which implies end of packet. End of text 374 can be at the end of a sentence or paragraph. The 375 renderer/client should expect a pause of undefined 376 length prior to the next utterance. 378 RTP Payload Format for October 2001 379 Phoneme/Facial Animation Parameter (PFAP) 381 6.3. FAP descriptor 383 0 1 2 3 384 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 386 | FAPind |s| Amplitude (22 bits) | 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 Transition (14 bits) | C |IB | 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 391 Figure 4 � FAP descriptor 393 FAPind: 7 bits 394 This field identifies FAPs in the range of 3 to 74. Facial 395 expressions are indicated using FAP numbers larger than 68. For 396 FAP numbers larger than 68, (FAP number � 68) gives the facial 397 expression number e1 or e2. Amplitude, transition and curve are 398 not mapped, they stay the same. 399 Example: 400 bookmark sequence for expression: 402 transformed bookmark sequence: 405 Sign (s): 1 bit 406 Sign of the FAP target amplitude. 0 stands for plus, and 1 for 407 minus. 409 Amplitude: 22 bits 410 This field holds the target amplitude for this FAP. The maximum 411 possible target amplitude is 2529600. 413 Transition: 14 bits 414 Holds the desired transition time during which the target 415 amplitude of the FAP has to be reached. The maximum transition 416 time is not specified in MPEG-4. In this payload format, it is 417 limited to 16383 ms. 419 Curve (C): 2 bits 420 Describes the time curve (1, 2, or 3) used for computation of 421 the FAP amplitude. 423 InfoBits (IB): 2 bits 424 These bits identify the following descriptor (phoneme, FAP) in 425 the stream. The meanings of the binary combinations are: 426 00 a Phoneme descriptor follows 427 01 a FAP descriptor follows 428 10 reserved 429 11 reserved 430 RTP Payload Format for October 2001 431 Phoneme/Facial Animation Parameter (PFAP) 433 6.4. Recovery information, type 1 435 Only FAPs can be recovered with the recovery information. In case of 436 complete recovery information, only FAPs with nonzero amplitudes are 437 specified. In case of dynamic recovery, only FAPs from bookmarks that 438 were specified during the prevPackets packets and still have an 439 effect on the FAPs are specified. This might include FAPs with a 440 target amplitude of 0. As an example, if a FAP is changed during a 441 previous packet using a triangle function (C=2) and the transition 442 time is already in the past, the FAP is not included in the recovery 443 bit structure. 445 0 1 2 3 446 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | FAPind |s| Amplitude (22 bits) | 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 Transition (14 bits) | C |IB | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 Figure 5 � Recovery information, type 1 455 FAPind: 7 bits 456 see FAP descriptor in �6.3 FAP descriptor� 458 Sign (s): 1 bit 459 see FAP descriptor in �6.3 FAP descriptor� 461 Amplitude: 22 bits 462 see FAP descriptor in �6.3 FAP descriptor� 464 Transition: 14 bits 465 Holds the transition time adjusted for the moment of sending of 466 each transmitted FAP. This new transition time should be set to 467 the greater of 0 or the end time of transition minus the 468 timestamp of the packet. 470 Curve (C): 2 bits 471 see FAP descriptor in �6.3 FAP descriptor� 473 InfoBits (IB): 2 bits 474 These Bits are describing the following data. 475 The meanings of the binary combinations are: 476 00 recovery information, type 1 follows 477 01 reserved 478 10 reserved 479 11 indicates the end of recovery information 481 7. RTP header fields usage: 483 RTP Payload Format for October 2001 484 Phoneme/Facial Animation Parameter (PFAP) 486 Payload Type: The assignment of an RTP payload type for this payload 487 format is outside the scope of this document, and will not be 488 specified here. It is expected that the RTP profile for a particular 489 class of applications will assign a payload type for this format, or 490 if that is not done then a payload type in the dynamic range shall be 491 chosen. 493 M bit: Marker Bit equals one indicates the start of a sentence with 494 the first phoneme in the current packet. This non-speech related 495 information is to be used with the renderer. 497 Timestamp: Represents the presentation time of the first phoneme in 498 this packet based on a 44.1 kHz clock unless specified otherwise out- 499 of-band. For packets without phonemes (complete recovery) the 500 timestamp specifies the time when the state of the bookmarks was 501 sampled. 503 8. Recovery Strategy 505 Recovery information is sent using the 6.4 Recovery information, type 506 1. Complete recovery information MAY be sent between two regular data 507 packets. Dynamical recovery information MAY be sent with each regular 508 data packet. Dynamical recovery information contains FAPs that were 509 transmitted during the recovery period prevPackets. Complete recovery 510 only contains non-zero FAPs. Complete recovery packets are only sent 511 for new clients/users or burst losses exceeding the limits of 512 dynamical recovery. 514 9. Security Considerations 516 RTP packets using the payload format defined in this specification 517 are subject to the security considerations discussed in the RTP 518 specification [5], and any appropriate profile. This implies that 519 confidentiality of the media streams is achieved by encryption. 520 Because the data encoding used with this payload format is applied 521 end-to-end, encryption may be performed after encoding so there is no 522 conflict between the two operations. 524 A potential denial-of-service threat exists for data encodings using 525 receiver side decoding. The attacker can inject pathological 526 datagrams into the stream, which are complex to decode and cause the 527 receiver to be overloaded. The decoder software should consider this 528 possibility and take the necessary precautions. 530 As with any IP-based protocol, in some circumstances, a receiver may 531 be overloaded simply by the receipt of too many packets, either 532 desired or undesired. Network-layer authentication may be used to 533 discard packets from undesired sources, but the processing cost of 534 the authentication itself may be too high. In a multicast 535 environment, pruning of specific sources may be implemented in future 536 RTP Payload Format for October 2001 537 Phoneme/Facial Animation Parameter (PFAP) 539 versions of IGMP [6] and in multicast routing protocols to allow a 540 receiver to select which sources are allowed to reach it. 542 RTP Payload Format for October 2001 543 Phoneme/Facial Animation Parameter (PFAP) 545 10. References 547 [1] ISO/IEC International Standard 14496-2; "Generic coding of audio- 548 visual objects - Part 2: Visual", 1998 550 [2] ISO/IEC International Standard 14496-3; "Generic coding of audio- 551 visual objects - Subpart 6: Text-to-Speech Interface", 1998 553 [3] Avaro, et. al., �RTP Payload Format for MPEG-4 Streams�, IETF 554 work in progress, draft-ietf-avt-mpeg4-multisl-00.txt, June 2001. 556 [4] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 557 "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, 558 January 1996. 560 [5] RFC 2119 Bradner, S., "Key words for use in RFCs to Indicate 561 Requirement Levels", BCP 14, RFC 2119, March 1997 563 [6] Deering, S., "Host Extensions for IP Multicasting", STD 5, RFC 564 1112, August 1989. 566 11. Author's Addresses 568 Joern Ostermann 569 AT&T Labs - Research, Rm A5-4E02 570 200 Laurel Ave South Phone: 1-732-420-9116 571 Middletown, NJ 07748 USA Email: 572 osterman@research.att.com 574 Juergen Th. Rurainsky 575 AT&T Labs - Research, Rm A5-4F27 576 200 Laurel Ave South Phone: 1-732-420-9138 577 Middletown, NJ 07748 USA Email: jru@research.att.com 579 M. Reha Civanlar 580 AT&T Labs - Research, Rm A5-4D04 581 200 Laurel Ave South Phone: 1-732-420-9170 582 Middletown, NJ 07748 USA Email: civanlar@research.att.com