Network Working Group J. van der Meer
Request for Comments: 3640 Philips Electronics
Category: Standards Track D. Mackie
Apple Computer
V. Swaminathan
Sun Microsystems Inc.
D. Singer
Apple Computer
P. Gentric
Philips Electronics
November 2003
Network Working Group J. van der Meer
Request for Comments: 3640 Philips Electronics
Category: Standards Track D. Mackie
Apple Computer
V. Swaminathan
Sun Microsystems Inc.
D. Singer
Apple Computer
P. Gentric
Philips Electronics
November 2003
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
The Motion Picture Experts Group (MPEG) Committee (ISO/IEC JTC1/SC29 WG11) is a working group in ISO that produced the MPEG-4 standard. MPEG defines tools to compress content such as audio-visual information into elementary streams. This specification defines a simple, but generic RTP payload format for transport of any non-multiplexed MPEG-4 elementary stream.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Carriage of MPEG-4 Elementary Streams Over RTP . . . . . . . . 4
2.1. Signaling by MIME Format Parameters . . . . . . . . . . 4
2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . 5
2.3. Concatenation of Access Units . . . . . . . . . . . . . 5
2.4. Fragmentation of Access Units . . . . . . . . . . . . . 6
2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . 6
2.6. Time Stamp Information . . . . . . . . . . . . . . . . . 7
2.7. State Indication of MPEG-4 System Streams . . . . . . . 8
2.8. Random Access Indication . . . . . . . . . . . . . . . . 8
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Carriage of MPEG-4 Elementary Streams Over RTP . . . . . . . . 4
2.1. Signaling by MIME Format Parameters . . . . . . . . . . 4
2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . 5
2.3. Concatenation of Access Units . . . . . . . . . . . . . 5
2.4. Fragmentation of Access Units . . . . . . . . . . . . . 6
2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . 6
2.6. Time Stamp Information . . . . . . . . . . . . . . . . . 7
2.7. State Indication of MPEG-4 System Streams . . . . . . . 8
2.8. Random Access Indication . . . . . . . . . . . . . . . . 8
2.9. Carriage of Auxiliary Information . . . . . . . . . . . 8
2.10. MIME Format Parameters and Configuring Conditional Field 8
2.11. Global Structure of Payload Format . . . . . . . . . . . 9
2.12. Modes to Transport MPEG-4 Streams . . . . . . . . . . . 9
2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . 10
3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. Usage of RTP Header Fields and RTCP . . . . . . . . . . 10
3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . 11
3.2.1. The AU Header Section . . . . . . . . . . . . . 11
3.2.1.1. The AU-header . . . . . . . . . . . . 12
3.2.2. The Auxiliary Section . . . . . . . . . . . . . 14
3.2.3. The Access Unit Data Section . . . . . . . . . . 15
3.2.3.1. Fragmentation. . . . . . . . . . . . . 16
3.2.3.2. Interleaving . . . . . . . . . . . . . 16
3.2.3.3. Constraints for Interleaving . . . . . 17
3.2.3.4. Crucial and Non-Crucial AUs with
MPEG-4 System Data . . . . . . . . . . 20
3.3. Usage of this Specification. . . . . . . . . . . . . . . 21
3.3.1. General. . . . . . . . . . . . . . . . . . . . . 21
3.3.2. The Generic Mode . . . . . . . . . . . . . . . . 22
3.3.3. Constant Bit Rate CELP . . . . . . . . . . . . . 22
3.3.4. Variable Bit Rate CELP . . . . . . . . . . . . . 23
3.3.5. Low Bit Rate AAC . . . . . . . . . . . . . . . . 24
3.3.6. High Bit Rate AAC. . . . . . . . . . . . . . . . 25
3.3.7. Additional Modes . . . . . . . . . . . . . . . . 26
4. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . 27
4.1. MIME Type Registration . . . . . . . . . . . . . . . . . 27
4.2. Registration of Mode Definitions with IANA . . . . . . . 33
4.3. Concatenation of Parameters. . . . . . . . . . . . . . . 33
4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . 34
4.4.1. The a=fmtp Keyword . . . . . . . . . . . . . . . 34
5. Security Considerations. . . . . . . . . . . . . . . . . . . . 34
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35
APPENDIX: Usage of this Payload Format. . . . . . . . . . . . . . 36
Appendix A. Interleave Analysis . . . . . . . . . . . . . . . . . 36
A. Examples of Delay Analysis with Interleave. . . . . . . . . . 36
A.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 36
A.2. De-interleaving and Error Concealment . . . . . . . . . 36
A.3. Simple Group Interleave . . . . . . . . . . . . . . . . 36
A.3.1. Introduction . . . . . . . . . . . . . . . . . . 36
A.3.2. Determining the De-interleave Buffer Size . . . 37
A.3.3. Determining the Maximum Displacement . . . . . . 37
A.4. More Subtle Group Interleave . . . . . . . . . . . . . . 38
A.4.1. Introduction . . . . . . . . . . . . . . . . . . 38
A.4.2. Determining the De-interleave Buffer Size. . . . 38
A.4.3. Determining the Maximum Displacement . . . . . . 39
A.5. Continuous Interleave . . . . . . . . . . . . . . . . . 39
A.5.1. Introduction . . . . . . . . . . . . . . . . . . 39
2.9. Carriage of Auxiliary Information . . . . . . . . . . . 8
2.10. MIME Format Parameters and Configuring Conditional Field 8
2.11. Global Structure of Payload Format . . . . . . . . . . . 9
2.12. Modes to Transport MPEG-4 Streams . . . . . . . . . . . 9
2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . 10
3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. Usage of RTP Header Fields and RTCP . . . . . . . . . . 10
3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . 11
3.2.1. The AU Header Section . . . . . . . . . . . . . 11
3.2.1.1. The AU-header . . . . . . . . . . . . 12
3.2.2. The Auxiliary Section . . . . . . . . . . . . . 14
3.2.3. The Access Unit Data Section . . . . . . . . . . 15
3.2.3.1. Fragmentation. . . . . . . . . . . . . 16
3.2.3.2. Interleaving . . . . . . . . . . . . . 16
3.2.3.3. Constraints for Interleaving . . . . . 17
3.2.3.4. Crucial and Non-Crucial AUs with
MPEG-4 System Data . . . . . . . . . . 20
3.3. Usage of this Specification. . . . . . . . . . . . . . . 21
3.3.1. General. . . . . . . . . . . . . . . . . . . . . 21
3.3.2. The Generic Mode . . . . . . . . . . . . . . . . 22
3.3.3. Constant Bit Rate CELP . . . . . . . . . . . . . 22
3.3.4. Variable Bit Rate CELP . . . . . . . . . . . . . 23
3.3.5. Low Bit Rate AAC . . . . . . . . . . . . . . . . 24
3.3.6. High Bit Rate AAC. . . . . . . . . . . . . . . . 25
3.3.7. Additional Modes . . . . . . . . . . . . . . . . 26
4. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . 27
4.1. MIME Type Registration . . . . . . . . . . . . . . . . . 27
4.2. Registration of Mode Definitions with IANA . . . . . . . 33
4.3. Concatenation of Parameters. . . . . . . . . . . . . . . 33
4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . 34
4.4.1. The a=fmtp Keyword . . . . . . . . . . . . . . . 34
5. Security Considerations. . . . . . . . . . . . . . . . . . . . 34
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35
APPENDIX: Usage of this Payload Format. . . . . . . . . . . . . . 36
Appendix A. Interleave Analysis . . . . . . . . . . . . . . . . . 36
A. Examples of Delay Analysis with Interleave. . . . . . . . . . 36
A.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 36
A.2. De-interleaving and Error Concealment . . . . . . . . . 36
A.3. Simple Group Interleave . . . . . . . . . . . . . . . . 36
A.3.1. Introduction . . . . . . . . . . . . . . . . . . 36
A.3.2. Determining the De-interleave Buffer Size . . . 37
A.3.3. Determining the Maximum Displacement . . . . . . 37
A.4. More Subtle Group Interleave . . . . . . . . . . . . . . 38
A.4.1. Introduction . . . . . . . . . . . . . . . . . . 38
A.4.2. Determining the De-interleave Buffer Size. . . . 38
A.4.3. Determining the Maximum Displacement . . . . . . 39
A.5. Continuous Interleave . . . . . . . . . . . . . . . . . 39
A.5.1. Introduction . . . . . . . . . . . . . . . . . . 39
A.5.2. Determining the De-interleave Buffer Size . . . 40
A.5.3. Determining the Maximum Displacement . . . . . . 40
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Normative References . . . . . . . . . . . . . . . . . . . . . . . 41
Informative References . . . . . . . . . . . . . . . . . . . . . . 41
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
A.5.2. Determining the De-interleave Buffer Size . . . 40
A.5.3. Determining the Maximum Displacement . . . . . . 40
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Normative References . . . . . . . . . . . . . . . . . . . . . . . 41
Informative References . . . . . . . . . . . . . . . . . . . . . . 41
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29 that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4 standards [1]. The MPEG-4 standard specifies compression of audio-visual data into, for example an audio or video elementary stream. In the MPEG-4 standard, these streams take the form of audio-visual objects that may be arranged into an audio-visual scene by means of a scene description. Each MPEG-4 elementary stream consists of a sequence of Access Units; examples of an Access Unit (AU) are an audio frame and a video picture.
This specification defines a general and configurable payload structure to transport MPEG-4 elementary streams, in particular MPEG-4 audio (including speech) streams, MPEG-4 video streams and also MPEG-4 systems streams, such as BIFS (BInary Format for Scenes), OCI (Object Content Information), OD (Object Descriptor) and IPMP (Intellectual Property Management and Protection) streams. The RTP payload defined in this document is simple to implement and reasonably efficient. It allows for optional interleaving of Access Units (such as audio frames) to increase error resiliency in packet loss.
Some types of MPEG-4 elementary streams include "crucial" information whose loss cannot be tolerated. However, RTP does not provide reliable transmission, so receipt of that crucial information is not assured. Section 3.2.3.4 specifies how stream state is conveyed so that the receiver can detect the loss of crucial information and cease decoding until the next random access point has been received. Applications transmitting streams that include crucial information, such as OD commands, BIFS commands, or programmatic content such as MPEG-J (Java) and ECMAScript, should include random access points, at a suitable periodicity depending upon the probability of loss, in order to reduce stream corruption to an acceptable level. An example is the carousel mechanism as defined by MPEG in ISO/IEC 14496-1 [1].
with due consideration to congestion control. Another solution that may be appropriate for some applications is to carry RTP over TCP (such as in RFC 2326 [8], section 10.12). At the network layer, resource allocation or preferential service may be available to reduce the probability of loss. For a general description of methods to repair streaming media, see RFC 2354 [9].
Configuration of the payload is provided to accommodate the transportation of any MPEG-4 stream at any possible bit rate. However, for a specific MPEG-4 elementary stream typically only very few configurations are needed. So as to allow for the design of simplified, but dedicated receivers, this specification requires that specific modes be defined for transport of MPEG-4 streams. This document defines modes for MPEG-4 CELP and AAC streams, as well as a generic mode that can be used to transport any MPEG-4 stream. In the future, new RFCs are expected to specify additional modes for the transportation of MPEG-4 streams.
The RTP payload format defined in this document specifies carriage of system-related information that is often equivalent to the information that may be contained in the MPEG-4 Sync Layer (SL) as defined in MPEG-4 Systems [1]. This document does not prescribe how to transcode or map information from the SL to fields defined in the RTP payload format. Such processing, if any, is left to the discretion of the application. However, to anticipate the need for the transportation of any additional system-related information in the future, an auxiliary field can be configured that may carry any such data.
With this payload format, a single MPEG-4 elementary stream can be transported. Information on the type of MPEG-4 stream carried in the payload is conveyed by MIME format parameters, as in an SDP [5] message or by other means (see section 4). These MIME format parameters specify the configuration of the payload. To allow for simplified and dedicated receivers, a MIME format parameter is
For carriage of compressed audio-visual data, MPEG defines Access Units. An MPEG Access Unit (AU) is the smallest data entity to which timing information is attributed. In the case of audio, an Access Unit may represent an audio frame and in the case of video, a picture. MPEG Access Units are octet-aligned by definition. If, for example, an audio frame is not octet-aligned, up to 7 zero-padding bits MUST be inserted at the end of the frame to achieve the octet-aligned Access Units, as required by the MPEG-4 specification. MPEG-4 decoders MUST be able to decode AUs in which such padding is applied.
Consistent with the MPEG-4 specification, this document requires that each MPEG-4 part 2 video Access Unit include all the coded data of a picture, any video stream headers that may precede the coded picture data, and any video stream stuffing that may follow it, up to but not including the startcode indicating the start of a new video stream or the next Access Unit.
Frequently it is possible to carry multiple Access Units in one RTP packet. This is particularly useful for audio; for example, when AAC is used for encoding a stereo signal at 64 kbits/sec, AAC frames contain on average, approximately 200 octets. On a LAN with a 1500 octet MTU, this would allow an average of 7 complete AAC frames to be carried per RTP packet.
Access Units may have a fixed size in octets, but a variable size is also possible. To facilitate parsing in the case of multiple concatenated AUs in one RTP packet, the size of each AU is made known to the receiver. When concatenating in the case of a constant AU size, this size is communicated "out of band" through a MIME format parameter. When concatenating in case of variable size AUs, the RTP payload carries "in band" an AU size field for each contained AU.
MPEG allows for very large Access Units. Since most IP networks have significantly smaller MTU sizes, this payload format allows for the fragmentation of an Access Unit over multiple RTP packets. Hence, when an IP packet is lost after IP-level fragmentation, only an AU fragment may get lost instead of the entire AU. To simplify the implementation of RTP receivers, an RTP packet SHALL either carry one or more complete Access Units or a single fragment of one AU, i.e., packets MUST NOT contain fragments of multiple Access Units.
When an RTP packet carries a contiguous sequence of Access Units, the loss of such a packet can result in a "decoding gap" for the user. One method of alleviating this problem is to allow for the Access Units to be interleaved in the RTP packets. For a modest cost in latency and implementation complexity, significant error resiliency to packet loss can be achieved.
To support optional interleaving of Access Units, this payload format allows for index information to be sent for each Access Unit. After informing receivers about buffer resources to allocate for de-interleaving, the RTP sender is free to choose the interleaving pattern without propagating this information a priori to the receiver(s). Indeed, the sender could dynamically adjust the interleaving pattern based on the Access Unit size, error rates, etc. The RTP receiver does not need to know the interleaving pattern used; it only needs to extract the index information of the Access Unit and insert the Access Unit into the appropriate sequence in the decoding or rendering queue. An example of interleaving is given below.
The RTP time stamp MUST carry the sampling instant of the first AU (fragment) in the RTP packet. When multiple AUs are carried within an RTP packet, the time stamps of subsequent AUs can be calculated if the frame period of each AU is known. For audio and video, this is possible if the frame rate is constant. However, in some cases it is not possible to make such a calculation (for example, for variable frame rate video, or for MPEG-4 BIFS streams carrying composition information). To support such cases, this payload format can be configured to carry a time stamp in the RTP payload for each contained Access Unit. A time stamp MAY be conveyed in the RTP payload only for non-first AUs in the RTP packet, and SHALL NOT be conveyed for the first AU (fragment), as the time stamp for the first AU in the RTP packet is carried by the RTP time stamp.
MPEG-4 defines two types of time stamps: the composition time stamp (CTS) and the decoding time stamp (DTS). The CTS represents the sampling instant of an AU, and hence the CTS is equivalent to the RTP time stamp. The DTS may be used in MPEG-4 video streams that use bi-directional coding, i.e., when pictures are predicted in both forward and backward direction by using either a reference picture in the past, or a reference picture in the future. The DTS cannot be carried in the RTP header. In some cases, the DTS can be derived from the RTP time stamp using frame rate information; this requires deep parsing in the video stream, which may be considered objectionable. If the video frame rate is variable, the required information may not even be present in the video stream. For both reasons, the capability has been defined to optionally carry the DTS in the RTP payload for each contained Access Unit.
ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to convey state information when transporting MPEG-4 system streams, this payload format allows for the optional carriage in the RTP payload of the stream state for each contained Access Unit. Stream states are used to signal "crucial" AUs that carry information whose loss cannot be tolerated and are also useful when repeating AUs according to the carousel mechanism defined in ISO/IEC 14496-1.
Random access to the content of MPEG-4 elementary streams may be possible at some but not all Access Units. To signal Access Units where random access is possible, a random access point flag can optionally be carried in the RTP payload for each contained Access Unit. Carriage of random access points is particularly useful for MPEG-4 system streams in combination with the stream state.
This payload format defines a specific field to carry auxiliary data. The auxiliary data field is preceded by a field that specifies the length of the auxiliary data, so as to facilitate the skipping of data without parsing it. The coding of the auxiliary data is not defined in this document; instead, the format, meaning and signaling of auxiliary information is expected to be specified in one or more future RFCs. Auxiliary information MUST NOT be transmitted until its format, meaning and signaling have been specified and its use has been signaled. Receivers that have knowledge of the auxiliary data MAY decode the auxiliary data, but receivers without knowledge of such data MUST skip the auxiliary data field.
To support the features described in the previous sections, several fields are defined for carriage in the RTP payload. However, their use strongly depends on the type of MPEG-4 elementary stream that is carried. Sometimes a specific field is needed with a certain length, while in other cases such a field is not needed. To be efficient in either case, the fields to support these features are configurable by means of MIME format parameters. In general, a MIME format parameter defines the presence and length of the associated field. A length of zero indicates absence of the field. As a consequence, parsing of the payload requires knowledge of MIME format parameters. The MIME format parameters are conveyed to the receiver via SDP [5] messages, as specified in section 4.4.1, or through other means.
+---------+-----------+-----------+---------------+
| RTP | AU Header | Auxiliary | Access Unit |
| Header | Section | Section | Data Section |
+---------+-----------+-----------+---------------+
+---------+-----------+-----------+---------------+
| RTP | AU Header | Auxiliary | Access Unit |
| Header | Section | Section | Data Section |
+---------+-----------+-----------+---------------+
The first data section is the AU (Access Unit) Header Section, that contains one or more AU-headers; however, each AU-header MAY be empty, in which case the entire AU Header Section is empty. The second section is the Auxiliary Section, containing auxiliary data; this section MAY also be configured empty. The third section is the Access Unit Data Section, containing either a single fragment of one Access Unit or one or more complete Access Units. The Access Unit Data Section MUST NOT be empty.
While it is possible to build fully configurable receivers capable of receiving any MPEG-4 stream, this specification also allows for the design of simplified, but dedicated receivers, that are for example, capable of receiving only one type of MPEG-4 stream. This is achieved by requiring that specific modes be defined in order to use this specification. Each mode may define constraints for transport of one or more types of MPEG-4 streams, for instance on the payload configuration.
The applied mode MUST be signaled. Signaling the mode is particularly important for receivers that are only capable of decoding one or more specific modes. Such receivers need to determine whether the applied mode is supported, so as to avoid problems with processing of payloads that are beyond the capabilities of the receiver.
In this document several modes are defined for the transportation of MPEG-4 CELP and AAC streams, as well as a generic mode that can be used for any MPEG-4 stream. In the future, new RFCs may specify other modes of using this specification. However, each mode MUST be in full compliance with this specification (see section 3.3.7).
This payload can be configured as nearly identical to the payload format defined in RFC 3016 [12] for the MPEG-4 video configurations recommended in RFC 3016. Hence, receivers that comply with RFC 3016 can decode such RTP payload, provided that additional packets containing video decoder configuration (VO, VOL, VOSH) are inserted in the stream, as required by RFC 3016 [12]. Conversely, receivers that comply with the specification in this document SHOULD be able to decode payloads, names and parameters defined for MPEG-4 video in RFC 3016 [12]. In this respect, it is strongly RECOMMENDED that the implementation provide the ability to ignore "in band" video decoder configuration packets that may be found in streams conforming to the RFC 3016 video payload.
Note the "out of band" availability of the video decoder configuration is optional in RFC 3016 [12]. To achieve maximum interoperability with the RTP payload format defined in this document, applications that use RFC 3016 to transport MPEG-4 video (part 2) are recommended to make the video decoder configuration available as a MIME parameter.
Timestamp: Indicates the sampling instant of the first AU contained in the RTP payload. This sampling instant is equivalent to the CTS in the MPEG-4 time domain. When using SDP, the clock rate of the RTP time stamp MUST be expressed using the "rtpmap" attribute. If an MPEG-4 audio stream is transported, the rate SHOULD be set to the same value as the sampling rate of the audio stream. If an MPEG-4 video stream is transported, it is RECOMMENDED that the rate be set to 90 kHz.
According to RFC 3550 [2] (section 5.1), it is RECOMMENDED that RTP time stamps start at a random value for security reasons. This is not an issue for synchronization of multiple RTP streams. However, when streams from multiple sources are to be synchronized (for example one stream from local storage, another from an RTP streaming server), synchronization may become impossible if the receiver only knows the original time stamp relationships. In such cases the time stamp relationship required for obtaining synchronization may be provided by out of band means. The format of such information, as well as methods to convey such information, are beyond the scope of this specification.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
|AU-headers-length|AU-header|AU-header| |AU-header|padding|
| | (1) | (2) | | (n) | bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
|AU-headers-length|AU-header|AU-header| |AU-header|padding|
| | (1) | (2) | | (n) | bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
The AU-headers are configured using MIME format parameters and MAY be empty. If the AU-header is configured empty, the AU-headers-length field SHALL NOT be present and consequently the AU Header Section is empty. If the AU-header is not configured empty, then the AU-headers-length is a two octet field that specifies the length in bits of the immediately following AU-headers, excluding the padding bits.
For each contained Access Unit (fragment), there is exactly one AU-header. Within the AU Header Section, the AU-headers are bit-wise concatenated in the order in which the Access Units are contained in the Access Unit Data Section. Hence, the n-th AU-header refers to the n-th AU (fragment). If the concatenated AU-headers consume a non-integer number of octets, up to 7 zero-padding bits MUST be inserted at the end in order to achieve octet-alignment of the AU Header Section.
Each AU-header may contain the fields given in Figure 3. The length in bits of the fields, with the exception of the CTS-flag, the DTS-flag and the RAP-flag fields, is defined by MIME format parameters; see section 4.1. If a MIME format parameter has the default value of zero, then the associated field is not present. The number of bits for fields that are present and that represent the value of a parameter MUST be chosen large enough to correctly encode the largest value of that parameter during the session.
+---------------------------------------+
| AU-size |
+---------------------------------------+
| AU-Index / AU-Index-delta |
+---------------------------------------+
| CTS-flag |
+---------------------------------------+
| CTS-delta |
+---------------------------------------+
| DTS-flag |
+---------------------------------------+
| DTS-delta |
+---------------------------------------+
| RAP-flag |
+---------------------------------------+
| Stream-state |
+---------------------------------------+
+---------------------------------------+
| AU-size |
+---------------------------------------+
| AU-Index / AU-Index-delta |
+---------------------------------------+
| CTS-flag |
+---------------------------------------+
| CTS-delta |
+---------------------------------------+
| DTS-flag |
+---------------------------------------+
| DTS-delta |
+---------------------------------------+
| RAP-flag |
+---------------------------------------+
| Stream-state |
+---------------------------------------+
AU-size: Indicates the size in octets of the associated Access Unit in the Access Unit Data Section in the same RTP packet. When the AU-size is associated with an AU fragment, the AU size indicates the size of the entire AU and not the size of the fragment. In this case, the size of the fragment is known from the size of the AU data section. This can be exploited to determine whether a packet contains an entire AU or a fragment, which is particularly useful after losing a packet carrying the last fragment of an AU.
AU-Index: Indicates the serial number of the associated Access Unit (fragment). For each (in decoding order) consecutive AU or AU fragment, the serial number is incremented by 1. When present, the AU-Index field occurs in the first AU-header in the AU Header Section, but MUST NOT occur in any subsequent (non-first) AU-header in that Section. To encode the serial number in any such non-first AU-header, the AU-Index-delta field is used.
If the AU-Index field is present in the first AU-header in the AU Header Section, then the AU-Index-delta field MUST be present in any subsequent (non-first) AU-header. When the AU-Index-delta is coded with the value 0, it indicates that the Access Units are consecutive in decoding order. An AU-Index-delta value larger than 0 signals that interleaving is applied.
The CTS-flag field MUST be present in each AU-header if the length of the CTS-delta field is signaled to be larger than zero. In that case, the CTS-flag field MUST have the value 0 in the first AU-header and MAY have the value 1 in all non-first AU-headers. The CTS-flag field SHOULD be 0 for any non-first fragment of an Access Unit.
Stream-state: Specifies the state of the stream for an AU of an MPEG-4 system stream; each state is identified by a value of a modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams use the AU_SequenceNumber to signal stream states. When the stream state changes, the value of the stream-state MUST be incremented by one.
The Auxiliary Section consists of the auxiliary-data-size field followed by the auxiliary-data field. Receivers MAY (but are not required to) parse the auxiliary-data field; to facilitate skipping of the auxiliary-data field by receivers, the auxiliary-data-size field indicates the length in bits of the auxiliary-data. If the concatenation of the auxiliary-data-size and the auxiliary-data fields consume a non-integer number of octets, up to 7 zero padding bits MUST be inserted immediately after the auxiliary data in order to achieve octet-alignment. See Figure 4.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
| auxiliary-data-size | auxiliary-data |padding bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
| auxiliary-data-size | auxiliary-data |padding bits |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
The Access Unit Data Section contains an integer number of complete Access Units or a single fragment of one AU. The Access Unit Data Section is never empty. If data of more than one Access Unit is present, then the AUs are concatenated into a contiguous string of octets. See Figure 5. The AUs inside the Access Unit Data Section MUST be in decoding order, though not necessarily contiguous in the case of interleaving.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(1) |
+ |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |AU(2) |
+-+-+-+-+-+-+-+-+ |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AU(n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(n) continued|
|-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(1) |
+ |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |AU(2) |
+-+-+-+-+-+-+-+-+ |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | AU(n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|AU(n) continued|
|-+-+-+-+-+-+-+-+
When multiple Access Units are carried, the size of each AU MUST be made available to the receiver. If the AU size is variable, then the size of each AU MUST be indicated in the AU-size field of the corresponding AU-header. However, if the AU size is constant for a stream, this mechanism SHOULD NOT be used; instead, the fixed size SHOULD be signaled by the MIME format parameter "constantSize"; see section 4.1.
A packet SHALL carry either one or more complete Access Units, or a single fragment of an Access Unit. Fragments of the same Access Unit have the same time stamp but different RTP sequence numbers. The marker bit in the RTP header is 1 on the last fragment of an Access Unit, and 0 on all other fragments.
When a sender interleaves Access Units, it needs to provide sufficient information to enable a receiver to unambiguously reconstruct the original order, even in the case of out-of-order packets, packet loss or duplication. The information that senders need to provide depends on whether or not the Access Units have a constant time duration. Access Units have a constant time duration, if:
If the "constantDuration" parameter is not present, then senders MAY signal AUs of constant duration by coding the AU-Index with zero in each RTP packet. In the absence of the constantDuration parameter receivers MUST conclude that the AUs have constant duration if the AU-index is zero in two consecutive RTP packets.
When transmitting Access Units of variable duration, then the "constantDuration" parameter MUST NOT be present, and the transmitter MUST use the AU-Index to encode the index information required for re-ordering, and the receiver MUST use that value to determine the index of each AU in the RTP packet. The number of bits of the AU-Index field MUST be chosen so that valid index information is provided at the applied interleaving scheme, without causing problems due to roll-over of the AU-Index field. In addition, the CTS-delta MUST be coded in the AU header for each non-first AU in the RTP packet, so that receivers can place the AUs correctly in time.
When interleaving is applied, a de-interleave buffer is needed in receivers to put the Access Units in their correct logical consecutive decoding order. This requires the computation of the time stamp for each Access Unit. In case of a constant time duration per Access Unit, the time stamp of the i-th access unit in an RTP packet with RTP time stamp T is calculated as follows:
AUs enter the decoder in decoding order. The de-interleave buffer is used to re-order a stream of interleaved AUs back into decoding order. When interleaving is applied, the decoding of "early" AUs has to be postponed until all AUs that precede it in decoding order are present. Therefore, these "early" AUs are stored in the de-interleave buffer. As an example in Figure 6, the interleaving pattern from section 2.5 is considered.
+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+--+--+--+--+--+--+--+--+--+--+--+-
Storage of "early" AUs 3 3 3 3 3 3
6 6 6 6 6 6
4 4 4
7 7 7
12 12
+--+--+--+--+--+--+--+--+--+--+--+-
Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+--+--+--+--+--+--+--+--+--+--+--+-
Storage of "early" AUs 3 3 3 3 3 3
6 6 6 6 6 6
4 4 4
7 7 7
12 12
AU(3) is to be delivered to the decoder after AU(0), AU(1) and AU(2); of these AUs, AU(2) arrives from the network last and hence AU(3) needs to be stored until AU(2) is present in the pattern. Similarly, AU(6) is to be stored until AU(5) is present, while AU(4) and AU(7) are to be stored until AU(2) and AU(5) are present, respectively. Note that the fullness of the de-interleave buffer varies in time. In Figure 6, the de-interleave buffer contains at most 4, but often less AUs.
So as to give a rough indication of the resources needed in the receiver for de-interleaving, the maximum displacement in time of an AU is defined. For any AU(j) in the pattern, each AU(i) with i