home Free programs Protocol rtp description and its purpose. RTP protocol. Fixed RTP header fields

Protocol rtp description and its purpose. RTP protocol. Fixed RTP header fields

RTP and RSVP protocols,

http://www.isuct.ru/~ivt/books/NETWORKING/NET10/269/pa.html

Modern Applications cannot tolerate their packages arriving late. Two protocols (RTP and PSVP) ensure timely delivery with quality of service.

The continued growth of the Internet and private networks places new demands on bandwidth. Client-server applications are far superior to Telnet in terms of the amount of data transferred. The World Wide Web led to a giant increase in the graph graphic information. Today, in addition, voice and video applications put forward their own specific requirements for already overloaded networks.

In order to satisfy all these demands, one increase in network capacity is not enough. What is really needed are smart efficient methods of schedule management and workload control.

Historically, IP-based networks have provided all applications with only the simplest data delivery service possible. However, needs have changed over time. Organizations that have spent millions of dollars installing an IP-based network to transfer data between local networks, are now faced with the fact that such configurations are not able to efficiently support new multicast real-time multimedia applications.

ATM is the only network technology that was originally designed to support normal TCP and UDP traffic along with real-time traffic. However, going ATM means either creating a new network infrastructure for real-time traffic or replacing an existing IP-based configuration, both of which are very expensive.

Therefore, the need to support multiple types of traffic with different quality of service requirements within the TCP/IP architecture is very urgent. This problem is intended to be solved by two key tool: Real-Time Transport Protocol (RTP) and Resource Reservation Protocol (RSVP).

RTP guarantees delivery of data to one or more destinations with a delay within specified limits. This means that the data can be replayed in real time. RSVP allows end systems to reserve network resources to obtain the required quality of service, especially resources for real-time traffic over the RTP protocol.

The most widely used transport layer protocol is TCP. Although TCP can support a wide variety of distributed applications, it is not suitable for real-time applications.

In real-time applications, the sender generates a data stream at a constant rate, and the receiver(s) must provide that data to the application at the same rate. Such applications include audio and video conferencing, live video distribution (for immediate playback), shared workspaces, medical remote diagnostics, computer telephony, distributed interactive simulation, games, and real-time monitoring.

Using TCP as the transport protocol for these applications is not possible for several reasons. Firstly, this protocol allows you to establish a connection only between two endpoints and therefore not suitable for multicasting. It provides for the retransmission of lost segments arriving at a time when the real-time application is no longer waiting for them. In addition, TCP does not have a convenient mechanism for associating timing information with segments, which is also a requirement for real-time applications.

Another widely used transport layer protocol, UDP does not have the first two

restrictions (point-to-point connection and transmission of lost segments), but it does not provide critical information about synchronization. Thus, UDP itself does not have any general purpose tools for real-time applications.

While each real-time application may have its own mechanisms to support real-time transmission, they share many common features that make defining a single protocol highly desirable. The standard protocol of this kind is RTP, defined in RFC 1889.

In a typical real-time environment, the sender generates packets at a constant rate. They are sent to them at regular intervals, travel through the network, and are received by the receiver, who plays back the data in real time as it is received.

However, due to the variation in latency as packets travel across the network, they arrive at irregular intervals. To compensate for this effect, incoming packets are buffered, held for a while, and then provided at a constant rate to the output generating software. To make this scheme work, each packet is timestamped so that the receiver can replay the incoming data at the same speed as the sender.

RTP supports real-time data transfer between multiple participants in a session. (A session is a logical relationship between two or more RTP users that is maintained for the duration of the data transfer. The process of opening a session is outside the scope of RTP.)

While RTP can also be used for real-time unicast, its strength lies in its multicast support. To do this, each RTP data block contains a sender identifier indicating which participant is generating the data. The RTP data blocks also contain a timestamp so that the data can be played back at the correct intervals by the receiving end.

In addition, RTP defines the payload format of the transmitted data. Directly related to this is the concept of synchronization, which is partly the responsibility of the mixer - the RTP translation mechanism. Upon receiving streams of RTP packets from one or more sources, it combines them and sends a new stream of RTP packets to one or more recipients. The mixer can simply combine the data and also change its format.

Mixer Application Example - Combining Multiple Audio Sources. For example, suppose that some of the systems in a given audio session each generate their own RTP stream. Most of the time, only one source is active, although sometimes several sources "talk" at the same time.

If a new system wants to participate in a session, but its link to the network does not have sufficient accurate capacity to support all RTP streams, then the mixer receives all these streams, merges them into one, and passes the last one to the new session member. When multiple streams are received, the mixer adds the PCM values. The RTP header generated by the mixer includes the identifier(s) of the sender(s) whose data is present in the packet.

A simpler device creates one outgoing RTP packet for each incoming RTP packet. This mechanism, called a translator, can change the format of the data in the packet or use a different set of low-level protocols to transfer data from one domain to another. For example, a potential recipient may not be able to process the high-speed video signal used by other participants in the session. The translator then converts the video to a lower quality format that requires a lower bit rate.

Each RTP packet has a basic header and possibly additional application-specific fields. Rice. 4 illustrates the structure of the main header. The first 12 octets consist of the following fields:

version field (2 bits): Current version- second;
padding field (1 bit): This field signals the presence of padding octets at the end of the payload. (Padding is applied when the application requires the payload size to be a multiple of, for example, 32 bits.) In this case, the last octet indicates the number of padding octets;
header extension field (1 bit): when this field is set, then the main header is followed by an additional one, used in experimental RTP extensions;
sender count field (4 bits): this field contains the number of identifiers of the senders whose data is in the packet, the identifiers themselves following the main header;
marker field (1 bit): The meaning of the marker bit depends on the payload type. The marker bit is typically used to indicate the boundaries of the data stream. In the case of video, it sets the end of the frame. In the case of voice, it specifies the start of speech after a period of silence;
payload type field (7 bits): This field identifies the payload type and data format, including compression and encryption. In the stationary state, the sender uses only one payload type per session, but it can change it in response to changing conditions if signaled by the Real-Time Transport Control Protocol;
field serial number(16 bits): Each source starts numbering packets from an arbitrary number, then increments by one with each RTP data packet sent. This allows you to detect packet loss and determine the order of packets with the same timestamp. Several consecutive packets may have the same timestamp if they are logically generated at the same instant (eg packets belonging to the same video frame);
timestamp field (32 bits): records the point in time when the first octet of payload data was generated here. The units in which the time is specified in this field depend on the type of payload. The value is determined by the sender's local clock;
Sync Source ID field: A randomly generated number that uniquely identifies the source during a session.

The main header may be followed by one or more sender identifier fields whose data is present in the payload. These identifiers are inserted by the mixer.

RTP protocol used only to transfer user data - usually multicast - to all participants in the session. A separate Real-Time Transport Control Protocol (RTCP) works with multiple destinations to provide feedback with RTP data senders and other session participants.

RTCP uses the same basic transport protocol as RTP (usually UDP), but a different port number. Each session participant periodically sends an RTCP packet to all other session participants. RFC 1889 describes three functions performed by RTCP.

The first function is to provide quality of service and feedback in case of congestion. Since RTCP packets are multicast, all participants in the session can evaluate how well the other participants work and receive. The sender's messages allow recipients to evaluate the data rate and transmission quality. The recipients' messages contain information about the problems they are experiencing, including packet loss and excessive ripple. For example, the bit rate for an audio/video application may be reduced if the link does not provide the desired quality of service at a given bit rate.

Recipient feedback is also important for diagnosing propagation errors.

By analyzing messages from all participants in a session, a network administrator can determine whether a given problem concerns one participant or is of a general nature.

The second main function of RTCP is sender identification. RTCP packets contain a standard textual description of the sender. They provide more information about the sender of data packets than a randomly selected sync source ID. In addition, they help the user to identify threads related to different sessions. For example, they allow the user to determine that separate audio and video sessions are open at the same time.

The third function is session sizing and scaling. To ensure quality of service and feedback to control congestion, as well as to identify the sender, all participants periodically send RTCP packets. The frequency of transmission of these packets decreases as the number of participants increases.

With a small number of participants, one RTCP packet is sent at most every five seconds. RFC 1889 describes an algorithm where participants limit the rate of RTCP packets based on the total number of participants. The goal is to keep RTCP traffic below 5% of the total session traffic.

The purpose of any network is to deliver data to the recipient with a guaranteed quality of service, including throughput, delay, and the allowable delay variation limit. As the number of users and applications grows, it becomes more and more difficult to ensure the quality of services.

Just responding to overload is no longer enough. A tool is needed to avoid congestion altogether, that is, to make it possible for applications to reserve network resources in accordance with the required quality of service.

Preventive measures are useful for both unicast and multicast. In unicast, two applications agree on a specific quality of service level for a given session. If the network is heavily loaded, it may not be able to provide the required quality of service. In this situation, applications will have to postpone the session until better times or try to reduce the quality of service requirements, if possible.

The solution in this case is for unicast applications to reserve resources to provide the required level of service. Then the routers on the intended path allocate resources (for example, a place in the queue and part of the capacity of the outgoing line). If the router is unable to allocate resources due to previous commitments, then it notifies the application. In this case, the application may try to initiate another session with lower quality of service requirements or reschedule it to a later date.

Multicast puts much more challenging tasks resource reservation. It leads to the generation of huge volumes network graphics- in the case of, for example, applications such as video, or where there is a large and dispersed group of recipients. However, traffic from a multicast source can in principle be significantly reduced.

There are two reasons for this. First, some members of a group may not need to deliver data from a particular source in a particular period of time. Thus, members of one group can receive information simultaneously via two channels (from two sources), but the recipient may be interested in receiving only one channel.

Secondly, that some members of the group are able to process only part of the information transmitted by the sender. For example, a video stream may consist of two components: one with low picture quality and the other with high picture quality. This format has a number of video compression algorithms: they generate a base component with a low quality picture and additional component with higher resolution.

Some recipients may not have enough processing power to process components with high resolution or be connected to the network through a subnet or link that does not have enough capacity to carry the full signal.

Resource reservation allows routers to determine in advance whether they can deliver multicast traffic to all recipients.

In previous attempts to implement resource reservations and in the approaches adopted in frame relay and ATM, the necessary resources are requested by the source of the data flow. This method is sufficient in the case of unicast transmission, because the transmitting application transmits data at a certain rate, and the required level of quality of service is inherent in the transmission scheme.

However, this approach cannot be used for multicasting. Different group members may have different resource requirements. If the original stream can be divided into substreams, then some members of the group may well want to receive only one of them. In particular, some receivers will only be able to process the low resolution video component. Or if several senders broadcast to the same group, then the recipient can choose only one sender or some subset of them. Finally, the quality of service requirements of different recipients may vary depending on the output equipment, processor power, and channel speed.

For this reason, resource reservations by the recipient are seen as preferable. Senders can provide routers with general characteristics of the traffic (such as data rate and variability), but receivers must determine the level of quality of service required. Routers then aggregate requests for resource allocations at common parts of the distribution tree.

RSVP is based on three concepts regarding data flows: session, flow specification, and filter specification. Session is a data stream identified by its destination. Note that this concept is different from that of an RTP session, although RSVP and RTP sessions may have a one-to-one correspondence. After a router reserves resources for a particular destination, it treats this as the start of a session and allocates resources for the duration of that session.

A reservation request from the destination end system, called a flow descriptor, consists of a flow specification and a filter. Flow specification defines the required quality of service and is used by the node to set the parameters of the packet scheduler. The router transmits packets with a given set of preferences based on the current flow specification.

Filter specification defines a set of packages under which resources are requested. Together with the session, it defines a set of packets (or flow) for which the required quality of service is to be provided. Any other packets destined for that destination are processed insofar as the network is able to do so.

RSVP does not define the content of the flow specification, it simply passes the request. A flow specification typically includes a service class, Rspec (R stands for reserve), and Tspec (T stands for traffic). The other two parameters are a set of numbers. The Rspec parameter defines the required quality of service, and the Tspec parameter describes the data flow. The contents of Rspec and Tspec are transparent to RSVP.

In principle, a filter specification describes an arbitrary subset of packets from a single session (that is, those packets whose destination is determined by that session). For example, a filter specification might define only specific senders, or define protocols or packets whose protocol header fields match those specified.

Rice. 3 illustrates the relationship between session, flow specification, and filter specification. Each incoming packet belongs to at least one session and is considered according to the logical flow for that session. If the packet does not belong to any session, then it is delivered insofar as there are free resources.

The main difficulty with RSVP is related to multicasting. An example of a multicast configuration is shown in fig. 6. This configuration consists of four routers. The link between any two routers, represented by a line, can be either a direct link or a subnet. Three hosts - Gl, G2 and G3 - are in the same group and receive datagrams with the corresponding multicast address. Data at this address is transmitted by two hosts - S1 and S2. The red line corresponds to the routing tree for S1 and this group, and the blue line for S2 and this group. Arrow lines indicate the direction of packets from S1 (red) and from S2 (blue).

The figure shows that all four routers must be aware of each recipient's resource reservation. Thus, resource allocation requests propagate backward through the routing tree.

RSVP uses two main message types: Resv and Path. Resv messages are generated by recipients and propagate up the tree, with each node along the way concatenating and reassembling packets from different recipients when possible. These messages cause the router to enter a resource reservation state for that session (multicast address). Eventually all the combined Resv messages reach the sending hosts. Based on the information received, they set the appropriate schedule control parameters for the first hop.

Rice. 7 shows the Resv message flow. Please note: messages are concatenated; therefore, only one message is sent up any branch of the combined delivery tree. However, these messages must be resent periodically to extend the resource reservation period.

The Path message is used to propagate reverse route information. All modern multicast routing protocols support only the direct route in the form of a propagation tree (down from the sender). But Resv messages must be sent back through all intermediate routers to all sending hosts.

Since the routing protocol does not provide reverse route information, it is carried by RSVP in Path messages. Any host that wants to be the sender sends a Path message to all members of the group. Along the way, each router and each destination host enters the path state, indicating that packets for this sender should be forwarded to the hop from which the packet was received. Rice. 5 shows that the Path packets are sent over the same paths as the data packets.

Consider the operation of the RSVP protocol. From the host's point of view, the operation of the protocol consists of the following steps (the first two steps in this sequence are sometimes reversed).

The recipient joins the multicast group by sending an IGMP message to the neighbor router.
The potential sender sends a message to the address of the group.
The recipient receives a Path message identifying the sender.
Now that the receiver has information about the return path, it can send Resv messages with stream descriptors.
Resv messages are sent over the network to the sender.
The sender starts transmitting data.
The receiver starts receiving data packets.

Yesterday's methods of working with large volumes of graphics are completely unsuitable for modern systems. Without new tools, it is impossible to meet the growing requirements for data transmission due to the growth of their volume, the spread of real-time applications and multicast distribution. RTP and RSVP provide a solid foundation for next generation LANs.

An example of the real application of these protocols is the VoIP (Voice over IP) model - voice transmission over IP networks, which is described in the H.232 standard and provides for the transmission of audio, video information and data over an IP network. In this case, the real-time protocol RTP is used to establish a connection, and the RSVP protocol is used to reserve network resources.

AT this section some aspects of the transfer of RTP packets by network and transport protocols are considered. Unless otherwise specified by the specifications of other protocols, the following basic rules apply when transmitting packets.

RTP relies on lower layer protocols to provide separation of RTP data streams and RTCP control information. For UDP and similar protocols, RTP uses an even port number, and the corresponding RTCP stream uses a port number greater than one.

RTP information packets do not contain any length field, hence RTP relies on the underlying protocol to provide a length indication as well. The maximum length of RTP packets is only limited by lower layer protocols.

Several RTP protocol packets can be transmitted in one lower layer protocol data unit, for example, in a UDP packet. This reduces header redundancy and can simplify synchronization between different streams.

9. List of protocol constants

This section contains a list of constants defined in the RTP protocol specification.

RTP (PT - payload type) traffic type constants are defined in profiles. However, the RTP header octet, which contains the marker bit(s) and the traffic type field, must not contain the reserved values 200 and 201 (decimal) to distinguish RTP packets from RTCP SR and RR packets. For a standard format with one marker bit and a seven-bit traffic type field, this restriction means that traffic types 72 and 73 should not be used.

Values of RTCP packet types (see Table 1) are chosen in the range from 200 to 204 to better control the correctness of the RTCP packet header when compared with RTP packets. When the RTCP packet type field is compared with the corresponding octet of the RTP header, this range corresponds to a marker bit of one (which is not normally the case in information packets) and the most significant bit of the standard traffic type field of one (whereas statically defined traffic types usually have PT values with a zero in the most significant digit). This range was also chosen to be more distant from the values 0 and 255, since fields consisting entirely of zeros or ones are mostly characteristic of the data.

Other types of RTCP packets are defined by the IANA Community. Developers have the ability to register the values they require for experimental research and then unregister when the need for those values is no longer needed.

Valid types of items in the SDES package are presented in Table. 2. Other SDES item types are assigned by the IANA Community. Developers have the ability to register the values they need when performing experimental studies and then unregister when those values are no longer needed.

10. Description of the traffic profile and format

As noted above (see Section 2), a complete description of the RTP protocol for a specific application requires additional documents of two types: a description of the profile and the traffic format.

RTP can be used for many classes of applications with widely differing requirements. Flexibility to adapt to these requirements is ensured by using different profiles (see ). Typically an application uses only one profile, and no explicit indication of which profile is in this moment in use, no.

An optional document of the second type, the traffic format specification, defines how a particular type of traffic (eg H.261 encoded video) should be transmitted according to RTP. The same traffic format may be used for multiple profiles and may therefore be defined independently of the profile. Profile documents are only responsible for matching this format to the PT value .

The profile description may define the following items, but this list is not exhaustive.

Header of the RTP data packet. The octet in the header of the RTP data packet, which contains the token bit and the traffic type field, can be redefined according to the profile to meet different requirements, for example to provide more or less token bits (Section 3.3).

traffic types. A profile typically defines a set of traffic formats (eg, media encoding algorithms) and a default static mapping of these formats and PT values. Some of the traffic formats may be defined by reference to individual traffic format descriptions. For each defined type of traffic, the profile must specify the required RTP timestamp clock rate to use (Section 3.1).

RTP data packet header additions. If some additional functionality within application class profile, independent of the type of traffic, then additional fields can be attached to the fixed header of the RTP data packet .

RTP data packet header extensions. The contents of the first 16 bits of the RTP Data Packet Header Extension structure shall be specified if the use of this mechanism is allowed by the profile. .

Types of RTCP packets. New, application-class specific types of RTCP packets may be defined (and registered with IANA).

RTCP reporting interval. The profile must define the values used in calculating the RTCP reporting interval: the RTCP session bandwidth fraction, the minimum reporting interval, and the bandwidth split between senders and receivers.

SR/RR package extension. If available Additional Information about a sender or receiver that is to be transmitted regularly, an extension section can be defined for RTCP SR and RR packets.

Using SDES. The profile may define relative priorities for RTCP SDES items to be transmitted or excluded (see section 4.2.2); alternative syntax or semantics for a CNAME clause (Section 4.4.1); LOC item format (Section 4.4.5); the semantics and use of the NOTE clause (Section 4.4.7) and the new SDES clauses to be registered with IANA.

Safety. A profile may define which security services and algorithms applications should use and may provide control over their use (clause 7).

Password-key matching. The profile can determine how the password entered by the user is converted into an encryption key.

The underlying protocol. The transmission of RTP packets may require the use of a particular underlying network or transport layer protocol.

Transport Compliance. Other than the standard mapping of RTP and RTCP to transport layer addresses specified in section 8, such as UDP ports, may be defined.

Encapsulation. RTP packet shaping may be defined to allow multiple RTP information packets to be transmitted in a single underlying protocol data unit (section 8).

Each application you develop should not require a new profile. It is more expedient to expand an existing profile within the same class of applications, rather than create a new one. This will make it easier for applications to interact, since each application typically runs under only one profile. Simple extensions, such as defining additional PT values or RTCP packet types, can be done by registering them with IANA and publishing their descriptions in a profile specification or traffic format specification.

11. RTP profile for audio and video conferencing with minimal control

RFC 1890 describes a profile for using the RTP version 2 real-time transport protocol and its associated RTCP control protocol within a group audio or video conference, the so-called RTP Profile for Audio and Video Conferences (RTP Profile for Audio and Video Conferences). with Minimal Control). This profile defines aspects of RTP not specified in the RTP protocol version 2 specification (RFC 1889). Minimum control means that no support for parameter negotiation or membership control is required (eg, when using static traffic type mappings and membership indications provided by RTCP). Consider the main provisions this profile.

11.1. RTP and RTCP packet formats and protocol parameters

This section contains a description of a number of items that can be defined or modified in a profile.

The header of the RTP information packet. The standard fixed header format of RTP information packets (one bit of marker) is used.

traffic types. Static values for traffic types are defined in sections 11.3 and 11.4.

RTP Information Packet Header Extensions. No additional fixed fields are attached to the RTP information packet headers.

RTP Information Packet Header Extensions. No RTP information packet header extensions are defined, but applications using this profile MAY use such extensions. That is, applications should not assume that the X bit of the RTP header is always zero. Applications must be prepared to ignore header expansion. If a header extension is defined in the future, then the contents of the first 16 bits must be specified so that many different extensions can be identified.

Types of RTCP packets. No additional RTCP packet types are defined in this profile specification.

RTCP reporting interval. When calculating the RTCP reporting interval, the constants proposed in RFC 1889 shall be used.

SR/RR package extensions. Extensions for RTCP SR and RR packets are not defined.

Using SDES. Applications can use any of the described SDES clauses. While the canonical name (CNAME) information is sent in every reporting interval, the other items need only be sent in every fifth reporting interval.

Safety. The default RTP security services are also defined by default by this profile.

Password-key matching. The password entered by the user is converted using the MD5 algorithm into a 16-octet digest. An N-bit key is obtained from the digest by using its first N bits. The password is intended to include only ASCII letters, numbers, hyphens, and spaces to reduce the possibility of corruption when transmitting passwords by phone, fax, telex, or email. The password may be preceded by an encryption algorithm specification. Any characters up to the first forward slash (ASCII code 0x2f) are taken as the name of the encryption algorithm. If there is no forward slash, then the default encryption algorithm is DES-CBC.

The password entered by the user is converted to its canonical form before the closing algorithm is applied. To do this, the password is converted to the ISO 10646 character set using UTF-8 encoding as defined in Annex P of ISO/IEC 10646-1:1993 (ASCII characters do not require any conversion); spaces are removed at the beginning and end of the password; two or more spaces are replaced with one space (ASCII or UTF-8 0x20); all letters are converted to lowercase letters

the underlying protocol. The profile defines the use of RTP over UDP in bidirectional and multicast mode.

Transport Compliance. The standard mapping of RTP and RTCP to transport layer addresses is used.

Encapsulation. Encapsulation of RTP packets is not defined.

11.2. Registering traffic types

This profile defines the standard encoding types used with RTP. Other encoding types must be registered with IANA before use. When registering a new coding type, the following information must be provided:

coding type convention name and RTP timestamp clock frequency (the convention names should be three or four characters long to provide a compact representation, if necessary);
an indication of who has the right to change the encoding type (for example, ISO, CCITT/ITU, other international standards organizations, a consortium, a particular company or group of companies);
any operating parameters;
links to available descriptions encoding algorithm, e.g. (in order of preference) RFC, published article, patent registration, technical report, codec source code or reference;
for private encoding types, Contact Information(postal address and email address);
value to indicate the type of traffic of this profile, if necessary (see below).
Note that not all encoding types to be used with RTP need to be statically assigned. To establish a dynamic mapping between a traffic type (PT) value in the range of 96 to 127 and an encoding type, "non-RTP means" not covered in this article can be used.
The available space of values for traffic types is quite small. New traffic types are assigned statically (permanently) only if the following conditions are met:
coding is of great interest to the Internet community;
it offers benefits comparable to existing encodings and/or is required for interoperability with existing, widely used conferencing or multimedia systems;
the description is enough to create a decoder.

11.3. Audio coding

For applications that do not send packets during pauses, the first burst of active speech (the first packet after the pause) is distinguished by setting the marker bit in the header of the RTP information packet to one. Applications without silence suppression set this bit to zero.

The RTP clock used when generating the RTP timestamp is independent of the number of channels and coding type; it is equal to the number of sampling periods per second. For N-channel coding (stereo, quad, etc.), each sampling period (say 1/8000 second) generates N samples. Total number of samples generated per second is equal to the product of the sample rate and the number of channels.

When using multiple sound channels they are numbered from left to right, starting with the first. In RTP audio packets, data from lower-numbered channels precedes data from higher-numbered channels. For more than two channels, the following notation is used:

l - left;
r - right;
c - central;
S - peripheral;
F - frontal;
R - back.

Number of channels	System name	Channel numbers
Number of channels	System name	1	2	3	4	5	6
2	stereo	l	r
3		l	r	c
4	quad	fl	Fr	Rl	Rr
4		l	c	r	S
5		fl	Fr	Fc	Sl	Sr
6		l	lc	c	r	rc	S

The samples of all channels belonging to the same sampling moment must be within the same packet. The interleaving of samples from different channels depends on the type of coding.

The sample rate must be selected from a variety of: 8000, 11025, 16000, 22050, 24000, 32000, 44100 and 48000 Hz (Apple Macintosh computers have native sample rates of 22254.54 and 11127.27, which can be converted to 22050 and 11025 s acceptable quality by skipping four or two samples in a 20-ms frame). However, most audio coding algorithms are defined for a more limited set of sample rates. Receivers must be prepared to receive multi-channel audio, but can also select mono.

For packaging sound signal, the default packetization interval shall be 20 ms unless specified otherwise in the encoding description. The packetization interval defines the minimum end-to-end delay. In longer packets, a relatively smaller proportion of bytes are allocated for the header, but they cause big delay and make packet loss more significant. For non-interactive applications such as lectures or channels with significant bandwidth constraints, a higher packetization delay may be acceptable. The recipient must receive packets with a sound signal with a delay of 0 to 200 ms. This limit ensures an acceptable buffer size for the receiver.

In sample-based encodings, each signal sample is represented by a fixed number of bits. Within compressed audio data, individual sample codes may cross octet boundaries. The duration of the signal transmitted in the audio packet is determined by the number of samples in the packet.

For sample-based encoding types producing one or more octets per sample, samples from different channels sampled simultaneously are packed into adjacent octets. For example, for stereo encoding, the sequence of octets is: left channel, first sample; right channel, first count; left channel, second count; right channel, second sample, etc. In multi-octet encoding, the most significant octet is transmitted first. The packing of sample-based encodings producing less than one octet per sample is determined by the encoding algorithm.

The frame-based coding algorithm converts a fixed length audio block into another compressed data block, usually also of a fixed length. For frame-based encodings, the sender may combine several such frames into a single message.

For frame-based codecs, the channel order is defined for the whole block. That is, for stereo audio, the samples for the left and right channels are encoded independently; wherein the coding frame for the left channel precedes the frame for the right channel.

All frame-oriented audio codecs must be able to encode and decode multiple consecutive frames transmitted within a single packet. Since the frame size for frame-oriented codecs is specified, there is no need to use a separate notation for the same encoding, but with different number frames in a package.

In table. 3 shows the values of traffic types (PT) defined by this profile for audio signals, their conventions and main specifications coding algorithms.

11.4. Video encoding

In table. 4 shows the values of coding types (PT), symbols of coding algorithms and technical characteristics of video coding algorithms defined by this profile, as well as unassigned, reserved and dynamically assigned PT values.

Traffic type values in the range 96 to 127 can be determined dynamically through the conference control protocol, which is not covered in this article. For example, the session directory may specify that, for a given session, traffic type 96 denotes PCMU coding, dual channel at 8000 Hz. The range of traffic type marked "reserved" is not used so that RTCP and RTP protocol packets can be reliably distinguished .

An RTP source only emits one type of traffic at any given time; interleaving of different types of traffic in one RTP session is not allowed. Multiple RTP sessions can be used in parallel to carry different types of traffic. The traffic types defined in this profile refer to either audio or video, but not both. However, it is possible to define combined traffic types that combine, for example, audio and video, with appropriate separation in the traffic format.

Audio applications using this profile must, at a minimum, be able to send and receive traffic types 0 (PCMU) and 5 (DVI4). This allows interoperability without format negotiation.

11.5. Port Assignment

As defined in the RTP protocol description, RTP data must be transmitted on an even numbered UDP port, and corresponding RTCP packets must be transmitted on a port number greater than one (odd number).

Applications running with this profile may use any such pair of UDP ports. For example, a pair of ports may be randomly assigned by the session management program. A single fixed pair of port numbers cannot be given because in some cases multiple applications using this profile must run correctly on the same host, and some operating systems do not allow multiple processes to use the same UDP port with different multicasts. addresses.

However, the default port numbers can be 5004 and 5005. Applications that use multiple profiles can choose this pair of ports as the indicator of that profile. But applications may also require that the port pair be explicitly specified.

12. List of used terms and abbreviations

ASCII (American Standard Code for Information Interchange) is the American standard code for information interchange. Seven-digit code for representation text information, used with some modifications in most computing systems
CBC (cipher block chaining) - a chain of encrypted blocks, DES data encryption standard mode
CELP (code-excited linear prediction) - a type of audio coding using code-excited linear prediction
CNAME (canonical name) - canonical name
CSRC (contributing source) - included source. The source of the RTP packet stream that contributed to the combined stream produced by the RTP mixer. The mixer inserts into the header of the RTP packet a list of SSRC identifiers of those sources that participated in the formation of this packet. This list is called the CSRC list. Example: the mixer transmits the identifiers of the currently speaking teleconference participants whose voice sounds were mixed and used in the creation of the outgoing packet, pointing the recipient to the current source of messages, even if all sound packets contain the same SSRC identifier (such as the mixer)
DES (Data Encryption Standard) - data encryption standard
IANA (Internet Assigned Numbers Authority) - Internet Assigned Numbers Authority
IMA (Interactive Multimedia Association) - Interactive Multimedia Association
IP (Internet Protocol) - internet protocol, network layer protocol, datagram protocol. Allows packets to cross multiple networks on their way to their destination
IPM (IP Multicast) - multicast using the IP protocol
LD-CELP (low-delay code excited linear prediction) - a speech coding algorithm using code-excited linear prediction with low delay
LPC (linear predictive encoding) - linear prediction coding
NTP (Network Time Protocol) - a network time protocol, is a countdown in seconds relative to zero hours on January 1, 1900. The full NTP timestamp format is an unsigned 64-bit number with fixed point with an integer part in the first 32 bits and a fractional part in the last 32 bits. In some cases, a more compact representation is used, in which only the middle 32 bits are taken from the full format: the low 16 bits of the integer part and the high 16 bits of the fractional part
RPE/LTP (residual pulse excitation/long term prediction) - speech signal coding algorithm with differential pulse excitation and long-term prediction
RTCP (Real-Time Control Protocol) - real-time communication control protocol
RTP (Real-Time Transport Protocol) - real-time transport protocol
SSRC (synchronization source) - synchronization source. The source of the RTP packet stream, identified by the 32-bit numeric SSRC identifier that is carried in the RTP header, regardless of the network address. All packets with the same timing source use the same timing interval and the same sequence number space, so that the receiver groups the packets for playback using the timing source. Synchronization source example: The sender of a stream of packets received from a signal source such as a microphone, video camera, or RTP mixer. The clock source may change the data format over time, such as audio coding. The SSRC ID is a randomly selected value that is considered globally unique within a particular RTP session. A teleconference participant is not required to use the same SSRC identifier for all RTP sessions in a multimedia session; SSRC ID aggregation is provided through the RTCP protocol. If a participant generates multiple streams in one RTP session, for example from multiple cameras, then each stream must be identified by a separate SSRC
TCP (Transmission Control Protocol) is a transport layer protocol used in conjunction with the IP protocol
UDP (User Datagram Protocol) is a transport layer protocol without establishing a logical connection. UDP only provides for sending a packet to one or more stations on the network. Checking the correctness and ensuring the integrity (assured delivery) of data transmission is carried out at a higher level
ADPCM - adaptive differential pulse code modulation
jitter (jitter) - jitter, deviations of the phase or frequency of the signal; in relation to IP telephony - datagram delay irregularities in the network
ZPD - data transmission link (the second level of the Reference model of interaction open systems)
IVS - information and computing networks
mixer (mixer) - an intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines packets into new package RTP and then transmits it. Since multiple signal sources are generally out of sync, the mixer corrects the timing of the component streams and generates its own timing for the combined stream. Thus, all data packets generated by the mixer are identified as having the mixer as their clock source.
monitor (monitor) - an application that receives RTCP packets sent by RTP session participants, in particular, reception reports, and evaluates the current quality of service for distribution control, error detection and long-term statistics. Normally, the functions of a monitor lie with the applications used in the session, but the monitor can also be a separate application that is not otherwise used, sending, or receiving RTP information packets. Such applications are called third party monitors.
ITU-T - Telecommunication Standardization Sector of the International Telecommunication Union
end system - an application that generates the content transmitted in RTP packets and/or that consumes the content of received RTP packets. An end system may act as one or more (but usually only one) clock sources in each RTP session.
RTCP packet - a control packet consisting of a fixed header part, similar to the headers of RTP protocol information packets, followed by structural elements that change depending on the type of RTCP packet. Typically, multiple RTCP packets are transmitted together as a multiple RTCP packet in a single underlying protocol packet; this is provided by the length field in the fixed header of each RTCP packet
RTP packet - A protocol data unit consisting of a fixed RTP header, possibly an empty list of sources to include, an extension, and traffic. Typically, one underlying protocol packet contains one RTP packet, but there may be several
port is an abstraction used by transport layer protocols to distinguish between multiple destinations within a single host computer. The port is identified by its number. Thus, the port number is a number that identifies the specific application to which the forwarded data is intended. This number, along with information about which protocol (for example, TCP or UDP) is used at the upper layer, is contained among other service information in datagrams sent over the Internet. Transport selectors (TSELs) used by the transport OSI layer, are equivalent to ports
profile (profile) - a set of parameters of the RTP and RTCP protocols for a class of applications, which determines the features of their functioning. The profile defines the use of the marker bit and traffic type fields in the RTP data packet header, traffic types, RTP data packet header extensions, the first 16 bits of the RTP data packet header extension, RTCP packet types, RTCP reporting interval, SR/RR packet extension, use SDES packets, services and algorithms for ensuring communication security and features of using the underlying protocol
RTP session (RTP session) - communication of multiple participants interacting through the RTP protocol. For each participant, a session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). The destination transport address pair may be common to all participants (as in the case of IPM) or may be different for each (an individual network address and a common pair of ports, as in bidirectional communication). In a multimedia session, each type of traffic is carried in a separate RTP session with its own RTCP packets. Multicast RTP sessions are distinguished by different port pair numbers and/or different multicast addresses
non-RTP means - Protocols and mechanisms that may be needed in addition to RTP to provide an acceptable service. Particularly for multimedia conferencing, a conference control application may distribute multicast addresses and encryption keys, negotiate the encryption algorithm to be used, and determine dynamic mappings between RTP traffic type values and the traffic formats they represent (formats that do not have a predefined value). type of traffic). For simple applications can also be used Email or conference database
translator (translator) - an intermediate system that forwards RTP packets without changing the identifier of the synchronization source. Examples of translators: devices that transcode without mixing, multi-way or bi-directional replicators, application layer applications in firewalls
transport address - A combination of network address and port number that identifies a transport endpoint, such as an IP address and a UDP port number. Packets are transmitted from the source transport address to the destination transport address
RTP traffic - multimedia data transmitted in an RTP protocol packet, such as audio samples or compressed video data
PSTN - Public Switched Telephone Networks

RTP protocol

The main transport protocol for multimedia applications has become the real-time protocol RTP (Real-Time Protocol), designed to organize the transmission of packets with coded speech signals over an IP network. The transmission of RTP packets is carried out over the UDP protocol, which, in turn, works over IP (Fig. 1.5.).

Rice. 1.5.

In fact, the level to which RTP belongs is not defined as unambiguously as shown in Fig. 1.5 and as it is usually described in the literature. On the one hand, the protocol really works on top of UDP, is implemented by application programs and, by all indications, is an application protocol. But at the same time, as stated at the beginning of this paragraph, RTP provides transport services independently of multimedia applications and is, from this point of view, just a transport protocol. Best definition: RTP is a transport protocol implemented at the application layer.

To transmit voice (multimedia) traffic, RTP uses packets, the structure of which is shown in Fig. 1.6.

An RTP packet consists of at least 12 bytes. The first two bits of the RTP header (version bit field, V) indicate the version of the RTP protocol (currently version 2).

Clearly, with this header structure, only one more RTP version is possible at most. The field following them contains two bits: the P bit, which indicates whether padding characters have been added to the end of the payload field (they are usually added if the transport protocol or encoding algorithm requires the use of fixed-size blocks), and the X bit, which indicates Whether an extended header is being used.

Rice. 1.6.

If used, the first word of the extended header contains the total length of the extension. Further, the four CC bits determine the number of CSRC fields at the end of the RTP header, i.e. the number of sources forming the flow. The marker bit M allows you to mark what the standard defines as significant events, for example, the beginning of a video frame, the beginning of a word in an audio channel, and so on. It is followed by a PT data type field (7 bits), which indicates the payload type code that determines the contents of the payload field - application data (Application Data), for example, uncompressed 8-bit MP3 audio, etc. From this code, the application can learn what to do to decode the data. The rest of the fixed-length header consists of a Sequence Number field, a Time Stamp field to record when the first word of the packet was created, and an SSRC timing source field that identifies this source. The last field can be a single device with only one network address, multiple sources that can represent different media (audio, video, etc.), or different streams of the same media. Since the sources can be on different devices, the SSRC identifier is chosen randomly so that the chance of receiving data from two sources at once during an RTP session is minimal. However, a mechanism for resolving conflicts if they arise is also defined. The fixed part of the RTP header can be followed by up to 15 separate 32-bit CSRC fields that identify data sources.

RTP is supported by the Real-Time Transport Control Protocol (RTCP), which generates additional reports containing information about RTP sessions. Recall that neither UDP nor RTP are engaged in providing QoS (Quality of Service). The RTCP protocol provides feedback to senders, and to stream receivers it provides some QoS enhancements, packet information (loss, delay, jitter) and user (application, stream). For flow control, there are two types of reports - generated by senders and generated by recipients. For example, information about the percentage of lost packets and the absolute number of losses allows the sender, when receiving a report, to detect that channel congestion may cause receivers not to receive packet streams that they expected. In this case, the sender has the option to lower the coding rate to reduce congestion and improve reception. The sender report contains information about when the last RTP packet was generated (it includes both an internal label and real time). This information allows the recipient to coordinate and synchronize multiple streams such as video and audio. If the stream is directed to several recipients, then streams of RTCP packets from each of them are organized. This will take steps to limit the bandwidth - inversely proportional to the rate at which RTCP reports are generated and the number of recipients.

It should be noted that although RTCP works separately from RTP, the RTP/UDP/IP chain itself leads to significant overhead (in the form of their headers). The G.729 codec generates packets of 10 bytes (80 bits every 10 ms). One RTP header, 12 bytes in size, is larger than this entire packet. In addition, an 8-byte UDP header and a 20-byte IP header (in IPv4) must be added to it, which creates a header that is four times the size of the transmitted data.

One of the most important trends in the evolution of modern telecommunications is the development of IP-telephony - a set of new technologies that ensure the transmission of multimedia messages (voice, data, video) through information and computer networks (ICNs) built on the basis of the IP (Internet Protocol) protocol, in including local, corporate, global computer networks and the Internet. The concept of IP telephony includes Internet telephony, which allows organizing telephone communication between Internet subscribers, between subscribers telephone networks general use (PSTN) over the Internet, as well as telephone communication between PSTN and Internet subscribers with each other.

IP-telephony has a number of undeniable advantages that ensure its rapid development and expansion of the computer telephony market. It is beneficial to end users who are provided with telephone communication at a fairly low per-minute payment. For companies with remote branches, IP technology allows you to organize voice communications using existing corporate IP networks. Instead of several communication networks, one is used. The undoubted advantage of IP-telephony over a regular phone is also the ability to provide additional services through the use of a multimedia computer and various Internet applications. Thus, with IP telephony, businesses and individuals can expand their communications capabilities by incorporating advanced videoconferencing, application sharing, whiteboard-type tools, and more.

What international standards and protocols regulate the main parameters and algorithms for the operation of hardware and software tools connections used in IP-telephony? Obviously, as the name suggests, this technology is based on the IP protocol, which, however, is used not only for telephony: it was originally developed for transmitting digital data to packet-switched IVS.

In networks that do not provide a guaranteed quality of service (these include networks built on the basis of the IP protocol), packets may be lost, the order of their arrival may change, the data transmitted in packets may be distorted. Various transport layer procedures are used to ensure reliable delivery of transmitted information under these conditions. When transmitting digital data, the TCP protocol (Transmission Control Protocol) is used for this purpose. This protocol provides reliable data delivery and restores the original packet order. If an error is detected in a packet or the packet is lost, the TCP procedures send a retransmission request.

For audio and video conferencing applications, packet delays have a much greater effect on signal quality than individual data distortions. Differences in delays can lead to gaps. Such applications require a different transport layer protocol that provides packet resequencing, delivery with minimum delay, real-time playback at precisely specified moments, traffic type recognition, multicast or two-way communication. Such a protocol is the real-time transport protocol RTP (Real-Time Transport Protocol). This protocol regulates the transmission of multimedia data in packets through the IVS at the transport level and is supplemented by the real-time data transmission control protocol RTCP (Real-Time Control Protocol). The RTCP protocol, in turn, provides control over the delivery of multimedia data, quality of service control, transfer of information about the participants in the current communication session, control and identification, and is sometimes considered part of the RTP protocol.

Many publications on IP telephony note that most of the network equipment and special software for this technology is developed on the basis of the Recommendation H.323 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) (including TAPI 3.0, NetMeeting 2.0, etc.). How does H.323 relate to RTP and RTCP? H.323 is a broad conceptual framework that includes many other standards, each dealing with different aspects of information transfer. Most of these standards, such as audio and video codec standards, are widely used not only in IP telephony. As for the RTP / RTCP protocols, they form the basis of the H.323 standard, are focused on providing exactly IP technology, and underlie the organization of IP telephony. This article is devoted to the consideration of these protocols.

2. Basic concepts

The RTP real-time transport protocol provides end-to-end real-time transmission of multimedia data such as interactive audio and video. This protocol implements traffic type recognition, packet sequence numbering, work with timestamps and transmission control.

The action of the RTP protocol is reduced to assigning each outgoing packet a timestamp. On the receiving side, packet timestamps indicate in what sequence and with what delays they need to be played back. Support for RTP and RTCP allows the receiving host to arrange the received packets in the proper order, reduce the effect of packet delay jitter on the network on signal quality, and restore synchronization between audio and video so that incoming information can be correctly heard and viewed by users.

Note that RTP itself does not have any mechanism to guarantee timely data transmission and quality of service, but uses underlying services to ensure this. It does not prevent out-of-order packets, but it does not assume that the underlying network is absolutely reliable and transmits packets in the correct sequence. The sequence numbers included in RTP allow the receiver to re-sequence the sender's packets.

The RTP protocol supports both two-way communication and data transfer to a group of destinations if the multicast is supported by the underlying network. RTP is intended to provide information required by individual applications, and in most cases is integrated into the operation of the application.

Although RTP is considered a transport layer protocol, it usually functions on top of another transport layer protocol, UDP (User Datagram Protocol). Both protocols contribute to the functionality of the transport layer. It should be noted that RTP and RTCP are independent of the underlying transport and network layers, so the RTP/RTCP protocols can be used with other suitable transport protocols.

RTP/RTCP protocol data units are called packets. Packets generated in accordance with the RTP protocol and used to transmit multimedia data are called information packets or data packets (data packets), and packets generated in accordance with the RTCP protocol and used to transmit service information required for reliable teleconferencing are called packets. control or service packets (control packets). An RTP packet includes a fixed header, an optional variable length header extension, and a data field. An RTCP packet starts with a fixed part (similar to the fixed part of RTP information packets) followed by variable length building blocks.

In order for the RTP protocol to be more flexible and applicable to various applications, some of its parameters are intentionally undefined, but it provides for the concept of a profile. Profile (profile) is a set of parameters for RTP and RTCP protocols for a specific class of applications, which determines the features of their functioning. The profile defines the use of individual packet header fields, traffic types, header additions and header extensions, packet types, communication security services and algorithms, features of the use of the underlying protocol, etc. RTP profile for audio and video conferencing with minimal control). Each application usually works with only one profile, and setting the profile type is done by selecting the appropriate application. No explicit indication of profile type by port number, protocol identifier, etc. not provided.

Thus, a complete RTP specification for a particular application must include additional documents, which include a profile description, as well as a traffic format description that defines how a particular type of traffic, such as audio or video, will be processed in RTP.

Features of multimedia data transmission during audio and video conferences are discussed in the following sections.

2.1. Group audio conferencing

Group audio conferencing requires a multi-user group address and two ports. In this case, one port is required for the exchange of audio data, and the other is used for control packets of the RTCP protocol. The group address and port information is sent to the intended teleconference participants. If privacy is required, then the information and control packets may be encrypted as defined in Section 7.1, in which case the encryption key must also be generated and distributed.

The audio conferencing application used by each conference participant sends audio data in small bursts, such as 20 ms. Each piece of audio data is preceded by an RTP header; the RTP header and data are in turn formed (encapsulated) into a UDP packet. The RTP header indicates which type of audio coding (eg, PCM, ADPCM, or LPC) was used to form the data in the packet. This makes it possible to change the coding type during the conference, for example, when a new participant arrives who uses a low bandwidth connection, or during network congestion.

In the Internet, as in other packet-switched data networks, packets are sometimes lost and reordered, and also delayed for various times. To counteract these events, the RTP header contains a timestamp and sequence number that allow receivers to re-timing so that, for example, portions of an audio signal are played continuously by the speaker every 20 ms. This timing reconstruction is performed separately and independently for each source of RTP packets in the teleconference. The sequence number can also be used by the receiver to estimate the number of lost packets.

Since participants in a teleconference can join and leave during a teleconference, it is useful to know who is currently in the conference and how well the conference participants are receiving audio data. For this purpose, each instance of the audio application during the conference periodically issues on the control port (RTCP port) for applications of all other participants, packet reception messages indicating their user name. The receive message indicates how well the current speaker is being heard and can be used to control adaptive encoders. In addition to the username, other identification information for bandwidth control may also be included. When leaving the conference, the site sends an RTCP BYE packet.

2.2. Videoconferencing

If both audio and video signals are used in a teleconference, they are transmitted separately. For the transmission of each type of traffic, regardless of the other, the protocol specification introduces the concept of an RTP session (see the list of abbreviations and terms used). A session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). Packets for each type of traffic are transmitted using two different pairs of UDP ports and/or multicast addresses. There is no direct RTP layer connection between audio and video sessions, except that a user participating in both sessions must use the same canonical name in the RTCP packets for both sessions so that the sessions can be linked.

One reason for this separation is that some conference participants need to be allowed to receive only one type of traffic if they wish to. Despite the separation, synchronous playback of source media data (audio and video) can be achieved using the timing information that is carried in the RTCP packets for both sessions.

2.3. The concept of mixers and translators

Not always all sites have the ability to receive multimedia data in the same format. Consider the case where participants from the same locality are connected via a low speed link to the majority of other conference participants who have broadband network access. Instead of forcing everyone to use a narrower bandwidth and lower quality audio coding, an RTP layer communication facility called a mixer can be placed in a low bandwidth region. This mixer resynchronizes the incoming audio packets to restore the original 20ms intervals, mixes these restored audio streams into a single stream, performs low bandwidth audio encoding, and transmits the packet stream over a low speed link. In this case, packets can be addressed to one recipient or a group of recipients with different addresses. In order for receiving endpoints to provide a correct indication of the source of messages, the RTP header includes means for mixers to identify the sources involved in the formation of the mixed packet.

Some of the participants in the audio conference may be connected by broadband communication lines, but may not be reachable through an IP multicast group conference (IPM). For example, they may be behind an application layer firewall that will not allow any transmission of IP packets. For such cases, not mixers are needed, but a different type of RTP layer communication, called translators. Of the two translators, one is installed outside the firewall and externally forwards all multicast packets received over a secure connection to the other translator installed behind the firewall. The translator behind the firewall broadcasts them again as multicast packets to a multi-user group restricted to internal network site.

Mixers and translators can be designed for a number of purposes. Example: A video mixer that scales video images of individuals in independent video streams and composites them into a single video stream, simulating a group scene. Broadcast examples: Connecting a group of IP/UDP-only hosts to a group of ST-II-only hosts, or transcoding video packet by packet from individual sources without retiming or mixing. The details of how mixers and translators work are discussed in Section 5.

2.4. Byte order, alignment, and timestamp format

All fields of RTP/RTCP packets are transmitted over the network in bytes (octets); the most significant byte is transmitted first. All header field data is aligned according to its length . Octets designated as optional have a value of zero.

Absolute time (Wallclock time) in RTP is represented using the NTP (Network Time Protocol) timestamp format, which is a countdown in seconds relative to zero hours on January 1, 1900. The full NTP timestamp format is a 64-bit unsigned fixed-point number with an integer part in the first 32 bits and a fractional part in the last 32 bits. In some fields with a more compact representation, only the middle 32 bits are used - the low 16 bits of the integer part and the high 16 bits of the fractional part.

The next two sections of this article (3 and 4) discuss the packet formats and features of the functioning of the RTP and RTCP protocols, respectively.

3. RTP data transfer protocol

3.1. Fixed RTP header fields

As noted above, an RTP packet includes a fixed header, an optional variable length header extension, and a data field. The fixed header of RTP protocol packets has the following format: .

The first twelve octets are present in every RTP packet, while the contributing source CSRC (contributing source) identifier field is present only when inserted by the mixer. The fields have the following purposes.

Version (V): 2 bits. This field identifies the RTP version. This article focuses on version 2 of the RTP protocol (value 1 was used in the first draft version of RTP).

Complement (P): 1 bit. If the padding bit is set to one, then the packet at the end contains one or more padding octets that are not part of the traffic. The last padding octet contains an indication of the number of such octets to be subsequently ignored. Padding may be required by some cipher algorithms with fixed block sizes or to carry multiple RTP packets in a single underlying protocol payload.

Extension (X): 1 bit. If the extension bit is set, then the fixed header is followed by a header extension with the format defined in .

CSRC counter (CC): 4 bits. The CSRC counter contains the number of CSRC source identifiers to include (see list of used abbreviations and terms) that follow the fixed header.

Marker (M): 1 bit. The interpretation of the marker is determined by the profile. It is intended to allow significant events (eg video frame boundaries) to be marked in the packet stream. The profile may introduce additional marker bits or determine that no marker bit is present by changing the number of bits in the traffic type field (see ).

Traffic type (PT): 7 bits. This field identifies the format of the RTP traffic and determines how the application will interpret it. A profile defines a default static mapping of PT values and traffic formats. Additional traffic type codes can be defined dynamically via non-RTP facilities. The sender of an RTP packet at any given time emits a single RTP traffic type value; this field is not intended for multiplexing individual media streams (see ).

Sequence number: 16 bits. The sequence number value is incremented by one with each RTP information packet sent and can be used by the receiver to detect lost packets and restore their original sequence. The initial value of the sequence number is chosen randomly to make it difficult to crack the key based on known values of this field (even if the source does not use encryption, since the packets may pass through a relay that uses encryption). Timestamp: 32 bits. The timestamp reflects the sampling time for the first octet in the RTP information packet. The sample time must be derived from a timer that increments monotonically and linearly with time to provide synchronization and jitter detection (see Section 4.3.1). The resolution of the timer should be sufficient for the desired timing accuracy and packet arrival jitter measurement (one timer report per video frame is usually not enough). The timing frequency depends on the format of the transmitted traffic and is set statically in the traffic format profile or specification, or can be set dynamically for traffic formats defined through "non-RTP facilities". If RTP packets are generated periodically, then the nominal sampling times determined by the sampling timer should be used, not the values of the system timer. For example, for a fixed rate audio signal, it is desirable that the timestamp encoder be incremented by one for each sample period. If an audio application from an input device reads blocks containing 160 samples, then the timestamp must be incremented by 160 for each such block, regardless of whether the block was transmitted in a packet or dropped as a pause. The initial value of the timestamp, like the initial value of the sequence number, is a random value. Several consecutive RTP packets may have equal timestamps if they are logically generated at the same time, eg belong to the same video frame. Consecutive RTP packets may contain non-monotone timestamps if the data is not transmitted in sample order, as is the case with interpolated MPEG video frames (however, packet sequence numbers will still be monotonic when transmitted).

SSRC: 32 bits. The SSRC (synchronization source) field identifies the synchronization source (see the list of used abbreviations and terms). This ID is randomly chosen so that no two clock sources within the same RTP session have the same SSRC ID. While the likelihood of multiple origins choosing the same identifier is low, all RTP implementations must be prepared to detect and resolve such collisions. Section 6 discusses the probability of collisions along with a mechanism for resolving them and detecting RTP layer loops based on the uniqueness of the SSRC identifier. If a source changes its original transport address, then it must also choose a new SSRC identifier so that it is not interpreted as a looped source.

CSRC list: 0 to 15 items, 32 bits each. The contributing source (CSRC) list identifies the sources of traffic contained in the packet to include. The number of identifiers is given by the CC field. If there are more than fifteen included sources, then only 15 of them can be identified. CSRC IDs are inserted by mixers when using SSRC IDs for switched sources. For example, for sound packets, the SSRC identifiers of all sources that were mixed when the packet was created are listed in the CSRC list, providing a correct indication of message sources to the recipient.

3.2. RTP sessions

As mentioned above, in accordance with the RTP protocol, different types of traffic must be transmitted separately, in different RTP sessions. A session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). For example, in a teleconference composed of separately encoded audio and video, each type of traffic needs to be sent in a separate RTP session with its own destination transport address. Audio and video are not expected to be carried in the same RTP session and separated based on traffic type or SSRC fields. Interleaving of packets having different types traffic but using the same SSRC would cause some problems:

If one of the traffic types changes during a session, there will be no general means to determine which of the old values has been replaced by the new one.
The SSRC identifies a single timing interval value and sequence number space. Interleaving multiple types of traffic would require different synchronization intervals if the clock rates of the different streams differ, and different sequence number spaces to indicate the type of traffic to which the packet loss is related.
The RTCP sender and receiver messages (see Section 4.3) describe only one timing interval value and sequence number space for SSRC and do not carry a traffic type field.
The RTP mixer is not capable of combining interleaved streams of different types of traffic into a single stream.
The transmission of multiple types of traffic in a single RTP session is hampered by the following factors: different network paths or distribution of network resources; receiving a subset of multimedia data when required, such as audio only if the video signal has exceeded the available bandwidth; sink implementations that use separate processes for different types of traffic, while using separate RTP sessions allows for both single and multiple process implementations.

By using different SSRCs for each type of traffic, but sending them in the same RTP session, the first three problems can be avoided, but the last two cannot be avoided. Therefore, the specification of the RTP protocol requires each type of traffic to use its own RTP session.

3.3. Profile-defined RTP header changes

The existing RTP Information Packet header is complete for the set of features required in general for all classes of applications that might support RTP. However, for better adaptation to specific tasks, the header can be modified through modifications or additions defined in the profile specification.

The marker bit and traffic type field carry profile specific information, but are located in a fixed header as many applications are expected to need them. The octet containing these fields may be redefined by the profile to meet different requirements, for example with more or less marker bits. If any marker bits are present, they should be placed in the high-order bits of the octet, since profile-independent monitors may be able to observe a correlation between the packet loss pattern and the marker bit.

Additional information that is required for a particular traffic format (eg video coding type) MUST be carried in the data field of the packet. It can be placed at a certain place at the beginning or inside the data array.

If a particular class of applications needs additional functionality independent of the traffic format, then the profile that those applications operate with must define additional fixed fields to be placed immediately after the SSRC field of the existing fixed header. These applications will be able to quickly access additional fields directly, while profile-independent monitors or recorders will still be able to process RTP packets by interpreting only the first twelve octets.

If it is considered that additional functionality is needed in general for all profiles, then the a new version RTP to make permanent change fixed header.

3.4. RTP header extension

To allow individual implementations to experiment with new traffic-format-independent features that require additional information to be carried in the information packet header, RTP provides a packet header extension mechanism. This mechanism is designed so that the header extension can be ignored by other cooperating applications that do not require it.

If the X bit in the RTP header is set to one, then a variable length header extension is appended to the fixed RTP header (following the CSRC list, if any). Note that this header extension is for limited use only. The RTP packet header extension has the following format:

The extension contains a 16-bit length field that indicates the number of 32-bit words in it, excluding the four-octet extension header (hence the length can be zero). Only one extension can be added to a fixed RTP information packet header. To allow each of a plurality of cooperating implementations to experiment independently with different header extensions, or to allow a particular implementation to experiment with more than one type of header extension, the use of the first 16 bits of the extension is undefined, left to distinguishing identifiers or parameters. The format of these 16 bits must be determined by the profile specification that the applications are working with.

1999
2000

When we, talking on an IP phone, hear the voice of the interlocutor in the receiver, or, using a video conferencing system, communicate with our colleagues and relatives, we exchange a continuous stream of data. When transmitting streaming data such as voice and video over a packet network, it is very important to use mechanisms that would solve the following tasks:

Eliminate the effect of packet loss
Order Restoration and Packet Control
Delay smoothing (jitter)

For these purposes, it was developed RTP(Real-time Transport Protocol) is a real-time transmission protocol, which will be discussed in today's article. The protocol was developed by the IETF by the Audio-Video Transport Working Group and is described in RFC 3550.

As a rule, RTP works on top of UDP (User Datagram Protocol), because when transmitting multimedia data, it is very important to ensure their timely delivery.

RTP includes the ability to determine the type of payload and assign a sequence number of the packet in the stream, as well as the use of timestamps.

On the transmitting side, each packet is marked with a timestamp, the receiving side receives it and determines the total delay, after which the difference in total delays is calculated and jitter is determined. Thus, it becomes possible to set a constant delay in the delivery of packets and thereby reduce the effect of jitter.

Another function of RTP is related to possible losses packets while passing through the IP network, which is expressed in the appearance of short pauses in the conversation. Sudden silence in handset, as a rule, has a very negative effect on the listener, therefore, with the capabilities of the RTP protocol, such periods of silence are filled with so-called “comfort noise”

RTP works in conjunction with another IETF protocol, namely RTCP (Real-time Transport Control Protocol), which is described in RFC 3550. RTCP is designed to collect statistical information, determine the quality of service QoS (Quality of Service), and also to synchronize between media streams of the RTP session.

The main function of RTCP is to establish feedback with the application to report on the quality of the information received. Participants in an RTCP session exchange information about the number of received and lost packets, jitter value, delay, etc. Based on the analysis of this information, a decision is made to change the transmission parameters, for example, to reduce the compression ratio of information in order to improve the quality of its transmission.

To perform these functions, RTCP sends special messages of certain types:

SR - Sender Report - source report with statistical information about RTP session
RR - Receiver Report - a report of the recipient with statistical information about the RTP session
SDES - contains a description of the source options, including cname (username)
BYE – Initiates the end of membership in a group
APP - Description of application functions

RTP is a unidirectional protocol, so two-way communication requires two RTP sessions, one on each side.

An RTP session is defined by the IP addresses of the participants, as well as a pair of unreserved UDP ports from the range 16384 - 32767. In addition, in order to organize feedback with the application, it is also necessary to establish a two-way RTCP session. For RTCP sessions, ports with a number one greater than RTP are occupied. So for example, if port 19554 is selected for RTP, then the RTCP session will take port 19555. Visually, the formation of an RTP/RTCP session is shown in the figure below.

Just about the complex. Programs. Iron. Internet. Windows