[jdev] Re: Question about Proxy65 support Voice Chat

Thu Apr 21 04:45:43 CDT 2005

Hi Nolan!

Nolan Eakins schrieb:

> You probably would want to use TCP for voice. You would not want to have
> an audio stream with someone saying "George Bush has never lied" to end
> up with a few packets lost along the way saying "George Bush has lied".
> UDP would be good for video though where you can get away with dropping
> half a frame or even a whole one since the information essentially
> replaces the previous information.
>
> Voice and audio is not like that. You will need to transfer it reliably.

No ... TCP would take by far to long to resend the dropped frame. You
can use TCP for internet radio streaming and that is why your streaming
client is typically buffering about 15 seconds of audio data before
playing it. It is hoping that TCP will retransmit lost packets within
this time. But even for internet radio TCP is not used because it is
suited for audio, but because HTTP is used to be able to pass proxies
and HTTP is transmitted over TCP.
For internet telephonie you cannot buffer 15 seconds of data. In the era
of the traditional telephone networks, it had been thought that a
maximum delay of 10 to 20 ms would be acceptable. In our days we accept
about 100 to 200 ms delay. (Especially we accept this, as internet
telephony systems have no echo and you don't hear your own voice.) But
with a buffer of 15 seconds, it would take half a minute to get your
answer back from the other person.

> TCP is what SIP/SDP has selected (there's another protocol at that layer
> too that's been drafted for use). Their model of getting around NATs and
> firewalls is also a lot like proxy65 in that a third party server is used.

SIP is not a protocol for audio transfer, but for session establishment.
It the protocol that calls the other side and sends back to the
originator that the called party's phone is rinning. This does not need
realtime, that's true. But even so SIP is specified using either TCP and
UDP and if you check out the hardware phones you can buy, you will see
that they typically use SIP over UDP.
SDP is the session description protocol. It is used as a payload for SIP
to signal which codecs the sender supports and to send back which codec
the receiver accepted. In the SDP you can also select which protocol is
used to carry our audio traffic, but it will nearly most of the time be
RTP (real time  transfer protocol) which is based only on UDP.

SIP phones are using STUN (Simple Traversal of User Datagram Protocol
(UDP) through Network Address Translators (NATs)). STUN servers are not
at all working like proxy65. A client just uses the STUN server to
detect the type of NAT it is behind, who it is assigning external IP
addresses and port numbers to it. With most NAT implementations you get
a fixed IP and port if you send you packets from a fixed source port
doesn't matter where you send them too. So you can discover with the
help of the STUN server on which address you can directly receive UDP
traffic from other hosts on the internet. The actual UDP transfer of
voice data is made directly between the two endpoints.

It is also possible to relay the voice data on on of the SIP servers
that routed the call, but it is not intended to pass NAT, but to offer
PBX-like features for which it might be necessary that the "PBX" stays
in the so called "media path".

Tot kijk
     Matthias