[JDEV] Videoconferencing with jabber / Re:[speex-dev]Videoconferencing with speex and jabber

Tue Dec 2 07:57:55 CST 2003

On Tue, 2 Dec 2003 10:50:36 -0000, Richard Dobson <richard at dobson-i.net> 
wrote:

> I didnt intend to say that we should not have server based conferencing 
> at
> all, all I originally meant to object to was putting that into a client
> rather than just using a dedicated server, and in cases where you can 
> that
> you should use p2p. Now I can see that p2p wont cover all cases, but 
> also I
> am unconvinced server based will cover all uses either, so as I 
> suggested at
> the bottom of the email is a hybrid cs/p2p mechanism that can take the 
> best
> of both worlds and cover more situations that either one of them alone 
> plus
> being more flexable.

I would suggest still using the c/s model as a basis for the spec, since 
you can use as the basis for person to person over a direct link, person 
to person via a server, conferencing over direct links (with each "peer" 
acting as server, as described in an additional spec) and of course, 
conferencing on a server.

However, it should also allow for an additional spec where there is more 
than one server hosting the conference.

There's different approaches you can take for this, you could create a 
persistant network of "public" "supernodes", but this ofcourse brings it's 
security issues (since you can't encrypt if there has to be mixing on the 
supernodes) or you could create a network of "supernodes" for each 
conversation.

Ideally a client that only implements the "base" c/s spec should be able 
to work transparently with such an "upgraded" network. For example I can 
imagine the "clients" will have to disco the "server" for wich server 
address to connect their audiostream to (wether this a JID or IP I 
wouldn't know yet). The server could also provide them with fallback 
servers (wich could ofcourse be updated as well during the conference as 
new "clients" connect that are able/willing to take the role of such a 
"supernode"). This distributes bandwith and CPU load amongst servers. The 
spec would mostly focus on how "supernodes" amongst themselves regulate 
the network (including stuff like in-sync mixing).

However, I think there's still a lot of cases where at least I myself 
would just prefer to host everything on my own connection (if that's the 
fastest one avaible). It's also important person to person conversations 
over direct links will work. This still leaves the problem of NAT 
traversal, which is really what Skype is all about.

>> Skype uses UDP NAT travelsal based on getting it's IP from someone 
>> outside
>> the NAT (at least, so it was suggested either here or on SJIG), wich is
>> currently being rejected by the jabber server folks
>
> I was under the impression it was asking your server for the IP address 
> that
> they are objecting to not other methods of obtaining it.

What I heard suggested was uPnP and SOCKS settings. I'm just pointing out 
the difference between that approach (relying on the client to figure it 
out) and Skype (whatever way will work, use it, and the jabber server 
telling you will cover it in a lot of cases to start with)

>> SOCKS5 is hardly integrated with an excisting mechanism
>
> Well it is using an existing protocol rather than baking our own,

As I understand it, it basically just uses it for the connection 
initiation part so it can pass some meta-data.

> I was
> suggesting that maybe we could do the same in this case and see if there 
> is
> an existing protocol/system that we could integrate with rather than
> possibly duplicating effort and saving us time.

I already named JXTA a few times. Though I have to admit it's been years 
probably since I looked at it, it will probably support opening binary 
streams over it's decentralized network (support for UDP I don't know).

But I think all existing solutions will depend on running a decentralized 
peer network independant of Jabber. Wheter this is something we want I 
don't know :)

>> Well, I agree that, just like with the bandwith requirments, demands on
>> the server will be higher than on a node in a direct link conference. 
>> Just
>> not THAT much higher
>
> Sorry but it is much higher because of the fact that in p2p clients do 
> not
> need to do any re-encoding at all, plus the fact that you need 
> individually
> mix the streams before re-encoding them to send out, this IS much more 
> of a
> requirement since using p2p you dont necessarly even have to do the 
> mixing
> step, let alone the re-encoding step (which will be the most CPU 
> intensive
> part). Plus remember to prevent echoing you really need to mix/encode 
> each
> of the outgoing streams individually.

Unless you use client side echo removal (which ofcourse will put some 
extra burden on the client, what I'm indeed trying to avoid all along, but 
it's still a compromise). But agreed, this is one of the main advantages 
of using a direct link based conference. Again, you'll benifit the most 
(and the disadvantages will be the least obvious) if all clients have 
somewhat equal specs, while in most cases I doubt this is true for Joe 
Consumer.

>>
>> Ofcourse you still have to mix when you use DirectX ;)
>
> Playing two (or more) streams simultaneously is not mixing as far as I am
> concerned, so no you dont necessarly have to even mix.

Ofcourse the streams are mixed, just by DirectX/ALSA/whatever instead of 
you. If your soundcard has some Direct X compatible hardware for this the 
soundcard can do the mixing. (Which is still true in many cases if you use 
a server that uses DirectX for mixing too)

>> Servers can use
>> existing technology too ofcourse.. Servers (components) specializing in
>> hosting this kind for companies or paying customers could even use DSP
>> hardware and such.
>
> Sure but using hardware such as that will be out of reach for the vast
> majority of people.

The vast majority of people won't need to mix several thousend streams at 
once either ;)

> I doubt it would be very clean and pretty hard to do, the only way I can 
> see
> it really working is by individually encoding each outgoing stream on the
> server.

Then we both don't know ;) But most implementations probably won't be so 
advanced, if this is even possible (and you made a good point about 
re-encoding, which I more seriously doubt you can optimize much)

> Ok but isnt that really an issue with you consuming so much of your 
> outgoing
> bandwidth that the TCP replies are having a hard job getting back which
> slows down the particular TCP socket, I would expect that we would use a 
> UDP
> transport like RTP which wouldnt have this problem as there are no 
> replies
> to return in UDP, the packets will just get dropped if there is a problem
> and the audio transport should be able to transparently handle this 
> easily.

No, packets will just be send out slower, wich means latency will increase.
If you want to send 100 bytes and you have 20 bytes a sec. available, it 
will take 5 sec. If you have 10 bytes a sec. it will take 10 sec. Wich 
means your latency just increased by 5 seconds, wether you use UDP or TCP. 
UDP is more effective for audio (espc if you're willing to tolerate lost 
packets), so you can build a more effective stream control mechanism 
(which is already build in for TCP). Escp. in case of IP-packetloss on the 
connection this will indeed help decrease latency a lot, but it doesn't 
change the effect that as available bandwith shrinks latency will jump.. a 
lot!

>> This problem doesn't occur when you make long distance phonecalls..??? 
>> How
>> could it? It doesn't even happen in a long distance *conference* call!
>
> Because you will hear yourself echo from the other end, that is I thought
> what you meant, but overall I doubt syncing is something we really need 
> to
> concentrate on or worry about too much.

Out of sync mixing is *the* biggest annoyance about direct-link based 
conferencing if you ask me. Escp. when participants have severly different 
connections. I find this unbareable to work with.

> If you think supernodes are good then you must like my compromise 
> solution
> below, since in the supernode model each supernode commuicates p2p with
> other supernodes, and clients talk to the supernodes.

True, and I think the idea is intresting. But I still think in most cases 
1 "supernode" will do (the host), and in many cases it won't be a problem 
to find one (broadband with a somewhat recent budget CPU will do I think).

>> In previous email I already briefly touched the subject, and some in 
>> this
>> email. I definatly think most of this could be handeled in the SI layer
>> though (with a little cheating), a c/s based spec will not rule this out
>> at all.
>
> Im not sure how my solution above can be handled in the SI layer,

I wasn't exactly clear I was talking about bandwith here (where you can 
use some kind of peer network you connect to with SI that acts as a 
proxy/multicaster for your stream in a direct-link conference). Since then 
you pointed out CPU req. might be a bigger issue than I assumed at first.

> its a
> modified version of your model where instead of having a single server 
> you
> could have as many as possible which communicate with each other via p2p,
> but anyone who is incapable due to platform issues, bandwidth, CPU, 
> firewall
> etc to act as a server itself connects to one of these servers, this
> provides the benefit of more evenly load balancing the CPU/bandwidth use 
> of
> the chat over several nodes rather than concentrating on one, provides
> instant fall back to another server if one has problems or leaves, and
> provides your primary benefit of being able to support dialup users or
> simple clients like pocket pc's or people who cant go p2p because of
> firewalls.

I think my idea as decribed more at the beginning of this email and here 
are pretty much alike. Are you suggesting a persistant network or these 
servers (which is p2p I suppose) or a per-conference network? (which I'd 
rather just call clustering of the servers)

Do you feel such a system should be part of the "base" spec for (audio) 
conferncing or an extention? And what do you feel should be done for NAT 
traversel in person to person? In case you suggest a peristant network of 
these nodes is that what should be used for that?