[JDEV] Videoconferencing with jabber / Re:[speex-dev]Videoconferencing with speex and jabber

Tue Dec 2 04:50:36 CST 2003

> More advanced clients are likely to also implement a server that supports
> hosting a conference with more than 2 people. Or they'll implement a
> direct link conferencing extention (still based on the same protocol
> ofcourse). Those two are complimentary not competitive. But as pointed out
> by you, direct link person to person is definatly needed most, and as
> pointed out by other, server based conferencing is needed most too.
>
> That doesn't mean there are no use cases left for direct link based
> conferencing, but IMHO not enough to justify a spec that will miss out on
> server based conferencing when you can get that practically for free, and
> will complicate the spec and raise the requirments for conferencing.
> Again, it's not impossible.

I didnt intend to say that we should not have server based conferencing at
all, all I originally meant to object to was putting that into a client
rather than just using a dedicated server, and in cases where you can that
you should use p2p. Now I can see that p2p wont cover all cases, but also I
am unconvinced server based will cover all uses either, so as I suggested at
the bottom of the email is a hybrid cs/p2p mechanism that can take the best
of both worlds and cover more situations that either one of them alone plus
being more flexable.

> Skype uses UDP NAT travelsal based on getting it's IP from someone outside
> the NAT (at least, so it was suggested either here or on SJIG), wich is
> currently being rejected by the jabber server folks

I was under the impression it was asking your server for the IP address that
they are objecting to not other methods of obtaining it.

> SOCKS5 is hardly integrated with an excisting mechanism, it just uses part
> of the same spec. Using SI you can intergrate other solutions, almost
> transparently, and fall back on others if they don't work. That doesn't
> eliminate the need for a spec of setting up these things, and I see no
> good reason to not use a c/s architecture there.

Well it is using an existing protocol rather than baking our own, I was
suggesting that maybe we could do the same in this case and see if there is
an existing protocol/system that we could integrate with rather than
possibly duplicating effort and saving us time.

> Agreed, when creating such a spec based on c/s, attention should be paid
> to allowing a direct-link conference style solution from the start. For
> that matter, it should also allow for things such as distrubited hosting
> of a conference (a sort of hybrid between direct links and c/s) or any
> other things people can come up with. It should just be as generic as
> possible.

Good

> > Yes but the server has to do more than simply mix the streams, it also
> > has
> > to re-encode the mixed streams, also if you want to remove echo's as you
> > suggest below or be able to ignore partipants as someone has already
> > suggested as useful functionality you need to re-mix and re-encode all
> > outgoing streams individually, which would I expect be quite a CPU
drain,
> > but in p2p mode clients if using available technologies (directx or the
> > equivalent) you dont even need to mix the streams as you can play
> > simultaneous WAVE streams at the same time, also the client isnt needing
> > to
> > re-encode the stream to send out again.
>
> Well, I agree that, just like with the bandwith requirments, demands on
> the server will be higher than on a node in a direct link conference. Just
> not THAT much higher

Sorry but it is much higher because of the fact that in p2p clients do not
need to do any re-encoding at all, plus the fact that you need individually
mix the streams before re-encoding them to send out, this IS much more of a
requirement since using p2p you dont necessarly even have to do the mixing
step, let alone the re-encoding step (which will be the most CPU intensive
part). Plus remember to prevent echoing you really need to mix/encode each
of the outgoing streams individually.

>, unless you want some more advanced features. There's
> always trade-offs between the two solutions, and at times you could prefer
> yours over the other. But the point I'm making is that we can have *all*
> of them, relativly simple with a c/s based architecture, even if a p2p
> spec might be just a *little* easier to work with in your case, or at
> least sound more logical when reading the spec.
>
> Ofcourse you still have to mix when you use DirectX ;)

Playing two (or more) streams simultaneously is not mixing as far as I am
concerned, so no you dont necessarly have to even mix.

> Servers can use
> existing technology too ofcourse.. Servers (components) specializing in
> hosting this kind for companies or paying customers could even use DSP
> hardware and such.

Sure but using hardware such as that will be out of reach for the vast
majority of people.

> > Not sure how you would suppress the echo of what someone said without
> > re-coding the streams individually to exclude that person on their own
> > incoming listening stream.
>
> Well, aside from that you can surpress it client side... (which would
> raise the requirments for our poor pocketPC clients a little too much) I'm
> not an expert on audio technology but I'd imagine there are some
> optimizations heavy possible when making different mixes based on the same
> streams? I could be wrong ofcourse..

I doubt it would be very clean and pretty hard to do, the only way I can see
it really working is by individually encoding each outgoing stream on the
server.

> >> Latency is an intresting case, but in practise the results would
> >> probably
> >> surprise you. Because on low-bandwith nodes to bandwith requirments
> >> dramatically drop when they act as a client rather than a node in the
> >> direct link conference, latency in many cases will actually improve in
a
> >> lot of cases!
> >
> > Thats good but do you have any real evidence of this?
>
> I assume you have no problems with the idea that latency is lower on
> low-bandwith connections when the bandwith used is lower too? If not..
> just play an online game, then exit it, turn on some filesharing network,
> and play the game again ;) That's just simple maths!
>
> Even on my old "broadband" connection, where I had 15 KB/s upstream
> availably, latency would jump from about 25-40ms to 50-400ms if I used
> only 10KB/s of it for different purposes.
>
> Gaming provides another example.. in the old days when I played Quake, I'd
> be a lot faster to play on my ISPs server with someone, then for either of
> us to host the server (latency would be higher and less reliable there).
> Experiance in using the old ICQ protocol gave me the same idea, even
> though the amounts of data are *very* limited there.
>
> If latency is your main point for choosing direct link conferencing, I'd
> be very carefull if I were you cause the result might dissappoint in many
> cases.

Ok but isnt that really an issue with you consuming so much of your outgoing
bandwidth that the TCP replies are having a hard job getting back which
slows down the particular TCP socket, I would expect that we would use a UDP
transport like RTP which wouldnt have this problem as there are no replies
to return in UDP, the packets will just get dropped if there is a problem
and the audio transport should be able to transparently handle this easily.

> >> Now lets talk about out-of-sync mixing. With direct-link based
> >> conferences
> >> every client will produce a different "mix" based on the latency /
> >> bandwith of their connections, and that of the other nodes. This means
> >> when we're in a meeting, for me it can sound like 3 people were talking
> >> at
> >> once, while for you it can sound like they didn't at all. (that means I
> >> didn't hear what they said and I'll ask them to repeat, while you'll be
> >> annoyed with me (even more ;) cause for you it sounded like I could
have
> >> heard perfectly).
> >
> > Sure that could be a problem, but its a problem people will be used to
if
> > they have ever made long distance phone calls, this sort of thing is the
> > least of our worries IMO.
>
> This problem doesn't occur when you make long distance phonecalls..??? How
> could it? It doesn't even happen in a long distance *conference* call!

Because you will hear yourself echo from the other end, that is I thought
what you meant, but overall I doubt syncing is something we really need to
concentrate on or worry about too much.

> That's not anything like what I am proposing. To start with, practically
> all person to person communication would be over direct links. Secondly,
> conferences would not be held on some gigantic server, rather there will
> be small clusters spread all over the place.
>
> As you might know many p2p network have made this same change, relying
> more on the stronger better clients, letting them take some roles that
> traditionally were meant for servers, Peer caches, supernodes etc.

If you think supernodes are good then you must like my compromise solution
below, since in the supernode model each supernode commuicates p2p with
other supernodes, and clients talk to the supernodes.

> > Maybe what we actually need to solve the low bandwidth problem of dial
up
> > users and the reliability problem of having a single point of failure is
> > to
> > have a hybrid client server and p2p system where the people with
> > sufficent
> > bandwidth run as both servers and p2p between each other (like the idea
> > of a
> > supernode) and the low bandwidth users connect to one of those servers,
> > it
> > solves the low bandwidth user problem and the reliablity problem by
> > having
> > multiple servers users can switch to if one goes down, and also the CPU
> > usage problem by not having too many people all connected to one server.
>
> In previous email I already briefly touched the subject, and some in this
> email. I definatly think most of this could be handeled in the SI layer
> though (with a little cheating), a c/s based spec will not rule this out
> at all.

Im not sure how my solution above can be handled in the SI layer, its a
modified version of your model where instead of having a single server you
could have as many as possible which communicate with each other via p2p,
but anyone who is incapable due to platform issues, bandwidth, CPU, firewall
etc to act as a server itself connects to one of these servers, this
provides the benefit of more evenly load balancing the CPU/bandwidth use of
the chat over several nodes rather than concentrating on one, provides
instant fall back to another server if one has problems or leaves, and
provides your primary benefit of being able to support dialup users or
simple clients like pocket pc's or people who cant go p2p because of
firewalls.

Richard