[JDEV] Videoconferencing with jabber / Re:[speex-dev]Videoconferencing with speex and jabber

Mon Dec 1 04:55:20 CST 2003

> Having one user assume the role as server, and one of client is really no
> harder than a model in wich you asume both are equal peers. It's simply a
> matter of different roles. If you can think of any reason why this is not
> true, please share it with the rest of us!

I dont disbute that it is any harder (for one 2 one), simply that using a
client server model when a p2p model is more appropriate IMO can create more
problems than it solves.

> However, using a client/model will allow you to participate in a
> conference on a server with more people *with no extra effort at all*. Yet
> you still state you don't believe it will be easyer?

Yes is easier to implement because you dont need extra p2p, but IMO its not
really that much more to implement it as you will already have a large
amount of the necessary code inplace once you have created a client with
inbuilt server.

> What I *am* saying, that an entirely p2p based conferencing model (with
> more than 2 persons involved) is a lot more complex than a client/server
> model. Even more so, if you only have to implement the client portion.
> That's why this allows "thin" clients to still participate. It was you
> yourself who argued against mixing and bandwith req. on thin clients such
> as a pocket PC.

Yes if you only implement the client portion it will be a lot more work to
add server or p2p, but if everyone does that (to save time and effort) your
proposed system will fall apart because there will be no servers for people
to connect to.

As Mats Bengtsson suggests I think you should take a look at this
http://www.skype.com/skype_p2pexplained.html their solution looks rather
good (although goes further than I have been suggesting), maybe what we
really need to do rather than concocing our own solution is defer to the
even greater experience of someone else and just try to integrate with an
existing mechanism, just like we did with SOCKS5 for the bytestreams
mechanism.

> I think from the discussion it's pretty obvious what's needed/wanted most
> are 2 things:
> - person to person over a direct link
> - conferencing with multiple persons on a server

As you realise I dont think you need to use a server to talk with a small
group of people.

> This can both be handeled, without overlap, with a simple JEP based on a
> c/s model. P2P won't cover this, nor will it be any simpeler.

Sorry but it can handle it as I have clearly shown, it wont be any simpler
but IMO its not much harder if you already have client/server code in place,
and is far more reliable.

> Conferencing over induvidual direct links between persons is intresting
> too, but too complex to be included in the basic JEP if you ask me.

I dont think its really all that much harder as you know.

> Conferencing over direct links doesn't have to be p2p either. You can base
> it on the c/s JEP with every induvidual participant acting as a server.
> Not that more complex than doing this on a p2p based model.

But that is p2p is it not?

> With conferencing the requirment of a (fast enough) server is way more
> reasonable than for a person to person conversation (I completly agree
> with you there a direct link should be used when possible!).

Good good.

> However, by
> going with a c2s model you'll still provide a fallback method for when a
> direct link fails, by using a component that hosts a conference.

Sure and if you read my emails I never said we shouldnt have server based as
a fallback, infact I proposed a hybrid client/server and p2p structure.

> The total amount of bandwith used in a c/s conference is always smaller
> than a conference based on direct links between all participants. For
> obvious reasons ofcourse, I don't need to explain here.
> Another difference is with c/s you'll need very little bandwith on all
> machines, except for the server.

Sure the total bandwidth will be less, but IMO that is irrelivant because it
does not really impact the clients themselves.

> So let's apply this to some real world situations. In how many cases are
> all the clients have about the same available bandwith, CPU, etc. With Joe
> Consumer this is unlikely.. it's a mix of dailup and broadband users. If
> I'd want to talk to my mother, sister and brother at the same time, I have
> a 1 mbit link, 1 will have a cheap DSL account, and the other 2 will be on
> dailup most likely.

I can see on dialup this is a problem, but as I detail below it can be
complex determining the correct machine to run the server from (bandwidth
available, CPU speed etc), this really needs to be automatic or we will make
it that much harder for normal users to use they might well not bother and
continue using MSN etc instead, we must make sure we offer something that is
at least as easy as MSN Messenger and the like to use, so whichever way we
go, be it client server or p2p or both all that needs to be hidden from the
user, and all they should need to do is select the people they wish to chat
to and click "chat".

> Again I don't think direct-link style conferncing is unintresting or
> unneeded, but it's a much more specific application than c/s conferencing.
> And *again*, a c/s style approach will not prevent this from being an
> extention.

Good, but once we have a client server system in clients we will have 90% of
the code needed to implement it, it would be a mistake IMO and could prove
to create a messy protocol if we dont consider how to include p2p function
into the protocol we create from day one, otherwise when we extended it
later it could end up either messy or we will end up duplicating lots of
effort.

> > also I would disbute that it would be 10 times as much
> > bandwidth for the rest, adding silence detection (which you seem to have
> > oddly put aside and ignored) reduces the p2p bandwidth use massively,
>
> Hopefully I adressed this now to your liking.

Yes, thank you.

> > also
> > as I have shown previously the mixing requirements are less on p2p
> > clients
> > than on the "server client".
>
> And how's that? When 4 people talk at once, *all* client will have to mix
> 4 streams in the case of direct links. In the case of c/s only the server
> will have to mix 4 streams. Explain..

Yes but the server has to do more than simply mix the streams, it also has
to re-encode the mixed streams, also if you want to remove echo's as you
suggest below or be able to ignore partipants as someone has already
suggested as useful functionality you need to re-mix and re-encode all
outgoing streams individually, which would I expect be quite a CPU drain,
but in p2p mode clients if using available technologies (directx or the
equivalent) you dont even need to mix the streams as you can play
simultaneous WAVE streams at the same time, also the client isnt needing to
re-encode the stream to send out again.

> (only thing I could think of is if you want to create a seperate mix for
> each client, without their own channel in it to prevent echo. Rather than
> mixing new streams for each client you should just surpress echo for each
> clients. Admitted, it increases demands on the server if you want this,
> but not as bad as having to mix a new stream for each client)

Not sure how you would suppress the echo of what someone said without
re-coding the streams individually to exclude that person on their own
incoming listening stream.

> Yes, when the server quits the conference the other will get booted. If
> this is a big issue for you, you could devise a fallback system to another
> server (one of the clients for example) and still have a massivly less
> complex system than direct-link based conferencing. Since servers are most
> likely to be the best machines with the best connections this isn't such a
> big problem, but it's still easily solved if you want.

Good this would have to be if I were to support this, problem is tho, adding
in this sort of thing brings us even closer to the requirements of just
using a p2p system, also would have to make it easy to start chats for
normal users so the system needs to automatically determine which machine in
the group is best suited to be the server and set it up as it without the
user needing to do that themselves. There is also a problem with falling
back in this situation in that what if there is not a machine with enough
bandwidth etc left to maintain the chat? It will go down, which it shouldnt
in p2p because all nodes will require the same amount of bandwidth to
maintain it and it should keep going.

> When there are a few clients with bad connections in the conversation
> reliability will probably improve a bit too. Bad connection <-> Good
> connection <-> bad connection is generally more reliable than bad
> connection <-> bad connection. Escp. when you consider bandwith usage
> drops too.

Yup but there is no real way without user intervention to make sure the
server is on a reliable connection, but we need to make it as easy as
possible otherwise normal people would not know what to do.

> Latency is an intresting case, but in practise the results would probably
> surprise you. Because on low-bandwith nodes to bandwith requirments
> dramatically drop when they act as a client rather than a node in the
> direct link conference, latency in many cases will actually improve in a
> lot of cases!

Thats good but do you have any real evidence of this?

> So you can have the situation where a node in a direct-link
> conference with 3 persons talking is barely able to keep up, with horrible
> latency. While a client with the exact same quality connection is enjoying
> a conference where 6 people are talking with lower latency! (it wouldn't
> even be able to participate when 6 people are talking in a direct link
> conference).

You would have to have very low bandwidth to not be able to talk to those 6
people tho in p2p, but yea that could be a problem, but one of the people
still needs to be on a good connection.

> Now lets talk about out-of-sync mixing. With direct-link based conferences
> every client will produce a different "mix" based on the latency /
> bandwith of their connections, and that of the other nodes. This means
> when we're in a meeting, for me it can sound like 3 people were talking at
> once, while for you it can sound like they didn't at all. (that means I
> didn't hear what they said and I'll ask them to repeat, while you'll be
> annoyed with me (even more ;) cause for you it sounded like I could have
> heard perfectly).

Sure that could be a problem, but its a problem people will be used to if
they have ever made long distance phone calls, this sort of thing is the
least of our worries IMO.

> Ofcourse there is a solution for this, syncing the mixes between nodes.
> But then you loose all latentcy advantages, you'll be as slow as the
> "weakest link". (and the weakest link will be a lot more stressed than it
> would be in a c/s model). Ofcourse compromises are possible..

Sure

> You'll always have problems with out of sync mixes if you don't do
> something about it, but there are cases where it's less likely to occur or
> just not so important. For example when it's only about a game anyway ;)
> and all clients have about the same bandwith and CPU available.. :)

Sure

> I've presented many reasons for you. Maybe you don't agree with them (then
> I wonder what you think of Jabber and it's client/server architecture),
> but I'd appriciate it if you do not refer to them as "strange".

Sorry about that, I was getting annoyed and tired, but thanks for finally
responding to my concerns. Also just because I think p2p is better in this
case does not mean I think Jabber should be p2p, they are two entirely
different things, IM should be client server IMO because you need a reliable
central place to be able to contact people from and need a permanent contact
point which requires a server to act for you. But p2p chats should not need
a server IMO because they are short lived sessions for which you will have
already located the other members of the chat via another means (your Jabber
session). Please bear in mind that client server systems are not always the
best solution, just think if the file sharing systems all went through
central servers the bandwidth use would be unsustanable for the server
admins.

> Escp. considering I didn't quite just make em up either, they are well
> known issues with audioconferncing (hardly "strange" issues), and if you'd
> have looked into it a little yourself you'd know that. (For example, I'm
> not on the speex list, but way at the beginning of the discussion someone
> already mentioned these thing have been discussed to death there, I guess
> he didn't take the bait and I did ;)

Although there is the fact that current audio chat systems are mostly p2p,
e.g. XBox Live, MSN Messenger, AIM, Yahoo Messenger, H.323, SIP. We need to
be careful not to dismiss all that research development and reasoning that
went into the decision for these people to go p2p.

Maybe what we actually need to solve the low bandwidth problem of dial up
users and the reliability problem of having a single point of failure is to
have a hybrid client server and p2p system where the people with sufficent
bandwidth run as both servers and p2p between each other (like the idea of a
supernode) and the low bandwidth users connect to one of those servers, it
solves the low bandwidth user problem and the reliablity problem by having
multiple servers users can switch to if one goes down, and also the CPU
usage problem by not having too many people all connected to one server.

Richard