[JDEV] Ramblings - feel free to join in :-)

Dennis Noordsij dennis.noordsij at wiral.com
Thu Dec 14 04:28:08 CST 2000


Hi,

I have had 2 things on my mind for a while and would like to take the 
opportunity to hear from other people what they think would work and wouldn't 
work, or maybe come up with a better idea or implementation.

The first one concerns bandwidth vs horsepower. I think we can pretty safely 
assume that :
- in our own jabber server farm bandwidth is plentily available, and the only 
thing we are worried about is raw power of our servers. Any optimizations 
would be ones that will get more messages routed in the same time, even if 
that takes up a little more bandwidth inbetween jabber components (think of 
the main JSM to transports to xdb databases - all on a small LAN).

- with regard to the "outside", ie users connecting via TCP/IP over the 
internet we value bandwidth much more. It is alright if the client has to do 
a little more work if it means it takes less bandwidth to get a message 
across.

How to do this, without affecting jabber server code at all, and clients 
minorly?

Why not bzip2 the xml stream? The client would simply stream through a bzip2 
function before sending it out over the socket, this would be quite easily 
implemented in clients. On the server side, since any serious setup will use 
jpolld multiplexing machines only jpolld has to know about bzip2, when the 
XML reaches the jabber server it is plain text XML again. Likewise, why not 
stream through an SSL component (with compression), once again, on the client 
side it would make no difference, on the server side the jpolld's could be 
linked again an SSL library making use that hardware SSL acceleration board I 
see in every issue of LinuxJournal :-) 

Even without the SSL, bzip2ing a stream would help tremendously as XML is 
basically text and compresses quite well. Only jpolld would have to be fitted 
with a bzip2 component (similar to the xstream) and clients could even use a 
local proxy that does it for them. Wouldn't the bandwidth savings be 
substantial enough to warrant implementation of this? This way we can still 
keep using the original protocol without resorting to small proprietary 
binary tags as someone else suggested, thus keeping everything open.



My second thought is about scalability of the core of the jabber server. We 
can already farm out incoming connections to several jpolld multiplexers, 
database lookups to a farm of xdb caching lookuppers (yeah that's really a 
word! :), but the central JSM for a domain (assuming btw that I want all 
users in the same domain, ie user at mydomain.fi) is still limited to how fast 
that one machine can route packets, few hundred/second? Forgive me if there 
already is a much more elegant solution than this :-) Here goes:

Our domain is jabber.com

Machine 1) internal name Apple
 Connected to Jpolld1-A and Jpolld1-B

Machine 2) internal name Orange
 Connected to Jpolld2-A and Jpolld2-B

User dennis at jabber.com logs in, round robin DNS puts him on Jpolld1-B.
My real JID is dennis at jabber.com/work
Internally Machine 1 also knows me as dennis at jpolld1-B    (already done)
Machine 1 now propagates to all other machines (each machine is connected to 
every other machine) "dennis at jabber.com/work - dennis at apple/work".

Now every single machine in our farm has a hashtable entry that says 
"dennis at jabber.com/work - dennis at apple/work", except for machine 1 which has 
"dennis at jabber.com/work - dennis at jpolld1-B"

However the amount of memory needed to store one entry would be so small that 
this would still work, AND we can dedicate the storing of this entries to a 
special machine with the server machine simply fetching it from the dedicated 
machine and caching it for a while. Note that only this particular string is 
stored, NOT the actual session data, that is only stored on the "home" 
server, ie the one that you actually connected to.

Now, harry connects, round robin puts him on Jpolld1-A

Propagation takes place:
Machine 1)
 dennis at jabber.com/work - dennis at jpolld1-B
 harry at jabber.com/school - harry at jpolld1-A

Machine 2)
 dennis at jabber.com/work - dennis at apple/work
 harry at jabber.com/school - harry at apple/school

Harry sends a message to Dennis, Machine 1 looks in it's hashtable, sees the 
message has to be delivered to jpolld1-B and does so.


Now, susan connects, round robin puts her on Jpolld2-B

Propagation:
Machine 1)
 dennis at jabber.com/work - dennis at jpolld1-B
 harry at jabber.com/school - harry at jpolld1-A
 suzan at jabber.com/home - suzan at orange/home

Machine 2)
 dennis at jabber.com/work - dennis at apple/work
 harry at jabber.com/school - harry at apple/school
 suzan at jabber.com/home - suzan at jpolld2-B

Now Suzan sends a message to Dennis, it goes from Suzan's client via 
jpolld2-B to her machine, orange.

Orange looks, sees that dennis at jabber.com has one session, which is 
dennis at apple/work. Machine 2 (orange) now sends the message to Machine 1 
(apple), apple receives it, sees that this session is managed by jpolld1-B 
and sends it to dennis at jpolld1-B


If this would work, how much traffic would it save? Btw if this already works 
like that please tell me how :-) Would it hard to implement? Are there issues 
I have totally missed that would make this impossible?

By using this technique, a number of servers, each having for example 2 
jpolld multiplexors, you can also implement load balancing. Although I don't 
remember right now I believe there was a redirect stream error so a server 
can redirect a client to a different IP? Based on statistics with regard to 
load and message flow between components and sockets/bandwidth usage per 
jpolld and cpu/memory consumption per server an intelligent redirecting 
policy can be dynamically maintained.


Then again, maybe I am just rambling :-))

Hope to hear some ideas,
cheers!
Dennis

PS On a sidenote, I managed to write what I initially started as a jabberd 
component (see Transport different approach thread) by using jpolld as a 
reference and using the libjabber and libxode libraries to write a standalone 
executable. Doesn't depend on etherx for my connections, allows pthreading 
and basically rocks :) libxode is a very nice library .. kudos guys.




More information about the JDev mailing list