[jdev] The future of Jabber/XMPP?

Fri Aug 27 11:36:55 CDT 2010

On Fri, Aug 27, 2010 at 8:42 PM, Evgeniy Khramtsov <xramtsov at gmail.com> wrote:
> 28.08.2010 01:18, Matthew Wild wrote:
>>
>> On 27 August 2010 16:12, Evgeniy Khramtsov<xramtsov at gmail.com>  wrote:
>>
>>>
>>> Good move, Remko. Now ejabberd will violate your synthetic rules for
>>> sure.
>>> I'm completely disappointed in XSF: noone cares about implementations
>>> feedback anymore, it is much more funny to flame implementation wars
>>> instead
>>> of make all implementations happy.
>>>
>>> So we ended from what we started: PEP doesn't scale.
>>>
>>>
>>
>> Do you have a better solution that doesn't have the issues your
>> implementation has? All we want are working specifications, and that's
>> what we're aiming to develop.
>>
>
> The question is what is better: increase traffic or increase server's
> memory? I think it is better to increase traffic a bit. This is not fatal,
> since all modern clients implementations has PEP support, so actually you
> don't need to filter anything.
>
>> The only cries I've heard that PEP doesn't scale seem to be coming
>> from folk involved in ejabberd. I'm not sure why that is.
>>
>
> Because writing XEPs where server should store foreign servers info is not a
> way to go. In fact, tight servers will double the data of each others:
> presences and resources. You can imagine the amount of data if server1 has
> 1M of users online and server2 has 2M of users online. Do you remember any
> other technologies where it takes place? HTTP, SMTP, SIP, etc? *Nowhere*.
> PEP design is a flaw.
>

Let's see.. 1M PEP nodes, with say.. 1K subscribers each. And server2
has max 2M resources.

On top, let's assume 10K different client configurations (i.e., 10K
caps hashes).

Here's what it might look like in Prosody:

1. A table of strings with all JIDs (2M resources * 3KB maximum JID
size = 6GB maximum).
2. A table of local accounts mapped to list of subscriptions. Each
subscription is just two pointers (one to a shared caps hash table,
and one to a JID). 1M * 64 bytes = 64MB (just roughly picked 64
bytes).
3. The caps hash table, mapping caps hashes to a list of +notify
namespaces. Assume one caps hash and list takes 4KB, we have 4KB * 10K
= 40MB (strings are shared, so this is likely a lot less).

This is a simplistic view, and the string table for example is strings
interned by Lua. I've tried to pick larger values than I actually
expect to see in the wild (e.g., I assumed everyone has 3KB JIDs).

But still, that's only a bit more than 6GB for a massive number of
nodes and subscriptions. I could even try simulating this, and don't
expect it to kill a server with hardware like jabber.org's.

I made the assumption that there are 10K different caps hashes. Let's
drop that now. Every resource has a unique caps hash (they are
malicious or crazy or something). That's.. 2M * 4K = 8GB of caps data.
The total is still under the 16GB of physical RAM jabber.org has :)

Mind you, this was only a quick back of the envelope calculation :)

--
Waqas Hussain