[JDEV] Re: Large scale Jabber bots

Jens Alfke jens at mac.com
Fri May 25 11:55:45 CDT 2001


[I've retitled this thread...]

On Friday, May 25, 2001, at 08:50 AM, David Waite wrote:

> IMO, mailservers should break if you try to send to more than ~20 people
> at a time. Not crash, but refuse to send.

What, all mailservers?! What about the ones that run mailing lists or 
[voluntary] announcements? I've been on music mailing lists with 
thousands of subscribers, and I get very useful "what's new" mailings 
from Amazon that must have tens of thousands of readers.

> Jabber should probably work
> the same way in this case (200 users max in a roster or something
> configurable like that)

Why on earth should that be a requirement? First off, it penalizes 
people for something that's not their fault. What happens when I can't 
add a new buddy because I happen to be on 195 people's rosters already?

Secondly, it ignores very useful aspects of Jabber for information 
delivery. Stock price agents, auction agents, news agents, etc. Like I 
said before, companies are slavering over the potential of this and 
having a viable open IM network would make it much easier to do. These 
things are not spam, they're voluntary since you have to subscribe to 
them. And they could scale to zillions of users. Just read this article:

http://biz.yahoo.com/prnews/010424/nytu114.html
"Capitol Records and Radiohead Create First Instant Message 'Buddy' in 
Music History"
"... The Radiohead agent will reside on a user's Instant Messenger buddy 
contact list. The agent will be able to recognize and respond to natural 
language questions and requests for information about the band and 
Amnesiac. Tour dates, song lists, artists' bios, album credits, 
purchasing information, contact information, current web site 
information, and other album related material will be available."

So how many Radiohead fans do you think are going to subscribe to this 
bot? What happens when ActiveBuddy builds ones for N'Sync or Eminem? Are 
you saying this kind of thing is inherently wrong and should not be 
supported?

> Hypothetically, if you had a 10,000 user roster, that would generate
> about 5 MB of XML traffic through the server it was running on everytime
> the bot came online.

5MB is not really a lot of traffic for any site with a decent size pipe. 
It won't be happening that often since bots by design tend to stay 
online all the time.

> Even if all of those users are on the same machine, that
> would be 10,000 user rosters it would have to load up via XDB and parsed
> (since the roster is also basically the presence ACL).

Yes, but you're talking about a machine hosting 10,000 users, which is 
going to be hellaciously busy no matter what. Presumably if a user is 
online their roster is already parsed, and if they're not online you 
don't need to do anything (since presence packets are not 
stored/forwarded.)

> Now imagine this is a portal with a quarter of a million users, and the
> bot is added by default to everyone's roster. Not only would that roster
> be about 25MB, there would be at least a 35MB memory image for the DOM
> tree created.

Those numbers aren't specific to having a bot, only to the size of the 
portal itself. It makes no difference whether all 250,000 users have the 
same bot in their rosters, or if their rosters all have different jids 
in them. In other words, you're saying that for a portal of this size, 
there is an average 35MB memory hit per roster entry per user -- so if 
the average portal user has 20 people in their roster, that's 700MB just 
for rosters.

(That's a scary number but on the other hand 700MB of RAM is chump 
change for a company big enough to run a portal this size. Something 
like $400? And anyway, wouldn't the load be spread across a whole farm 
of servers, not just one?)

If you're saying is that the present Jabber server is not scalable to 
this size portal, that's sort of bad news for Jabber, it sounds like, 
since no large scale provider would adopt it.

In any case, this is a bogus scenario. Everyone seems to keep forgetting 
that Jabber is supposed to be a distributed system; while there will be 
large portals with large numbers of users, there will be large numbers 
of smaller servers, as well as special purpose servers for bots. The 
likely scenario is that a major bot would run on its own server (or 
perhaps there would be a small number of bots) hosted by the company 
that owns it. There would not be any appreciable number of actual users 
on this server. So there's 25MB for the bot's roster; that's about $20 
worth of RAM I think. The other side of the overhead is distributed 
among the host servers of all the subscribers, and has the same effect 
as of all the subscribers adding one more friend to their buddy list.

> Moral of the story: if you try to solve every problem with a hammer and
> a crowbar, you just end up breaking a lot of things ;-)

To be blunt, the real "hammer and crowbar" here seems to be the server's 
usage of in-memory DOM structures rather than some kind of actual 
database engine. Commercial databases like Oracle have no problem with 
the kinds of scale you're saying is impossible. (And for someone from 
jabber.com to be saying this sort of thing is impractical is sort of 
damning for the claims made about your server, btw.)

—Jens
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 5650 bytes
Desc: not available
URL: <https://www.jabber.org/jdev/attachments/20010525/62ec2ddf/attachment-0002.bin>


More information about the JDev mailing list