[JDEV] Re: Large scale Jabber bots
Jens Alfke
jens at mac.com
Fri May 25 11:55:45 CDT 2001
[I've retitled this thread...]
On Friday, May 25, 2001, at 08:50 AM, David Waite wrote:
> IMO, mailservers should break if you try to send to more than ~20 people
> at a time. Not crash, but refuse to send.
What, all mailservers?! What about the ones that run mailing lists or
[voluntary] announcements? I've been on music mailing lists with
thousands of subscribers, and I get very useful "what's new" mailings
from Amazon that must have tens of thousands of readers.
> Jabber should probably work
> the same way in this case (200 users max in a roster or something
> configurable like that)
Why on earth should that be a requirement? First off, it penalizes
people for something that's not their fault. What happens when I can't
add a new buddy because I happen to be on 195 people's rosters already?
Secondly, it ignores very useful aspects of Jabber for information
delivery. Stock price agents, auction agents, news agents, etc. Like I
said before, companies are slavering over the potential of this and
having a viable open IM network would make it much easier to do. These
things are not spam, they're voluntary since you have to subscribe to
them. And they could scale to zillions of users. Just read this article:
http://biz.yahoo.com/prnews/010424/nytu114.html
"Capitol Records and Radiohead Create First Instant Message 'Buddy' in
Music History"
"... The Radiohead agent will reside on a user's Instant Messenger buddy
contact list. The agent will be able to recognize and respond to natural
language questions and requests for information about the band and
Amnesiac. Tour dates, song lists, artists' bios, album credits,
purchasing information, contact information, current web site
information, and other album related material will be available."
So how many Radiohead fans do you think are going to subscribe to this
bot? What happens when ActiveBuddy builds ones for N'Sync or Eminem? Are
you saying this kind of thing is inherently wrong and should not be
supported?
> Hypothetically, if you had a 10,000 user roster, that would generate
> about 5 MB of XML traffic through the server it was running on everytime
> the bot came online.
5MB is not really a lot of traffic for any site with a decent size pipe.
It won't be happening that often since bots by design tend to stay
online all the time.
> Even if all of those users are on the same machine, that
> would be 10,000 user rosters it would have to load up via XDB and parsed
> (since the roster is also basically the presence ACL).
Yes, but you're talking about a machine hosting 10,000 users, which is
going to be hellaciously busy no matter what. Presumably if a user is
online their roster is already parsed, and if they're not online you
don't need to do anything (since presence packets are not
stored/forwarded.)
> Now imagine this is a portal with a quarter of a million users, and the
> bot is added by default to everyone's roster. Not only would that roster
> be about 25MB, there would be at least a 35MB memory image for the DOM
> tree created.
Those numbers aren't specific to having a bot, only to the size of the
portal itself. It makes no difference whether all 250,000 users have the
same bot in their rosters, or if their rosters all have different jids
in them. In other words, you're saying that for a portal of this size,
there is an average 35MB memory hit per roster entry per user -- so if
the average portal user has 20 people in their roster, that's 700MB just
for rosters.
(That's a scary number but on the other hand 700MB of RAM is chump
change for a company big enough to run a portal this size. Something
like $400? And anyway, wouldn't the load be spread across a whole farm
of servers, not just one?)
If you're saying is that the present Jabber server is not scalable to
this size portal, that's sort of bad news for Jabber, it sounds like,
since no large scale provider would adopt it.
In any case, this is a bogus scenario. Everyone seems to keep forgetting
that Jabber is supposed to be a distributed system; while there will be
large portals with large numbers of users, there will be large numbers
of smaller servers, as well as special purpose servers for bots. The
likely scenario is that a major bot would run on its own server (or
perhaps there would be a small number of bots) hosted by the company
that owns it. There would not be any appreciable number of actual users
on this server. So there's 25MB for the bot's roster; that's about $20
worth of RAM I think. The other side of the overhead is distributed
among the host servers of all the subscribers, and has the same effect
as of all the subscribers adding one more friend to their buddy list.
> Moral of the story: if you try to solve every problem with a hammer and
> a crowbar, you just end up breaking a lot of things ;-)
To be blunt, the real "hammer and crowbar" here seems to be the server's
usage of in-memory DOM structures rather than some kind of actual
database engine. Commercial databases like Oracle have no problem with
the kinds of scale you're saying is impossible. (And for someone from
jabber.com to be saying this sort of thing is impractical is sort of
damning for the claims made about your server, btw.)
—Jens
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 5650 bytes
Desc: not available
URL: <https://www.jabber.org/jdev/attachments/20010525/62ec2ddf/attachment-0002.bin>
More information about the JDev
mailing list