[JDEV] Re: Large scale Jabber bots

David Waite dwaite at jabber.com
Fri May 25 12:35:29 CDT 2001


Jens Alfke wrote:

> [I've retitled this thread...]
>
> On Friday, May 25, 2001, at 08:50 AM, David Waite wrote:
>
> > IMO, mailservers should break if you try to send to more than ~20 people
> > at a time. Not crash, but refuse to send.
> >
> >
> What, all mailservers?! What about the ones that run mailing lists or
> [voluntary] announcements? I've been on music mailing lists with thousands
> of subscribers, and I get very useful "what's new" mailings from Amazon that
> must have tens of thousands of readers.

I said IMO, and I didn't say I was going to go out and start trying to force
people to quota their users. If I had an open system such as hotmail, I
wouldn't want people to be able to create accounts and send out email to
400-500 people at a time - there just isn't any good reason to do it.

> > Jabber should probably work
> > the same way in this case (200 users max in a roster or something
> > configurable like that)
> >
> >
> Why on earth should that be a requirement? First off, it penalizes people
> for something that's not their fault. What happens when I can't add a new
> buddy because I happen to be on 195 people's rosters already?
>

You get rid of them? Why do you keep people on your roster if you don't know
who they are?
Again, I didn't say it would be something that people were forced to do, but
there just isn't a good reason for someone to just have a thousand user roster;
at that point you should be implementing the logic some other way.

> Secondly, it ignores very useful aspects of Jabber for information delivery.
> Stock price agents, auction agents, news agents, etc. Like I said before,
> companies are slavering over the potential of this and having a viable open
> IM network would make it much easier to do. These things are not spam,
> they're voluntary since you have to subscribe to them. And they could scale
> to zillions of users. Just read this article:

I'm very familiar with bots. I also know that the majority of bots do not
require presence information at all, all users do is query their services.

> So how many Radiohead fans do you think are going to subscribe to this bot?
> What happens when ActiveBuddy builds ones for N'Sync or Eminem? Are you
> saying this kind of thing is inherently wrong and should not be supported?
>

Yes, as a standard client, and as any service not known to the administrator of
the box. If you are running a server, do you want some user to connect,
advertise their client as a bot, and start taking up memory and CPU usage to
the scale described below?

> > Now imagine this is a portal with a quarter of a million users, and the
> > bot is added by default to everyone's roster. Not only would that roster
> > be about 25MB, there would be at least a 35MB memory image for the DOM
> > tree created.
> >
> >
> Those numbers aren't specific to having a bot, only to the size of the
> portal itself. It makes no difference whether all 250,000 users have the
> same bot in their rosters, or if their rosters all have different jids in
> them. In other words, you're saying that for a portal of this size, there is
> an average 35MB memory hit per roster entry per user -- so if the average
> portal user has 20 people in their roster, that's 700MB just for rosters.
>

35 MB per roster entry per user? where does that come from?

>
> (That's a scary number but on the other hand 700MB of RAM is chump change
> for a company big enough to run a portal this size. Something like $400? And
> anyway, wouldn't the load be spread across a whole farm of servers, not just
> one?)
>

No, because this is connected as a user. How is one user session spread out
over a farm of machines? You would probably *want* a farm with a service this
popular, but you can't do that if you connect as a plain user.

> If you're saying is that the present Jabber server is not scalable to this
> size portal, that's sort of bad news for Jabber, it sounds like, since no
> large scale provider would adopt it.

I'm saying that you can't take protocol decisions made for an average roster
size of 10 and scale it to a roster size of a quarter million. I'm not saying
this is impossible, I'm saying that you have to approach it a different way.

Do you really think that normal clients should be allowed to grow to a roster
of several thousand users?

> In any case, this is a bogus scenario. Everyone seems to keep forgetting
> that Jabber is supposed to be a distributed system; while there will be
> large portals with large numbers of users, there will be large numbers of
> smaller servers, as well as special purpose servers for bots. The likely
> scenario is that a major bot would run on its own server (or perhaps there
> would be a small number of bots) hosted by the company that owns it. There
> would not be any appreciable number of actual users on this server. So
> there's 25MB for the bot's roster; that's about $20 worth of RAM I think.
> The other side of the overhead is distributed among the host servers of all
> the subscribers, and has the same effect as of all the subscribers adding
> one more friend to their buddy list.

If you are giving out a user address as the bot, it is a user; it cannot be
farmed, it cannot be distributed. Again, it has to be attacked another way.

-David Waite




More information about the JDev mailing list