[JDEV] Re: Large scale Jabber bots

Thomas Charron tcharron at ductape.net
Fri May 25 15:23:43 CDT 2001


From: "David Waite" <dwaite at jabber.com>
Subject: Re: [JDEV] Re: Large scale Jabber bots
> > Secondly, it ignores very useful aspects of Jabber for information
delivery.
> > Stock price agents, auction agents, news agents, etc. Like I said
before,
> > companies are slavering over the potential of this and having a viable
open
> > IM network would make it much easier to do. These things are not spam,
> > they're voluntary since you have to subscribe to them. And they could
scale
> > to zillions of users. Just read this article:
> I'm very familiar with bots. I also know that the majority of bots do not
> require presence information at all, all users do is query their services.

    Sounds like an iq request to a transport to me..  8-)

> > So how many Radiohead fans do you think are going to subscribe to this
bot?
> > What happens when ActiveBuddy builds ones for N'Sync or Eminem? Are you
> > saying this kind of thing is inherently wrong and should not be
supported?
> Yes, as a standard client, and as any service not known to the
administrator of
> the box. If you are running a server, do you want some user to connect,
> advertise their client as a bot, and start taking up memory and CPU usage
to
> the scale described below?

    Thats a good point.  Should servers enforce limitations for the sake of
actually running?  There are probrably good poitns on both sides of the
fence.  It's a good point becouse the server will send out presence to
everyone, which really bypasses the entire idea of a karma limit on the
socket.  Death by buddy entries..  8-)

> > Those numbers aren't specific to having a bot, only to the size of the
> > portal itself. It makes no difference whether all 250,000 users have the
> > same bot in their rosters, or if their rosters all have different jids
in
> > them. In other words, you're saying that for a portal of this size,
there is
> > an average 35MB memory hit per roster entry per user -- so if the
average
> > portal user has 20 people in their roster, that's 700MB just for
rosters.
> 35 MB per roster entry per user? where does that come from?

    No, 35 Meg per 250k roster entires, one per user.

> > (That's a scary number but on the other hand 700MB of RAM is chump
change
> > for a company big enough to run a portal this size. Something like $400?
And
> > anyway, wouldn't the load be spread across a whole farm of servers, not
just
> > one?)

    I know mass didn't write this part, but after rereading it, I have to
say, as this is a development list, how would the original auther solve
this?  *NOT ALLOW ROSTERS?*  Not allow presence?  Guess who their friends
are?  Kill the servers bandwidth by sending all of this data thru the
sockets?  Memory is cheaper then bandwidth, and can be sacled accross
machines, as mass notes below..

> No, because this is connected as a user. How is one user session spread
out
> over a farm of machines? You would probably *want* a farm with a service
this
> popular, but you can't do that if you connect as a plain user.
> > If you're saying is that the present Jabber server is not scalable to
this
> > size portal, that's sort of bad news for Jabber, it sounds like, since
no
> > large scale provider would adopt it.
> I'm saying that you can't take protocol decisions made for an average
roster
> size of 10 and scale it to a roster size of a quarter million. I'm not
saying
> this is impossible, I'm saying that you have to approach it a different
way.

    And it's DEFINATLY not a 'architecturally superior' decision.

> > In any case, this is a bogus scenario. Everyone seems to keep forgetting
> > that Jabber is supposed to be a distributed system; while there will be
> > large portals with large numbers of users, there will be large numbers
of
> > smaller servers, as well as special purpose servers for bots. The likely
> > scenario is that a major bot would run on its own server (or perhaps
there
> > would be a small number of bots) hosted by the company that owns it.
There
> > would not be any appreciable number of actual users on this server. So
> > there's 25MB for the bot's roster; that's about $20 worth of RAM I
think.
> > The other side of the overhead is distributed among the host servers of
all
> > the subscribers, and has the same effect as of all the subscribers
adding
> > one more friend to their buddy list.

    Once again, after a second read, I need to respond.  bots are used when
an end user is *always* considered to be a human.  Thats why they call them
bots.  In the case of jabber, the entire idea of a nonperson resource is
taken care of.  NOT everything here is an individual person.  Hence, the
need for a bot is negated, unless you wish to allow users to run rogue
server applications at their slightest whim.  This would be unsafe, and to
say that the client protocol and system is unsound becouse it's not built
for it is rubish.  Go ahead and try to send an email SMTP email to 250k
users in one email.  Watch your bandwidth get eaten alive.  Bandwidth ain't
cheap, and sure can't be farmed or upgraded.  Now, once the servers really
start chewing on those 250k entires, see how much memory is consumed.  Oh,
and the disk space in /var/spool/mqueue.

    While your looking at it, think about how much nicer it'd have been if
that mail had been send from a designated spam mailer, meant for this sort
of thing.  (Assuming you don't just have 250k close friends, this is spam..
8-P)  Thats what transports and services are *FOR*.  Individuals should be
providing broad based services.  Transports and services should be doing
this.

    Oh, and go aheead and run that test with sendmail and see what happens.
The server will reject you.  Man, that sendmail must suck, and must be so
architechturally unsound for not letting you do it, hu?.




More information about the JDev mailing list