[JDEV] Re: Large scale Jabber bots

Al Sutton al at alsutton.com
Fri May 25 16:20:34 CDT 2001


The roster issue is one I've avoided by not requiring people to monitor the
presence of personalbuddy at jabber.com to get notifications. I looked over the
protocol docs and saw this as a potential weakness in terms of server load.

I think there should be some effort to standardise a subscription message
that doesn't require server intervention in order to facilitate systems that
act as bots. I prefer to have personalbuddy act as a client on the basis
that I currently can't afford a 24/7 connected server so I need to move the
system between machines during testing.

Al.

----- Original Message -----
From: "David Waite" <dwaite at jabber.com>
To: <jdev at jabber.org>
Sent: Friday, May 25, 2001 6:35 PM
Subject: Re: [JDEV] Re: Large scale Jabber bots


> Jens Alfke wrote:
>
> > [I've retitled this thread...]
> >
> > On Friday, May 25, 2001, at 08:50 AM, David Waite wrote:
> >
> > > IMO, mailservers should break if you try to send to more than ~20
people
> > > at a time. Not crash, but refuse to send.
> > >
> > >
> > What, all mailservers?! What about the ones that run mailing lists or
> > [voluntary] announcements? I've been on music mailing lists with
thousands
> > of subscribers, and I get very useful "what's new" mailings from Amazon
that
> > must have tens of thousands of readers.
>
> I said IMO, and I didn't say I was going to go out and start trying to
force
> people to quota their users. If I had an open system such as hotmail, I
> wouldn't want people to be able to create accounts and send out email to
> 400-500 people at a time - there just isn't any good reason to do it.
>
> > > Jabber should probably work
> > > the same way in this case (200 users max in a roster or something
> > > configurable like that)
> > >
> > >
> > Why on earth should that be a requirement? First off, it penalizes
people
> > for something that's not their fault. What happens when I can't add a
new
> > buddy because I happen to be on 195 people's rosters already?
> >
>
> You get rid of them? Why do you keep people on your roster if you don't
know
> who they are?
> Again, I didn't say it would be something that people were forced to do,
but
> there just isn't a good reason for someone to just have a thousand user
roster;
> at that point you should be implementing the logic some other way.
>
> > Secondly, it ignores very useful aspects of Jabber for information
delivery.
> > Stock price agents, auction agents, news agents, etc. Like I said
before,
> > companies are slavering over the potential of this and having a viable
open
> > IM network would make it much easier to do. These things are not spam,
> > they're voluntary since you have to subscribe to them. And they could
scale
> > to zillions of users. Just read this article:
>
> I'm very familiar with bots. I also know that the majority of bots do not
> require presence information at all, all users do is query their services.
>
> > So how many Radiohead fans do you think are going to subscribe to this
bot?
> > What happens when ActiveBuddy builds ones for N'Sync or Eminem? Are you
> > saying this kind of thing is inherently wrong and should not be
supported?
> >
>
> Yes, as a standard client, and as any service not known to the
administrator of
> the box. If you are running a server, do you want some user to connect,
> advertise their client as a bot, and start taking up memory and CPU usage
to
> the scale described below?
>
> > > Now imagine this is a portal with a quarter of a million users, and
the
> > > bot is added by default to everyone's roster. Not only would that
roster
> > > be about 25MB, there would be at least a 35MB memory image for the DOM
> > > tree created.
> > >
> > >
> > Those numbers aren't specific to having a bot, only to the size of the
> > portal itself. It makes no difference whether all 250,000 users have the
> > same bot in their rosters, or if their rosters all have different jids
in
> > them. In other words, you're saying that for a portal of this size,
there is
> > an average 35MB memory hit per roster entry per user -- so if the
average
> > portal user has 20 people in their roster, that's 700MB just for
rosters.
> >
>
> 35 MB per roster entry per user? where does that come from?
>
> >
> > (That's a scary number but on the other hand 700MB of RAM is chump
change
> > for a company big enough to run a portal this size. Something like $400?
And
> > anyway, wouldn't the load be spread across a whole farm of servers, not
just
> > one?)
> >
>
> No, because this is connected as a user. How is one user session spread
out
> over a farm of machines? You would probably *want* a farm with a service
this
> popular, but you can't do that if you connect as a plain user.
>
> > If you're saying is that the present Jabber server is not scalable to
this
> > size portal, that's sort of bad news for Jabber, it sounds like, since
no
> > large scale provider would adopt it.
>
> I'm saying that you can't take protocol decisions made for an average
roster
> size of 10 and scale it to a roster size of a quarter million. I'm not
saying
> this is impossible, I'm saying that you have to approach it a different
way.
>
> Do you really think that normal clients should be allowed to grow to a
roster
> of several thousand users?
>
> > In any case, this is a bogus scenario. Everyone seems to keep forgetting
> > that Jabber is supposed to be a distributed system; while there will be
> > large portals with large numbers of users, there will be large numbers
of
> > smaller servers, as well as special purpose servers for bots. The likely
> > scenario is that a major bot would run on its own server (or perhaps
there
> > would be a small number of bots) hosted by the company that owns it.
There
> > would not be any appreciable number of actual users on this server. So
> > there's 25MB for the bot's roster; that's about $20 worth of RAM I
think.
> > The other side of the overhead is distributed among the host servers of
all
> > the subscribers, and has the same effect as of all the subscribers
adding
> > one more friend to their buddy list.
>
> If you are giving out a user address as the bot, it is a user; it cannot
be
> farmed, it cannot be distributed. Again, it has to be attacked another
way.
>
> -David Waite
>
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev




More information about the JDev mailing list