[jdev] Ok, here's my tiny little Perl script

Gaspar, Al (EES) Al.Gaspar at lrn.va.gov
Fri Sep 30 07:14:42 CDT 2005


I'm kind of at the tail end of this, but I had problems with UTF-8 and some
Microsoft coding in a perl script that had nothing to do with jabber; it
sounded like it could help.  I was querying a sql database and writing out
an RSS feed.  I ended up using the Encode module.  Here are the relevant
code fragments and comments from that script.  I hope it is useful.

Cheers--

Al

...

use Encode;           # Module to handle Unicode and UTF-8

...

# 
	# General data--it is possible that the title and description fields
	# could contain data that is WINDOWS-1252 (Microsoft Code Page
1252).
	# decode() translates the WINDOWS-1252 code to the appropriate (Perl
	# internal) UTF8 code.  By decoding these fields we ensure that the
	# proper UTF-8 code is used for these characters rather than any
	# default produced by opening our output files as :utf8 in
	# write_feeds().
	# 
	title	=> decode ('WINDOWS-1252', $title),
	description => decode ('WINDOWS-1252', $description),
	link =>
"http://vaww.sites.lrn.va.gov/vacatalog/cu_detail.asp?id=$id",

...

#
       # We open the channel to output in UTF-8 encoding.  This eliminates
       # any "wide character" errors and ensures that our feeds are
completely
       # UTF-8 just in case our decode() back in build_feed() missed
something.
       #
       open( CHANNEL,'>:utf8', "Channel$channel.rss") || die "Cannot open
file Channel$channel.rss for write: $!";
       print CHANNEL ${"rss_".$channel}->as_string;
       close CHANNEL;
	
...

> -----Original Message-----
> From: jdev-bounces at jabber.org [mailto:jdev-bounces at jabber.org] On Behalf
> Of John Talbot
> Sent: Thursday, September 29, 2005 1:18 PM
> To: Jabber software development list
> Subject: Re: [jdev] Ok, here's my tiny little Perl script
> 
> Tijl Houtbeckers wrote:
> > On Thu, 29 Sep 2005 17:57:04 +0200, John Talbot <jtalbot at proionta.gr>
> > wrote:
> >
> >>
> >> That is very surprising. Since Perl probably has nothing to do with the
> >> unicode here, the culprit has to be jabberd then. I'll try to upgrade
> >> (though I use the apt-get system for which the most recent versions
> >> don't always exist).
> >
> > Back up a second there ;) Perl is notorious for being bad with
> > unicode, wether that's a repuation still deserved I don't know, but it
> > was justified in the past. And afaik never in the history of jabberd
> > has there been such a serious problem with UTF-8 handeling.
> 
> You are absolutely right. I tried using another public jabber server,
> before installing one myself, and same malfunction happened.
> I even tried Psi, and that didn't make it work either.
> 
> > I think you should consider first:
> > - is the file UTF-8? (you seem to have this covered)
> 
> Yes.
> 
> > - is your version of Perl configured right to read unicode UTF-8files..
> 
> I'm not sure about this... Perl can get configured? During compile time
> you mean? Also I didn't think that Perl could have anything to do with
> this, because the libraries (Net::XMPP::etc and Net::Jabber::etc) don't
> contain the string 'utf8' anywhere, so I was guessing that these
> libraries were just passing whatever data they found inside the <body>
> tags without regard for utf8 compliance... but I guess I was wrong?
> 
> > - *and* to use unicode for string handeling by default
> 
> No, it doesn't have that (and 5.8.6 has that?). It has got to be Perl's
> fault, but perhaps there's a way to avoid installing a second version of
> Perl on my system (I've got to keep the old one too, so many .debs are
> dependent on it in some way) - is this advice you're giving accurate?
> i.e. are newer versions of Perl handling unicode by default? If so,
> maybe I can just plug a 'use utf8;' command or type-in some CLI switch
> and make it all happen...
> 
> And what do you mean by Perl being able to read unicode files right?
> Aren't utf8 files just a series of bytes in Perl's eyes, just like any
> other file?
> 
> > - do you have the most recent Net::Jabber
> 
> Yes, and there's only two versions of Net::Jabber (0.1 and 1.0) so all
> have the same.
> 
> > Not critizing you or anything, but I'm worried you'll loose a lot of
> > time accomplishing very little trying to find a UTF-8 bug in jabberd.
> > Jabberd's unicode handeling is independant of how your system is
> > setup, how you build it, etc. The same can not be said for Perl. If
> > you're still not convinced it's not jabberd, try another server (a
> > public one).
> 
> Yes, and thanks, you did save me. Installing jabberd 1.4.4 was the next
> thing I was going to do... but it's Perl's fault actually :-)



More information about the JDev mailing list