[jdev] Ok, here's my tiny little Perl script
Gaspar, Al (EES)
Al.Gaspar at lrn.va.gov
Fri Sep 30 07:14:42 CDT 2005
I'm kind of at the tail end of this, but I had problems with UTF-8 and some
Microsoft coding in a perl script that had nothing to do with jabber; it
sounded like it could help. I was querying a sql database and writing out
an RSS feed. I ended up using the Encode module. Here are the relevant
code fragments and comments from that script. I hope it is useful.
Cheers--
Al
...
use Encode; # Module to handle Unicode and UTF-8
...
#
# General data--it is possible that the title and description fields
# could contain data that is WINDOWS-1252 (Microsoft Code Page
1252).
# decode() translates the WINDOWS-1252 code to the appropriate (Perl
# internal) UTF8 code. By decoding these fields we ensure that the
# proper UTF-8 code is used for these characters rather than any
# default produced by opening our output files as :utf8 in
# write_feeds().
#
title => decode ('WINDOWS-1252', $title),
description => decode ('WINDOWS-1252', $description),
link =>
"http://vaww.sites.lrn.va.gov/vacatalog/cu_detail.asp?id=$id",
...
#
# We open the channel to output in UTF-8 encoding. This eliminates
# any "wide character" errors and ensures that our feeds are
completely
# UTF-8 just in case our decode() back in build_feed() missed
something.
#
open( CHANNEL,'>:utf8', "Channel$channel.rss") || die "Cannot open
file Channel$channel.rss for write: $!";
print CHANNEL ${"rss_".$channel}->as_string;
close CHANNEL;
...
> -----Original Message-----
> From: jdev-bounces at jabber.org [mailto:jdev-bounces at jabber.org] On Behalf
> Of John Talbot
> Sent: Thursday, September 29, 2005 1:18 PM
> To: Jabber software development list
> Subject: Re: [jdev] Ok, here's my tiny little Perl script
>
> Tijl Houtbeckers wrote:
> > On Thu, 29 Sep 2005 17:57:04 +0200, John Talbot <jtalbot at proionta.gr>
> > wrote:
> >
> >>
> >> That is very surprising. Since Perl probably has nothing to do with the
> >> unicode here, the culprit has to be jabberd then. I'll try to upgrade
> >> (though I use the apt-get system for which the most recent versions
> >> don't always exist).
> >
> > Back up a second there ;) Perl is notorious for being bad with
> > unicode, wether that's a repuation still deserved I don't know, but it
> > was justified in the past. And afaik never in the history of jabberd
> > has there been such a serious problem with UTF-8 handeling.
>
> You are absolutely right. I tried using another public jabber server,
> before installing one myself, and same malfunction happened.
> I even tried Psi, and that didn't make it work either.
>
> > I think you should consider first:
> > - is the file UTF-8? (you seem to have this covered)
>
> Yes.
>
> > - is your version of Perl configured right to read unicode UTF-8files..
>
> I'm not sure about this... Perl can get configured? During compile time
> you mean? Also I didn't think that Perl could have anything to do with
> this, because the libraries (Net::XMPP::etc and Net::Jabber::etc) don't
> contain the string 'utf8' anywhere, so I was guessing that these
> libraries were just passing whatever data they found inside the <body>
> tags without regard for utf8 compliance... but I guess I was wrong?
>
> > - *and* to use unicode for string handeling by default
>
> No, it doesn't have that (and 5.8.6 has that?). It has got to be Perl's
> fault, but perhaps there's a way to avoid installing a second version of
> Perl on my system (I've got to keep the old one too, so many .debs are
> dependent on it in some way) - is this advice you're giving accurate?
> i.e. are newer versions of Perl handling unicode by default? If so,
> maybe I can just plug a 'use utf8;' command or type-in some CLI switch
> and make it all happen...
>
> And what do you mean by Perl being able to read unicode files right?
> Aren't utf8 files just a series of bytes in Perl's eyes, just like any
> other file?
>
> > - do you have the most recent Net::Jabber
>
> Yes, and there's only two versions of Net::Jabber (0.1 and 1.0) so all
> have the same.
>
> > Not critizing you or anything, but I'm worried you'll loose a lot of
> > time accomplishing very little trying to find a UTF-8 bug in jabberd.
> > Jabberd's unicode handeling is independant of how your system is
> > setup, how you build it, etc. The same can not be said for Perl. If
> > you're still not convinced it's not jabberd, try another server (a
> > public one).
>
> Yes, and thanks, you did save me. Installing jabberd 1.4.4 was the next
> thing I was going to do... but it's Perl's fault actually :-)
More information about the JDev
mailing list