[JDEV] utf8

Glen jdev at empireenterprises.com
Thu Dec 4 10:48:49 CST 2003


Dude!  

Finally figured out this utf8 perl nonsense.  

As it turns out, perl will use utf8 by default in it's strings; however,
there is a utf8 "flag" on each variable that is not turned on by
default.  

I was using the _is_utf8 function in the Encode module to test whether
the string was utf8, but this only checks for the flag, which has to be
explicitly set.  >:(

Basically, since I'm getting my content through LWP, I'm checking the
content for the character set.  I search for /charset=UTF8/, if it
doesn't exist, I convert to UTF8 using the Encode module: 

use Encode qw(encode);
my $string = encode("utf", $string);

My code was previously crashing as well, whenever it received a funky
character that iso-8559-1 didn't recognize, but this has taken care of
it.  

Hope this helps you...

-g

--search keywords: 
utf8
perl
fixed
help
--






On Wed, 2003-12-03 at 22:22, Jeremy Nickurak wrote:
> > On Mon, 2003-12-01 at 20:48, jdev at empireenterprises.com wrote:
> > I found the Encode module, which includes utf8 checking function, "is_utf8". 
> > According to this, my utf8 conversion functions are not working properly, as
> > is_utf8 is always returning false whenever I get content from LWP::UserAgent. 
> > 
> > 
> > I've tried using both Unicode::MapUTF8 & Encode modules, to no avail.  I'll keep
> > looking for perl utf8 information.  
> > 
> > -g
> > 
> > Quoting Nicholas Perez <nick at jabberstudio.org>:
> > 
> > > Depending on your Perl version, all strings should already be unicode 
> > > enabled. You should `man perluniintro` or `man perlunicode` for further 
> > > information.
> > > 
> > > 
> > > Glen wrote:
> > > 
> > > >Hmm.  
> > > >Any ideas on how I would determine whether a string is UTF-8 encoded or
> > > >not?  
> > > >
> > > >-g
> > > >
> > > >
> > > >
> > > >On Mon, 2003-12-01 at 18:19, Justin Karneges wrote:
> > > >  
> > > >
> > > >>Make sure you don't double-encode your data.  Your XML library probably 
> > > >>supports unicode already, and so there should be no need to explicitly
> > > encode 
> > > >>anything yourself.
> > > >>
> > > >>-Justin
> > > >>
> > > >>On Monday 01 December 2003 02:27 pm, Glen wrote:
> > > >>    
> > > >>
> > > >>>general public,
> > > >>>
> > > >>>I'm attempting to send multiple languages in a jabber message.
> > > >>>I'm using Net::Jabber to send, & I'm encoding content into UTF-8 with
> > > >>>Unicode::MapUTF8; however, I'm receiving gibberish in the client.
> > > >>>
> > > >>>I don't know much about Unicode, but from what I understand, there isn't
> > > >>>much to it.  My client (PSI on linux) supposedly supports UTF-8 - is
> > > >>>there something that I'm missing, or is there a direction anyone can
> > > >>>point me in?
> > > >>>
> > > >>>-g
> > > >>>
> > > >>>
> > > >>>
> > > >>>_______________________________________________
> > > >>>jdev mailing list
> > > >>>jdev at jabber.org
> > > >>>http://mailman.jabber.org/listinfo/jdev
> > > >>>      
> > > >>>
> > > >>_______________________________________________
> > > >>jdev mailing list
> > > >>jdev at jabber.org
> > > >>http://mailman.jabber.org/listinfo/jdev
> > > >>    
> > > >>
> > > >
> > > >_______________________________________________
> > > >jdev mailing list
> > > >jdev at jabber.org
> > > >http://mailman.jabber.org/listinfo/jdev
> > > >
> > > >  
> > > >
> > > 
> > > _______________________________________________
> > > jdev mailing list
> > > jdev at jabber.org
> > > http://mailman.jabber.org/listinfo/jdev
> > > 
> > 
> > 
> > 
> > 
> > ----------------------------------------------------------------
> > This message was sent using IMP, the Internet Messaging Program.
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> 
> I had no end of problems with UTF8 in perl writing janchor. I never did
> find any solutions, unfortunately. If you ever do find a solution, I'd
> be very interested in hearing it, as it's still a constant problem that
> causes crashes frequently.




More information about the JDev mailing list