[JDEV] look for help about unicode in jabber system
Dave
dave at dave.tj
Sun Aug 18 08:55:20 CDT 2002
Making a typedef isn't "extending" the language ... come on. C was
designed even before Unicode's dad (ASCII) became the de facto standard.
There's no way you can expect a built-in type to work with Unicode chars
(although you could use an int array, with an appropriate output filter).
It's worth noting, incidentally, that UTF-8 is substantially more compact
for most machine-carried text in today's world (although when China gets
with it and embraces Capitalism that may change), so using NULL as a
string terminator ain't too bad, after all, anyway.
I've never worked on an OpenVMS Alpha, so I have no clue what kind of
supercompilers they've got. All I know is that the conventional x86
and Apple compilers I worked with didn't know the meaning of the word
"optimize." If there's anything I like about PASCAL that I felt was
missing in C, it'd have to be nested functions (which GNU CC supports,
anyway).
I guess it's useful to have strict interfaces documented (and maybe even
have the compiler warn by default when you violate them), but I hate
compilers that try to prevent you from doing what they think amounts to
shooting yourself in the foot. (They're close cousins of Word processors
that don't let you do what they think you don't want to do.)
- Dave
Timothy Carpenter wrote:
>
> Apologies about the <NULL> tag jibe - it was late and I have never forgiven
> C for having to be extended to get around what I saw as ox-headed string
> handling.
>
> Well, I cannot speak for other ex-PASCAL programmers, but when I used it on
> 64-bit OpenVMS Alphas we had awareness of 64-bit processing, quad
> pipelining, hits due to call stacks, local and remote jumps, indirection,
> L1&2 cache behaviour, register use, soft and hard page faults and compiler
> optimisation strengths and weaknesses. The AXP compiler was red hot and took
> care to make fast code out of PASCAL and C alike. I dug in to the assembler
> to see how on occasion and to compare programming styles for future
> reference.
>
> So, PASCAL programmers concerned with efficiency did exist, as now do
> reliable and robust C programmers, which I notice in abundance here, in the
> Jabber world (and why I feel at home).
>
> <crosspost type="Warning" list="jig">
> I see the problems of UTF-8 and binary headers as very similar - both are
> bit-packed conditionally-sized data. Thus, if we can handle UTF-8 properly,
> we can handle binary headers properly. It is up to awareness in design to
> avoid placing data across obvious boundaries. I would even go to say that we
> need to be careful of embedded devices, so assuming 64-bit registers may
> still be optimistic at this time.
>
> My admittedly crude point about PASCAL vs. C was we should seek out and use
> systematic and 'tight' practices, e.g. interfaces, strong typing or
> libraries.
> </crosspost>
>
> Tim
>
>
> On 16/08/2002 10:50 pm, "Dave" <dave at dave.tj> wrote:
>
> > C doesn't require NULL-terminated strings. It's just that the standard
> > C string library assumes that strings end in NULL (since that method's
> > proven to be very effective for many applications). There are plenty
> > of enumerated-string libraries for C, and because strings aren't built
> > into the language, those libraries can be every bit as efficient as
> > the standard C routines (but then again, PASCAL people don't really
> > care much about efficiency, anyway ... if they did, they wouldn't be
> > PASCAL programmers, now, would they?). If anything, one of C's sons
> > (that bastard created by Mr. Stroustrup) makes it rediculously easy
> > to use Unicode in the full UCS-32 format (or any of the other formats,
> > for that matter), by creating a new character data type, and using the
> > should've-been-in-STL basic_string template with that new UCS32Char type.
> > If you'd prefer to avoid leaving C (a very wise choice, IMHO), you can
> > use a wchar_t array ... or you can just stick with the extraordinarily
> > simple (and very compatible) UTF-8 :-)
> >
> > As for alignment of structure elements, anything like that is guaranteed
> > to cause portability headaches. If you really want to do it in C, you can
> > either fake it using character arrays, or use an inline assembly block.
> > Be aware that neither C nor PASCAL provides sufficient portability
> > when you try to do that kind of stuff, because that requirement by
> > definition violates any hopes of portability (which is not necessarily
> > bad, but it's worth considering nonetheless). Also, the primary reason
> > for system-dependent alignment is efficiency. If your 64-bit CPU has
> > to fetch two seperate 64-bit words just to get a 2-bit value, you're
> > losing lots of potential speed.
> >
> > - Dave
> >
> >
> > Timothy Carpenter wrote:
> >>
> >> I do not think CHAR to UNICODE is the answer. CHAR is 8 bit, but UTF-8 is a
> >> way of sending UNICODE without breaking 'text' streams with data that looks
> >> like CR, LF EOF EOLN etc etc. RCSU is also another mechanism that is very
> >> intelligent use of packing, processing and compromising between ASCII and
> >> full 16-bit character sets, but I cannot recall if this protects text stream
> >> handlers from shocks. UTF-8 is less compact, but simpler, with no sliding
> >> windows.
> >>
> >> To convert is not a huge task, to my memory - just a little masking and bit
> >> shuffling...shame no one uses PASCAL, as apart from not using <NULL> end
> >> tags for strings (yeah!), you can define structures to have conditional
> >> contents nailed down to the bit position, and even crossing
> >> byte/word/longword boundaries. Thus the data slots in without too much math
> >> nonsense all over the place.
> >>
> >> Maybe this is why many C programmers quail at the thought of binary
> >> bit-packed headers and say they are unmaintainable. They probably are...in
> >> C. ;-)
> >>
> >> Tim
> >>
> >> On 17/08/2002 12:38 pm, "ÕÅ Æé" <jabberjaist at hotmail.com> wrote:
> >>
> >>> do the jabber system support to east aisa GLYPH images,chinese ,japanese
> >>> and korea.I want
> >>> my jabber server support to unicode of east aisa.but I get a trouble. my
> >>> friend tell me.
> >>> just below ,is it right ,or have a better way to resolve the problem.
> >>>
> >>>
> >>> 6¦1Jabber uses UTF-8 encoding
> >>> 6¦1We have not been facing any problems because we have been operating in
> >>> the
> >>> ASCII domain which is a subset of UTF-8.
> >>> 6¦1We need to find some kind of encoding algorithm/API which converts
> >>> Unicode
> >>> to UTF-8 before we send out strings to the server and some kind of decoding
> >>> Algorithm/API which does the opposite when we receive strings.
> >>> 6¦1We need some kind of rendering mechanism has to make the mapping from
> >>> unicode to the actual character.
> >>> 6¦1
> >>>
> >>> 6¦1There are a couple of Microsoft APIs called MultiByteToWideChar and
> >>> CharToMultiByte
> >>> 6¦1There is an Mlang API of Microsoft which has functions like
> >>> ConvertStringToUnicode and ConvertUnicodeToString (I think this is our best
> >>> bet. If we read this thoroughly we might be able to solve the problem)
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> jdev mailing list
> >>> jdev at jabber.org
> >>> http://mailman.jabber.org/listinfo/jdev
> >>
> >> __________________________________________________
> >> Do You Yahoo!?
> >> Everything you'll ever need on one web page
> >> from News and Sport to Email and Music Charts
> >> http://uk.my.yahoo.com
> >> _______________________________________________
> >> jdev mailing list
> >> jdev at jabber.org
> >> http://mailman.jabber.org/listinfo/jdev
> >>
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev
>
More information about the JDev
mailing list