[jdev] What to do with Invalid XML Characters
Matthias Wimmer
m at tthias.eu
Sun Aug 12 08:05:43 CDT 2007
Norman Rasmussen schrieb:
> XML defines the list of valid characters to be:
> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
>
> Which of the following should an IM application perform if the user
> (attempts to) enter characters outside of this range?
What might the user enter outside this range? I guess that the user is
not able to accidently enter characters outside this range.
> 1) Reject the entry at the UI level - have to check both keypresses,
> and clipboard paste
> 2) UI should filter invalid chars before sending data to xmpp object layer
I'd check for invalid characters when converting data from the UI to the
Application-Backend in methods of the backend. But I would not filter,
but reject function/method containing invalid characters.
This allows you to reuse the checking if your UI changes but keep the
backend which most likely will represent the data in an XML-DOM like
manner in a state where only characters that are allowed by XML are present.
> 3) xmpp object layer should filter/reject data
> 4) xmpp stream layer should filter/reject xmpp object
An alternate possibility to handle the characters from #x0 - #x1F
(excluding #x9, #xA and #xD) is to substitude them with the characters
from #x2400 - #x241F.
... or you could use XML1.1 where the set of allowed characters is less
restrictive: Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */
But I am not sure if XMPP allowes usage of XML 1.1. I could not find
anything on that at my first look at RFC3920 / RFC3920bis. It seems to
be undefined.
Matthias
More information about the JDev
mailing list