[jdev] What to do with Invalid XML Characters

Matthias Wimmer m at tthias.eu
Sun Aug 12 08:05:43 CDT 2007


Norman Rasmussen schrieb:
> XML defines the list of valid characters to be:
>    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> 
> Which of the following should an IM application perform if the user
> (attempts to) enter characters outside of this range?

What might the user enter outside this range? I guess that the user is 
not able to accidently enter characters outside this range.

> 1) Reject the entry at the UI level - have to check both keypresses,
> and clipboard paste
> 2) UI should filter invalid chars before sending data to xmpp object layer

I'd check for invalid characters when converting data from the UI to the 
Application-Backend in methods of the backend. But I would not filter, 
but reject function/method containing invalid characters.

This allows you to reuse the checking if your UI changes but keep the 
backend which most likely will represent the data in an XML-DOM like 
manner in a state where only characters that are allowed by XML are present.

> 3) xmpp object layer should filter/reject data
> 4) xmpp stream layer should filter/reject xmpp object

An alternate possibility to handle the characters from #x0 - #x1F 
(excluding #x9, #xA and #xD) is to substitude them with the characters 
from #x2400 - #x241F.

... or you could use XML1.1 where the set of allowed characters is less 
restrictive: Char	   ::=   	[#x1-#xD7FF] | [#xE000-#xFFFD] | 
[#x10000-#x10FFFF]	/* any Unicode character, excluding the surrogate 
blocks, FFFE, and FFFF. */

But I am not sure if XMPP allowes usage of XML 1.1. I could not find 
anything on that at my first look at RFC3920 / RFC3920bis. It seems to 
be undefined.


Matthias



More information about the JDev mailing list