RE : RE : [xml] Parsing invalid characters or entities ref

I agree with you on the fact that an application that take a "bugged" XML
document as input MUST inform the user that the document is corrupted (what
you call "yell").
But some applications - not critical - could give the choice to the user to
treat the XML despite the "character error".

For example : an XML editor.
If the invalid character is in a CharData (cf XML specs), the XML document
could be considered as well-formed.
In the case of an XML editor, the choice is : do we force the user to
correct the document before in a text editor or after the the XML editor
that do the parse.

Another example : a database that take XML description of data to be
imported in.
If the data are identifiable and the invalid charcters are in informations
values, the question is let the user import the data and change the values
after or correct the XML document in a text editor.

More examples : I don't see, but ... ;)

Reminder (from the XML 1.0 specs) :
[43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
Where are the invalid characters specified elsewhere the encoding spec
(ASCII, UTF8, etc...)?

-----Message d'origine-----
De : Daniel Veillard [mailto:veillard redhat com] 
Envoyé : mercredi 5 novembre 2003 11:48
À : Morus Walter
Cc : xml gnome org
Objet : Re: RE : [xml] Parsing invalid characters or entities ref

On Wed, Nov 05, 2003 at 11:42:15AM +0100, Morus Walter wrote:
Daniel Veillard writes:
On Wed, Nov 05, 2003 at 11:11:37AM +0100, GARNIER Pierre wrote:
In fact I was asking myself about the opportuneness to give the 
possibility in the apis (as an option) to replace invalid 
characters - where it is possible - by a given replacement 
character and so to continue parsing.

  Well, clearly this violates the behaviour stated by the XML spec. 
So I don't think it's proper to add this for any kind of general 

Couldn't he just use the io layer to do some prefiltering? That would 
remove the problem before the parser sees it.

  or the encoding layer, yes that would be 2 possibilities. But in the face
of broken XML the right thing to do 99% of the time it to yell at whoever
provided something broken so that they fix the problem on their side.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine
xml mailing list, project page xml gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]