[xml] Problem with an old SGML



I've a problem with an old SGLM.
This have many format error, the 2 most annoing are:

1- Have more than 1 element as root child
     //Start of file
     <reuters> ........ </reuters>
     <reuters> ........ </reuters>
etc.
This file is 7 years old, but i need to parse it :(
There is a possibility to parse it without add a node from the start
of file to the end?

2- There are also some char like &#31; that obviusly are not
recognised and generate errors...there is a way to avoid the errors
and make the parser recognise  them as TEXT element avoiding the call
of xmlParseCharRef or make this function don't generate error? (an
Option i haven't found ^_^)

thx in advance, and sorry for my english :)



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]