Re: [xml] Problem with an old SGML



On Tue, 2005-11-08 at 00:44 +0100, Kail wrote:
I've a problem with an old SGLM.
This have many format error, the 2 most annoing are:

1- Have more than 1 element as root child

SGML does not allow this.

     //Start of file
     <reuters> ........ </reuters>
     <reuters> ........ </reuters>

As Daniel has suggested, this must be an external entity.
You are missing a main or "driver" file.

etc.
This file is 7 years old, but i need to parse it :(

Maybe use osx to convert it to XML -- it's part of
OpenJade I think these days.

There is a possibility to parse it without add a node from the start
of file to the end?

2- There are also some char like &#31; that obviusly are not
recognised and generate errors...there is a way to avoid the errors
and make the parser recognise  them as TEXT element avoiding the call
of xmlParseCharRef or make this function don't generate error? (an
Option i haven't found ^_^)

There should be a SGML Declaration which says which characters
are allowed in that SGML document.  It's often considered to be
part of the SGML DTD.

Typically you give something like osx the SGML declaration, the
DTD file, and the document, all in one stream.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]