Re: [xml] DTDs and null SystemID/ExternalID ?



On Sat, Jul 07, 2007 at 04:35:20PM -0400, Stefan Jeglinski wrote:
I think you've already answered by saying that the ParseDTD functions 
are only for external subsets.

  Well that's by definition, if you can parse the DTD independantly 
of a containing document that means you're loading a DTD file and
hence what would be an external subset if referenced. 

There's where I perhaps went wrong. 
I'm creating the dtd content in memory in preparation for writing it 
out as part of an xml file that has an internal subset. I hope to be 
directed to that part of the API that should be used to do this. 

  You can try to take the elements under the DTD and prune them 
under the new document.


 > I've boxed myself into a corner. Snooping the libxml2 source
 > (xmlsave.c, xmlDtdDumpOutput), I see that I can create the 3rd type
 only if SystemID and ExternalID are NULL.

  Sorry, that premice sounds wrong to me. Please explain ! SystemID
 and ExternalID where ?

The xmlDtd record structure, which contains an entry for both. Please 
refer to xmlsave.c, in xmlDtdDumpOutput (as quoted above in my OP). 
If you just read the code, you see that if SystemID and ExternalID 
are NULL, neither SYSTEM nor PUBLIC are inserted. At least that's the 
way it reads to me. Maybe I'm stupid.

  no and that's normal, if you have just an internal subset then
you won't serialize the SYSTEM or PUBLIC and go straight to serializing the
internal subset which should be the set of children under that DTD element.


 > But this is directly at
 > odds with xmlIOParseDTD, which ultimately allocates both and fills
 them in with "none".

  I don't think I ever use "none", it may be NULL, and that's normal
it's not part of the file being parsed.

Again I must be way off base, but in parser.c, in the xmlIOParseDTD 
routine, you call xmlNewDtd, and pass '"none"' not once but 3 times. 

I didn't expect xmlIOParseDTD to create the DTD node in the tree (but
to develop relatively simple external subsets programmatically).
Use xmlNewDTD and move the child nodes of what you obtained by parsing
the external subset.

What can I say here? I mean, that's what the libxml2 code does. I 
can't explain why you say "none" is never used when it is, or that a 
premise about SystemID and ExternalID being NULL is wrong, when there 
are explicit conditionals testing for NULL. I can only conclude that 
my approach is *so* naive and off-base that it never occurred to you 
that anyone would ever try to look at it in so ridiculous a fashion 
:-)

  Well the normal way to create an internal subset is to call the routines
creating entities, element and attribute declarations.
  There is things which can exist in external subset and don't need to be
serialized (like conditional sections) precisely because the external subset
is not saved as part of the document. They can't appear in internal subset.
DTD are far less structured than the element tree itself, they can't
always be represented that way, you get only the parsing result in libxml2
this may or may not be sufficient for your usage. 


I am sorry to be so confusing. This has been discussed on the list 
before and you have participated in the threads - that's where I got 
the idea.

I really can't remember, must be some time ago. I'm not saying it's
impossible, but this can be non trivial or not fully represent what
you had in your external subset. I can't tell because I don't know or
remember why you're doing this and for some DTD you just can't, as I said
some constructs which are fine in external subsets aren't in internal
subsets, or just can't be represented in libxml2 tree representation
(because being external they don't need to show up in the tree and 
they are just not structured). This may work or not, hopefully if
your DTD is simple that should work. 

My question boils down to that - if I want to create an xml file with 
embedded dtd according to my case #3, which is legal as you say, an 
internal subset, then both SYSTEM and PUBLIC are correctly missing. 
But they're not missing if they have been initialized as an xmlChar* 
to "none". Apparently I need to use a different part of the API. I 
may in fact stumble on that in the list archive or by googling if you 
decline to answer... but help is always appreciated.

Answer:
    Don't move the DTD node with those attributes in the doc,
    unlink and move the children to a new DTD in the doc. Then 
    call the routine to free the DTD.

Freeing the strings and replacing by NULL should probably work too
as they can't come from a dictionnary. You say it's crash but I can't
guess why nor where, so I suggest an alternative.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]