Re: [xml] DTDs and null SystemID/ExternalID ?



I think you've already answered by saying that the ParseDTD functions are only for external subsets. There's where I perhaps went wrong. I'm creating the dtd content in memory in preparation for writing it out as part of an xml file that has an internal subset. I hope to be directed to that part of the API that should be used to do this. Please read further for my specific responses to your points:


 Yes that's legal, that's called an internal subset, and no it's not
the last third case as you can combine SYSTEM or PUBLIC identifiers and
exend their definitions with an internal subset too.

  Spec is there
    http://www.w3.org/TR/REC-xml/#NT-doctypedecl
please don't paraphrase especially when being wrong. take the time to read
the spec it will help being understood and avoid this kind of errors.

A more-than-fair criticism. The rest of my answers are without the benefit of me having absorbed and understood the spec as well as I should. That reading is on my list of TDD.


  the ParseDTD functions are there to load an external subset, not to
try to tweak things like that. What you're attempting may work if you
understand the internal representation of libxml2 and be careful though.

OK. See my opening comment. Hopefully I can also learn how to be careful. Please note that the reason my OP actually referred to the libxml2 source, is *precisely* because I'm trying to learn more about its internal representation.


 > I've boxed myself into a corner. Snooping the libxml2 source
 > (xmlsave.c, xmlDtdDumpOutput), I see that I can create the 3rd type
 only if SystemID and ExternalID are NULL.

  Sorry, that premice sounds wrong to me. Please explain ! SystemID
 and ExternalID where ?

The xmlDtd record structure, which contains an entry for both. Please refer to xmlsave.c, in xmlDtdDumpOutput (as quoted above in my OP). If you just read the code, you see that if SystemID and ExternalID are NULL, neither SYSTEM nor PUBLIC are inserted. At least that's the way it reads to me. Maybe I'm stupid.


 > But this is directly at
 > odds with xmlIOParseDTD, which ultimately allocates both and fills
 them in with "none".

  I don't think I ever use "none", it may be NULL, and that's normal
it's not part of the file being parsed.

Again I must be way off base, but in parser.c, in the xmlIOParseDTD routine, you call xmlNewDtd, and pass '"none"' not once but 3 times. If you refer to xmlNewDtd in tree.c, you will see that the last 2 "nones" are for SystemID and ExternalID. Bottom line, neither SystemID nor ExternalID are initialized to NULL.

What can I say here? I mean, that's what the libxml2 code does. I can't explain why you say "none" is never used when it is, or that a premise about SystemID and ExternalID being NULL is wrong, when there are explicit conditionals testing for NULL. I can only conclude that my approach is *so* naive and off-base that it never occurred to you that anyone would ever try to look at it in so ridiculous a fashion :-)


 > You can't naively can't set them to NULL after the fact, because you
 will die in xmlFreeDoc.

  I'm not sure I understand, explain !

Hopefully my above explanation helps clear this up.


  First read the spec it will help understanding the code and the
terminology. Second I don't understand what you're trying to achieve.
 "create the dtd in memory" how ? The DOM tree for the DTD ? The
serialization of a DTD to a string ?

  Very confusing ...

I am sorry to be so confusing. This has been discussed on the list before and you have participated in the threads - that's where I got the idea. I allocate a buffer in memory, and write the contents of the dtd (elements names etc) to that buffer. I then create the xml nodes and their content and write it all out to disk. I may be guilty of not understanding the spec as I should, but there is plenty of documentation out there about creating an xml file that has the dtd embedded in it.

My question boils down to that - if I want to create an xml file with embedded dtd according to my case #3, which is legal as you say, an internal subset, then both SYSTEM and PUBLIC are correctly missing. But they're not missing if they have been initialized as an xmlChar* to "none". Apparently I need to use a different part of the API. I may in fact stumble on that in the list archive or by googling if you decline to answer... but help is always appreciated.



Stefan Jeglinski



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]