Re: [xml] receiving entity nodes with xmlTextReaderRead



Replying to my own post, I've tracked this down to what may be a bug in parser.c.  In my source base, which is 2.7.8, it's line 3883.

    /*
    * This may look absurd but is needed to detect
    * entities problems
    */
    if ((ent->etype != XML_INTERNAL_PREDEFINED_ENTITY) &&
(ent->content != NULL)) {
rep = xmlStringDecodeEntities(ctxt, ent->content,
  XML_SUBSTITUTE_REF, 0, 0, 0);
if (rep != NULL) {
    xmlFree(rep);
    rep = NULL;
}
    }


What is this code doing?  What "entities problems" is it avoiding?  Shouldn't it check ctxt->replaceEntities before replacing the entities?  Note that this only affects entities embedded in attribute values.

-Jonah

On Dec 1, 2010, at 2:51 PM, Jonah Petri wrote:

Hello,

I'm trying to use the xmlreader API to receive entity nodes (un-substituted) so I can do my own evaluation as I stream the document.  I need to do so because the names of the entities are significant, as well as their decoded values.

My XML document has an inline DTD at the top, which defines the entities I'm concerned about, something like:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE Constants [
  <!ENTITY kConstPI "3.14">
]>
<Doc>
<Thing val="&kConstPI;" />
</Doc>

My code, boiled down, looks like:

    m_pReader= xmlReaderForMemory( pUTF8XMLData, uLength, strBaseURL, NULL, XML_PARSE_NONET );
    xmlTextReaderSetParserProp(m_pReader, XML_PARSER_SUBST_ENTITIES, 0)
    int uResult = xmlTextReaderRead( m_pReader );
    while (uResult == 1) {
     
        switch( xmlReaderTypes(xmlTextReaderNodeType(m_pReader)) ) {
                
            case XML_READER_TYPE_ELEMENT:
            case XML_READER_TYPE_END_ELEMENT:
            case XML_READER_TYPE_TEXT:
            case XML_READER_TYPE_ENTITY:
            case XML_READER_TYPE_ENTITY_REFERENCE:
            case XML_READER_TYPE_DOCUMENT:
            case XML_READER_TYPE_NONE:
            case XML_READER_TYPE_ATTRIBUTE:
            case XML_READER_TYPE_CDATA:
            case XML_READER_TYPE_PROCESSING_INSTRUCTION:
            case XML_READER_TYPE_COMMENT:
            case XML_READER_TYPE_DOCUMENT_TYPE:
            case XML_READER_TYPE_DOCUMENT_FRAGMENT:
            case XML_READER_TYPE_NOTATION:
            case XML_READER_TYPE_WHITESPACE:
            case XML_READER_TYPE_SIGNIFICANT_WHITESPACE:
            case XML_READER_TYPE_END_ENTITY:
            case XML_READER_TYPE_XML_DECLARATION:
                printf("found: %s type %d\n", xmlTextReaderConstName(m_pReader), xmlTextReaderNodeType(m_pReader));
                break;
        }
        
        uResult = xmlTextReaderRead(m_pReader);
    }

I never see any ENTITY types coming through.  I must be doing something wrong, as this technique is specifically called out in http://xmlsoft.org/xmlreader.html, but I'm at a loss.

Any help would be appreciated!

-Jonah

Attachment: smime.p7s
Description: S/MIME cryptographic signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]