[xml] libxml newbie question on htmlParseChunk function



Hi all,
My very first post in this mailing list :)

Ok, i'm trying to unhtmlize some text, using the SAX
model.

Here is how i initialize the parser

void unhtmlizeHandleCharacters(void *user_data, const
xmlChar * string,
                   int length)
{
   fprintf(stderr,"string = %s", (gchar *)string);
   //process string here...
}
void unhtmlize(text)
{
    sax_p = g_new0(htmlSAXHandler, 1);
    sax_p->characters = unhtmlizeHandleCharacters;
    ctxt =
    htmlCreatePushParserCtxt(sax_p, buffer, string,
strlen(string), "",
                 XML_CHAR_ENCODING_UTF8);
    htmlParseChunk(ctxt, string, 0, 1);
}    


What's interesting is, this works with 'normal' text.
However if
text = "abc < xyz"

Then i see in the debug in func handleCharacters that
it only takes "abc " as the string, everything after
this '<' character is omitted.

So my func unhtmlize("abc < xyz") gives "abc " as the
result. 

How can i over come this? Any reply much appreciated. 

Thanks in advance
TranVan Hoang,



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]