[xml] Possible loss of TEXT node with htmlParseDoc



Hi,

See below the result of a document dump after
a call to the htmlParseDoc() function.

Here is an extract of source code :

  ...
  htmlDocPtr doc;

  doc = htmlParseDoc((xmlChar *)"Hello <a href=\"world\">world</a> !",
"HTML");
  if (!doc)
    return;

  xmlDebugDumpDocument(stderr, doc);
  ...

HTML DOCUMENT
standalone=true
  DTD(HTML), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM
http://www.w3.org/TR/REC-html40/loose.dtd
  ELEMENT html
    ELEMENT body
                        <-- where is the TEXT node "Hello" ?
      ELEMENT a
        ATTRIBUTE href
          TEXT
            content=http://world.org
        TEXT
          content=world
      TEXT
        content= !

I have this bug (?) only when the input string contains some HTML tags.
But it works if there is any HTML tag at the beginning.

...
doc = htmlParseDoc((xmlChar *)"<br>Hello <a href=\"world\">world</a> !",
"HTML");
...

HTML DOCUMENT
standalone=true
  DTD(HTML), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM
http://www.w3.
org/TR/REC-html40/loose.dtd
  ELEMENT html
    ELEMENT body
      ELEMENT br
      TEXT                  <-- Ok
        content=Hello 
      ELEMENT a
        ATTRIBUTE href
          TEXT
            content=http://world.org
        TEXT
          content=world
      TEXT
        content= !

I found that problem in libxml 2.2.10 and 2.3.5 doesn't seem to fix it.
Has anyone already met that problem ?

Thanks in advance for your possible help.

Regards,

-- 
---------------------------------------
Frédéric Gicquel

rnr easybusiness fr
---------------------------------------




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]