Re: [xml] Possible loss of TEXT node with htmlParseDoc

From: Daniel Veillard <veillard redhat com>
To: Frédéric GICQUEL <frederic gicquel easybusiness fr>
Cc: xml gnome org
Subject: Re: [xml] Possible loss of TEXT node with htmlParseDoc
Date: Sun, 22 Apr 2001 04:48:25 -0400

On Thu, Apr 05, 2001 at 04:46:33PM +0200, Frédéric GICQUEL wrote:

Hi,

See below the result of a document dump after
a call to the htmlParseDoc() function.

Here is an extract of source code :

  ...
  htmlDocPtr doc;

  doc = htmlParseDoc((xmlChar *)"Hello <a href=\"world\">world</a> !",
"HTML");
  if (!doc)
    return;

  xmlDebugDumpDocument(stderr, doc);
  ...

HTML DOCUMENT
standalone=true
  DTD(HTML), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM
http://www.w3.org/TR/REC-html40/loose.dtd
  ELEMENT html
    ELEMENT body
                      <-- where is the TEXT node "Hello" ?
      ELEMENT a
        ATTRIBUTE href
          TEXT
            content=http://world.org
        TEXT
          content=world
      TEXT
        content= !

I have this bug (?) only when the input string contains some HTML tags.
But it works if there is any HTML tag at the beginning.


  Well I can't reproduce this with the current version:

orchis:~/XML -> cat tst.html
Hello <a href=\"world\">world</a> !
orchis:~/XML -> ./testHTML --debug tst.html
HTML DOCUMENT
URL=tst.html
standalone=true
  DTD(HTML), PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN, SYSTEM http://www.w3.org/TR/REC-html40/loose.dtd
  ELEMENT html
    ELEMENT body
      ELEMENT p
        TEXT
          content=Hello 
        ELEMENT a
          ATTRIBUTE href
            TEXT
              content=\"world\"
          TEXT
            content=world
        TEXT
          content= ! 
orchis:~/XML -> 

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Follow-Ups:
- Re: [xml] Possible loss of TEXT node with htmlParseDoc
  - From: =?iso-8859-1?Q?Fr=E9d=E9ric?= Gicquel

References:
- [xml] Possible loss of TEXT node with htmlParseDoc
  - From: =?iso-8859-1?Q?Fr=E9d=E9ric?= GICQUEL

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]