[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] libxml2 add <p> tag



On Mon, Oct 07, 2002 at 10:08:56AM +0200, Morus Walter wrote:
> Hi,
> > 
> > I've found the following problem with libxml2 2.4.16
> > 
> > [manu xx yy]$ cat /tmp/test.html 
> > <gsweb name="UploadFile">Upload</gsweb>/<gsweb name="ProcessFile">Processing</gsweb>
> > 
> > [manu xx yy]$ xmllint  /tmp/test.html  --html
> > /tmp/test.html:1: error: Tag gsweb invalid
> > <gsweb name="UploadFile">Upload</gsweb>/<gsweb name="ProcessFile">Processing</g
> >                        ^
> > /tmp/test.html:1: error: Tag gsweb invalid
> > <gsweb name="UploadFile">Upload</gsweb>/<gsweb name="ProcessFile">Processing</g
> >                                                                 ^
> > <?xml version="1.0" standalone="yes"?>
> > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
> > <html><body><gsweb name="UploadFile">Upload</gsweb><p>/<gsweb name="ProcessFile">Processing</gsweb></p></body></html>
> >                                                     ^
> > ____________________________________________________|
> > 
> > 
> > I get the same thing with a file containing:
> > <?xml version="1.0" standalone="yes"?>
> > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
> > <html><body><a name="UploadFile">Upload</a>/<a name="ProcessFile">Processing</a></body></html>
> > 
> > 
> > I've found nothing about this problem in the list. Is it a fixed bug or how can I avoid this <p> addition ?
> > 
> This is a characteristic of libxmls html parser.
> The parser has a built in "knowledge" about allowed structures and adds
> paragraphes if it finds content in contexts where it "thinks" they were
> required.
> It might be considered a bug, that the paragraph is not inserted before
> the first <a>-element in your second example (the p insertion seems to 
> be triggered by PCDATA content and not by inline elements like <a>,
> which is not to consequent). 
> OTOH tag soup parsing is always a mess...
> And there no way do guess the right behaviour for user created elements
> like gsweb.
> 
> Apart from modifying libxml, there is AFAIK no way of preventing the
> <p> addition in the html parser.

  Yup, there is an associated bug report which I didn't closed though
it's in fixed state:
  http://bugzilla.gnome.org/show_bug.cgi?id=87235

  I should probably apply your last suggestion, though it would change
the output of nearly all regression tests,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]