[xml] How to parse HTML files with ampersands in URI not encoded as "& " ?


In pretty new to this list, and guess what, sorry for my English :)

I have an issue concerning parsing HTML files with the HTMLparser API.
The web page has attributes in tags which contain URI with ampersands
not encoded as "&".
Obviously, the parser (with the HTML_PARSE_RECOVER option) returns an error:
htmlParsEntityRef: expecting ';'

The xmlDoc created lacks of many elements.

So, I would like to know if there is a way to parse such HTML files with libxml?


PS: I apology in advance if I have missed an explanation posted in the
previous posts

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]