[xml] htmlParser questions



Hello all,
First of all I'd like to say that I'm a new member to this list. Started working with libxml2 3 days ago and so far I'm very happy with it. I'm working with it under windows using Borland's C++ Builder.
 
My main use is with the htmlParser. I'm processing thousands of html pages and running them with libxslt to get my desired output. Now I have a few problems here that I hope you could help me with:
 
1) Right now I'm simply using htmlParseDoc with "encoding=NULL" to build the tree I need for the xsl engine. This function gives me a well-formed tree but not valid at all, I wanted to know if there's an option to use the htmlParser to build also a valid document.
 
2) Is there anyway to speed up the work of htmlParser? I'm not using any options and only calling htmlParseDoc. The thing that worries me is that I've also tested a seperate library called HtmlAgilityPack which is managed code and it processes a html file faster than the libxml's html parser AND outputs a well-formed+valid tree. From my tests libxml has an amazing performance on xml and xsl files so I don't understand how a managed and marshalled code can work better and faster. I must be doing something wrong, maybe the htmlParser is not intended for valid trees which is also fine by me but I'd like it atleast to be faster.
 
I really hope to get some answers. I fell in love with this tool and I want to use it but performance is my main issue here and I'd hate to use alternatives.
 
Thank you very much
Liron


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]