Re: [xml] HTML Parser

Hi all,
        Hi Marco

I'd like to use libxml2 HTML Parser to parse web page and extract information.

Reading docs I see that the method htmlParseFile accepts two parameters: file to parse and the encoding. But 
I can't know the web page encoding before parsing it.

If I pass null, does libxml2 discover the web page encoding?

        AFAIK, no. W3C HTML Recommendation ( recommends authors to
specify an encoding. What browsers do is a "guess up" comparing
page charset against a little internal database.
        If you *really* need to discover it, you can do something
like stripping out HTML tags and try to figure out content encoding...


Lucas Brasilino
brasilino recife pe gov br
Emprel -        Empresa Municipal de Informatica (pt_BR)
                Municipal Computing Enterprise (en_US)
Recife - Pernambuco - Brasil
Fone: +55-81-34167078

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]