Re: [xml] encoding question..

On Mon, Sep 20, 2004 at 01:58:38PM +0300, Manos Moschous wrote:

i do

$ ./testHTML.exe --encode ISO-8859-7 htmlSites/index.html
htmlSites/index.html:12: error: Input is not proper UTF-8, indicate encoding

  that mean that doc contains non-UTF-8 strings.

the index.html is the greek version of with

what am i have to do to parse the html document normally?

  When saving to a file for example you lost the HTTP headers which
may indicate the encoding of the original.

  Anyway parsing google HTML is pointless, there is a direct xml interface.
Jumping on code and asking question every 5 minutes is not the proper way
to build right code. You should look a bit more globally and read about the
issues first. I pointed you at the encoding page already ! Please refrain
from asking at every little step in the way, read the docs, and think !


Daniel Veillard      | Red Hat Desktop team
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]