Re: [xml] encoding question..



On Mon, Sep 20, 2004 at 01:58:38PM +0300, Manos Moschous wrote:
Hi,

i do

$ ./testHTML.exe --encode ISO-8859-7 htmlSites/index.html
htmlSites/index.html:12: error: Input is not proper UTF-8, indicate encoding
!

  that mean that doc contains non-UTF-8 strings.

the index.html is the greek version of www.google.com.gr with
encoding-type:ISO-8859-7

what am i have to do to parse the html document normally?

  When saving to a file for example you lost the HTTP headers which
may indicate the encoding of the original.

  Anyway parsing google HTML is pointless, there is a direct xml interface.
Jumping on code and asking question every 5 minutes is not the proper way
to build right code. You should look a bit more globally and read about the
issues first. I pointed you at the encoding page already ! Please refrain
from asking at every little step in the way, read the docs, and think !

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]