RE: [xml] How do I read German Umlaute - entities from anXML-File using libxml?



You should also be able to use the routine xmlUTF8Strpos(utf, 1) to get the 
position of the following character, instead of re-inventing the wheel. :)

Bill Moseley wrote:

So I should be able to pass the string to xmlUTF8Strlen() to get the 
length
of the UTF-8 character, skip that many bytes, and pass it back to
UTF8Toisolat1 to continue processing.  Sound right?

No, you confuse two things here.

First, xmlUTF8Strlen() only knows the UTF-8 encoding. It knows nothing
about Latin-1 or other encodings. As your string is a valid UTF-8
encoded string, xmlUTF8Strlen() will return the length of the entire
string -- not the number of characters which cannot be encoded in
Latin-1, which would be necessary for your scheme to work.

Second, xmlUTF8Strlen() returns the number of characters, not the number
of bytes.

What you need to do is to skip ahead one UTF-8 encoded character yourself,
and then continue processing. Have a look at the code for xmlUTF8Strlen()
to get inspiration on how to skip one UTF-8 encoded characters.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]