Re: How to deal with different encodings ?



I have no idea how it determines if data is  in non-UT8 encoding.
I found this with google:
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

and linux file command can be used (some times it write it is english text even if it is danish, and the MPEG detection is a fail)

# file a.txt
a.txt: ISO-8859 text, with no line terminators
# file u.txt
u.txt: MPEG ADTS, layer I, v1, 192 kBits, 48 kHz, Stereo
# file ub.txt
ub.txt: Unicode text, UTF-16, big-endian
# file u8.txt
u8.txt: Unicode text, UTF-8

/knr

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]