Re: How to deal with different encodings ?



Thanks for the pointers. If someone can implement them in C# or at the
worst find a C library than we can give it a shot.

the linux "file" command version 4: ftp://ftp.astron.com:21/pub/file/file-4.24.tar.gz
uses a library libmagic (magic.h)
Lib and h file is default installed on my FC6

There are some caveats though,
....
- This works mostly for data, but for metadata which are say 20
character strings, there is no way other than to have a default
encoding.
FYI: The test in previous response is generated on test files with only 4 words

The file man page only takes about: ASCII, ISO-8859-x, UTF-8, extended-ASCII, UTF-16 and EBCDIC.

Shift-JIS, EUC-JP, GB2312, Big5, EUC-TW, EUC-KR, ISO2022-XX, and HZ is not mentioned

/Karsten

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]