Re: dealing with utf8 filenames



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2/22/2006 10:47 AM, Christian Neumair wrote:
Am Mittwoch, den 22.02.2006, 08:17 -0800 schrieb Alan M. Evans:
On Wed, 2006-02-22 at 07:59, Christian Neumair wrote:
For the sake of readability, I'd rather use the following code:

char **str;

/* str[0]: basename
   str[1]: extension */
str = g_strsplit (filename, ".", 2);

g_strfreev(str)
Surely that won't work if there is more than one dot in the name!

Why? :)

Well, it'll "work" in the sense that it will return valid data in the
first two array indices of 'str'.

However, the first dot you find doesn't necessarily mean the rest of the
file name is the extension.  For example:  "Letter to Mr. Johnson.odf".
 Your scheme is equivalent to:

str[0] = "Letter to Mr";
str[1] = " Johnson.odf";

To get back to the original point of this discussion, about finding file
extensions in UTF-8-encoded filenames, what's wrong with this:

gchar *filename = get_file_name();  /* or whatever */
gchar *ext = g_strrstr(filename, ".");
if(ext) {
    ++ext;
    do_whatever();
}

IIRC, you can easily search for ASCII substrings in UTF-8 strings
without having to care that the string is UTF-8.

Now, if you need to match non-ASCII extensions, you'd need to do a
casefold on 'ext' and then match based on that.  If you're just matching
"xci" as in your example, you shoudln't need to bother: just a simple
g_ascii_strcasecmp(ext, "xci") should do the trick.  If 'ext' is
anything other than "xci\0", then it won't match.

        -brian

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)

iD8DBQFD/LY/6XyW6VEeAnsRAuoCAKDWcMJWSkgi8VaNZip81fquyaxYBgCfeJcj
GIPx7ged6iP9azMXn7eKGOo=
=zAr9
-----END PGP SIGNATURE-----



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]