Re: Exposing get_filename_charset



On Mon, 2004-11-01 at 10:21 +0100, Alexander Larsson wrote:
> On Sun, 2004-10-31 at 20:19 +0000, Tor Lillqvist wrote:
> > Alexander Larsson writes:
> >  > The local files part of nautilus_file_get_display_name currently goes
> >  > like:
> > 
> > If I understood that code correctly, it patches the display name
> > together from valid UTF-8 snippets in the string and question marks? I
> > think instead of question marks it would be more useful to use
> > something like g_strescape() of the whole string. I don't think file
> > names that are partially in UTF-8 and partially in something else
> > occur very often, so using the portions of the string that happen to
> > be valid UTF-8 as such is probably wrong, and it would be better to
> > just output all of the non-ASCII bytes in octal or hex.
> >
> > Would this be OK:
> 
> Not trying to convert from various likely encodings makes this fail for
> many common cases. The eel fallback code with question marks might not
> be ideal, but in reality it isn't hit that often. I'm not sure that
> showing escaped characters in the user interface is any better though.


Alex, does this look like a reasonable first attempt ? 
Based on the nautilus code you posted earlier, and only very
superficially tested:

static gchar *
make_valid_utf8 (const gchar *name)
{
  GString *string;
  const gchar *remainder, *invalid;
  gint remaining_bytes, valid_bytes;
  
  string = NULL;
  remainder = name;
  remaining_bytes = strlen (name);
  
  while (remaining_bytes != 0) 
    {
      if (g_utf8_validate (remainder, remaining_bytes, &invalid)) 
	break;
      valid_bytes = invalid - remainder;
    
      if (string == NULL) 
	string = g_string_sized_new (remaining_bytes);

      g_string_append_len (string, remainder, valid_bytes);
      g_string_append_c (string, '?');
      
      remaining_bytes -= valid_bytes + 1;
      remainder = invalid + 1;
    }
  
  if (string == NULL)
    return g_strdup (name);
  
  g_string_append (string, remainder);
  g_string_append (string, " (invalid encoding)");

  g_assert (g_utf8_validate (string->str, -1, NULL));
  
  return g_string_free (string, FALSE);
}

gchar *
g_filename_display_name (const gchar *filename)
{
  gint i;
  const gchar **charsets;
  gchar *display_name = NULL;
  gboolean is_utf8;
 
  is_utf8 = g_get_filename_charsets (&charsets);

  if (is_utf8)
    {
      if (g_utf8_validate (filename, -1, NULL))
	display_name = g_strdup (filename);
    }
  
  
  if (!display_name)
    {
      /* Try to convert from the filename charsets to UTF-8.
       * Skip the first charset if it is UTF-8.
       */
      for (i = is_utf8 ? 1 : 0; charsets[i]; i++)
	{
	  display_name = g_convert (filename, -1, "UTF-8", charsets[i], 
				    NULL, NULL, NULL);

	  if (display_name)
	    break;
	}
    }
  
  /* if all conversions failed, we replace invalid UTF-8
   * by a question mark
   */
  if (!display_name) 
    display_name = make_valid_utf8 (filename);

  return display_name;
}





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]