Re: auto-upgrading strings to utf8




On Jun 5, 2005, at 9:48 AM, Quentin wrote:

Ideally the glib filename functions should be fixed. Quite a few glib
based programs have had problems with filenames, which become utf8 even
though the locale is say iso-8859-15.

I think we should provide a filename helper of some sort. Either as a
function that take a perl string and returns a filename suited for the
locale or perhaps handling the convertion in the the wrappers for the
functions that access the filesystem.

There is already Glib->filename_from_unicode,
The problem here is that gstreamer put a filename in a glib object
property of type string, and all string properties are auto- upgraded to
utf8.
The function called is $source->set(location => $file);
 which is a generic glib function to set a property.

To clarify:

   # filename from command line is in proper filename encoding.
   $filename = $ARGV[$n];

   $source->set (location => $filename);

invokes Glib::Object::set(), which contains something like this:

     foreach key/val pair:
         SV * key = ST (i);   # ith item from the stack, "location"
         SV * val = ST (i+1);  # i+1th item from the stack, $filename

         # look up the "location" property on this class to find the
         # value type
         pspec = g_object_class_find_property (class, key);

         # initialize the GValue container to hold that type:
         g_value_init (gvalue, G_PARAM_SPEC_VALUE_TYPE (pspec));

         # the location property is G_TYPE_STRING, so the GValue
         # is now prepared to hold gchar* strings.

         # marshal the SV into the GValue:
         gperl_value_from_sv (gvalue, val);
         # this function contains a great big switch on GType, and
         # for G_TYPE_STRING, it does this:
               g_value_set_string (value, SvGChar (sv));
         # SvGChar() upgrades the sv to utf8.
         # now the GValue contains a utf8-encoded version of
         # $filename, which isn't what we actually need.


The GstFileSrcElement's set_property handler does nothing special to the string -- it just copies it, and then later passes it, unaltered to open(). So, it expects the string to have been a valid filename.

But since it was passed through a G_TYPE_STRING property, and we consider G_TYPE_STRING to mean "utf8 text", we mangled it.


I can think of three fixes for this:

a) if it's the case that a G_TYPE_STRING really is supposed to be utf8, then GstFileSrcElement is broken, and should do something like g_filename_from_utf8() on the string it gets from the location property. the bindings would have to do nothing. this risks breaking C programs, but GStreamer is still at a nonstable version...

b) turn off auto-upgrading, and push the burden of ensuring utf8-ness of text onto perl developers. this risks breaking lots of existing code.

c) add infrastructure to the bindings to allow per-property overrides for marshaling. this would slow down the general case and take up even more memory (another hash table and lookup per property), but would allow problems like this (and G_TYPE_POINTER properties) to be fixed.


--
Holy crap, dude, we have kids!
    -- Elysse, six days after giving birth to twins




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]