Re: PATCH] sms_decode_text(): Sanitize 8-bit data so that it is UTF8-clean.



On Tue, 2011-09-27 at 14:55 -0400, Nathan Williams wrote:
> 
> 
> On Tue, Sep 27, 2011 at 2:18 PM, Dan Williams <dcbw redhat com> wrote:
>         On Mon, 2011-09-26 at 18:29 -0400, Nathan Williams wrote:
>         > This keeps ModemManager from crashing deep in the DBus
>         libraries when
>         > a SMS Get() or List() DBus operation finds a message that
>         isn't valid
>         > UTF-8 and/or has embedded NUL characters.
>         >
>         > I'll be putting up a separate patch as a proposal for how to
>         avoid
>         > this problem in the new API.
>         
>         
>         Sounds fine; though in general we know the encoding that the
>         message
>         comes in with, and we know we need to convert to UTF-8 for
>         D-Bus (and
>         really, everything should be UTF-8 at the boundaries, it would
>         be just
>         horrid to expose any charset encoding details to clients and I
>         don't
>         think we have to).  So we should be able to convert to UTF-8
>         without any
>         real loss of fidelity when reading the  message from the
>         modem, and we
>         should be able to convert from UTF-8 to a suitable charset
>         (whatever
>         we've selected from CSCS) when sending messages too.
>         
>         In what cases would we want to send or receive essentially
>         binary data
>         via SMS?  AFAIK most of these cases show up as base64 or
>         hex-string SMS
>         if they aren't intended for human consumption.
>         
> 
> We do do that conversion to UTF-8 when we know the transmission
> character set, GSM-7 or UCS2. The one fly in this ointment is that one
> of the possible encodings is, in fact, "8-bit data" (TP-DCS value of
> 04 or f4) with no associated character set. The particular case that
> brought this to my attention was a test SMS from a carrier that was
> supposed to contain, I believe, a polyphonic ringtone for some Nokia
> handset.

Ok, I suppose we could also expose the data as a byte array in the Get()
method call along with the 'text' argument.  Since it seems like we can
probably tell whether it's supposed to be a string or not.

Dan





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]