Re: CORBA_char vs CORBA_wchar



Hi :

Again I am replying to myself :-)

Michael and I, along with Mark McLoughlin, had a discussion about this 
today which sent us rummaging through OMG docs.

It appears that this is a non-problem, at least with ORBs that comply 
with recent OMG specs.  Once upon a time 'char' and 'string' meant ISO 
8859-1, e.g. latin chars, exclusively, but nowadays there should be no 
such assumption about narrow chars.

Of course there are several levels of char encodings going on: client, 
server, and transmission (wire) encoding.  The native encodings may be 
different from the transmission code set (TCS), and there is some 
negotiation possibly between ORBS that use different TCS conventions so 
that they can interoperate.  What mostly concerns us is there is also, 
for a given ORB, a conversion code set (CCS) or sets that the ORB can 
convert to/from.  Conversions can occur on either client or server-side, 
depending on the environment.  More detail can be found in the CORBA 2.3 
spec, section 13.7.*

When ORBs interoperate they use the IOR Multi-Component Profile 
structure to convey, among other bits of info, the servers native and 
conversion char and wchar code sets.  If the 'char' code set is not 
specified, ISO 8859-1 is assumed for backwards compatibility, but if 
UTF-8 is specified then the two ORBs should be able to correctly 
interpret narrow chars and convert appropriately to the native char 
sets, provided these native char sets are compatible with UTF-8 (which 
both Gnome/GTK+ and Java char strings are).

The spec recommends wstring/wchar for interoperability with some 
'generic' language/runtime environments, but this does not seem to apply 
to our situation.

Eventually we will need to test this - in the meantime Michael will be 
looking at implementing Multi-component profiles for ORBit2...

Corrections/clarifications are welcome!

Regards,

Bill


>Hi list:
>
>I am forwarding a reply to a conversation between Michael and myself.  
>The short background is this:
>
>The accessibility SPI needs to deal in internationalized strings.  
>However Michael believes ORBit's wchar/wstring may be broken, at the 
>least it is untested and inconvenient when one considers that the 
>strings Gnome uses internally are UTF-8.
>
>So Michael suggests using CORBA_string and CORBA_char.  This works fine 
>for ORBit since the strings are all UTF-8 anyhow, the marshalling 
>process doesn't know anything about this.  But the accessibility SPI 
>needs to interoperate with accessible Java apps and the Java ORB, and 
>Java uses UCS-2.  My concern is that the use of 'char' and 'string' for 
>UTF-8 is nonstandard in CORBA and there is no guarantee that the Java 
>ORB will correctly convert between UTF-8 and UCS-2 internally when 
>transmitting and receiving (non-wide) strings over the wire protocol.
>
>Does anyone out there have any hard info about this, or about whether 
>the use of CORBA 'string' for UTF-8 is nonstandard/broken/dangerous?
>
>Thanks,
>
>Bill

[trailing discussion removed]

------
Bill Haneman x19279
Gnome Accessibility / Batik SVG Toolkit
Sun Microsystems Ireland 





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]