GB18030 clarification



Hello Everyone,

I was wondering if anyone could clarify this for
me.  When converting from GB18030 multi-byte to
a wide character, what does one do with a  two-byte
sequence?

I believe with a one-byte sequence, it is a direct
assignment, and with a four-byte sequence, it is
temp_wc = ((byte1 * 10 + byte2) * 126 + byte3) * 10 + byte4;

So, what does one do with the two-byte sequence?

Secondly, converting from a wide character back to
multi-byte, anything <= 0x80 would be easy, and
converting to a four-byte resultant multi-byte would use:

result[3]=0x30+temp_wc%10;  temp_wc/=10;
result[2]=0x81+temp_wc%126; temp_wc/=126;
result[1]=0x30+temp_wc%10;  temp_wc/=10;
result[0]=0x81+temp_wc;

BUT, how do you determine that the result will be
a four-byte or a two-byte sequence?  What alogorithm
does one use to "make" the two-byte sequence?

Any insight would be _greatly_ appreciated, and if I
should post this somewhere else, let me know.

Thank You.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]