[gjs: 6/16] byte array: Fix endianness in toString()

From: Cosimo Cecchi <cosimoc src gnome org>
To: commits-list gnome org
Cc:
Subject: [gjs: 6/16] byte array: Fix endianness in toString()
Date: Sat, 13 Oct 2018 21:27:30 +0000 (UTC)
commit 8c030d42110a5fccb832c54a2d6d6141a608b000
Author: Philip Chimento <philip chimento gmail com>
Date:   Sun Sep 16 20:25:38 2018 -0700

    byte array: Fix endianness in toString()
    
    The reason why there was a byte order mark (BOM) in the encoded string
    was not due to GLib, but instead due to iconv. We specified the encoding
    "UTF-16", and iconv will in that case write bytes with a system-default
    UTF-16 endianness, starting with a BOM to tell you which one it chose.
    
    To make matters worse, this iconv default is little-endian on Linux and
    big-endian on Mac (despite the processor being little-endian! Maybe this
    is for compatibility with files from the m68k days.) So, on macOS the
    "abcd" test was coming out completely garbled.
    
    The correct thing to do, since we are interpreting the bytes as a
    char16_t array in memory, is to make sure that iconv writes them into
    memory in the processor's endianness, not the system's default UTF-16
    endianness.
    
    This fixes a regression from !232 on macOS (and probably on all
    big-endian machines as well.)

 gjs/byteArray.cpp                   | 14 ++++++++++----
 installed-tests/js/testByteArray.js |  5 ++---
 2 files changed, 12 insertions(+), 7 deletions(-)
---
diff --git a/gjs/byteArray.cpp b/gjs/byteArray.cpp
index 2dc6faf9..af238537 100644
--- a/gjs/byteArray.cpp
+++ b/gjs/byteArray.cpp
@@ -83,10 +83,16 @@ static bool to_string_impl(JSContext* context, JS::HandleObject byte_array,
         GError *error;
 
         error = NULL;
-        GjsAutoChar u16_str =
-            g_convert(reinterpret_cast<char*>(data), len, "UTF-16", encoding,
-                      nullptr, /* bytes read */
-                      &bytes_written, &error);
+        GjsAutoChar u16_str = g_convert(reinterpret_cast<char*>(data), len,
+        // Make sure the bytes of the UTF-16 string are laid out in memory
+        // such that we can simply reinterpret_cast<char16_t> them.
+#if G_BYTE_ORDER == G_LITTLE_ENDIAN
+                                        "UTF-16LE",
+#else
+                                        "UTF-16BE",
+#endif
+                                        encoding, nullptr, /* bytes read */
+                                        &bytes_written, &error);
         if (u16_str == NULL) {
             /* frees the GError */
             gjs_throw_g_error(context, error);
diff --git a/installed-tests/js/testByteArray.js b/installed-tests/js/testByteArray.js
index fd3fd610..3c537603 100644
--- a/installed-tests/js/testByteArray.js
+++ b/installed-tests/js/testByteArray.js
@@ -47,9 +47,8 @@ describe('Byte array', function () {
     it('can be converted to a string of encoded characters even with a 0 byte', function () {
         const a = Uint8Array.of(97, 98, 99, 100, 0);
         const s = ByteArray.toString(a, 'LATIN1');
-        expect(s.length).toEqual(5);
-        expect(s).toEqual('\uFEFFabcd');
-        // GLib puts a BOM in the string, who knows why
+        expect(s.length).toEqual(4);
+        expect(s).toEqual('abcd');
     });
 
     it('stops converting to a string at an embedded 0 byte', function () {
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]