[xml] Another encoding problem when not using iconv



There is a problem in the UTF8ToUTF16xx conversion routines in encoding.c. The last parameter is a pointer to 
the length of the input, which xmlCharEncOutFunc expects to be modified to be the number of characters in the 
input that have been processed. Other routines such as UTF8Toisolat1 do this but the UTF8ToUTF16xx routines 
seem to set it to 0. This means that xmlCharEncOutFunc doesn't remove the processed input from the input 
buffer.

The problem can be verified by running the attached xml document (converted to utf-16 from slashdot.xml in 
the test suite) through a version of xmllint built without iconv - it never ends!

A patch is attached containing changes for UTF8ToUTF16LE and UTF8ToUTF16BE.


-----------------------------------------
Email provided by http://www.ntlhome.com/

ÿþ<?xml version="1.0" encoding="utf-16"?>
<ultramode>
 <story>
    <title>100 Mbit/s on Fibre to the home</title>
    
<url>http://slashdot.org/articles/99/06/06/1440211.shtml</url>
    <time>1999-06-06 14:39:59</time>
    <author>CmdrTaco</author>
    <department>wouldn't-it-be-nice</department>
    <topic>internet</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>Gimp 1.2 Preview</title>
    
<url>http://slashdot.org/articles/99/06/06/1438246.shtml</url>
    <time>1999-06-06 14:38:40</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>gimp</topic>
    <comments>12</comments>
    <section>articles</section>
    <image>topicgimp.gif</image>
  </story>
 <story>
    <title>Sony's AIBO robot Sold Out</title>
    
<url>http://slashdot.org/articles/99/06/06/1432256.shtml</url>
    <time>1999-06-06 14:32:51</time>
    <author>CmdrTaco</author>
    <department>stuff-to-see</department>
    <topic>tech</topic>
    <comments>10</comments>
    <section>articles</section>
    <image>topictech2.jpg</image>
  </story>
 <story>
    <title>Ask Slashdot: Another Word for 
"Hacker"?</title>
    
<url>http://slashdot.org/askslashdot/99/06/05/1815225.shtml</url>
    <time>1999-06-05 20:00:00</time>
    <author>Cliff</author>
    <department>hacker-vs-cracker</department>
    <topic>news</topic>
    <comments>385</comments>
    <section>askslashdot</section>
    <image>topicnews.gif</image>
  </story>
  <story>
    <title>100 Mbit/s on Fibre to the home</title>
    
<url>http://slashdot.org/articles/99/06/06/1440211.shtml</url>
    <time>1999-06-06 14:39:59</time>
    <author>CmdrTaco</author>
    <department>wouldn't-it-be-nice</department>
    <topic>internet</topic>
    <comments>20</comments>
    <section>articles</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>Gimp 1.2 Preview</title>
    
<url>http://slashdot.org/articles/99/06/06/1438246.shtml</url>
    <time>1999-06-06 14:38:40</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>gimp</topic>
    <comments>12</comments>
    <section>articles</section>
    <image>topicgimp.gif</image>
  </story>
 <story>
    <title>Sony's AIBO robot Sold Out</title>
    
<url>http://slashdot.org/articles/99/06/06/1432256.shtml</url>
    <time>1999-06-06 14:32:51</time>
    <author>CmdrTaco</author>
    <department>stuff-to-see</department>
    <topic>tech</topic>
    <comments>10</comments>
    <section>articles</section>
    <image>topictech2.jpg</image>
  </story>
 <story>
    <title>Ask Slashdot: Another Word for 
"Hacker"?</title>
    
<url>http://slashdot.org/askslashdot/99/06/05/1815225.shtml</url>
    <time>1999-06-05 20:00:00</time>
    <author>Cliff</author>
    <department>hacker-vs-cracker</department>
    <topic>news</topic>
    <comments>385</comments>
    <section>askslashdot</section>
    <image>topicnews.gif</image>
  </story>
<story>
    <title>Corel Linux FAQ</title>
    
<url>http://slashdot.org/articles/99/06/05/1842218.shtml</url>
    <time>1999-06-05 18:42:06</time>
    <author>CmdrTaco</author>
    <department>stuff-to-read</department>
    <topic>corel</topic>
    <comments>164</comments>
    <section>articles</section>
    <image>topiccorel.gif</image>
  </story>
 <story>
    <title>Upside downsides MP3.COM.</title>
    
<url>http://slashdot.org/articles/99/06/05/1558210.shtml</url>
    <time>1999-06-05 15:56:45</time>
    <author>CmdrTaco</author>
    <department>stuff-to-think-about</department>
    <topic>music</topic>
    <comments>48</comments>
    <section>articles</section>
    <image>topicmusic.gif</image>
  </story>
 <story>
    <title>2 Terabits of Bandwidth</title>
    
<url>http://slashdot.org/articles/99/06/05/1554258.shtml</url>
    <time>1999-06-05 15:53:43</time>
    <author>CmdrTaco</author>
    <department>faster-porn</department>
    <topic>internet</topic>
    <comments>66</comments>
    <section>articles</section>
    <image>topicinternet.jpg</image>
  </story>
 <story>
    <title>Suppression of cold fusion 
research?</title>
    
<url>http://slashdot.org/articles/99/06/04/2313200.shtml</url>
    <time>1999-06-04 23:12:29</time>
    <author>Hemos</author>
    <department>possibly-probably</department>
    <topic>science</topic>
    <comments>217</comments>
    <section>articles</section>
    <image>topicscience.gif</image>
  </story>
 <story>
    <title>California Gov. Halts Wage Info 
Sale</title>
    
<url>http://slashdot.org/articles/99/06/04/235256.shtml</url>
    <time>1999-06-04 23:05:34</time>
    <author>Hemos</author>
    <department>woo-hoo!</department>
    <topic>usa</topic>
    <comments>16</comments>
    <section>articles</section>
    <image>topicus.gif</image>
  </story>
 <story>
    <title>Red Hat Announces IPO</title>
    
<url>http://slashdot.org/articles/99/06/04/0849207.shtml</url>
    <time>1999-06-04 19:30:18</time>
    <author>Justin</author>
    <department>details-sketchy</department>
    <topic>redhat</topic>
    <comments>155</comments>
    <section>articles</section>
    <image>topicredhat.gif</image>
  </story>
</ultramode>
# diff -c encoding_orig.c encoding.c
*** encoding_orig.c     Wed Jul 30 14:45:20 2003
--- encoding.c  Wed Jul 30 14:47:14 2003
***************
*** 824,829 ****
--- 824,830 ----
  {
      unsigned short* out = (unsigned short*) outb;
      const unsigned char* processed = in;
+     const unsigned char *const instart = in;
      unsigned short* outstart= out;
      unsigned short* outend;
      const unsigned char* inend= in+*inlen;
***************
*** 858,864 ****
        else if (d < 0xC0) {
            /* trailing byte in leading position */
          *outlen = (out - outstart) * 2;
!         *inlen = processed - in;
          return(-2);
        } else if (d < 0xE0)  { c= d & 0x1F; trailing= 1; }
        else if (d < 0xF0)  { c= d & 0x0F; trailing= 2; }
--- 859,865 ----
        else if (d < 0xC0) {
            /* trailing byte in leading position */
          *outlen = (out - outstart) * 2;
!         *inlen = processed - instart;
          return(-2);
        } else if (d < 0xE0)  { c= d & 0x1F; trailing= 1; }
        else if (d < 0xF0)  { c= d & 0x0F; trailing= 2; }
***************
*** 866,872 ****
        else {
        /* no chance for this in UTF-16 */
        *outlen = (out - outstart) * 2;
!       *inlen = processed - in;
        return(-2);
        }
  
--- 867,873 ----
        else {
        /* no chance for this in UTF-16 */
        *outlen = (out - outstart) * 2;
!       *inlen = processed - instart;
        return(-2);
        }
  
***************
*** 920,926 ****
        processed = in;
      }
      *outlen = (out - outstart) * 2;
!     *inlen = processed - in;
      return(0);
  }
  
--- 921,927 ----
        processed = in;
      }
      *outlen = (out - outstart) * 2;
!     *inlen = processed - instart;
      return(0);
  }
  
***************
*** 1035,1040 ****
--- 1036,1042 ----
  {
      unsigned short* out = (unsigned short*) outb;
      const unsigned char* processed = in;
+     const unsigned char *const instart = in;
      unsigned short* outstart= out;
      unsigned short* outend;
      const unsigned char* inend= in+*inlen;
***************
*** 1069,1075 ****
        else if (d < 0xC0)  {
            /* trailing byte in leading position */
          *outlen = out - outstart;
!         *inlen = processed - in;
          return(-2);
        } else if (d < 0xE0)  { c= d & 0x1F; trailing= 1; }
        else if (d < 0xF0)  { c= d & 0x0F; trailing= 2; }
--- 1071,1077 ----
        else if (d < 0xC0)  {
            /* trailing byte in leading position */
          *outlen = out - outstart;
!         *inlen = processed - instart;
          return(-2);
        } else if (d < 0xE0)  { c= d & 0x1F; trailing= 1; }
        else if (d < 0xF0)  { c= d & 0x0F; trailing= 2; }
***************
*** 1077,1083 ****
        else {
            /* no chance for this in UTF-16 */
          *outlen = out - outstart;
!         *inlen = processed - in;
          return(-2);
        }
  
--- 1079,1085 ----
        else {
            /* no chance for this in UTF-16 */
          *outlen = out - outstart;
!         *inlen = processed - instart;
          return(-2);
        }
  
***************
*** 1128,1134 ****
        processed = in;
      }
      *outlen = (out - outstart) * 2;
!     *inlen = processed - in;
      return(0);
  }
  
--- 1130,1136 ----
        processed = in;
      }
      *outlen = (out - outstart) * 2;
!     *inlen = processed - instart;
      return(0);
  }
  
# 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]