[xml] Patch to fix ICU flush and pivot buffer



Hi,

The chromium team have recently detected a fuzz-testing bug in libxml / ICU where UTF8 chars can be decoded incorrectly.  See http://crbug.com/722420.

The root cause of this problem is that libxml is calling ICU ucnv_convertEx with incorrect params.  It is always setting flush to TRUE.  This param should only be set to true for the last call when reading an input.

Also, when calling ucnv_convertEx multiple times (with flush=FALSE), the caller must provide a pivot buffer which is maintained between calls.

This patch fixes those issues.  The patch includes test/icu_parse_test.xml to reproduce the error.

./configure --with-icu --with-iconv=no
make runtest
./runtest

The test sets encoding to "UTF8-".  This encoding is chosen since it is not recognized by libxml and forces the decoding to be done by ICU which recognizes this encoding as UTF8.  Unfortunately, the test always fails when using iconv since iconv does not recognize this encoding.  Since iconv is the default in libxml, I understand that it may not make sense to always run this testcase, or to include it at all.

This patch has been reviewed by Markus Scherer (maintainer of ICU), and Jungshik in the chromium team who worked on the original integration of libxml and ICU in 2007.  http://crosreview.com/729616.  The patch is ready to submit against the chromium local copy of libxml, but it is preferable to have this accepted into libxml if you are happy with it, and we can then take the patch from libxml and stay in sync.

If it is any help, I can send this patch as a pull request to the libxml github repo, or I can also create a libxml bug.  It looks like libxml preference is to take patches on the mailing list.

Thanks,
Joel

Attachment: libxml.icu.flush.pivot.patch
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]