Re: [xml] Potential problem with 2.7.4



On Tue, Sep 15, 2009 at 10:09:15PM +0200, Mike Hommey wrote:
On Tue, Sep 15, 2009 at 08:28:19PM +0200, Daniel Veillard wrote:
On Tue, Sep 15, 2009 at 07:57:17PM +0200, Mike Hommey wrote:
On Tue, Sep 15, 2009 at 07:43:54PM +0200, Daniel Veillard wrote:
  Some application don't deliver sufficient data at start of parsing
conflicting with the fix I made for 

  https://bugzilla.gnome.org/show_bug.cgi?id=566012 

I commited a fix which solves the problem while preserving the EBCDIC
parsing fix:

http://git.gnome.org/cgit/libxml2/commit/?id=9d3d141c412baa5c713ad3df48f1a4d179e07b05

+     * than just the first line, unless the amount of data is really
+     * too small to hold "<?xml version="1.0" encoding="foo"
      */
+    if ((ctxt->input->end - ctxt->input->cur) < 35) {
+       GROW;
+    }

Can't there be another similar problem later in the code when that GROW
still doesn't get enough to hold "<?xml version="1.0" encoding="foo" ?

  It's a trade-off, basically the first line is nasty because you have
to guess an encoding and only at the end of the first line you can
actually be sure you use the right decoder. If you push too agressively
the risk is to corrupt the first bytes of the content as 566012
demonstrated.

  basically if in 2 calls to GROW you didnt got enough for the first
line well yes you're at risk, but IMHO the client side is fairly broken
then.

But there's nothing that needs the input callback to give back enough
input. It could even give back data a byte at a time. Wouldn't a better
solution to revert the current fix and to have xmlParserInputBufferGrow
actually fill len bytes by looping over in->readcallback until the
callback either gives enough data or says there's nothing more ?

Something like the attached patch seems to work (at least, inkscape starts
up, and the test suite passes[1]), and from my POV, should be safe once and
for all, whatever xmlIO consumers are doing with their callbacks, be it
stupid or not.

Mike

1. it passes as well as plain git, which fails on some tests, as you
can see below:
## XML regression tests
## XML regression tests on memory
## XML entity subst regression tests
## XML Namespaces regression tests
## xml:id regression tests
## Error cases regression tests
## Error cases stream regression tests
## Running the API regression tests this may take a little while
Total: 1162 functions, 291389 tests, 0 errors
## Reader regression tests
## Reader on memory regression tests
## Walker regression tests
## Reader entities substitution regression tests
## SAX1 callbacks regression tests
## SAX2 callbacks regression tests
## XML push regression tests
## HTML regression tests
## Push HTML regression tests
## HTML SAX regression tests
## Push HTML SAX regression tests
## Valid documents regression tests
## Validity checking regression tests
## General documents valid regression tests
## URI module regression tests
## Pattern regression tests
## XPath regression tests
## XPointer regression tests
## XInclude regression tests
## XInclude xmlReader regression tests
## C14N and XPath regression tests
./test/c14n/1-1-without-comments/xmlspace-prop-1.xml:9: parser warning : Invalid value "true" for xml:space : 
"default" or "preserve" expected
     <ietf:e1 xml:space="true">
                              ^
./test/c14n/1-1-without-comments/xmlspace-prop-2.xml:9: parser warning : Invalid value "true" for xml:space : 
"default" or "preserve" expected
     <ietf:e1 xml:space="true">
                              ^
./test/c14n/1-1-without-comments/xmlspace-prop-3.xml:9: parser warning : Invalid value "true" for xml:space : 
"default" or "preserve" expected
     <ietf:e1 xml:space="true">
                              ^
./test/c14n/1-1-without-comments/xmlspace-prop-4.xml:9: parser warning : Invalid value "true" for xml:space : 
"default" or "preserve" expected
     <ietf:e1 xml:space="true">
                              ^
## Scripts regression tests
## Some of the base computations may be different if srcdir != .
set3 result
1c1 < ./test/scripts/set3.xml:1: parser warning : xmlns: URI bar is not absolute --- > 
./test/scripts/set3.xml:1: namespace warning : xmlns: URI bar is not absolute
## Catalog regression tests
## Add and del operations on XML Catalogs
## Regexp regression tests
bug316338 result
3c3 < C 433 12: Fail --- > C 433 12: Ok
## Formal expresssions regression tests
## Automata regression tests
## Schemas regression tests
any4_0_0 result
1d0 < ./test/schemas/any4_0.xml validates 0a1 > ./test/schemas/any4_0.xsd:14: element complexType: Schemas 
parser error : local complex type: The content model is not determinist.
diff: ./result/schemas/regexp-char-ref_0_0.err: No such file or directory
diff: ./result/schemas/regexp-char-ref_1_0.err: No such file or directory
## Relax-NG regression tests
## Relax-NG streaming regression tests
## Schematron regression tests
## Threaded regression tests
/bin/sh: line 2: 15846 Segmentation fault      ./testThreads
## generating dba100000.xml
## Timing tests to try to detect performance
## as well a memory usage breakage when streaming
## 1/ using the file interface
## 2/ using the memory interface
## 3/ repeated DOM parsing
## 4/ repeated DOM validation
Parsing took 265 ms
Parsing took 261 ms
100 iterations took 348 ms
100 iterations took 784 ms
## Module tests
Success!
make[1]: Entering directory `/tmp/libxml2/doc/examples'
## examples regression tests
4c4
<   <list xml:base="sql://select_name_from_people"><people>a</people><people>b</people></list>
---
  <list><people>a</people><people>b</people></list>
make[1]: Leaving directory `/tmp/libxml2/doc/examples'

Attachment: diff
Description: Text document



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]