RE: [xml] RAW && NXT with strncmp()



[CC'd reply back to the list because it looks like Igor sent me alone]

On Thu, 2 Oct 2003, Igor Izvarin wrote:

Hi community,

Good chat!!!

At the end of this message I see the question #2: how can I be sure that
during this long check all the keyword is presented in the buffer?

For example: if I check (from the parser.c)
If ((RAW == 'S') && (NXT(1) == 'Y') && (NXT(2) == 'S') && (NXT(3) ==
'T') && (NXT(4) == 'E') && (NXT(6) == 'M') {
....
}
How can I be sure that buffer ctxt->input->cur contains all the chars
'S', 'Y', 'S', 'T', 'E', 'M'? And not only some of them and then buffer
ends!??

It looks like a potential bug. Yes???? Or No???

Whilst I was surprised; Daniel said something in a reply which arrived an
hour or so after I posted my comment (in <20031001145439 R21529 redhat com>)
which said :

"One of the reasons I was using CUR/NXT is that I used to do some "buffer
grows if needed" code handling in it. Not the case anymore which allows
the optimisation."

which I think satisfies me that the buffer contents should be treated as
safe up to the end of the <> section (or the next ; if we're working on a
character or entity reference.

If the buffer were not guarenteed then using the push parser with one
character at a time would (probably) result in the NXT going off the end
of the buffer. A quick investigation shows a number of checks on the
available input in xmlParseTryOrFinish (eg 'if (avail < 4)...') which
would guarentee the buffer to be safe within certain specific limits, as
Daniel stated in the cited posting (and the comment around the definition
of NXT and CUR - 'one often need to make assumption on the context to use
them'.

However, as I was looking at this section I noticed a slightly odd thing -
in XML_PARSER_START state the parser could conceivably read off the end of
the buffer if the character set was initially set and only a single byte
of data was passed. The clause in xmlParseTryOrFinish which checks
XML_PARSER_START checks for 1 byte being available (outside the state
check) then for 4 bytes if there is no encoding known yet. Iff the charset
had been set by the application then the code would roll on without
checking that sufficient data was present.

If there are other checks present which guarentee that this condition
cannot occur, I cannot see them (although I've only checked the path very
quickly).

The following patch against libxml2-2.5.11 should fix this, I think...

--- parser.c    Tue Sep  9 11:02:47 2003
+++ parser-jf.c Thu Oct  2 16:24:10 2003
@@ -8438,6 +8438,8 @@
                    break;
                }

+               if (avail < 2)
+                   goto done;
                cur = ctxt->input->cur[0];
                next = ctxt->input->cur[1];
                if (cur == 0) {


There may be others like this around the same area but I couldn't see any
barring that one, but I hope that's helpful.

-----Original Message-----
From: xml-admin gnome org [mailto:xml-admin gnome org] On Behalf Of
Justin Fletcher
Sent: Wednesday, October 01, 2003 10:38 PM
To: xml gnome org
Subject: Re: [xml] RAW && NXT with strncmp()


On Wed, 1 Oct 2003, Daniel Veillard wrote:

On Wed, Oct 01, 2003 at 09:55:53AM -0600, Steve Williams wrote:
Hi,

[snip - stuff about optimisation in parser.c with reference to the access
to buffer values]


I'm slightly surprised that there aren't any extent checks to see
whether the NXT runs off the end of the buffer, but I guess that there
is a guarentee that the data is sufficiently 'complete' within these
blocks.

[snip]

-- 
Gerph {djf0-.3w6e2w2.226,6q6w2q2,2.3,2m4}
URL: http://www.movspclr.co.uk/
... Yet more stuff happens.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]