Fw: [xml] HTML push interface



----- Original Message -----
From: "Nilo S. Mismetti" <nilo newpos com>
To: <xml xmlsoft org>
Sent: quinta-feira, 18 de outubro de 2001 18:08
Subject: Re: [xml] HTML push interface


Team,

From MSDN, about "fread":

"The fread function reads up to count items of size bytes from the input
stream and stores them in buffer. The file pointer associated with stream
(if there is one) is increased by the number of bytes actually read. If
the
given stream is opened in text mode, carriage return-linefeed pairs are
replaced with single linefeed characters. The replacement has no effect on
the file pointer or the return value."

This means that the "res" value counts the \r that fread zaps and the poor
"htmlParseChunk" tries to parse more characters than the ones transferred
by
fread.

One solution - Change the fread by fgets and do a strlen to obtain the
real
number of chars.

Nilo
----- Original Message -----
From: "Marc Sanfacon" <sanm copernic com>
To: <xml xmlsoft org>
Sent: terça-feira, 1 de agosto de 2000 16:36
Subject: [xml] HTML push interface


Hi there,
I am new to libxml (I've been using it for less than 1 week).  I
have written a C++ interface on top of it.  It is not yet finished, but
it
includes most features I need for now.  BTW, I am working under Windows
2000
using MSVC 6.0 SP3.

I have tried to parse a file using the html push interface and have
strange results.

Here is the code:

FILE *f = fopen(CGL::ConvertString(p_FileName).c_str(), "r");
if (f != NULL) {
    int res, size = 4096;
    char chars[4096];
    htmlParserCtxtPtr ctxt;

    res = fread(chars, 1, 4, f);
    if (res > 0) {
        ctxt = htmlCreatePushParserCtxt(NULL, NULL,
        chars, res, 0, static_cast<xmlCharEncoding>(0));
        InitContext(ctxt);
        while ((res = fread(chars, 1, size, f)) > 0) {
            htmlParseChunk(ctxt, chars, res, 0);
        }
        htmlParseChunk(ctxt, chars, 0, 1);
        pDoc = ctxt->myDoc;
        htmlFreeParserCtxt(ctxt);
    }
    fclose(f);
}

This is mainly the code presented in 'testHTML.c' from the package,
except
that I use a bigger buffer.  In my tests, one strange thing happened.
When
using a buffer large enough to fit one of my document, the result of the
parsing is not complete.  For now, I have only one document that does
this
effect and I have attached it to this email.

For example, the document is 2001 bytes long.  When reading using fread,
it
strips the '\r' so this gives a total of 1971 bytes.  When I put 1967
(1971
- 4 bytes for the header) or more, I get the error, a big chunk from my
document is skipped, but if I put 1966 or less, the document is parsed
OK.

I even modified 'testHTML.c' to use buffer of 1967 bytes to ensure I was
OK,
and I had the same error using: testHTML -debug -repeat -push doc2.htm

Anyone can help me ?

Regards,

Marc.

 <<doc2.htm>>

---------------------------------------------------------------------
 "If you choose not to decide, you still have made a choice."
Neil Peart
---------------------------------------------------------------------
Marc Sanfacon, Software developer Copernic.com
e-mail: sanm copernic com R&D Group
Tel   : (418) 527-0528 ext 1212 ICQ #7355101








[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]