[xml] [Bug 143739] - incorrect line numbers in well-formedness error message

I filed bug 143739 about a month ago and Daniel suggested I discuss the
problem on the list. I'm a month late but finally found time to
subscribe to the list and will try to summarize the problem.

First some background: I'm an ODP editor and I maintain a C program used
for error checking the ODP data dump. The data dump is a set of  large
(about 2GB) XML files containing the structure and content of the entire
directory. It is exported once a week and used by other sites such as
Google to generate a customized site directory. The ODP data dumps have
a reputation for containing all sorts of bad data; illegal XML chars,
corrupt UTF-8 sequences, well-formedness errors, you name it. To help
solve these problems I wrote a simple program that processes each dump
and generates a list of errors which can be used by the ODP developers
to track down and make fixes to the code that generates the dump.

My problem is that after upgrading from libxml2 v2.5.x to v.2.6.9
recently, we lost the ability to display the line number of
well-formedness errors. Prior to the upgrade the error messages appeared
in this form:

Well-formedness Error [line 29664810]: Char 0xD96B out of allowed

Now all error messages display the same, incorrect line number:

Well-formedness Error [line 65535]: Char 0xD96B out of allowed range

Daniel said this was due to a change in the code that now restricts the
line number to a small int. So features of libxml2 that rely on line
numbers don't work on large files anymore. It sounds like there's no
general solution to this problem without breaking the new node
structure. But in an email Daniel suggested there might be a fix that
would work in reader-based programs (like mine):

On Sat, 2004-06-05 at 04:05, Daniel Veillard wrote: 
In the specific case of a program based on the reader I
may be able to get a solution for this because we have a
parser context hence access to the ctxt->input->line which
is an int.

It sounds like 2.6.10 is about to be released so I guess I need find out
if the problem is still there and, if so, what I can do to facilitate a
fix or a change that might restore the ability to determine the line
number of errors.

related links:



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]