[xml] libxml2 is dividing contiguous non-whitespace characters



Forgive me if this is a rookie user error, but I ran across an odd situation with respect to the libxml2 parser. It appears that when it is parsing a series of numerical values separated by whitespace (e.g. "... 293.18 218.92 289.13 ..."), it is possible for it to grab just a portion of the series resulting in a number being separated from the rest of its digits.

For example, in the series above which is the data portion of a table (rows, columns, data values - see attached file), the parser may grab everything from the beginning of the list through 293.18 and 218 (but not ".92") and pass it in through the 'cdata' argument to characterHandler(). My code then extracts each of the values with "218" being the last one. Then libxml2 grabs the rest of the series, beginning with ".92 289.13 ..." through the end, passes it again to characterHandler() where my code again extracts each value, starting with ".92". This results in an extra value being parsed from the series ("218" and ".92" instead of "218.92") and an error in my code (# of data values doesn't match the product of rows and columns).

It appears to be completely random as to whether the split happens within a value or not, except that the array ends with character number 250 (suspiciously close to 256). The more white space characters, the less likely it is to happen within a value, so I can reformat the numbers and make the error go away. However, this doesn't leave me with a very comfortable feeling. Plus, I have a hard time believing that we wouldn't have seen this a long time ago if it was truly this random.

Is this something with which you are familiar? I searched the web but didn't see any returns that seems to address this situation. I'm wondering if there's a way to instruct libxml2 to not separate contiguous non-whitespace characters. I hope I don't have to try to reassemble separated values myself.

Any help is appreciated.
Thanks,
-Dan
-- 
Dan McRae
Software Engineering Manager
Comet Solutions, Inc.
505.323.2525
505.353.2635

Attachment: Table.xml
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]