*From*: Vincent Lefevre <vincent+gnome vinc17 org>*To*: xslt gnome org*Subject*: Re: [xslt] XPath number formatting*Date*: Fri, 7 Apr 2006 16:31:44 +0200

On 2006-04-07 07:50:49 -0400, Daniel Veillard wrote: > On Fri, Apr 07, 2006 at 12:13:20PM +0200, Vincent Lefevre wrote: > > http://bugzilla.gnome.org/show_bug.cgi?id=337565 > > > > There are 2 problems with the current behavior: > > > > 1. A number that is written by xsltproc must be readable by any > > XSLT/XPath processor, and the "style e" is not a number according to > > the XPath recommendation. More precisely, converting a string like > > 1.23456789012e+11 into a number should produce a NaN (this is again > > a bug in libxml2, and xalan does it right). So, this means that data > > Round tripping is mandatory at least internally I agree (note that this is implied for a comforming XPath implementation). But this is currently not true: a conversion number -> string -> number sometimes lead to a different number with libxml2, as I showed it. The fix in the scientific notation is trivial: print out more digits (IIRC, 17 decimal digits are sufficient). > > produced by xsltproc won't be readable by other XSLT/XPath processors > > like xalan, because libxml2 breaks the specs. For an interchange > > format (one of the main goals of XML), this is not acceptable. > > Please read the previous discussions about this like > http://mail.gnome.org/archives/xml/2001-April/msg00080.html > This deviation in libxml2 has been there nearly forever. Unfortunately I wasn't there at this time, but it is based on wrong assumptions. Bjorn Reese said: Very simple. XPath uses ANSI/IEEE Std 754 (1985) for floating point numbers. This standard uses a fixed numbers of digits (bits, actually) to represent numbers, which means that it cannot accurately represent, say, the number pi as it would require an infinite number of digits. Until there, this is true (53 bits + 1 sign bit for the mantissa). It also means that big numbers cannot be represented accurately with the regular 9.9 floating point notation. If we were going to try this, the least significant digit would be garbage. The representation required by XPath is *not* a "9.9 floating point notation". With the representation required by XPath, there is absolutely no garbage. It such cases it is customary to use scientific notation instead, and this is what we did. It is not 100% standard compliant, but the standard would generate garbage, so we chose the lesser of the two evils. Again this is wrong. Note that Bjorn didn't give any example for which there would be a problem (but perhaps he wrongly thought about a 9.9 floating point notation). FYI, <http://www.w3.org/TR/xpath> says: ------------------------------------------------------------------------ A number is converted to a string as follows * NaN is converted to the string NaN * positive zero is converted to the string 0 * negative zero is converted to the string 0 * positive infinity is converted to the string Infinity * negative infinity is converted to the string -Infinity * if the number is an integer, the number is represented in decimal form as a Number with no decimal point and no leading zeros, preceded by a minus sign (-) if the number is negative * otherwise, the number is represented in decimal form as a Number including a decimal point with at least one digit before the decimal point and at least one digit after the decimal point, preceded by a minus sign (-) if the number is negative; there must be no leading zeros before the decimal point apart possibly from the one required digit immediately before the decimal point; beyond the one required digit after the decimal point there must be as many, but only as many, more digits as are needed to uniquely distinguish the number from all other IEEE 754 numeric values. ------------------------------------------------------------------------ > > 2. Even inside xsltproc, the current behavior may break things when > > one wants to do string manipulations like digit extraction (this is > > my case: I had integers between 0 and 2^53). > > I don't think the XPath specs can mandate correct behaviour for such > integer values. It does. See above. > > Isn't the C library sufficient? (BTW, glibc uses GMP for this purpose.) > > The divergence from the standard is recorded and present since the > beginning of libxml2 XPath and libxslt support. At the time clearly > there was no coherence in implementations. Maybe this need to be fixed, > but I won't take a single other processor output as an argument, even > if it's Saxon... > Following the spec is important, but number formating is clearly one > of the areas where the XPath and XSLT 1.0 were broken. I don't think it is broken. Perhaps not the best choice (at least for very large and very small numbers). But at least, the conversion is well-defined, and *in pratice*, the spec seems to be OK (if people don't like it, there are other ways to manipulate numbers; but it is annoying to spend hours trying to find why some program doesn't work while the problem is that the XSLT processor breaks the spec). > Now if people want to fix this, I'm not against it, but: > - I don't want extra library requirement > - I want a compatible behaviour on all the platforms supported by > libxml2 Well, the C library may be sufficient (but one should check the standard and the various implementations), e.g.: dixsept:~> printf "%.0f\n" 1111111111111111111111111111111111.0 1111111111111111114690340241539072 dixsept:~> printf "%.60f\n" 1e-30 0.000000000000000000000000000001000000000000000083336420607586 This is easy for very large numbers (1st case), more tricky to find the number of digits for very small numbers (2nd case). Personally I wouldn't mind if such above numbers were not supported. But [-2^53,2^53] is a natural integer range when the number type is IEEE-754 double precision (and integer types are not available), like in XPath. And it is important to follow the spec concerning these integers. > That probably mean writing some not so trivial code to handle larger than > necessary integers/fractional numbers, I think there is something somehow > similar in the Schemas type support for the Decimal type which requires > 18 digits of precisions at least (we support 24 see xmlSchemaValDecimal in > xmlschemastypes.c), 52 digits is IMHO totally out of scope for this. Note that integers in [-2^53,2^53] are represented on at most 16 digits. -- Vincent Lefèvre <vincent vinc17 org> - Web: <http://www.vinc17.org/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/> Work: CR INRIA - computer arithmetic / SPACES project at LORIA

**Follow-Ups**:**Re: [xslt] XPath number formatting***From:*Daniel Veillard

**Re: [xslt] XPath number formatting***From:*Bjorn Reese

**References**:**[xslt] XPath number formatting***From:*William M. Brack

**Re: [xslt] XPath number formatting***From:*Tim Van Holder

**Re: [xslt] XPath number formatting***From:*Vincent Lefevre

**Re: [xslt] XPath number formatting***From:*Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]