Re: [xslt] XPath number formatting



On 2006-04-07 07:50:49 -0400, Daniel Veillard wrote:
> On Fri, Apr 07, 2006 at 12:13:20PM +0200, Vincent Lefevre wrote:
> >   http://bugzilla.gnome.org/show_bug.cgi?id=337565
> > 
> > There are 2 problems with the current behavior:
> > 
> > 1. A number that is written by xsltproc must be readable by any
> > XSLT/XPath processor, and the "style e" is not a number according to
> > the XPath recommendation. More precisely, converting a string like
> > 1.23456789012e+11 into a number should produce a NaN (this is again
> > a bug in libxml2, and xalan does it right). So, this means that data
> 
>   Round tripping is mandatory at least internally

I agree (note that this is implied for a comforming XPath
implementation). But this is currently not true: a conversion
number -> string -> number sometimes lead to a different number
with libxml2, as I showed it. The fix in the scientific notation
is trivial: print out more digits (IIRC, 17 decimal digits are
sufficient).

> > produced by xsltproc won't be readable by other XSLT/XPath processors
> > like xalan, because libxml2 breaks the specs. For an interchange
> > format (one of the main goals of XML), this is not acceptable.
> 
>   Please read the previous discussions about this like
>   http://mail.gnome.org/archives/xml/2001-April/msg00080.html
> This deviation in libxml2 has been there nearly forever.

Unfortunately I wasn't there at this time, but it is based on wrong
assumptions. Bjorn Reese said:

  Very simple. XPath uses ANSI/IEEE Std 754 (1985) for floating point
  numbers. This standard uses a fixed numbers of digits (bits,
  actually) to represent numbers, which means that it cannot
  accurately represent, say, the number pi as it would require an
  infinite number of digits.

Until there, this is true (53 bits + 1 sign bit for the mantissa).

  It also means that big numbers cannot be represented accurately with
  the regular 9.9 floating point notation. If we were going to try this,
  the least significant digit would be garbage.

The representation required by XPath is *not* a "9.9 floating point
notation". With the representation required by XPath, there is
absolutely no garbage.

  It such cases it is customary to use scientific notation instead,
  and this is what we did. It is not 100% standard compliant, but the
  standard would generate garbage, so we chose the lesser of the two
  evils.

Again this is wrong. Note that Bjorn didn't give any example for which
there would be a problem (but perhaps he wrongly thought about a 9.9
floating point notation).

FYI, <http://www.w3.org/TR/xpath> says:

------------------------------------------------------------------------
A number is converted to a string as follows
    * NaN is converted to the string NaN
    * positive zero is converted to the string 0
    * negative zero is converted to the string 0
    * positive infinity is converted to the string Infinity
    * negative infinity is converted to the string -Infinity
    * if the number is an integer, the number is represented in
      decimal form as a Number with no decimal point and no leading
      zeros, preceded by a minus sign (-) if the number is negative
    * otherwise, the number is represented in decimal form as a Number
      including a decimal point with at least one digit before the
      decimal point and at least one digit after the decimal point,
      preceded by a minus sign (-) if the number is negative; there
      must be no leading zeros before the decimal point apart possibly
      from the one required digit immediately before the decimal
      point; beyond the one required digit after the decimal point
      there must be as many, but only as many, more digits as are
      needed to uniquely distinguish the number from all other IEEE
      754 numeric values.
------------------------------------------------------------------------

> > 2. Even inside xsltproc, the current behavior may break things when
> > one wants to do string manipulations like digit extraction (this is
> > my case: I had integers between 0 and 2^53).
> 
>   I don't think the XPath specs can mandate correct behaviour for such
> integer values.

It does. See above.

> > Isn't the C library sufficient? (BTW, glibc uses GMP for this purpose.)
> 
>   The divergence from the standard is recorded and present since the
> beginning of libxml2 XPath and libxslt support. At the time clearly
> there was no coherence in implementations. Maybe this need to be fixed, 
> but I won't take a single other processor output as an argument, even
> if it's Saxon...
>   Following the spec is important, but number formating is clearly one
> of the areas where the XPath and XSLT 1.0 were broken.

I don't think it is broken. Perhaps not the best choice (at least
for very large and very small numbers). But at least, the conversion
is well-defined, and *in pratice*, the spec seems to be OK (if people
don't like it, there are other ways to manipulate numbers; but it is
annoying to spend hours trying to find why some program doesn't work
while the problem is that the XSLT processor breaks the spec).

> Now if people want to fix this, I'm not against it, but:
>    - I don't want extra library requirement
>    - I want a compatible behaviour on all the platforms supported by
>      libxml2

Well, the C library may be sufficient (but one should check the
standard and the various implementations), e.g.:

dixsept:~> printf "%.0f\n" 1111111111111111111111111111111111.0
1111111111111111114690340241539072
dixsept:~> printf "%.60f\n" 1e-30
0.000000000000000000000000000001000000000000000083336420607586

This is easy for very large numbers (1st case), more tricky to
find the number of digits for very small numbers (2nd case).

Personally I wouldn't mind if such above numbers were not supported.
But [-2^53,2^53] is a natural integer range when the number type is
IEEE-754 double precision (and integer types are not available),
like in XPath. And it is important to follow the spec concerning
these integers.

> That probably mean writing some not so trivial code to handle larger than
> necessary integers/fractional numbers, I think there is something somehow
> similar in the Schemas type support for the Decimal type which requires
> 18 digits of precisions at least (we support 24 see xmlSchemaValDecimal in
> xmlschemastypes.c), 52 digits is IMHO totally out of scope for this.

Note that integers in [-2^53,2^53] are represented on at most 16 digits.

-- 
Vincent Lefèvre <vincent vinc17 org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]