Re: libxml2, scrollkeeper, and commas in tags



On Tue, Mar 19, 2002 at 12:43:42AM -0600, Dan Mueth wrote:
> 
> On Wed, 13 Mar 2002, Daniel Veillard wrote:
> 
> >   Guys, this is frustrating. There is a bug, at least 2 persons have seen
> > it, it seems related to libxslt not doing the right thing, I asked for
> > a reproductible test but didn't got anything, not even basic informations
> > about the platform being used.
> 
> I wish we could have found a reproducable test case earlier, but I'm not
> sure anybody is worthy of finger-pointing here.  The only two people who
> observed it both thought it was a bug in ScrollKeeper and reported it to
> me.  I couldn't reproduce it, but did what I could to investigate the
> ScrollKeeper stylesheets for a bug.  Meanwhile I was emailing and cc'ing
> some of the emails to you just in case it was a problem with libxml.  So,
> you really haven't missed out on much here.

  Okay, I probably had too much coffee that day :-\

> We do (finally) have a reproducable test case:  It seems that the bug is 
> only seen in certain locales (eg. sv_SE but not C).
> 
> If you have scrollkeeper-0.3.5 (from
> https://sourceforge.net/project/showfiles.php?group_id=11543&release_id=24049),

  Okay, installed. BTW installing the RPM generated a validity error:

[root paphio tmp]# rpm -U ~veillard/scrollkeeper-0.3.5-1.i386.rpm ~veillard/docbook-dtd412-xml-1.0-1.noarch.rpm 
/usr/share/scrollkeeper/doc/writing_scrollkeeper_omf_files/C/writing_scrollkeeper_omf_files.xml:783: validity error: ID skomf-seriesid already defined
    <sect2 id="skomf-seriesid">
			      ^
[root paphio tmp]# 

> you can do:
> 
> xsltproc /usr/share/scrollkeeper/stylesheets/toc.xsl 
> /usr/share/scrollkeeper/doc/writing_scrollkeeper_omf_files/C/writing_scrollkeeper_omf_files.xml
> 
> and you get good output for both C and sv_SE in LC_ALL.

  Okay

> If you use the ScrollKeeper extraction code, you see the problem:
> 
> scrollkeeper-extract 
> /usr/share/scrollkeeper/doc/writing_scrollkeeper_omf_files/C/writing_scrollkeeper_omf_files.xml 
> /usr/share/scrollkeeper/stylesheets/toc.xsl /tmp/junk
> 
> ie. for sv_SE, there are commas (,) inserted into the tags in the output 
> file, /tmp/junk.

  Okay I can reproduce this with the binary from the RPM.
I recompiled scrollleeper, it need to be fixed in the way it includes
libxml2 headers (c.f. my mail to gnome-hackers last week):
   gcc -DHAVE_CONFIG_H -I. -I. -I../.. -I../../libs -I/usr/include/libxml2    -g -O2 -c toc.c
   In file included from toc.c:26:
   toc.h:25:20: parser.h: Filen eller katalogen finns inte
   toc.h:26:29: parserInternals.h: Filen eller katalogen finns inte
   toc.h:27:17: SAX.h: Filen eller katalogen finns inte
   toc.h:28:23: xmlmemory.h: Filen eller katalogen finns inte

 patch enclosed.

> The main code being used here is (from libs/extract.c):
[...]
> }

  Hum, it was actually deep into the XPath number to string formatting
functions.
  The problem seems to be that in that locale, snprintf does not
follow the same conventions by postfixing the value with the float separator
even if there is no suffix to be written (note that the way the stylesheet is
written using attribute value template to propagate the value of toclevel,
the XPath engine is requested to go from the string representation to
the float representation and back a lot).
  
0x08095402 in xmlXPathFormatNumber (number=1, buffer=0xbfffc310 "\bõ\r\bP", 
    buffersize=100) at xpath.c:1115
1163			size = snprintf(work, sizeof(work), "%0.*f",
(gdb) 
(gdb) p work
$18 = "1,", '0' <repeats 14 times>, "\000\000\000\000\000\000ð"

  I fixed the XPath implementation in CVS to not call the *printf functions
if the number is actually an integer value, the current implementation was
bugged in this respect but only when the locale was influencing the behaviour
of the libc function. It's still mysterious to me whether the glibc is
actually at fault here. I prefer to not even take the risk to rely on 
those for integer formatting.

 I still don't understand either why this occured only in the context of
scrollkeeper, maybe it does some locale based tweaking, anyway this should
not influence the XPath serialization of integers.

http://cvs.gnome.org/bonsai/cvsquery.cgi?module=gnome-xml&branch=HEAD&branchtype=match&dir=gnome-xml&file=&filetype=match&who=veillard&whotype=match&sortby=Date&hours=&date=explicit&mindate=03%2F19%2F02+06%3A24&maxdate=03%2F19%2F02+06%3A26&cvsroot=%2Fcvs%2Fgnome

  Thanks for providing the testcase !

Daniel

  
-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
*** toc/src/toc.h.orig	Tue Mar 19 09:47:39 2002
--- toc/src/toc.h	Tue Mar 19 09:48:01 2002
***************
*** 22,31 ****
  #ifndef __TOC_H__
  #define __TOC_H__
  
! #include <parser.h>
! #include <parserInternals.h>
! #include <SAX.h>
! #include <xmlmemory.h>
  #include <string.h>
  
  typedef enum ElementIndex {
--- 22,31 ----
  #ifndef __TOC_H__
  #define __TOC_H__
  
! #include <libxml/parser.h>
! #include <libxml/parserInternals.h>
! #include <libxml/SAX.h>
! #include <libxml/xmlmemory.h>
  #include <string.h>
  
  typedef enum ElementIndex {


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]