[xslt] template breakpoint complexity, release of xsldbg-0.6.4


Here's  some analysis I've done which may be of interest . This has been to 
reduce the risk of poor interfaces/implementation by doing some estimations 
of its performance when debugging docbook xsl.  

Also new version of  xsldbg has been released. at 
sourceforge.net/projects/xsldbg.  The new release requires the latest cvs 
source for both libxml and libxslt .  libxslt must to be patched and 
configured and reinstalled with the "--with-debugger=yes"  option enabled.  
Changes made  are 
	 o upgrade to libxslt-1.0.6
	 o the removal of xsltaddon
	 o splitting og breakpoint API implementation in separate files.
  	 o xsldbg now throws an error if the debugger patch to libxslt has not 
been applied

Daniel : Could I have a flag in xslt.h to indicate if debugging is enabled? I 
currently do a grep on the libraries that xslt-config indicates libxslt 
exports to tell if debugging is enabled.

See xsldbg's ChangeLog for more details.
This version 
	o does not improve the implentation the the breakpoint API
	o suffers from the hard limit of a maximum 40 breakpoints.

Lastly I think that xmlGetLineNo can be improved, see attached diff for 
debugXML.c (apply to libxml). This is not required but is recommended.


Keith Isdale


	The problem

How to ensure that finding a breakpoint takes the minimum amount of time. In 
the worst case scenario there can be as many templates per line (and 
corresponding breakpoint) as there are files(see summary below). 

	A solution
I am starting to view the breakpoint problem as solvable by using a map 
between a line number and a hash table of breakpoints. This means that I can 
use a list (implementated using an array)  to hold these maps. To get a 
breakpoint I could use this algorithm
Line	Code 
---       -----
1:    	lineNumber = xmlGetLineNo(node) 
2:    	URL = node->doc->URL;

3:   	map = getItem(mapList, lineNumber)    
4:   	 if (map != NULL){
5:   	   breakpoint = getEntry(map, URL)
6)  	 }

  mapList is the list of maps ( linenumber to breakpoints) 
  lineNumber is the lineNumber of "xsl instruction" or "xml document node" 
being checked to see it it is a breakpoint 
 <URL>  is the URL of node being looked at , and is the "key" for the map

The first statement, getItem,  would have an order of complexity of 1
The map would likely to have, on average, four entries. (see below in 
summary). So the second statement would have an order of complexity of 4. 

It is assumed that the hash map of breakpionts will not need to be rehashed 
to minimize the time taken to find a breakpoint.  This assumption will need 
to be tested! Test to be conducted is counting the number of collisions in 
hash table (elements in hash table with same hashcode). When xsldbg is loaded 
with maximum number of "template" breakpoints.

If needed possible rehash strategies are 
1)  set hash table to another hashCode function. (hmm what other one's to use)
2)  grow hash table size  
	But this would cause slight delay during rehashing, and use more memory.
     choice a) By a factor of two if table has only a few elements
     choice b) Up to 1.5  times the size of number of elements in hash table 
when there are a large number of elements in hash table (try to reduce memory 

It is also assumed that the xmlGetLineNo function rarely need to "search" for 
a valid parent/previous node with a valid line number. This will need to be 
		Statistics summary  
These are the statistics that I determined from the html version of  docbook 
xsl 1.75

By inspection I found nearly all stylesheets files have a template at the 
start of file. Since I allow breakpoints to be set at all templates the worst 
case for the number of breakpoints per line is the number of files.

In the avarage case it is expected that each line will have a four templates. 
Which means we could expect to have four breakpionts per line. 

		Details of statistics

Files needed for test
cat *.xsl >> xsl_data.txt
ls *.xsl > xsl_filenames.txt

Number of files:44
cmd used : wc  -l xsl_filenames.txt

Total number of lines : 17412 
cmd used : wc -l xsl_data.txt

Average size of file: 395
cmd used: perl -e "print 17412/44"

Size of preamble: 7 (by visual inspection on files)

Average area of file excluding "preamble":388
cmd used: perl -e "print 395 -7"

Total number of templates:1352
cmd used: grep -c "\(\<xsl:template\)" xsl_data.txt 

"called templates" ie <xsl:template name= ...  
Total number of normal templates : 235
cmd used: grep -c "\(xsl:template\([[:space:]]\)\+name\)" xsl_data.txt 

"normal" templates ie <xsl:template match= ...    
Total number of normal templates : 1113
cmd used: grep -c "\(xsl:template\([[:space:]]\)\+match\)" xsl_data.txt 

These don't addup to total number of templates , can't find th reason why.:-(

Max number of templates per file: 372
Min number of templates per file: 0
cmd used :grep -c -e "<xsl:template" *.xsl  

Density of templates per line (average): 3.5
cmd used: perl -e "print 1352/388"

Density of templates per line (in file with most templates 
titlepage.templates.xsl ) :7.2
approximate cmd used :  (wc -l titlepage.templates.xsl)   divided by  grep -c 
"\(\<xsl:template\)"  titlepage.templates.xsl 

*** debugXML.c~	Thu Nov  1 18:38:11 2001
--- debugXML.c	Sat Nov  3 20:41:54 2001
*** 1194,1204 ****
      if (node->type == XML_ELEMENT_NODE)
          result = (long) node->content;
      else if ((node->prev != NULL) &&
!              (node->prev->type == XML_ELEMENT_NODE))
!         result = (long) node->prev->content;
      else if ((node->parent != NULL) &&
!              (node->parent->type == XML_ELEMENT_NODE))
!         result = (long) node->parent->content;
      return result;
--- 1194,1206 ----
      if (node->type == XML_ELEMENT_NODE)
          result = (long) node->content;
      else if ((node->prev != NULL) &&
!              ((node->prev->type == XML_ELEMENT_NODE) ||
! 	      (node->prev->type == XML_TEXT_NODE)))
!         result = xmlGetLineNo(node->prev);
      else if ((node->parent != NULL) &&
!              ((node->parent->type == XML_ELEMENT_NODE) ||
!       	      (node->parent->type == XML_TEXT_NODE)))
!         result = xmlGetLineNo(node->parent);
      return result;

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]