[gtk-doc] TODO: more comments wrt. performance
- From: Stefan Sauer <stefkost src gnome org>
- To: commits-list gnome org
- Cc:
- Subject: [gtk-doc] TODO: more comments wrt. performance
- Date: Sun, 20 May 2018 11:54:36 +0000 (UTC)
commit 8f355c4f432eee45720c3dc7bc42f6061b9879a6
Author: Stefan Sauer <ensonic users sf net>
Date: Sun May 20 13:51:43 2018 +0200
TODO: more comments wrt. performance
TODO | 35 +++++++++++++++++++++++++++--------
gtkdoc/mkhtml2.py | 20 ++++++++++++++++----
2 files changed, 43 insertions(+), 12 deletions(-)
---
diff --git a/TODO b/TODO
index 9fe99fd..7504973 100644
--- a/TODO
+++ b/TODO
@@ -414,10 +414,6 @@ grep "gst_caps_is_always_compatible" tags
0m33.282s 0m29.266s 0m4.012s
- removing the gentext calls for nav-bar alt tags does not help
-
- - try plain docbook xslt to see if maybe we have bad xslt templates in the
- customisation layer (gtk-doc.xsl)
-
- we could do the xinlcude processing once and use it for both html and pdf
time /usr/bin/xsltproc 2>../xslt4.log --path
/home/ensonic/projects/gnome/gtk-doc/gtk-doc/tests/gobject/docs --nonet --xinclude --stringparam
gtkdoc.bookname tester --stringparam gtkdoc.version 1.14
/home/ensonic/projects/gnome/gtk-doc/gtk-doc/gtk-doc.xsl ../tester-docs.xml
real user sys
@@ -454,12 +450,35 @@ grep "gst_caps_is_always_compatible" tags
- unfortunately there is no way to ask xsltproc to pre-transform an xslt, that could
- strip comments
- process xsl:import and xsl:include
- - compile xslt
- http://sourceforge.net/projects/xsltc/
- http://www.xmlhack.com/read.php?item=618
- extra xsltproc options:
--novalid: saves ~ 0.12 sec
-
+
+ - strip DOCTYPES on xincludes
+ - there is a performance bottleneck in libxml, where it parses DTD for each fragment
+ - we're using the dtd to
+ - validate fragments
+ - inject package name/version etc.
+ - 1) we could provide a flags to gtkdoc-mkdb to not put any doctype on
+ generated file and manually result entities in generated files (and
+ expand_content_files)
+ - to get a list of entities:
+ - we could parse entities from the main doc-file header
+ - tricky as with xml/gtkdocentities.ent, they are in a extra file
+ - we could pass entities as args for gtkdoc-mkdb
+ - if the flag is used, we should warn if entities are in the header
+ - 2) if the doctype on the main doc, does not conatin entities, skip adding
+ the doctype to generated output
+ - 3) if the doctype on the main doc contains entities, only add the doctype
+ if the generated content contains entities ([&%][_a-zA-Z]*;)
+ - a quick check on the gnome modules showed:
+ - quite a few don't use entities
+ - those that use version.xml
+ - seem to mostly use it in the main doc
+ - but a few use it for man pages
+ find . -name "*.xml" -exec grep -Hn "&version;" {} \; | grep -v "\-docs.xml"
+
+find . -name "*.xml" -exec egrep --color -Hn '&[_a-zA-Z]*;' {} \; | egrep -v '&(amp|lt|gt|quot|apos|nbsp);'
| egrep --color '&[_a-zA-Z]*;'
+find . -name "*.xml" -exec egrep -o '&[_a-zA-Z]*;' {} \; | sort | uniq -c | sort -n
= python =
- consider swithcing to this markdown parser
diff --git a/gtkdoc/mkhtml2.py b/gtkdoc/mkhtml2.py
index 6256129..fef4876 100644
--- a/gtkdoc/mkhtml2.py
+++ b/gtkdoc/mkhtml2.py
@@ -43,15 +43,25 @@ TODO:
- convert_{figure,table} need counters.
- check each docbook tag if it can contain #PCDATA, if not don't check for
xml.text/xml.tail and add a comment (# no PCDATA allowed here)
-- consider some perf-warnings flag
- - see 'No "id" attribute on'
- find a better way to print context for warnings
- we use 'xml.sourceline', but this all does not help a lot due to xi:include
- consolidate title handling:
- always use the titles-dict
+ - convert_title(): uses titles.get(tid)['title']
+ - convert_xref(): uses titles[tid]['tag'], ['title'] and ['xml']
+ - create_devhelp2_refsect2_keyword(): uses titles[tid]['title']
- there only store what we have (xml, tag, ...)
- when chunking generate 'id's and add entries to titles-dict
- add accessors for title and raw_title that lazily get them
+ - see if any of the other ~10 places that call convert_title() could use this
+ cache
+- performance
+ - consider some perf-warnings flag
+ - see 'No "id" attribute on'
+ - xinclude processing in libxml2 is slow
+ - if we disable it, we get '{http://www.w3.org/2003/XInclude}include' tags
+ and we could try handling them ourself, in some cases those are subtrees
+ that we extract for chunking anyway
DIFFERENCES:
- titles
@@ -1761,11 +1771,13 @@ def main(module, index_file, out_dir, uninstalled, src_lang, paths):
# 1) load the docuemnt
_t = timer()
# does not seem to be faster
- # parser = etree.XMLParser(collect_ids=False)
+ # parser = etree.XMLParser(dtd_validation=False, collect_ids=False)
# tree = etree.parse(index_file, parser)
tree = etree.parse(index_file)
+ logging.warning("1a: %7.3lf: load doc", timer() - _t)
+ _t = timer()
tree.xinclude()
- logging.warning("1: %7.3lf: load doc", timer() - _t)
+ logging.warning("1b: %7.3lf: xinclude doc", timer() - _t)
# 2) copy datafiles
_t = timer()
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]