Re: [xml] Character Positions
- From: "Richard Jinks" <cyberthymia yahoo co uk>
- To: <xml gnome org>
- Subject: Re: [xml] Character Positions
- Date: Mon, 14 Jan 2002 15:05:06 -0000
Hi
Just been inspecting the libxml parser code with respect to putting in the
changes required to store start / end character positions, but from what I
can tell, most of the things are already in the code.
(that'll teach me not to read the code a little more closely next time...
:-) )
How are the xmlParserNodeInfo structs stored? They appear to be kept in a
separate list / tree (xmlParserNodeInfoSeq) as opposed to being "attached"
to the normal DOM tree in some way? I've done a couple of tests (mainly by
setting ctxt->record_info = 1 in parserInternals.c xmlInitParserCtxt(), and
putting printf()'s everywhere), and it looks like libxml already records the
positions of nodes against the input stream.
I'm guessing that this is only present for debugging purposes (hence it
being turned off by default).
Is it only through your debugging and testing that the node info stuff has
been tested?
As this set of functions is already present, it appears that the only
modifications I now need to make are to increase the use of these functions
to record the information for attributes and DTD nodes, and to add functions
to return the data to the calling application. Is this correct?
Thanks,
Richard
----- Original Message -----
From: "Daniel Veillard" <veillard redhat com>
To: "Richard Jinks" <cyberthymia yahoo co uk>
Cc: <xml gnome org>
Sent: Sunday, January 13, 2002 9:12 PM
Subject: Re: [xml] Character Positions
On Fri, Jan 11, 2002 at 03:54:39PM -0000, Richard Jinks wrote:
Hi
I need to make a couple of modifications to libxml in order to support
some
features required on the project I'm currently working on, and wanted to
check their feasibility with regards to reflecting the changes back into
libxml, or just keep our own private modified version.
Okay,
The reason I'm asking is concerning the license - I'm going to be
linking
libxml (and libxsl / gdome) into a proprietary app, and wanted to see if
the
mods would be useful to give back. (Also to save me remaking the same
changes the next time libxml gets updated ;-) )
thanks for sharing changes, not required but welcome.
The main change I need to make is to store the start and end character
positions in the input XML stream of the nodes (including attributes) in
the
DOM tree.
Hum, this sounds very weak ...
So I'd have to add a new counter to the _xmlParserInput structure to
count
the character position, updating it in the same places the line / col
counters are updated.
I'd also need to modify the _xmlNode structure to add variables to store
the
char positions once the node is created.
Finally, I'd have to alter the parser code so that it notes the relative
start and end positions and stores them in the xmlNode when it builds up
the
tree.
That's a lot of binary incompatible changes.
As I'm also going to be putting in libxsl and libgdome, I'd need to make
sure that my changes don't break anything outside of libxml. (As far as
I'm
Well they certainly break binary compatibility, especially the
formato of an xmlNode structure.
I know these changes fall outside of the APIs for SAX and DOM, but they
are
small additions that don't directly affect anything else, and I can
always
put #defines or a flag around the changes if required.
My main objection is that "keeping start and end characters from the
input stream" is a very weak notion and not in my opinion a good concept
to
build API on for XML documents. For example if the input document is not
in UTF8, libxml will preprocess them using an input conversion filter.
So the "input stream" from the resource is not the input stream seen by
the
parser in a lot of case. Also the parser must do some character
normalization
for example at the CR/LF level and in the attribute contents. Here also
the XML specification allows to have it handled like at a preprocessor
level. There is also the handling of BOM (Byte Order Mark) markers that
are expected to be stripped first and just used for recognition of the
encoding. In all those case the notion of character position in the
input stream really depends at what level it is defined, there is a lot
of associated complexity to do that in a truely generic way, and I'm
afraid your patch would not cover them.
If you think these changes might be useful to keep, then I'd be happy to
submit them back for inclusion in libxml,
For those reasons I'm afraid those changes are not good candidate
"a priori" for inclusion in libxml source. But providing them may help
others and still lower the cost of maintainance, so I would appreciate if
you do this.
otherwise can I have confirmation
that all I need to do is to include the credits, source (?), license and
a
notice marking the changes to libxml with my application when it ships?
http://xmlsoft.org/FAQ.html#Licence
Yes. Actually if you're using the W3C IPR Licence, then you don't need
to provide the source of the changes. I personally don't care about the
credit and prefer by far receiving the patch(es) (even if it looks
unlikely
to be included). The goal is to grow the pool and fix bugs, not inflate
the developpers ego.
Thanks,
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]