Re: [xml] Character Positions



On Fri, Jan 11, 2002 at 03:54:39PM -0000, Richard Jinks wrote:
Hi

I need to make a couple of modifications to libxml in order to support some
features required on the project I'm currently working on, and wanted to
check their feasibility with regards to reflecting the changes back into
libxml, or just keep our own private modified version.

  Okay,

The reason I'm asking is concerning the license - I'm going to be linking
libxml (and libxsl / gdome) into a proprietary app, and wanted to see if the
mods would be useful to give back. (Also to save me remaking the same
changes the next time libxml gets updated ;-) )

  thanks for sharing changes, not required but welcome.

The main change I need to make is to store the start and end character
positions in the input XML stream of the nodes (including attributes) in the
DOM tree.

  Hum, this sounds very weak ...

So I'd have to add a new counter to the _xmlParserInput structure to count
the character position, updating it in the same places the line / col
counters are updated.
I'd also need to modify the _xmlNode structure to add variables to store the
char positions once the node is created.
Finally, I'd have to alter the parser code so that it notes the relative
start and end positions and stores them in the xmlNode when it builds up the
tree.

  That's a lot of binary incompatible changes.

As I'm also going to be putting in libxsl and libgdome, I'd need to make
sure that my changes don't break anything outside of libxml. (As far as I'm

  Well they certainly break binary compatibility, especially the 
formato of an xmlNode structure.

I know these changes fall outside of the APIs for SAX and DOM, but they are
small additions that don't directly affect anything else, and I can always
put #defines or a flag around the changes if required.

  My main objection is that "keeping start and end characters from the
input stream" is a very weak notion and not in my opinion a good concept to
build API on for XML documents. For example if the input document is not
in UTF8, libxml will preprocess them using an input conversion filter.
So the "input stream" from the resource is not the input stream seen by the
parser in a lot of case. Also the parser must do some character normalization
for example at the CR/LF level and in the attribute contents. Here also
the XML specification allows to have it handled like at a preprocessor
level. There is also the handling of BOM (Byte Order Mark) markers that
are expected to be stripped first and just used for recognition of the
encoding. In all those case the notion of character position in the
input stream really depends at what level it is defined, there is a lot
of associated complexity to do that in a truely generic way, and I'm
afraid your patch would not cover them.

If you think these changes might be useful to keep, then I'd be happy to
submit them back for inclusion in libxml,

  For those reasons I'm afraid those changes are not good candidate
"a priori" for inclusion in libxml source. But providing them may help
others and still lower the cost of maintainance, so I would appreciate if
you do this.

otherwise can I have confirmation
that all I need to do is to include the credits, source (?), license and a
notice marking the changes to libxml with my application when it ships?

  http://xmlsoft.org/FAQ.html#Licence

  Yes. Actually if you're using the W3C IPR Licence, then you don't need
to provide the source of the changes. I personally don't care about the
credit and prefer by far receiving the patch(es) (even if it looks unlikely
to be included). The goal is to grow the pool and fix bugs, not inflate
the developpers ego.

    Thanks,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]