[libxml++] libxml++ future

Hash: SHA1

Hi all,

Here are a few features or little technical points I'd like to see in libxml++ 
one day. Some could be included in the 1.0 version, while others will 
certainly wait for a 1.2.

I would appreciate you to comment them : do you think it worth it having such 
feature, and do you think it is a good technical choice.

I may have forgot some things important for you, don't hesitate to suggest 

- From you observations I will make a first RoadMap.

1 - postfix private members intead of prefixing them with an underscore

target version : 1.0

The ISO c++ standart reserve names with a leading underscore to the 
implemention. One shouldn't use some.
Although there is no risk of real problem with that, I think it would be 

2 - wrap xmlIO.

target version : 1.0

xmlIO interface allow the creation of our own Input/Output Buffers. Wrap them 
is an elegant and efficient way to reduce some useless potentialy big strings 

Think about how to send a document to a stream. Currently we have to do :

std::ostream & output = std::cout; // could be any ostream of course
std::string tmp = document.write_to_string();
output << tmp;

In the above code, the entire document is written to a buffer by libxml, then 
copied to a std::string by libxml++ which is finally returned by 
write_to_string(). Even the a COW implementation of std::string, we'll need 
twice more memory than the size of the document. With a non-COW 
implementation it is even worse : it may be copied 3 or 4 time.

I wrote a small wrapper to xmlOutputBuffer and implemented a 
Document::write_to_stream() function. The precedent code become :

std::ostream & output = std::cout; // std::cout is still an example of course

The advantage is much more than just writing 1 line instead of 2. The entire 
document is never in memory. libxml write to buffer by small pieces which are 
immediatly sent to the stream by the wrapper. A patch demonstrating this is 
on the patch manager if you want to experiment it. The wrapper allow the user 
to very easily define it's own OutputBuffer. I modified dom_build example to 
test it, and it works pretty well.

Another possible thing is to wrap xmlInputBuffer. Although we can (and did) 
implement parse_stream without it, it would permit to implement 
xmlTextReader.getRemainder() in an elegant way (cf. 3).

3 - wrap xmlTextReader

target version : 1.0 ?

First some reference if you want to know better what I'm speaking about :
* libxml2 xmlTextReader implementation :
* C# xmlTextReader interface :

I know this interface is not part the XML specification, which is one argument 
not to implement it.
However I think is worth it : It will answer some needs on which SAX or DOM 
are not satisfying for many people, and I bet some new users may get 
interested into libxml++ if we implement such a thing.

I think we can give it an API very close to the C# one, thanks to the xmlIO 

4 - wrap xmlTextWriter

target version : it's too early to know

This interface if far less advanced than xmlTextReader. I don't think it's 
time to think seriously about it but it's a logical step after xmlTextReader. 
An idea to keep for the future ?

5 - use a string type which handle UTF-8

target version : 1.2

This point has been discussed in the past. I will just sum-up the state of the 
discussions at this time.
The main debate was : do we impose a precise class or do we transform libxml++ 
to a templated library to let the user which class he wants.
This debate ended with a vote pro/against templates with a quite balanced 

We however have an alternative way : explicit instanciation. This would 
consist of implementing the lib with templates, but not including 
implementations in header.
Instead, we would explicitely instanciate the template classes into the 
dynamic lib with a chosen string type (very probably Glib::ustring). Programs 
using this default string type wouldn't need to be recompiled at each minor 
release, which is the main argument against templates.
At the same time, users who want to use another string type (QString for 
exemple, or even std::string of char *), could still do it, at the price of 
recompiling their application at each release of libxml++, even if the API 
doesn't change.

- - Is this solution acceptable for you ?
- - Is there any issue about LGPL with template libraries ?

6 - Implement node iterators

target version : ?

This point was also discussed earlier. We couldn't make any decision on a 
clean API.
Since xmlNode has some internal pointers to the other nodes of the tree (next, 
prev, children, parent), we could easily implement iterators allowing to walk 
in the tree in different ways :

- - children_iterator: explore all the children of a node.
- - depth_first_traversal_iterator: allow to explore all node with a depth first 
algorithm, starting from a node, ending when all the subtree has been 
- - breadth_first_traversal_iterator: idem but breadth first.

These iterators could be bidirectionnal. The question is how to define the 
end() element.
Each of them would have a const version.
I'll try to make something more complete than last time about this. Any idea 
is welcome.	

7 - make a better XPath support

target version : ?

I'm not very familiar with XPath. I don't know if the current support we have 
is enough for common uses. Any feedback on this would be appreciated.

The end.

If you reached this point, thank you for reading :-)

I'm waiting forward for comments/ideas,

Best regards,

Version: GnuPG v1.2.3 (GNU/Linux)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]