Re: [xml] Network code in libxml (was: SSL/TLS support)



Hi there,

I am going to embark on a dangerous mission here. I will raise a possibly
unpopular question about the network support in libxml. Before anybody
feels offended by the question, please consider that I also have submitted
patches for the network code, and thus have invested time therein.

If people would express their opinions with less fear from offending
someone, we would possibly get more good ideas. Besides, you were never the
one who formulates his words as an insult :-)

Do we really wish to extend the HTTP/FTP support in libxml?

No. What I wish is to have support for https:// URIs. When I run xsltproc
and the XML document I give it has a processing instruction saying that the
stylesheet is located on a HTTPS server, I would like the tool to be able to
fetch that styleshet. The solution which gives this without extending the
libxml core is as good as any other.

This is not merely a question about the current SSL patches, but a general
question about the future direction of libxml.

In my view, libxml is a library for handling of XML, XPath, XPointer, and
related techologies, not a library for handling network communication,
despite the fact that it includes minimal support for HTTP and FTP.

True. It is possible to go this way. One would have to modify the xmlIO core
and have much more functionality through external libraries, and that while
actually reducing the libxml core complexity. However, this could break
compatibility with previous versions. Ideas for libxml3 are underway, I see
:-)

Proper support for HTTP, HTTPS, FTP and similar protocols is full-scale
project in its own right. There are other libraries which specialize on
such support. By adding our own network support we are re-implementing
what these libraries does, and we probably do it less well than they do.
Furthermore, we have to maintain features once they are added.

However, a XML document can reference an external DTD, Schema, stylesheet
and what not. Those references can very well be HTTP URIs. A XML processor
must fetch DTD's and Schemas in order to validate the document and
validation is the responsibility of the XML processor. This implies that the
XML processor must be able to fetch data through any protocol specified by
an URI, as long the URI is legal. I don't know which URIs are mandatory to
support and if http:// belongs to that group, but it is a very common one.
One would hardly start a full-scale project just to enable the XML processor
to validate XML, which should be a part of its functionality anyway.

We should let libxml excel at what it does well, and be cautious about
adding features that are irrelevant to the main purpose of libxml.

This leads me to a stronger version of the above question:

Do we really want networks support in libxml?


Some support for network must be there, otherwise top-level functions like
xmlParseFile won't work anymore without coding additional callback code
which handles the network, a callback code which would have to be repeated
in every application which uses libxml. But if this code must be in libxml
core, or if libxml should rely on other libraries, that is a good question
:-) It can very well be reduced to a proxy which uses another, specialised
library.

That is, should we keep the current HTTP/FTP code as part of the libxml
core? I am aware that this code is used by some, and we should not simply
remove it, but how about extracting it as a separate project?

It does not need to be a separate project. One can provide http:// URI
handler for xmlIO which acts as a proxy and uses a specialised HTTP library
as a backend. This one could then be in libxml source, but optional, not a
part of the core. However, the URI callbacks in xmlIO must then be
runtime-replaceable by the application code, to avoid a lock to one specific
backend implementation.

I know that I can disable these features when I configure libxml, but
that is beside the point I am trying to raise. My main concern is with
the portability and continuous support of these features.

Well, if a builtin URI handler is not implementing the protocol but is
relying on an external library, there is not much to support. In fact, other
software does the same with libxml, building simple XML proxies which use a
particular XML processor implementation as a backend.


I agree that the proper place for this HTTP code is a library specialised
for that. If you simply use some external libary, then people must agree on
that libary. I am worried that they won't. Whichever HTTP library you
choose, there will be someone who prefers another one and perhaps forks an
own development branch at some point. I would like to prevent having a
myriad of libxml incarnations in the future, not because I think that
everyone should choicelessly agree on the same backend, but because we are
sure to receive support requests for all those incarnations sooner or later.

PS: I have pretty much the same option about the thread support in libxml,
which I would have preferred being implemented as callback functions, so
the application should provide the thread specific code, leaving the added
complexity out of libxml.

The real problem here was the ability to preempt one thread executing a
libxml function with another thread executing the same function. This can
also be done by redesigning libxml to use no global variables. This is okay,
but is a compatibility break and calls for libxml3.

About the SSL code, it is perfectly okay with me not to apply it in libxml2,
but to work towards a more general solution in libxml3.

Ciao
Igor





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]