Re: [xml] xmllint as filter?



At 10:54 AM 11/23/01 -0500, Daniel Veillard wrote:
  concatenating a set of XML files onto a single stream is in the
majority of the cases I have seen a serious design error.

Ok, maybe it _is_ a design error. I'm working on search engine software that will accept a stream of XML documents as its input for indexing, to allow fast fuzzy and exact indexing within very flexible constraints <plug>soon to be available for free download at http://www.nextrieve.com/</plug>.

The stream of uniformly encoded XML documents, indicates a batch of documents to be updated or indexed for the first time. The overhead of calling the indexer for each document seperately, becomes prohibitive for larger number of documents. Hence, the stream format.

We're considering a front-end for the indexing engine which could accept single XML documents and then create a stream out of these every X minutes or so, but that would still involve a stream of XML documents at that time.

If there is any better way to do this, I'd appreciate knowing about it  ;-)


> It seems that "xmllint" could be used for that, if it were not for the
> <?xml processing directive that is always output.
  simply filter it on top of xmllint with a simple script, this
sounds quite more sane.

Will do that for now...


> I've changed the source of xmllint.c to allow for an extra parameter
> "--noxmlproc", but now find it difficult to actually get that flag setting
> passed to "xmlDocContentDumpOutput" in tree.c.
  Libxml2 API is being frozen currently. I'm not found at all about
adding a new global variable, and totally opposed to change the signature
of the existing routines especially as you noted the right API already
exists.

Ok, I agree changing the signature is bad.

And if I would like to do a global variable, I would have to tweak ../include/libxml/globals.h , right?


Elizabeth Mattijsen

P.S. At this point I don't care whether it would be accepted as a patch or not. I would just like to get a little C hacking experience again... ;-)




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]