Re: [xml] lazy libxml



Daniel Veillard <veillard redhat com> writes:

  The problem is that libxml2 core won't be aware of that subtype.
And adding a node type to libxml2 is not something I would do
lightly.

I don't think one would do it like that. See below.


I don't really understand what it is aiming at honnestly, 

The reason for having lazy XML is that you can add things to the tree
that are lazy.

Ok, so that was a dumb explanation. Here's my practical use case.

I am developing XRT (http://xsltpages.nongnu.org) which is an HTTP
server side mechanism using XSLT as the main handling language.

XRT has to handle all types of content though, not just XML. I would
like XRT to be able to handle binary types (like raster images) by
representing them as XML binary objects (as Tim Bray has suggested)
thusly:

  <element xml:binary="base64">
     RUlQOiAwMDczOls8YTAxOThlMGE+XSBDUFU6IDAgTm90IHRhaW50ZWQgRVNQOiAwMDdiOmEwMjNi
     YTcwIEVGTEFHUzogMDAyMDAyNDYKICAgIE5vdCB0YWludGVkCkVBWDogMDAwMDAwMDAgRUJYOiAw
     MDAwMDAwMSBFQ1g6IGEwMjNiYTkwIEVEWDogMDAwMDAwMDAKRVNJOiAwMDAwMDAwOCBFREk6IDAw
     MDAwMDBhIEVCUDogYTAyM2JhNzggRFM6IDAwN2IgRVM6IDAwN2IKYTAyM2I1ZWM6ICBbPGEwMDMx
     YmYzPl0gc2hvd19yZWdzKzB4MTEzLzB4MTQwCmEwMjNiNjBjOiAgWzxhMDA1NmJiNz5dIHNvZnRs
     b2NrdXBfdGljaysweDU3LzB4NjAKYTAyM2I2MmM6ICBbPGEwMDQxNjI3Pl0gZG9fdGltZXIrMHg0
     Ny8weGQwCmEwMjNiNjNjOiAgWzxhMDAxYTY0ND5dIHVtX3RpbWVyKzB4MTQvMHg1MAphMDIzYjY0
     YzogIFs8YTAwNTZlMTM+XSBoYW5kbGVfSVJRX2V2ZW50KzB4MzMvMHg4MAphMDIzYjY3YzogIFs8
     YTAwNTZlYjU+XSBfX2RvX0lSUSsweDU1LzB4YjAKYTAyM2I2YWM6ICBbPGEwMDE1MzUwPl0gZG9f
     SVJRKzB4MzAvMHg0MAphMDIzYjZiYzogIFs8YTAwMWE1YjM+XSB0aW1lcl9pcnErMHgxMTMvMHgx
     NzAKYTAyM2I2ZWM6ICBbPGEwMDFhODJmPl0gdGltZXJfaGFuZGxlcisweDZmLzB4ODAKYTAyM2I3
     MGM6ICBbPGEwMDFmMGI3Pl0gc2lnX2hhbmRsZXJfY29tbW9uX3R0KzB4YjcvMHgxNTAKYTAyM2I3
     NmM6ICBbPGEwMDJkOWU5Pl0gYWxhcm1faGFuZGxlcisweDI5LzB4NjAKYTAyM2I3OGM6ICBbPGZm
     ZmZlNDIwPl0gX2V0ZXh0KzB4NWZlMTg5NzYvMHgwCmEwMjNiYTdjOiAgWzxhMDAxOTVlMj5dIGNo
     YW5nZV9zaWduYWxzKzB4NjIvMHg5MAphMDIzYmIxYzogIFs8YTAwMTk2NDI+XSB1bmJsb2NrX3Np
     Z25hbHMrMHgxMi8weDIwCmEwMjNiYjJjOiAgWzxhMDAzYzg5Mj5dIF9fZG9fc29mdGlycSsweDQy
     LzB4YzAKYTAyM2JiNGM6ICBbPGEwMDNjOTU5Pl0gZG9fc29mdGlycSsweDQ5LzB4NTAKYTAyM2Ji
     NmM6ICBbPGEwMDNjYTM5Pl0gaXJxX2V4aXQrMHg0OS8weDUwCmEwMjNiYjhjOiAgWzxhMDAxYTgw
     OT5dIHRpbWVyX2hhbmRsZXIrMHg0OS8weDgwCmEwMjNiYmFjOiAgWzxhMDAxZjBiNz5dIHNpZ19o
     YW5kbGVyX2NvbW1vbl90dCsweGI3LzB4MTUwCmEwMjNiYzBjOiAgWzxhMDAyZDllOT5dIGFsYXJt
     X2hhbmRsZXIrMHgyOS8weDYwCmEwMjNiYzJjOiAgWzxmZmZmZTQyMD5dIF9ldGV4dCsweDVmZTE4
     OTc2LzB4MAphMDIzYmYyYzogIFs8YTAwMTc3ZDY+XSBkZWZhdWx0X2lkbGUrMHg2Ni8weDcwCmEw
     MjNiZjRjOiAgWzxhMDAxZGM3Yj5dIGluaXRfaWRsZV90dCsweGIvMHgxMAphMDIzYmY1YzogIFs8
     YTAwMTc3ZWI+XSBjcHVfaWRsZSsweGIvMHgxMAphMDIzYmY2YzogIFs8YTAwMTQxNGI+XSByZXN0
     X2luaXQrMHgyYi8weDMwCmEwMjNiZjhjOiAgWzxhMDAwMTY3Mz5dIHN0YXJ0X2tlcm5lbCsweDE3
     My8weDFiMAphMDIzYmZhYzogIFs8YTAwMWRjYjg+XSBzdGFydF9rZXJuZWxfcHJvYysweDM4LzB4
     NTAKYTAyM2JmYmM6ICBbPGEwMDFlODUzPl0gc2lnbmFsX3RyYW1wKzB4YzMvMHgxMTAKYTAyM2Jm
     ZGM6ICBbPGEwMWIyNDRhPl0gY2xvbmUrMHg2YS8weDgwCg==
  </element>

I think this is a pretty neat idea because it will make all sorts of
interesting things possible (from XRT's perspective anyway).

Unfortunately it is completly unpractical. The above base64 is a < 20
line stack trace from a linux VM. If you measure even a small base64
image they are quire large. Having to do a base64 conversion for an
image file takes significant time.

But if one could do it lazily that would gie me all the advantages
whilst only hitting me with the cost when I _need_ to serialize the
data.

Consider this example of imagined XRT:

   <xsl:variable name="pngfile"  select="lazy.get('adabyron.png')"/>
   
   <xsl:variable name="jpgfile" select="lazy.convert($pngfile, 'jpg')"/>

The idea here is that there is a lazy object representing the
pngfile which is then converted to another lazy object which
represents a jpg file.

Only when I do:

   <xsl:value-of select="$jpgfile"/>

does the actual import of the png file and it's conversion to jpg
actually happen.


You can always add up a function in the _private field
of a node, but that will only be interpreted by non-libxml2 code by
definition.

That's right. This would seem to be some scary fundamental change.

What I would have to do is replace all the code which currently
handles the content of a node and make it check to see whether the
node is delayed/lazy or not and if lazy, compute it.


FWIW SXML already has an understanding of lazy XML I believe:

  http://okmij.org/ftp/Scheme/SXML.html


-- 
Nic Ferrier
http://www.tapsellferrier.co.uk   for all your tapsell ferrier needs



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]