[xml] Encodings and Memory



Hello,

I have 2 issues to ask advice on.

(1) I'm currently using libxml2 to save documents.  These documents can be
read/written in any number of languages, so I'd like to save using Unicode
if at all possible.  Based on my readings of the libxml2 documentation and
the xml standard, there's 2 feasable possiblities that I see:
        (a) Save all characters < 128 as normal characters (7 bit) and
        encode all characters >= 128 as entities (ie. &#266f;), or
        (b) Save all characters as UTF-8 encoded.
What other solutions have people come up with for this sort of thing?  I
can see benefits of either solution and I'm curious what other people
could suggest.

(2) Is there any way to decrease the amount of memory consumed while
building the DOM tree?  Memory consumption is pretty small using a SAX
reader, but writing some of my documents takes upwards of 18 megs of RAM
to build the entire tree in memory. Is there any way to streamline this at
all?

Thanks,
...Matt





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]