Re: [xml] setting URL for xmlRelaxNGParserCtxt?
- From: Martijn Faassen <faassen infrae com>
- To: veillard redhat com
- Cc: Kasimier Buchcik <kbuchcik 4commerce de>, xml gnome org
- Subject: Re: [xml] setting URL for xmlRelaxNGParserCtxt?
- Date: Wed, 26 Jan 2005 17:39:56 +0100
Daniel Veillard wrote:
On Wed, Jan 26, 2005 at 03:12:52PM +0100, Martijn Faassen wrote:
So what *is* stored in these dictionaries? I still don't know. Tagnames?
Namespace strings? Text node content? IDs? All of them? I guess I'll
have to study the source to get the answer. :)
markup tag name, very small text node values, ID/REFs, DTD attribute
defaults values, namespace names. With libxslt you also get stylesheets
names.
general text node content is not added, this would explode and be unusable.
Okay, thanks. Even if that memory is not freed ever it isn't too bad. I
think I understand also now why you mention IDs, as they may be globally
unique strings and there might be many of them. Does namespace names
mean their prefixes or the href, or both?
It might be interesting for me to try building something on top of the
dictionary that that caches Python unicode strings so that they don't
need to be regenerated all the time. Basically, if I understand it
correctly, dictionaries guarantee that there is only a single char*
pointer to a piece of textual data, so I could use that pointer as a
hash to Python unicode strings. I'm not sure that'd gain me a lot of
speedup, as I already check whether a string is ascii only and return
that directly (which is safe in Python).
If one blows away a dictionary once every while, what happens to the
things referencing things inside it?
they will point to freed memory. So don't free the dictionnary until
it it not in use anymore. Use another one, but you will loose unicity
of strings.
Hm, that sounds tricky. If I have a bunch of documents that share the
same dictionary, how would I go ahead and clean a dictionary up? One way
would be to hunt all references to dictionaries and replace the
dictionary with another one. The other way would be to clean or shrink
the dictionary itself.
Both approaches have a problem I can't seem to figure my way out of:
The strings in the original dictionary (or the strings not known to the
dictionary anymore if the dictionary has been 'shrunk') will still be
shared between nodes. If two nodes refer to the same string and they're
freed, we'll have a memory violation. The dictionary use can prevent
this as before freeing we'll check whether the string is in use by the
dictionary, but we can't do this now..
The only way I can see to solve this is to hunt down all such strings
first and replace them with unique copies before a dictionary goes
away/is shrunk. That'd be a pain to do too..
Regards,
Martijn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]