Re: [xml] setting URL for xmlRelaxNGParserCtxt?

From: Martijn Faassen <faassen infrae com>
To: veillard redhat com
Cc: Kasimier Buchcik <kbuchcik 4commerce de>, xml gnome org
Subject: Re: [xml] setting URL for xmlRelaxNGParserCtxt?
Date: Wed, 26 Jan 2005 17:39:56 +0100

Daniel Veillard wrote:

On Wed, Jan 26, 2005 at 03:12:52PM +0100, Martijn Faassen wrote:
So what *is* stored in these dictionaries? I still don't know. Tagnames?Namespace strings? Text node content? IDs? All of them? I guess I'llhave to study the source to get the answer. :)
  markup tag name, very small text node values, ID/REFs, DTD attribute
defaults values, namespace names. With libxslt you also get stylesheets
names.
  general text node content is not added, this would explode and be unusable.

Okay, thanks. Even if that memory is not freed ever it isn't too bad. Ithink I understand also now why you mention IDs, as they may be globallyunique strings and there might be many of them. Does namespace namesmean their prefixes or the href, or both?

It might be interesting for me to try building something on top of thedictionary that that caches Python unicode strings so that they don'tneed to be regenerated all the time. Basically, if I understand itcorrectly, dictionaries guarantee that there is only a single char*pointer to a piece of textual data, so I could use that pointer as ahash to Python unicode strings. I'm not sure that'd gain me a lot ofspeedup, as I already check whether a string is ascii only and returnthat directly (which is safe in Python).

If one blows away a dictionary once every while, what happens to thethings referencing things inside it?
  they will point to freed memory. So don't free the dictionnary until
it it not in use anymore. Use another one, but you will loose unicity
of strings.

Hm, that sounds tricky. If I have a bunch of documents that share thesame dictionary, how would I go ahead and clean a dictionary up? One waywould be to hunt all references to dictionaries and replace thedictionary with another one. The other way would be to clean or shrinkthe dictionary itself.


Both approaches have a problem I can't seem to figure my way out of:

The strings in the original dictionary (or the strings not known to thedictionary anymore if the dictionary has been 'shrunk') will still beshared between nodes. If two nodes refer to the same string and they'refreed, we'll have a memory violation. The dictionary use can preventthis as before freeing we'll check whether the string is in use by thedictionary, but we can't do this now..

The only way I can see to solve this is to hunt down all such stringsfirst and replace them with unique copies before a dictionary goesaway/is shrunk. That'd be a pain to do too..


Regards,

Martijn

Follow-Ups:
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Daniel Veillard

References:
- [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Martijn Faassen
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Daniel Veillard
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Martijn Faassen
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Daniel Veillard
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Martijn Faassen
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Daniel Veillard
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Martijn Faassen
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Kasimier Buchcik
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Martijn Faassen
- Re: [xml] setting URL for xmlRelaxNGParserCtxt?
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]