Re: [xml] setting URL for xmlRelaxNGParserCtxt?



On Wed, Jan 26, 2005 at 05:39:56PM +0100, Martijn Faassen wrote:
Daniel Veillard wrote:
On Wed, Jan 26, 2005 at 03:12:52PM +0100, Martijn Faassen wrote:

So what *is* stored in these dictionaries? I still don't know. Tagnames? 
Namespace strings? Text node content? IDs? All of them? I guess I'll 
have to study the source to get the answer. :)


 markup tag name, very small text node values, ID/REFs, DTD attribute
defaults values, namespace names. With libxslt you also get stylesheets
names.
 general text node content is not added, this would explode and be 
 unusable.

Okay, thanks. Even if that memory is not freed ever it isn't too bad. I 
think I understand also now why you mention IDs, as they may be globally 
unique strings and there might be many of them. Does namespace names 
mean their prefixes or the href, or both?

  both,

It might be interesting for me to try building something on top of the 
dictionary that that caches Python unicode strings so that they don't 
need to be regenerated all the time. Basically, if I understand it 
correctly, dictionaries guarantee that there is only a single char* 
pointer to a piece of textual data, so I could use that pointer as a 

  yes unicity of the pointer returned by the API is the main garantee.
(note that ptr+1 may not be unique as "boo" and "foo" will be stored on
different locations).

 they will point to freed memory. So don't free the dictionnary until
it it not in use anymore. Use another one, but you will loose unicity
of strings.

Hm, that sounds tricky. If I have a bunch of documents that share the 
same dictionary, how would I go ahead and clean a dictionary up? One way 
would be to hunt all references to dictionaries and replace the 
dictionary with another one. The other way would be to clean or shrink 
the dictionary itself.

  You can remove the dictionnary only when no more document reference it.
trying tou change dynamically the dictionnary of a document would be expensive
and very tricky,

Both approaches have a problem I can't seem to figure my way out of:

  So don't do it.

The strings in the original dictionary (or the strings not known to the 
dictionary anymore if the dictionary has been 'shrunk') will still be 

  You can't 'shrunk' a dictionnary, there is no way you can tell whether
a given string need to be kept or discarded.
  But you can ask a dictionnary if it owns a string pointer really fast.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]