Re: [xml] semantics of XML_PARSE_NONET when reusing parser contexts



On Fri, Jul 27, 2007 at 11:06:36AM +0200, Stefan Behnel wrote:
Hi,

one of the lxml users noticed that libxml2 changes behaviour when you set the
NONET option for xmlCtxtReadFile() and then call it twice on a network URL.
The first time, it parses the external document. The second time, it refuses
to parse it.

The problem lies in the handling of the parser options, which are only set
*after* the first call to xmlLoadExternalEntity(), in the following call to
xmlDoRead(). I think this is ok in general as it allows users to parse from a
URL by passing it in but to avoid additional network access when loading
external entities transitively (DTDs etc.) - is this the intended semantics of
the NONET option?

  Hum, no. The NONEt semantic is that any access outside the local filesystem
should genrate an error. Note that if you have a catalog remapping external
resources to local ones, then they should proceed without failure.

Now, the thing is, when you reuse the parser context, then the options *stay*
in the context when you use it the second time, so they will be picked up by
the xmlLoadExternalEntity() call when running xmlCtxtReadFile() a second time.

  That's weird.

Depending on how contexts are reused in an application, this can lead to
unpredictable behaviour. In lxml, we can work around this by resetting the
context options after parsing, but I would like to see the intended semantics
of the NONET options cleared up and see reliable behaviour here.

  In general you should always reset the parsing context, like xmlCtxtRead*
function do.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]