[xml] semantics of XML_PARSE_NONET when reusing parser contexts



Hi,

one of the lxml users noticed that libxml2 changes behaviour when you set the
NONET option for xmlCtxtReadFile() and then call it twice on a network URL.
The first time, it parses the external document. The second time, it refuses
to parse it.

The problem lies in the handling of the parser options, which are only set
*after* the first call to xmlLoadExternalEntity(), in the following call to
xmlDoRead(). I think this is ok in general as it allows users to parse from a
URL by passing it in but to avoid additional network access when loading
external entities transitively (DTDs etc.) - is this the intended semantics of
the NONET option?

Now, the thing is, when you reuse the parser context, then the options *stay*
in the context when you use it the second time, so they will be picked up by
the xmlLoadExternalEntity() call when running xmlCtxtReadFile() a second time.

Depending on how contexts are reused in an application, this can lead to
unpredictable behaviour. In lxml, we can work around this by resetting the
context options after parsing, but I would like to see the intended semantics
of the NONET options cleared up and see reliable behaviour here.

Stefan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]