Re: [xml] new PARSER_NO_DISK_ACCESS constant



On Sun, May 08, 2005 at 12:48:36PM +0100, Nuno Lopes wrote:
Hi,

  Hi Nuno

I've filled my request to add a new constant in the bugzilla 
(http://bugzilla.gnome.org/show_bug.cgi?id=303342), but Daniel Veillard 
asked me to discuss my request here.

  thanks for doing that first step.

I thought in a PARSER_NO_DISK_ACCESS constant that could do the same as 
PARSER_NO_NET, that is, disable disk access. This means that I could load a 
file, but libxml couldn't access the filesystem to check for DTDs, etc... 
But it could access the internet (if NO_NET wasn't set..).

So, why do I think this constant would be usefull? Well, as you know, 
libxml is now used by PHP as its internal xml parser. PHP is mainly 
designed for web applications, so it should be secure. My concern is 
regarding parsing xml entered by the user. How do I know that the user 
won't add some stuff to include local disk files? Off course, some good 
regex parsing could do the trick, but as I'm not a xml expert, I'm sure I 
would left some special cases behind, thus potential opening the file 
system to the world.

I hope I made my opinion clearer :)

  yes, and as I said in the bug report, I'm fine working on libxml2 APIs
to make sure we have an as secure as possible framework. Note that we already
added a security framework at the libxslt level to check any document loaded
from an XSLT transformation:
   http://xmlsoft.org/XSLT/html/libxslt-security.html

  The problem I see with adding the parser option you requested in the
bugzilla entry is that it is far too narrow to be really useful and may just
end up giving a false belief of security which is the worse one can do at
an API level. There is a number of places where libxml2 may fetches data:
    - DTDs and more generally schemas parsing
    - external parsed entities
    - XInclude
    - catalogs
    - XSLT document()
    - and probably a few others.
  So a solution would have to cover all of them.
The good point is that all external file/entities accesses are centralized,
and you can override the default routine:
  http://xmlsoft.org/html/libxml-parser.html#xmlSetExternalEntityLoader
(with the exception of catalog lookups which are doing direct I/O)

  I'm pretty sure PHP5 already use that function to control all I/O accesses
made from libxml2 (and libxslt). The problem is that it's global so if
you share libxml2 with another library, well you're in problem.
  So I agree that somewhat finer control for all I/O made from various
processings are needed and allowing independant control of each different
step, but it's not just a parsing flag, the behaviour may be far more
complex than allowing file access or not (see the libxslt and xsltproc
security options, you may allow read or write access but to a dedicated
subtree), and that control need to be passed down to all verious processing
not just fetching external parsed entities when doing just the parsing phase.

Daniel
  

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]