Re: [xml] new PARSER_NO_DISK_ACCESS constant



Nuno Lopes wrote:

I thought in a PARSER_NO_DISK_ACCESS constant that could do the same as PARSER_NO_NET, that is, disable disk access. This means that I could load a file, but libxml couldn't access the filesystem to check for DTDs, etc... But it could access the internet (if NO_NET wasn't set..).

So, why do I think this constant would be usefull? Well, as you know, libxml is now used by PHP as its internal xml parser. PHP is mainly designed for web applications, so it should be secure. My concern is regarding parsing xml entered by the user. How do I know that the user won't add some stuff to include local disk files? Off course, some good regex parsing could do the trick, but as I'm not a xml expert, I'm sure I would left some special cases behind, thus potential opening the file system to the world.

I have been looking at a similar situation, but with some different thoughts on this matter.

So basically you are saying that you would allow it to load DTDs off the network if XML_PARSE_NONET wasn't set, which would still open it up to the same issue. Basically you would want to load the xml (local and remote) without any allowing ANY external entities, which is what I have been looking at. In a case like this my thoughts were as follows (some relate directly to how its implemented in PHP and others are more libxml general): - safe mode, open_basedir and allow_url_fopen settings can be used in PHP
   - Load the xml without validating and without external subsets

Now this is where I have been investigatingand hadnt yet asked on the list. The above will load the xml fine without accessing external resources, except in the case I have something like an entity within an internal subset. i.e.:

<?xml version='1.0'?>
<!DOCTYPE root SYSTEM "notfound.dtd" [
<!ENTITY % incent SYSTEM "extern.ent">
%incent;
]>
<root />

Without having to build a custom external entity loader, which for PHP is being looked at for other reasons and currently using a lower level call ro I/O handling, yet still may have the same issue presented here, I have yet to find a way to prevent the external from being loaded. Once inside the external entity loader, wether default or a custom one, how can it be determined wether this is the initial call (call to load the document) or a subsquential call from parsing the document? I have yet to find any indication from the parser context: both MyDoc and instate are of no help as both empty in both cases (initial load and trying to load the external).

Rob







[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]