On Mon, 10/31/2011 5:48 PM, Stefan Sauer wrote:
On 09/18/2011 10:24 PM, Glen Hein wrote:Hello, My vote is to add a generic XML sanitizer. Presumably it would correct syntax problems, escape special characters, etc. Once the data is syntactically correct, the sanitizer could use a dtd/schema/xslt to add missing elements, or more importantly strip unwanted elements. The obvious application is HTML. A web server could pass untrusted bytes into the sanitizer and get back a result that is both valid and safe. Different levels/rules would be used to achieve different results. Of course there are existing solutions, but everything I've found so far is written in PHP, Perl, Python, Java, et al. And most are written as standalone command line tools. Launching a command line tool, particularly an executable that runs atop a virtual machine is very inefficient, and difficult to scale. Having the functionality inside libxml2 means daemons that already use the library could easily sanitize their output, and with relatively little overhead protect themselves from a number of potential problems. A secondary goal would be the standardization of the dtd/schema/xslt rules that are used to sanitize HTML (and other XML formatted content). Right now, every sanitizer uses a different set of rules, and looks for a different collection of exploits. If a new trick is discovered to pass harmful data to clients, presumably by encapsulating it in a way that might be valid, but which gets parsed by some clients in a "vendor specific" way, updating the standardized rules would allow all the saniziters to adapt without changing code... Just my .02. |