RE: [libxml++] Bug in escaping of characters



 

> De : Murray Cumming [mailto:murrayc murrayc com] 

> > The text. Note that the 5 predefined entities (&, ", <, 
> &qt, TODO: What's the 5th one?) are always resolved, so this 
> content will show their human-readable equivalents.
> 
> This is talking about resolving, which is unescaping, to get 
> the original text, such as " from &quot;.

Sure, but in the part that talks about escaping, the doc refers to this unescaping:

content 	The text. This must be unescaped, meaning that the predefined entities will be created for you. See get_content().



> > But after a test, it appears that :
> > - the second character (") is not escaped
> 
> Here you are talking about escaping, to get the encoded text, 
> such as &quot; from ". That's the opposite direction.

Sorry I have confused you. Since the process is symetrical, and specified as symetrical, I used both directions indistinctively in my mail.


 
> A small compilable test case might help us.


Here is the program:

#include "libxml++/libxml++.h"
#include <iostream>

int main()
{
	{
		xmlpp::Document doc;
		xmlpp::Element* root = doc.create_root_node("root");
		root->add_child_text("'\"<>&");
		doc.write_to_file("toto.xml");
	}
	{
		xmlpp::DomParser parser("titi.xml");
		xmlpp::Element *root = parser.get_document()->get_root_node();
		std::cout << root->get_child_text()->get_content() << std::endl;
	}
}

The file titi.xml contains the following content:
<?xml version="1.0"?>
<root>&apos;&quot;&lt;&gt;&amp;</root>



After execution, the display is correct, meaning get_content does all the required unescaping. However, toto.xml contains the following data :
<?xml version="1.0"?>
<root>'"&lt;&gt;&amp;</root>

There, we can see that both ' and " are not escaped as described in the doc.

Best regards,

-- 
Loïc






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]