[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] last version libxml2 2.4.23 problem in international "href" and "src" escaping



On Mon, Jul 29, 2002 at 05:35:17PM +0300, romis wrote:
> Sorry for my bad english.
> 
> I'm using libxml for apache module in company work. After changing:
> 
> htmlAttrDump(xmlBufferPtr buf, xmlDocPtr doc, xmlAttrPtr cur) {
> ....................................
> ....................................
> if ((xmlStrEqual(cur->name, BAD_CAST "href")) ||
>     (xmlStrEqual(cur->name, BAD_CAST "src"))) {
>     xmlChar *escaped;
>     xmlChar *tmp = value;
> 
>     while (IS_BLANK(*tmp)) tmp++;
> 
> escaped = xmlURIEscapeStr(tmp, BAD_CAST"@/:=?;#%&");
> _______^^^^^^^^^^^^^^^^^^^^^^^^^^^________________
> if (escaped != NULL) {
>     xmlBufferWriteQuotedString(buf, escaped);
>     xmlFree(escaped);
> } else {
>     xmlBufferWriteQuotedString(buf, value);
> }
> ....................................
> ....................................
> 
> After calling function xmlURIEscapeStr strings from koi8-r stored in UTF-8
> throw libiconv dumped like pure UTF-8 sequence without decoding back.
> 
> For example:
> string transmitted like %EC%E0%EA%E0%F0%EE%ED - 7 letters in koi8-r
> becomes %D0%BC%D0%B0%D0%BA%D0%B0%D1%80%D0%BE%D0%BD - 14 letters in utf8
> 
> i don't know form which side change this. But previous versions don't do
> anything whith "href" and "src" now is more correct.

  I'm not sure I understand the change you want to make.
My understanding of URI escaping is precisely that the string
must fist be transocded to UTF8 before being escaped with %xx codes. 
It's a principle of the Web to keep URL context independant as much as 
possible, and UTF8 was selected for this. The best reference I can offer
for this is:
  http://www.w3.org/TR/xptr/#uri-escaping

  Libxml2 do convert the strings to UTF-8 internally so that is taken care
of.  Is libxml2 HTML serializer broken  with %xx escaping ? Are you suggesting
a fix to be sure the string is properly %xx escaped while this was missing ?

  Just to be sure I understand the problem and the suggested fix,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]