Re: PATCH: produce xhtml output and get rid of evilsedhack



Curtis Hovey wrote:

Some downsides include:

  1. if no <meta http-equiv="content-type"> tag is in the source HTML,
     libxml will assume it is latin-1.
The content should be UTF-8 (or at least it was when I made some mass
updated last year).  Do I need to add the meta tag?  I believe point 2
from above (and the fact that latin1 is a UTF-8 subset) that nothing
needs to be done, save for recent content that might be localized
improperly.
Latin-1 is not a UTF-8 subset. Anything above codepoint 127 is represented differently, which is the problem. So if a page without the meta tag contains 8-bit characters and it is in UTF-8, it would get mangled (eg. "®" would come out as "®").

Perhaps the right solution is to just add the meta tag to every .wml file ...

  2. makes the website build depend on libxml2/libxslt

I don't believe this to be a problem..  Will order realse from the past
year or two suffice?
I don't think I'm using anything new or uncommon. The exslt date functions have been in libxslt for a long time, and everything else is standard XSLT.

  3. is a little more resource intensive than sed (not enormously though).

(1) was an obvious problem for the translated versions of the Gnome 2.10 press release, which I've corrected in the attached patch. I haven't checked to see if other files are affected.

While this isn't adding much new right now, it should also make it easier to customise things in the future (it should also result in less of those "optimised for standards" links spitting out lots of errors ...)
We use evilsedhack in foundation too.  Do you anticipate portability
problems if we upgrade all evilsedhacks?
Nope. It would be worth going over any of the warnings printed by xsltproc though -- these usually indicate invalid HTML, which can result in unexpected results if it is particularly bad.

James.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]