[xml] Looking for prior art on strange HTML charset conversion



Dear All,

I'm exploring, how much effort it would take to write
markup aware charset converter for conversions which
require re-ordering of characters.

Especially I'm asking whether anybody has written
or seen something like this, under any open source license.

Using libxml2 is planned of course, but I'm ready to port.

To give a simple example of what this is about:

Consider a charset conversion which needs
to convert "ab" to "BA", but the "a" is the
equivalent of the "A" and the "b" is the equivalent
of the "B". This is moderately easy in plain text. 

Now some HTML mixed in

a<em>b</b> 
should convert to 
<em>B</em>a

and 
xa<em>by</em>
should convert to
X<em>B</em>A<em>Y</em>

So while being repositioned, the characters should
drag their markup with them.

Best Regards,
Peter Jacobi




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]