[gmime-devel] Unecscaped Unicode



Hi Jeff,

Thanks for the prompt fixes! I have another question about FilterHTML, but this one isn't a bug, I swear. As I understand it, FilterHTML will either escape non-ASCII unicode characters as &#uuuu; or convert them to question marks. Is there some way to let them stay encoded as UTF-8, or even to let all bytes through without checking for unicode validity? If not, should there be?

I ask because I'm trying to use FilterHTML to take a plain text email from the wire and convert it to HTML for display. There's no need to worry about getting the output into a 7-bit encoding, so the escapement doesn't really help. It actually gets into the way a bit: I'm also trying to put <blockquotes> around the quoted sections. I do this by marking the quoted sections with a flag as I'm unwrapping the flowed text before sending it through FilterHTML, and then adding the HTML tags afterwards based on the flags. My two top choices for the flag would be a private unicode control character or a byte invalid in unicode. The second is completely ruled out; the first is still feasible, but is less clear with the encoding going on. (FWIW, I'm using 0x7f, the DEL character, as the flag right now. I'm not expecting any emails to have this in them, but still....)

Of course, if I'm going about this in a completely wrong way, please say so.

Thanks,
Robert


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]