Re: [gmime-devel] Unecscaped Unicode
- From: Jeffrey Stedfast <fejj gnome org>
- To: Robert Schroll <rschroll gmail com>
- Cc: gmime-devel-list gnome org
- Subject: Re: [gmime-devel] Unecscaped Unicode
- Date: Fri, 22 Feb 2013 10:08:39 -0500
On 2/21/2013 8:19 AM, Robert Schroll wrote:
On 02/21/2013 12:24 AM, Jeffrey Stedfast wrote:
Maybe it would make sense to have gmime's html filter optionally add
blockquotes instead of doing things the way your doing it?
Perhaps, but we're trying to make sure this works well with
format=flowed emails, so this isn't as simple as changing the <font
color=""> tags to <blockquote>s.
Ah, Okay.
Can you give me an example of the input and what you want as the output?
Do you want to nest the <blockquote>'s according to the line's citation
depth?
Here's a simple example. I'll use ~ to represent space so you can
better see what's going on.
>~This~is~a~line~of~flowed~text~
>~that~has~been~wrapped.
>~But~this~is~a~new~line.
~>~This~looks~like~a~quote,~but~
it~isn't.
should become:
<blockquote>This is a line of flowed text that has been wrapped.
But this is a new line.</blockquote>
< This looks like a quote, but isn't.
Not sure how big the can of worms is that my big mouth is offering to
implement, but if it's not too difficult maybe I can add that feature
;-)
You can take a look at our implementation-in-progress, which uses a
filter on either side of the FilterHTML. Prior to the FilterHTML,
FilterFlowed
(https://github.com/rschroll/geary/blob/plaintext/src/engine/rfc822/rfc822-gmime-filter-flowed.vala)
undoes line wrapping and space stuffing, and converts quote symbols to
0x7f. After FilterHTML, FilterBlockquote
(https://github.com/rschroll/geary/blob/plaintext/src/engine/rfc822/rfc822-gmime-filter-blockquotes.vala)
removes the 0x7f flags and inserts <blockquote>s appropriately. This
works fine, unless the email has lines starting with 0x7f. (Which it
won't, so why am I worrying?)
Wow, I'm so happy that someone is actually implementing their own filters!
As you see, the latter is relatively simple, but the former is
non-trivial. Either we'd need all of that in FilterHTML, or
FilterHTML would need a flag to indicate quote levels separate from
>. Actually, the second solution might be feasible. FilterHTML could
gain an optional "quote_marker" flag, defaulting to ">", so it would
work automatically with unprocessed text. But people like me could set
it to be something else and do our own preprocessing.
Does that make sense?
Hmm, I think I understand what you're doing better now but I'm not sure
I want to have a flag to define the quote char. That doesn't seem like
the right solution.
Could you, instead of using a magic char, using a magic string? If you
generate a randomish string then you could be fairly confidant that it
wouldn't be in the message text. Then you could just look for that same
string in the blockquote filter.
I could also add a flag like GMIME_FILTER_HTML_ALLOW_UTF8 which would
pass utf8 through without encoding as entities:
if (u >= 0x20 && u < 0x80) {
*outptr++ = (char) (u & 0xff);
} else if (html->flags & GMIME_FILTER_HTML_ALLOW_UTF8) {
outptr += g_unichar_to_utf8 (u, outptr);
} else if (html->flags & GMIME_FILTER_HTML_ESCAPE_8BIT) {
*outptr++ = '?';
} else {
outptr += sprintf (outptr, "&#%u;", u);
}
Jeff
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]