[gmime-devel] Email parsing functions flawed and not practical to use



Hi,

I'm having troubles with email parsing in gmime. Generally speaking the
parser is not good compared to popular email clients (like thunderbird).
I've recognized few problems I'm trying to fix:

1. Email address processing is broken.

The worst thing for me is that with the current design it's hard to do
modify it to be able to parse more malformed addresses used very often
because gmime really assumes that an email address is well formed and
uses general word/phrase decoding functions.

- The clearest bug is that gmime will decode an email address and then
  unquote the result. The information about the origin of the quotation
  mark is lost during the decode phase, so an example email address:

  =?UTF-8?b?RGFtaWFuICJEYXBlciIgUGlldHJhcwo=?= <damian mud>

  Which is (decoded):

  Damian "Daper" Pietras <damian mud>

  But for gmime it's:

  Damian Daper Pietras <damian mud>

- I also want to parse some broken addresses that are too hard for
  gmime, but work with other MIME implementations like thunderbird.

  Examples taken from real world are:

  Email: =?UTF-8?Q?agatest123_"test"?= <aga aga>
  Malformed because: Quotation mark is not allowed in a phrase.
  Gmime parsing: "=?UTF-8?Q?agatest123_ test ?=" <aga aga>
  Thunderbird parsing: agatest123 "test" <agatest123 o2 pl>

  Email: dot.com <dot.com>
  Malformed because: Dot is not allowed here, the string must be
  enquoted or encoded.
  Gmime parsing: dot.com
  Thunderbird parsing: dot.com <dot.com>

  Email: "=?ISO-8859-2?Q?TEST?=" <p p org>
  Malformed because: encoded word can't appear in quoted string.
  Gmime parsing: "=?ISO-8859-2?Q?TEST?=" <p p org>
  Thunderbird parsing: TEST <p p org>

I think all those problems require biger changes in the parser than a
simple fix. I'm trying to do that, but anybody who tried to parse emails
know it's hard :)

Any advises? Would such patches be accepted considering that the (IMHO)
nice gmime code will be harder due to many workarounds and the use of
common functions (like unquoting) will decrease?

2. API for creating InternetAddress objects is flawed. If I want to
  compose an email or just encode an email addres to put it in a header
  it's hard to do because all functions that allow me to set the name
  part for the addres:

  - internet_address_mailbox_new
  - internet_address_set_name
  - g_mime_message_add_recipient

Will decode the name fore setting it in the object:

buf = g_mime_utils_header_decode_phrase (name);
g_mime_utils_unquote_string (buf);

In case when I'm composing a message or just create InternetAddress
object it would be more practical to just set the name directly in the
form I want to be displayed. Now I must enquote and encode the name
just before passing it to gmime just to allow it to decode. In
conjuction with bugs mentioned above it makes a mess.

-- 
Damian Pietras

http://www.linuxprogrammingblog.com


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]