Re: [Tracker] Tracker to do list



Jamie McCracken wrote:
Laurent Aguerreche wrote:
Le jeudi 07 septembre 2006 Ã 12:48 +0100, Jamie McCracken a Ãcrit :
Im posting some to do items in case any of you lot have some spare time and want to use it hacking on tracker and help speed up development :)
...

C programming:

To pave the way for email indexing we will need mail/mbox handling utilities.

Suggest use GMime
more info at http://spruce.sourceforge.net/gmime/ and tutorial at http://spruce.sourceforge.net/gmime/tutorial/

We will need utility functions to :


1) parse entire mbox file - extracting message ID and all other fields into a GHashTable.

2) As (1) but parse only new mails (given a file offset of the last known email). All new mails are always appended to an mbox file.

3) work out whether a mail is marked as deleted or junk (evo and thunderbird use different flags in the email headers to determine this - google for the exact flags)

4) Extract plain text (we have an html filter in tracker already for html)

5) extract and decode mime attachments

All the above should be easy to implement using GMime.
Hum, it seems interesting. I would like to take a look at that. :-)

great!


But before, I will continue to read and clean code.

Sure no problem

I wonder whether the use of strlen() on UTF-8 is correct, it
shouldn't... If I remember correctly, unicode can use arrays filled that
way:
'\0' 'H' '\0' 'E' '\0' 'L' '\0' L '\0' 'O'      ("HELLO")
where a '\0' can be replaced by a value to stock characters on 2 bytes.
But I don't remember if it happens with UTF-8. I'll have to check what
happen with strlen() and funky characters.

utf-8 is not unicode.

utf-8 if ascii is always 1 byte per character and is indistinguishable from plain text/ascii

Non-ascii is always 2-4 bytes per character (mostly 2 bytes though).

Also non-ascii bytes cannot contain an ascii character within its multibyte sequence. (multibyte characters in utf-8 always have bytes with most significant bit of 1 whereas ascii is always less than 128 so has msb of 0)

for ref: http://en.wikipedia.org/wiki/UTF-8



--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]