[Evolution-hackers] extracting email to taged_file : next step



Hi all,

thanks for your advices my script is near to be OK, (event if it's not a
really nice development, as you can see down)

Finaly I choose Python, instead of ruby, because i've few friends in
that domain :o) who helped me.

Well, I have a problem i didn't anticipate, because of encoding, of
course.
In my mboxe file there is multiple encoding mails (nice UTF-8 and
F0##king ISO-8859-1)

I ve tried to extract charset for each email and encode/decode
iso-8859-1, but without good results. I can't have a readable mail or
subject.


So, my question is :

Is it possible to force an encoding (UTF-8) for all messages in a mbox ?

To be more explicit, script python and after an extract of 2 mail in my
mbox.

thanks for all.
simon

is it my python script :___________________________________________



#! /usr/bin/env python
# -*- coding: utf-8  -*-  

import mailbox, rfc822,email
import sys, os, string, re

fout    =   file('mailbox_tag','w')

mb  =   mailbox.mbox('test_mailbox')

for message in  mb:
    subject =   message['subject']
    AU  =   message.get_from()
    TI  =   message['subject']
    DT  =   message.__getitem__('date')
    DSt  =   message.__getitem__('To')
    DSc  =   message.__getitem__('Cc')
    PJ  =   message.__getitem__('filename')
    #contenu    dans    certains    cas plusieurs   messages
    #   la  PJ  est contenue    dans    le  second  message
    if  message.is_multipart():
        
        listeCT  =   message.get_payload()
        #print   'from   :'+message.get_from()
              
        
        #le titre   de  la  PJ  se  trouve  dans    le  second  message
        PJ  =   str(listeCT[1].get('Content-Disposition'))
        #print   PJ
        nameFile    = PJ[PJ.rfind('filename=')+10:len(PJ)-1]
        
        
        #encoding   du  sous-message
        contentType =   str(listeCT[0].get('Content-Type'))
        encoding    =
contentType[contentType.rfind('charset')+8:len(str(listeCT[0].get('Content-Type')))]
        
        #print   'PJ type    :   '+PJ.__class__.__name__
        
        CT =   str(listeCT[0])
        
        
    else:
        contentType =   message.get('Content-Type')
        encoding =
contentType[contentType.rfind('charset')+8:len(str(contentType))]
        #print   encoding
        CT  =   message.get_payload()
##        contentType    =    str(CT.get('Content-Type'))
##        print   contentType
        
    print   TI
        #problème   encoding
#here, I've tried a lot of things to encode strings!
        
    if  encoding    =='ISO-8859-1':
        TI=TI.encode('ISO-8859-1').decode('UTF-8')
        print   'encoding   iso :'+   TI
    else:pass
    
    #supprimer  les retour  chariot du  contenu
    
    
    
    
    NewCT   =   CT.replace('\n','')
    
    
    
    
    
    
##    print   message.get_param('charset')
##    print   message.get_content_type()
##    print   '\n'
    
    
    #écrire le  fichier
    
    fout.write('AU  ' + AU+'\n')
    fout.write('TI  '+  TI  +'\n')
    fout.write('DT  '+  DT+'\n')
    
    fout.write('DS  '+  str(DSt) +'\n    '+str(DSc))
    fout.write('\n')
    fout.write('CT  '+    str(NewCT)    + '\n')
    fout.write('PJ  '+  str(nameFile) +   '\n \n  \n')
        
    fout.write('\n  \n')
    
 _______________________________________________________________
   

ANd this is mymbox file with multiple charset mail






>From editorial e mckinseyquarterly com Mon Jun 23 08:47:08 2008
Delivered-To: slbzindep gmail com
Received: by 10.114.179.3 with SMTP id b3cs90226waf; Mon, 23 Jun 2008
	08:47:08 -0700 (PDT)
Received: by 10.100.166.9 with SMTP id o9mr13424632ane.59.1214236026168;
	Mon, 23 Jun 2008 08:47:06 -0700 (PDT)
DomainKey-Status: bad
Received-SPF: softfail (google.com: domain of transitioning
	support e mckinseyquarterly com does not designate 216.33.63.41 as
	permitted sender) client-ip=216.33.63.41;
Received: by 10.34.233.50 with POP3 id f50mf901573pyh.7; Mon, 23 Jun
2008
	08:47:04 -0700 (PDT)
X-Gmail-Fetch-Info: slebayon zindep com 1 webmail.zindep.com 110
slebayon
Return-Path: <support e mckinseyquarterly com>
Delivered-To: 50-slebayon zindep com
Received: (qmail 23686 invoked from network); 23 Jun 2008 17:45:02 +0200
Received: from arm-ei41.bigfootinteractive.com (HELO
	bigfootinteractive.com) (216.33.63.41) by wpc4283.amenworld.com with
SMTP;
	23 Jun 2008 17:45:02 +0200
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=bfi;
	d=e.mckinseyquarterly.com;
	b=chJKA6TXyXDiWg3Bf9/qpDv+sbWVGY37kJOD4MPDDR7rw8/n8nEjI9SnIxTp1+dy;
Reply-To: support W0TH053CA3225565CC8182B5034D90 e mckinseyquarterly com
Bounces_to: support e mckinseyquarterly com
Message-ID:
<W0TH053CA3225565CC8182B5034D90 1487 16750 pimailer53 DumpShot 2 e mckinseyquarterly com>
X-BFI: W0TH053CA3225565CC8182B5034D90
Date: Mon, 23 Jun 2008 11:39:43 EDT
From: =?iso-8859-1?B?VGhlIE1jS2luc2V5IFF1YXJ0ZXJseQ==?=
<editorial e mckinseyquarterly com>
Subject: =?iso-8859-1?B?VGhlIG5leHQgc3RlcCBpbiBvcGVuIGlubm92YXRpb24=?= 
To: slebayon zindep com
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="ABCD-W0TH053CA3225565CC8182B5034D90-EFGH"
X-Evolution-Source: pop://slbzindep%40gmail com pop gmail com/
X-Evolution: 00000002-0010


--ABCD-W0TH053CA3225565CC8182B5034D90-EFGH
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

THE MCKINSEY QUARTERLY

*  *  *
THE NEXT STEP IN OPEN INNOVATION

The creation of knowledge, products, and services by online communities
of companies and consumers is still in its earliest stages. Who knows
where it will lead?
http://e.mckinseyquarterly.com/W0RT0195C0661565CC8182B5034D90 =


*  *  *  =

About This E-mail
You are receiving this mailing because you subscribed to the Information
=
Technology =

list on mckinseyquarterly.com. =


Manage your subscriptions:
http://e.mckinseyquarterly.com/W0RT0195C0463565CC8182B5034D90

Unsubscribe from this mailing:
http://www.mckinseyquarterly.com/unsubscribe.aspx?uid=3D63546828&subcode=3D=
E07

Access our help tool:
http://e.mckinseyquarterly.com/W0RT0195C036D565CC8182B5034D90

Read our privacy policy:
http://e.mckinseyquarterly.com/W0RT0195C046C565CC8182B5034D90

To ensure delivery, add e.mckinseyquarterly.com to your address book.

McKinsey & Company, 21 South Clark Street, Chicago, IL 60603

--ABCD-W0TH053CA3225565CC8182B5034D90-EFGH
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

<html>
<body bgcolor=3D"#ffffff">

<div style=3D"width:605px;padding-left:10px;"><img
src=3D"http://www.mcki=
nseyquarterly.com/image/mail/logo2.gif" alt=3D"The McKinsey Quarterly"
wi=
dth=3D"275" height=3D"30" border=3D"0"></div>

<div style=3D"width:605px;border-width: 0 0 1px
0;border-style:solid;bord=
er-color:grey;height:10px;margin:0px;"></div>
<Br>
<table border=3D"0" cellpadding=3D"10" cellspacing=3D"0" width=3D"605">

<!-- shade -->
<tr>
<td bgcolor=3D"#e5e9f4"
style=3D"font-size:12px;font-family:verdana,arial=
,helvetica,sans-serif;line-height:16px;" width=3D"425">
<!-- Shaded Box (Title and Deck)  -->
<a
href=3D"http://e.mckinseyquarterly.com/W0RH0195C0060565CC8182B5034D90"=
  =

style=3D"color:#0e4985;font-size:14px;font-family:verdana,arial,helvetica=
,sans-serif;"><b>The next step in open innovation</b></a><br><br>
The creation of knowledge, products, and services by online communities
o=
f companies and consumers is still in its earliest stages. Who knows
wher=
e it will lead?


<br>
</td>
<td width=3D"10"></td></table>

<div style=3D"width:605px;border-width: 0 0 1px
0;border-style:solid;bord=
er-color:grey;height:10px;margin:0px;"></div>

<table width=3D"604" cellpadding=3D"5" cellspacing=3D"0"
style=3D"margin-=
top:6px;margin-left:9px;color:#4C4949;font-family:Verdana;font-size:10px;=
">
<tr>
<td>

<h3
style=3D"font-size:12px;font-family:Verdana;font-family:Verdana;font-=
weight:normal;margin-bottom:6px;color:#4C4949;"><strong>About This
E-mail=
</strong></h3>
</td></tr>
<tr>
<td>
<p style=3D"margin-left:9px;font-size:10px;font-family:Verdana;">You are
=
receiving this mailing because you subscribed to the Information
Technolo=
gy list on <a
href=3D"http://e.mckinseyquarterly.com/W0RH0195C046F565CC81=
82B5034D90" style=3D"color:#0e4985">mckinseyquarterly.com</a>.</p>
<p
style=3D"margin-left:9px;text-align:center;font-size:10px;font-family:=
Verdana;"><a
href=3D"http://e.mckinseyquarterly.com/W0RH0195C056E565CC818=
2B5034D90" style=3D"color:#0e4985">Manage your
<strong>subscriptions</str=
ong></a>&nbsp;|&nbsp;
<a
href=3D"http://www.mckinseyquarterly.com/unsubscribe.aspx?uid=3D635468=
28&subcode=3DE07" style=3D"color:#0e4985"><strong>Unsubscribe</strong>
fr=
om this mailing</a></p>
<p
style=3D"margin-left:9px;text-align:center;font-size:10px;font-family:=
Verdana;">
<a
href=3D"http://e.mckinseyquarterly.com/W0RH0195C0268565CC8182B5034D90"=
 style=3D"color:#0e4985">Access our <strong>help</strong>
tool</a>&nbsp;|=
&nbsp;
<a
href=3D"http://e.mckinseyquarterly.com/W0RH0195C086B565CC8182B5034D90"=
 style=3D"color:#0e4985">Read <strong>privacy policy</strong></a></p>
<p
style=3D"margin-left:9px;text-align:center;font-size:10px;font-family:=
Verdana;">To ensure delivery, add e.mckinseyquarterly.com to your
address=
 book.</p>

<p
style=3D"margin-left:9px;text-align:center;font-size:10px;font-family:=
Verdana;">McKinsey &amp; Company, 21 South Clark Street, Chicago, IL
6060=
3</p></td></tr>
</table>
<IMG
SRC=3D"http://e.mckinseyquarterly.com/W0GH053CA3025565CC8182B5034D90=
"></body></html>
--ABCD-W0TH053CA3225565CC8182B5034D90-EFGH--

>From joelle bisson uhb fr Thu Jun 19 03:18:36 2008
Delivered-To: slbzindep gmail com
Received: by 10.114.190.1 with SMTP id n1cs33509waf; Thu, 19 Jun 2008
	03:18:36 -0700 (PDT)
Received: by 10.115.88.1 with SMTP id q1mr2230983wal.122.1213870716553;
	Thu, 19 Jun 2008 03:18:36 -0700 (PDT)
Received-SPF: softfail (google.com: domain of transitioning
	doctinfos-owner listes uhb fr does not designate 193.52.64.50 as
permitted
	sender) client-ip=193.52.64.50;
Received: by 10.114.237.36 with POP3 id k36mf220413wah.21; Thu, 19 Jun
2008
	03:18:34 -0700 (PDT)
X-Gmail-Fetch-Info: slebayon zindep com 1 webmail.zindep.com 110
slebayon
Return-Path: <doctinfos-owner listes uhb fr>
Delivered-To: 50-slebayon zindep com
Received: (qmail 11120 invoked from network); 19 Jun 2008 12:15:43 +0200
Received: from uhbhic.uhb.fr (193.52.64.50) by wpc4283.amenworld.com
with
	SMTP; 19 Jun 2008 12:15:43 +0200
Received: from uhbhic.uhb.fr (localhost [127.0.0.1]) by uhbhic.uhb.fr
	(8.14.1/8.14.1) with ESMTP id m5JAFSaB015014 for <slebayon zindep com>;
	Thu, 19 Jun 2008 12:15:41 +0200
Received: from v-listes-1.v-gen.uhb.fr (v-listes-1.v-gen.uhb.fr
	[172.30.2.31]) by uhbhic.uhb.fr (8.14.1/8.14.1) with ESMTP id
	m5JAFJK6030669; Thu, 19 Jun 2008 12:15:20 +0200
Received: from v-listes-1.v-gen.uhb.fr (localhost [127.0.0.1]) by
	v-listes-1.v-gen.uhb.fr (8.13.8/8.13.8) with ESMTP id m5JAFJuG021627;
Thu,
	19 Jun 2008 12:15:19 +0200
Received: (from sympa localhost) by v-listes-1.v-gen.uhb.fr
	(8.13.8/8.13.1/Submit) id m5JAFIUo021617; Thu, 19 Jun 2008 12:15:18
+0200
X-Sympa-To: doctinfos listes uhb fr
Received: from Uhb.Fr (calypso.uhb.fr [193.52.64.62]) by
	v-listes-1.v-gen.uhb.fr (8.13.8/8.13.8) with ESMTP id m5JA2JoU020532
for
	<doctinfos listes uhb fr>; Thu, 19 Jun 2008 12:02:19 +0200
Received: from uhbhic.uhb.fr (uhbhic.uhb.fr [193.52.64.50]) by Uhb.Fr
	(8.13.8/8.13.8) with ESMTP id m5JA2J1e011730 for <doctinfos uhb fr>;
Thu,
	19 Jun 2008 12:02:19 +0200
Received: from uhbhic.uhb.fr (localhost [127.0.0.1]) by uhbhic.uhb.fr
	(8.14.1/8.14.1) with ESMTP id m5JA2IBm010989 for <doctinfos uhb fr>;
Thu,
	19 Jun 2008 12:02:18 +0200
Received: from [172.20.4.109] (uc5069.affgen.fx-per.uhb.fr
[172.20.4.109])
	by uhbhic.uhb.fr (8.14.1/8.14.1) with ESMTP id m5JA2IHO025117 for
	<doctinfos uhb fr>; Thu, 19 Jun 2008 12:02:18 +0200
Message-ID: <485A2EA9 8080400 uhb fr>
Date: Thu, 19 Jun 2008 12:02:17 +0200
From: JOELLE BISSON <joelle bisson uhb fr>
Organization: =?ISO-8859-1?Q?Universit=E9_Rennes_2?=
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: doctinfos uhb fr
Content-Type: multipart/mixed;
boundary="------------020906040603080702030809"
X-Miltered: at uhbhic.uhb.fr with ID 485A31B7.003 by Joe's j-chkmail
(http
	: // j-chkmail dot ensmp dot fr)!
X-Miltered: at uhbhic.uhb.fr with ID 485A2EAA.000 by Joe's j-chkmail
(http
	: // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe:

485A31B7.003/172.30.2.31/v-listes-1.v-gen.uhb.fr/v-listes-1.v-gen.uhb.fr/<doctinfos-owner listes uhb fr>
X-j-chkmail-Enveloppe:

485A2EAA.000/172.20.4.109/uc5069.affgen.fx-per.uhb.fr/[172.20.4.109]/<joelle bisson uhb fr>
X-j-chkmail-Score: MSGID : 485A31B7.003 on uhbhic.uhb.fr : j-chkmail
score
	: . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Score: MSGID : 485A2EAA.000 on uhbhic.uhb.fr : j-chkmail
score
	: . : R=. U=. O=. B=0.000 -> S=0.000
X-j-chkmail-Status: Ham
X-j-chkmail-Status: Ham
X-Validation-by: carole duigou uhb fr
Subject: [doctinfos ] Doctoriales 2008
X-Loop: doctinfos listes uhb fr
X-Sequence: 401
Errors-to: doctinfos-owner listes uhb fr
Precedence: list
Precedence: bulk
X-no-archive: yes
List-Id: <doctinfos.listes.uhb.fr>
List-Archive: <https://listes.uhb.fr/wws/arc/doctinfos>
List-Help: <mailto:sympa listes uhb fr?subject=help>
List-Owner: <mailto:doctinfos-request listes uhb fr>
List-Post: <mailto:doctinfos listes uhb fr>
List-Subscribe: <mailto:sympa listes uhb fr?subject=subscribe%
20doctinfos>
List-Unsubscribe:
	<mailto:sympa listes uhb fr?subject=unsubscribe%20doctinfos>
X-Evolution-Source: pop://slbzindep%40gmail com pop gmail com/
X-Evolution: 00000004-0030

This is a multi-part message in MIME format.
--------------020906040603080702030809
Content-Type: multipart/alternative;
boundary="------------060608020500020502030202"


--------------060608020500020502030202
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

*Doctoriales® Bretagne 2008*


Les Doctoriales Bretagne 2008 organisées par le collège des Grandes 
Ecoles de Bretagne auront lieu du *16 au 21 novembre 2008* (à Brest, du 
16 au 19 novembre  puis à Rennes du 20 au 21 novembre) . L'objectif de 
ces Doctoriales est de faire prendre conscience des atouts d'une 
formation par la recherche, de permettre de réfléchir au projet 
professionnel, de communiquer le dynamisme et l'ouverture d'esprit 
nécessaires pour aborder le monde de l'après thèse, de faire connaître 
la richesse et la diversité de la vie en entreprise.

Les Doctoriales Bretagne 2008 sont accessibles  à tous les doctorants 
inscrits dans un Etablissement d'Enseignement Supérieur de Bretagne.

Les doctorants intéressés pourront consulter dans quelques jours le 
site  www.doctoriales-bretagne.fr pour y trouver la présentation et le 
programme de la manifestation.

L'inscription  engendre la _participation obligatoire_ aux 
pré-formations qui auront lieu selon le calendrier suivant :

/*Pré-formation Posters*/ :
17 sept à Rennes de 14h à 17h ou
18 sept à Brest  de 9h30 à 12h30

/*Pré-formation Réflexion sur le projet professionnel et  personnel : */
7 octobre à rennes  de 14h30 à 14h30 ou
8 octobre à Brest de 9h à 12h

/*Pré-formation Montage financier -business plan et recherches 
d'informations stratégiques : */
21 octobre à Brest de 9h à 16h ou
22 octobre à Rennes de 9h à 16h


Merci de bien vouloir adresser à l'Espace Recherche pour le _*2 juillet 
2008*_ le formulaire d'inscription joint.

Cordialement.

--------------060608020500020502030202
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body link="#0000ee" alink="#ee0000" bgcolor="#ffffcc" text="#000000"
 vlink="#551a8b">
<div align="center"><big><b><font
color="#3366ff"><big><big>Doctoriales&reg;
Bretagne 2008</big></big></font></b></big><br>
</div>
<br>
<br>
Les Doctoriales Bretagne 2008 organis&eacute;es par le coll&egrave;ge
des Grandes
Ecoles de Bretagne auront lieu du <b>16 au 21 novembre 2008</b>
(&agrave;
Brest, du 16 au 19 novembre&nbsp; puis &agrave; Rennes du 20 au 21
novembre) .
L&#8217;objectif
de ces Doctoriales est de faire prendre conscience
des atouts d&#8217;une formation par la recherche, de permettre de
r&eacute;fl&eacute;chir
au
projet professionnel, de communiquer le dynamisme et l&#8217;ouverture
d&#8217;esprit
n&eacute;cessaires pour aborder le monde de l&#8217;apr&egrave;s
th&egrave;se, de faire conna&icirc;tre
la
richesse et la diversit&eacute; de la vie en entreprise.<br>
<br>
Les Doctoriales Bretagne 2008 sont accessibles&nbsp; &agrave; tous les
doctorants
inscrits dans un Etablissement
d'Enseignement Sup&eacute;rieur de Bretagne.<br>
<br>
Les doctorants int&eacute;ress&eacute;s pourront consulter dans quelques
jours le
site&nbsp; <font color="#3333ff"><a class="moz-txt-link-abbreviated"
href="http://www.doctoriales-bretagne.fr";>www.doctoriales-bretagne.fr</a></font> pour y
trouver la pr&eacute;sentation et le programme de la manifestation. <br>
<br>
L'inscription&nbsp; engendre la <u>participation obligatoire</u> aux
pr&eacute;-formations qui auront lieu selon le calendrier suivant : <br>
<br>
<i><b>Pr&eacute;-formation Posters</b></i> :
<br>
17 sept &agrave; Rennes
de 14h &agrave; 17h ou<br>
18 sept &agrave; Brest&nbsp;
de 9h30 &agrave; 12h30<br>
<br>
<i><b>Pr&eacute;-formation R&eacute;flexion sur le projet professionnel
et&nbsp; personnel
:
</b></i><br>
7 octobre &agrave; rennes&nbsp;
de 14h30 &agrave; 14h30 ou<br>
8 octobre &agrave; Brest
de 9h &agrave; 12h<br>
<br>
<i><b>Pr&eacute;-formation Montage financier -business plan et
recherches
d'informations strat&eacute;giques :
</b></i><br>
21 octobre &agrave; Brest
de 9h &agrave; 16h ou<br>
22 octobre &agrave; Rennes
de 9h &agrave; 16h<br>
<br>
<br>
Merci de bien vouloir adresser &agrave; l'Espace Recherche pour le
<u><b>2
juillet 2008</b></u> le formulaire d'inscription joint.<br>
<br>
Cordialement.<br>
</body>
</html>

--------------060608020500020502030202--

--------------020906040603080702030809
Content-Type: application/msword; name="fiche_inscription_Rennes 2.doc"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="fiche_inscription_Rennes 2.doc"







-  
Simon Le Bayon
       
       ZINDEP
       25, rue de l'Ancienne Mairie 
       35230 Bourgbarré
       Tél : 02 99 57 79 73
       Fax : 02 99 57 03 69
       Por : 06 63 40 32 19
       url : http://www.zindep.com
       skype : slebayon




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]