Re: Problems with beagle-0.3.1 with .htm- files



Am Freitag, 4. Januar 2008 schrieben Sie:
> > I am running beagle 0.3.1 (compiled manually) on a openSuSE 10.3 system.
> > I have a directory that contains the online articles of a german computer
> > magazine named "ct".  The files contained are mostly named ".htm" and
> > contain plain html text.
> >
> > When I try to index the "ct"  directory with beagle it complains with the
> > following error message in file current-IndexHelper:
> >
> > 20080104 09:49:03.9370 17841 IndexH DEBUG: No filter for
> > file:///opt/zeitschriften/ct/html/05/19/220/art.htm
> > (/opt/zeitschriften/ct/html/05/19/220/art.htm)
> > [application/x-mozilla-bookmarks]
>
> This is a problem in recognizing mimetypes of the files. Beagle uses
> freedesktop.org spec shared-mime-info and implementation xdgmime to
> determine the type of a file. In this case, shared-mime-info diagnosed
> those files are mozilla-bookmark files. Mozilla-bookmark files have a
> slightly different structure, so beagle does not index them. :(

Thanks for the explanation. However I am not sure why mime detection fails. 
The head of the file looks like this:

<head>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
<meta http-equiv="Copyright" content="Copyright by Heise Zeitschriften Verlag 
GmbH & Co. KG, Hannover">
<title>
 Buchkritik
: Mac OS X, SpamAssassin
</title>
</head>

So actually its text/html. Next if I query xdg-mime  for the mimetype of this 
file it says also text/html:

$ xdg-mime query filetype /opt/zeitschriften/ct/html/05/19/220/art.htm
text/html

So my question is how exactly beagle determines the mime type of a file when 
trying to index it? This could help me to fix the problem.

Thanks
Rainer


-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html,     Fax: +49261287 1001312

Attachment: signature.asc
Description: This is a digitally signed message part.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]