RE: TODO list for the next version



> catdoc is rather popular, and if you neither saw word2x, there is no
> difference. Besides, there is catdoc in mc.ext already, you just
> have to move "#". You can skip the part about excel, but I'd still
> like catdoc to be default.

Ok, I have applied a patch bases on yours. It tries "catdoc", "word2x" and
then "strings". Excel files are handled by "xls2csv" and the fallback is
also "strings".

By the way:

1) Spaces should be backslashed in the "type" directives. It was wrong for
MS Word, but I assume that you haven't tested type recognition for Excel.

2) I removed "Document" and "Worksheet" at the end. My "file" command
(RedHat 7.1) would sometimes print "Microsoft Excel 5.0 Worksheet" which
wouldn't match. Even worse, it prints "Microsoft Office Document" for my
MS Word files, but fortunately they all have doc extention.

The resulting entries in mc.ext are:

# Microsoft Word Document
regex/\.([Dd]o[ct]|DO[CT]|[Ww]ri|WRI)$
	View=%view{ascii} catdoc -w %f || word2x -f text %f - || strings %f
type/Microsoft\ Word
	View=%view{ascii} catdoc -w %f || word2x -f text %f - || strings %f

# Microsoft Excel Worksheet
regex/\.([Xx]l[sw]|XL[SW])$
	View=%view{ascii} xls2csv %f || strings %f
type/Microsoft\ Excel
	View=%view{ascii} xls2csv %f || strings %f

-- 
Regards,
Pavel Roskin





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]