Re: Indexing mail attachments & files inside Zip/tar.



On Mon, 2004-10-18 at 16:31 +0200, Michael Levy wrote:
>  I'd thought about indexing the contents of archives a while ago
> and discussed it briefly on this list. I have since completely run out
> of time,
> and would love ot see this happen!

Lets make it happen!! :-)

> 
>         The basic idea was to use existing code (SharpZipLib) to
> access each
> of the documents contained in an archive as a stream for indexing.
> Then to 
> use a URI to reference each one (something like
> tar://the/tar/file.tar?entry=the_contained_file). 

Well.. really I haven't looked in to the actual implementation details,
just got it when I was archiving my backup. ;-)

> 
>         The problems I ran across seem trivial but caused me to run
> out
> of time. Basically the problem is this: It is easy to know the MIME
> type
> of the archive (tar or zip files for eaxmaple), but it gets a little
> tougher
> to know the MIME type of the contained documents. I ran across some
> memory
> corruption issues when trying to use the existing MIME type-sniffing
> functions
> ifrom gnome-vfs. 

We need to see the possibility of identifying a MIME-type through URI or
sort of.. ??

> The problems were most likely due to my poor understanding of
> Marshalling in .NET,
> so others will probably have an easier go at it. Also, I noticed that
> the functions
> behave poorly when attempting to get the MIME type of a non-existing
> file (i.e. a 
> file which is not actually on disk, but rather is an "entry" in an
> archive). 

hmmm... have to really dig into it.. let me do it and "bug" the list
with questions.. ;-)

> I can try
> to find the C code I wrote to test the functions if someone is
> interested.

Yes.  I am interested!!

> 
>         Indexing archives would be great, even for archives which are
> not mail attachements.
> 
Yes.. but at the same time, we have to make sure that we don't choke the
CPU!!

Thanks,

V. Varadhan.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]