Re: Magic is useless!
- From: Daniel Veillard <veillard redhat com>
- To: Sander Vesik <Sander Vesik Sun COM>
- Cc: Maciej Stachowiak <mjs noisehavoc org>, Seth Nickell <snickell stanford edu>, Raphael Bosshard <whistler x-files ch>, "Gnome 2.0 List" <gnome-2-0-list gnome org>
- Subject: Re: Magic is useless!
- Date: Mon, 7 Jan 2002 12:24:14 -0500
On Mon, Jan 07, 2002 at 02:07:00PM +0000, Sander Vesik wrote:
> On Mon, 7 Jan 2002, Maciej Stachowiak wrote:
> > > Most major file formats already have a detectable magic byte
> > > signaturate, though some of the now prospering "human readable"
> > > formats using XML or whatever are a thorn in the side (particularly
> > > if they are compressed, grr). But this would be a good complement.
Been there for 2.5 year in the normalization process. XML totally
outgrows the Mime-Type mechanism. Mime-Type is unfortuantely obsoleted
in that work. Want an example ?:
Okay you can get the mime type for SMIL could be application/smil
(I'm too lazy to check). You can also get the Mime-Type for SVG
like graphic/svg. Now both the SMIL and the SVG spec expects to
mix elements from each other in a single document (using namespaces
to do the coupling). Now tell me the Mime-Type of the document...
Sorry won't work anymore. Forget about it.
> > XML is reasonably OK, actually, since a proper XML document has a DTD
> > declaration at the top that you can look for. The main problem is
Bing wrong. XML does not require DTD support. And DTD support is
slowly disapearing in favor of better validation mechanism. DOCTYPE is
not really proper for this.
> > compressed files and archives, since you need to either look inside
> > the compression/archiving, or allow suffixes to take precedence for
> > those types (returning to the bad old suffix-based world for those
> > kinds of files). We really should come up with a solution to this in
> > gnome-vfs, since it is a frequent user complaint.
> It makes sense to put composite xml/non-xml (or really, even composite xml
> made out of many independent parts) documents into a container. Zip is a
and even into a single entity
> good format as it allows for compressed and uncompressed storage (so you
> don't waste time trying to re-compress those .jpgs) and internal directory
Right exactly the appraoch taken by the OpenOffice group.
> A pure magic number system cannot even cope with .jar files and these are
> also quite widespread in the real world.
Yep, IMHO Mime-Type are slowing falling into obsolescence due to
composition and the fact that it's such a f...g pain to update that
registry that some new format tend to not even bother.
The goal of the Mime-Type was to associate processing tool with
resources. Unfortuantely it's a too limited view to cope with most
of the complex formats. A better approach is a list/hierarchy of
strings and not a single one, the list of namespaces name of
a XML document, the list of mimetypes of a zip. To take the example
of a ZIP format it could be:
(zip (image/gif, xml/docbook, image/gif))
to use a LISP like syntax. For a compound compressed XML document
(gzip (application/xml (http://www.w3.org/2001/svg http://www.w3.org/)))
I'm pretty sure you can find examples even outside of the XML world.
don't limit yourself to known-to-getting-obsolete mechanism when defining
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
] [Thread Prev