Re: Magic is useless!

On Mon, Jan 07, 2002 at 02:07:00PM +0000, Sander Vesik wrote:
> On Mon, 7 Jan 2002, Maciej Stachowiak wrote:
> > > Most major file formats already have a detectable magic byte
> > > signaturate, though some of the now prospering "human readable"
> > > formats using XML or whatever are a thorn in the side (particularly
> > > if they are compressed, grr). But this would be a good complement.

    Been there for 2.5 year in the normalization process. XML totally
outgrows the Mime-Type mechanism. Mime-Type is unfortuantely obsoleted
in that work. Want an example ?:
   Okay you can get the mime type for SMIL could be application/smil
   (I'm too lazy to check). You can also get the Mime-Type for SVG
   like graphic/svg. Now both the SMIL and the SVG spec expects to 
   mix elements from each other in a single document (using namespaces
   to do the coupling). Now tell me the Mime-Type of the document...
   Sorry won't work anymore. Forget about it.

> > XML is reasonably OK, actually, since a proper XML document has a DTD
> > declaration at the top that you can look for. The main problem is
    Bing wrong. XML does not require DTD support. And DTD support is
slowly disapearing in favor of better validation mechanism. DOCTYPE is
not really proper for this.

> > compressed files and archives, since you need to either look inside
> > the compression/archiving, or allow suffixes to take precedence for
> > those types (returning to the bad old suffix-based world for those
> > kinds of files). We really should come up with a solution to this in
> > gnome-vfs, since it is a frequent user complaint.
> > 
> It makes sense to put composite xml/non-xml (or really, even composite xml
> made out of many independent parts) documents into a container. Zip is a

  and even into a single entity

> good format as it allows for compressed and uncompressed storage (so you
> don't waste time trying to re-compress those .jpgs) and internal directory
> structure.

  Right exactly the appraoch taken by the OpenOffice group.

> A pure magic number system cannot even cope with .jar files and these are
> also quite widespread in the real world.

  Yep, IMHO Mime-Type are slowing falling into obsolescence due to
composition and the fact that it's such a f...g pain to update that
registry that some new format tend to not even bother.

  The goal of the Mime-Type was to associate processing tool with
resources. Unfortuantely it's a too limited view to cope with most
of the complex formats. A better approach is a list/hierarchy of 
strings and not a single one, the list of namespaces name of
a XML document, the list of mimetypes of a zip. To take the example
of a ZIP format it could be:

       (zip (image/gif,   xml/docbook,   image/gif))

   to use a LISP like syntax. For a compound compressed XML document

(gzip (application/xml (

  I'm pretty sure you can find examples even outside of the XML world.
don't limit yourself to known-to-getting-obsolete mechanism when defining
new interfaces.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml Gnome XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]