Re: Request: Test suite for EFS.



On Wed, Feb 16, 2000 at 03:57:36PM -0500, Nat Friedman wrote:
> 
> Daniel Veillard writes:
>  > 
>  >   I afraid you are gonna object that you don't have time to pusue the issue
>  > that it's too complicated and you have a product to ship. I already got this
>  > argument for not switching gtkHtml to the libxml/DOM code. I feel concerned
>  > about this. Until now Gnome was very nice in the sense that people took the
>  > time to stick to Do The Right Thing even if it was a bit more
>  > painful/slower.
> 
>     Dude, this is silly.  I sent mail saying "let's figure out how to
> do XML" and you respond with "You are not considering XML because
> you're eeeeeeevil!"  A few deep breaths on either side seem like a

  I didn't say evil at all, I said busy, and I guess it's true :-)

> good idea to me :-).

  Well is 7 am and I didn't manage to sleep much. maybe I should get
a lengthy night before going on.

>     For the record: I do want to use XML.  Now, let's work out how we
> can do that.

  XML won't do it for everything. If you have a gif resource embedding
it *inside* and XML documetn is effetively a bad idea. On the other hand

   <compount mime-type="image/gif" href="bonobo.gif"/>

 makes perfect sense for those kind of resources. It makes even more
sense to do the following in some cases:

   <coumpound mime-type="sound/mp3" href="http://www.bonobo.org/" />

Now the href an be local or pointing somewhere in the Web space.

>  >   The Right Way to do this in XML is to use namespaces.
>  > I *strongly* suggest you read :
>  > 
>  >   http://www.w3.org/TR/REC-xml-names/
>  >     
>  > It starts with:
>  >     "We envision applications of Extensible Markup Language (XML) where a
>  >     single XML document may contain elements and attributes (here referred
>  >     to as a "markup vocabulary") that are defined for and used by multiple
>  >     software modules. One motivation for this is modularity"
> 
>     Yeah, namespaces do seem like a really good idea in the case where 
> the embedded component can persist itself to XML.  I do wonder whether 
> or not we want to require this, though.

  for SVG or usual formats serializable in XML it is useful:

<embedding>
   <author>
      <name>dv</name>
      <email>veillard@w3.org</email>
   </author>   
   <compount mime-type="application/xml" xmlns:xyz="http://www.gnome.org/gnumeric/">
      <xyz:sheet>
        ....
      </xyz:sheet>
   </compound>
   <compount mime-type="graphic/svg"
             origin="http://tux.w3.org/~dv/paper/bonobo/bonobo.svg"
             xmlns:svg="http://www.w3.org/Graphics/SVG/SVG-19991203.dtd">
      <svg:width="5cm" height="8cm">
        ...
      </svg:width>
   </compound>
   <compount mime-type="image/gif" 
	     origin="http://tux.w3.org/~dv/paper/bonobo/bonobo.gif"/>
             href="bonobo.gif"/>
   </compound>
</embedding>
   

   If gnumeric or gill parser is smart enough it could handle directly
the parts it knows without even requiring a deserializer.

>  >   I did send mail to Miguel about how namaspaces should be used in bonobo
>  > to switch/detect compound in document serialized in XML. This was requiring
>  > registering the XML namespace URL for each application and make sure
>  > that all serialization would declare the namespace, and build XML analizers
>  > in a way flexible enought to skip element not in your "own"
>  > namespace.
> 
>     Can you please resend this mail to the list?

 I  guess the really relevant part is the lasts para.

------------------------------------
From: Daniel Veillard <Daniel.Veillard@w3.org>
To: Miguel de Icaza <miguel@nuclecu.unam.mx>, tlewis@mindspring.net
Cc: gnumeric-list@gnome.org, Daniel.Veillard@w3.org, gwp@rufus.w3.org
Subject: Re: [tlewis@mindspring.net: Where in the heck is the DTD for gnumeric's file format?!? (fwd)]
In-Reply-To: <199903201142.FAA04605@erandi.nuclecu.unam.mx>; from Miguel de Icaza on Sat, Mar 20, 1999 at 05:42:06AM -0600

> - ---------- Forwarded message ----------
> Date: Thu, 18 Mar 1999 19:07:39 -0500 (EST)
> From: Todd Graham Lewis <tlewis@mindspring.net>
> To: gnumeric-list@gnome.org
> Subject: Where in the heck is the DTD for gnumeric's file format?!?
> 
> I am really struggling with figuring out what a gnumeric xml doc
> looks like.  Where in the heck is the dtd for the darned thing?  I am used
> to seeing something like:
> 
>       <?xml version="1.0"?>
>       <!DOCTYPE advert SYSTEM "http://www.foo.org/ad.dtd">
>       <advert>
>         <headline>...<pic/>...</headline>
>         <text>...</text>
>       </advert>
>       (...)
> 
> in the XML FAQs and whatnot, but gnumeric just says:
> 
>       <?xml version="1.0"?>
>       <gmr:Workbook xmlns:gmr="http://www.gnome.org/gnumeric/">
>         <gmr:Geometry Width="607" Height="556"/>
>       (...)
> 
> Does gnumeric just dump its internal structure in XML format and trust
> that the next program to pick it up will understand the flags, complete
> with their typing, etc?  I am afraid that I do not understand (as usual).

  Okay, I guess I need to drop a bit of explanation here.
First, you are right, the XML produced by Gnumeric doesn't reference a
DTD. Having a DTD helps checking that the document structures conforms
to a set of rules. However, it does not help at all "understand the flags".
The semantic of an XML file cannot be coded in a DTD. For example
it possible to define that Workbook will only accept childs for type
Sheet, and that Cell are of type text. But it's impossible to express
taht for example "E3 + 12" in a Cell has the meaning of "adding content
of cell E3 and value 12".
  So basically a  huge amount of the semantic associated to an XML
file is still hardcoded in the application. I don't think the DTD would
really help.

  But we are using namespaces, the gmr: and the xmlns definition allow
to associate any of the elements prefixed by gmr: with the URL given
in the xmlns. When upgrading to a new encoding I strongly suggest
to also add a versionning information to the namespace, e.g.
  xmlns:gmr="http://www.gnome.org/gnumeric/1.1/"
and use this to dispatch the handling of those elements to the new
code.
  Basically namespace could be used for a serialisation format for Bonobo
documents (if i understand the concept correctly), i.e. when embedding
a Gnumeric cell in a GWP document, the top level of the XML tree would
define the GWP document version namespace and all element related to GWP
would be prefixed by this namespace, however when dumping the content
of the embedded Gnumeric sheet, the xmlns:gmr= ... namespace definition
should be saved at the top of the subtree of the gnumeric document and
all Gnumeric element being prefixed by the gnumeric namespace prefix.
  Add an infrastructure to Gnome where a service registers namespaces
per application and a standard set of CORBA interfaces for it and I
guess it would be possible to handle document composition and versionning
when serializing them to XML for storage by using namespaces in an
intelligent way.

   Hope this help,

Daniel
----------------------------------------------------

>  > >     What OLE2 does (as I've described before on this list) is just
>  > > write the activation ID in straight (this is especially easy since
>  > > their activation IDs are fixed-length) and then dump in whatever
>  > > stream the component spits out untouched.
>  > 
>  >   How are activation ID allocated ? Is that a registry ? who
>  > maintains it ?
> 
>     The activation IDs are just GUIDs that a component registers into
> the naming service.  They're the equivalent of GOAD IDs.

  Ok, are they temporary ? If not how did the component choose it's name ?

>  > >     So using XML makes me think "scary buffering."  But maybe I'm just 
>  > > being timid.
>  > 
>  >   I don't understand. Are you just considering the memory requirements ?
>  > I'm considering designing a serialization model which will be stable over time
>  > and still readable in 20 years by generic tools.
> 
>     I'm just constrasting the simplicity of the OLE2 compound doc
> storage model to the internal complexity of using XML -- with OLE2,
> they don't even have to buffer any of the streams.  Compound document
> persistence looks like this in the OLE2 world:
> 
>     foreach embedding in embedding_list
> 
>         activation_id = embedding.get_activation_id ();
>         write_to_persist_stream (output_stream, activation_id);
> 
>         embedding.persist_to_stream (embedding, output_stream);
> 
>     next embedding
> 
> Depersistence looks like this:
> 
>     while not eof
> 
>         activation_id = read_from_persist_stream (input_stream,
>                                                   activation_id_length);
> 
>         embedding = activate_component_with_id (activation_id);
> 
>         embedding.depersist_from_stream (embedding, input_stream);
> 
>     end
> 
> This is just incredibly simple to implement.  And with XML it's
> harder.  This is a tradeoff; XML has many advantages over the OLE2
> approach.  I was just pointing out one issue.

  Ok, I understand better. I will just point out that this scheme
doesn't really work right for real streaming data, i.e. audio or video.
Even if the size is known in advance it's hard to do an efficient save.
While if you accept the idea of having a resource possibly fragmented 
onto multiple components this becomes feasible.
  Also what happen if you have an multimedia presentation with video
audio, etc [1] .. and want to save the document after changing a typo
in the title ? Do you really have to rewrite the full data ? Does EFS
allows efficient partial writes. 

Daniel

[1] http://www.w3.org/TR/REC-smil/

-- 
Daniel.Veillard@w3.org | W3C, INRIA Rhone-Alpes  | Today's Bookmarks :
Tel: +33 476 615 257  | 655, avenue de l'Europe | Linux XML libxml WWW
Fax: +33 476 615 207  | 38330 Montbonnot FRANCE | Gnome rpm2html rpmfind
 http://www.w3.org/People/all#veillard%40w3.org  | RPM badminton Kaffe



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]