El jue, 29-01-2004 a las 17:32, Xavier Bestel escribió: > Ah well ... not really: > 1) Only the first read matters, otherwise nautilus (or something else) > will cache the mime-type (or the first bytes) by itself anyway. Just try > opening nautilus for the first time on a crowded directory, then close > and reopen it. Feel the difference. So no point here. Once EAs are widespread, the probability that EAs are cached will be higher than the files' contents. Meanwhile, their probability of them being cached is the same as the current dataset used to determine file types. But, read on. > 2) Uh ? If you don't know, that's okay. EAs make lots of things possible. E.g. you could use EAs to store file notes, descriptions, unique file identifiers (to track your ever-moving MP3 collection, from unsorted to sorted), song rankings, links to covers, personalized icons, and much more. This all info is expected to be read upon directory read, so placing MIME types in EAs makes sense. To tell you the truth, plain files could also be used, but a) there's problems with locking; b) security problems arise; c) consistency problems arise. EAs are the architecturally correct, trustable solution to all the problems of older, non-integrated solutions for metadata. Think of all the wonderful things Mac OS can do with files, and you begin to comprehend how useful EAs become. > 3) When you read one single byte on disk, you in fact read a bunch of > them (how many exactly depends on the drive, driver, etc.). Moreover, > the cost of seeking to this byte is so high that the "system" will read > a bunch more sectors (many many bytes) just in case they're needed > later. No point there too. You've just confirmed what I said in my last email. Due to readaheads, reading EAs would be faster than reading files themselves for getting contents. Nowadays, Nautilus readdir's() a directory, and uses either file contents or file extensions to infer the file type (note I say INFER, not determine, the file type is already determined). Reading file contents is *dog slow* (relatively). Reading file types is inaccurate can lead to security compromises, creates usability problems, and it's plain wrong (the file type shouldn't be a function of the file name) but it's fast. Oh, well, let's move on: If every (every!) application saved files WITHOUT extensions (naysayers, please hold on to your hats because this might be too radical for you), and they stored the file type on a special, standard, agreed-upon "file", you would have a reliable way of determining file types (and perhaps many other things, such as file notes, icons, etcetera). Wanna know the file type? Just look on the "filetype file". The problem is that, you all would have to actually agree a 100% on where (location) to store (format) that information, and solve all the technical hurdles: * does my current user have permission to save to the filetype file? * what if another process is currently writing to the file? * what if some user moves his files around? * how does the filetype file get updated?). Now you begin to understand the complexity of the problem. It's a fact of life that we associate data, and one of the consequences of this fact is that we tend to label, categorize and tag. These tags, labels and categories are called "metadata", and one way to categorize and sort files out is to define file types. File types *belong* into the system used to categorize and tag files: the metadata. File types literally are *data about data*: by definition, metadata. Just as we are human, metadata will always exist, and you can't do anything to get rid of it. It's a byproduct of thought, and a useful one. You need to store it. Period. Now, how? Or perhaps, how best? The technical solution is to devise a simple, transactional, atomic way of storing information *along* files. Atomic to guarantee consistency, and "along files" so that the information (file types, attached notes, etcetera) goes WITH the files, when they are moved. Now, every application can agree on where to store file types, because the systems provide a clean, secure, robust and uniform way to do so. But who will ever write such a piece of software? How will it take form? Is it a daemon? Is it a service? Is it a command that runs every night? No. Due to the properties of such a system, it has to work inside the kernel. Ensuring that when files are moved, copied, and stuff, the information stored along the files (called metadata) goes with them. Any other system, outside the kernel, would have to follow the user's actions and files like a nanny follows a naughty baby. So EAs were defined, standardized, and developed. Today, most file systems available for linux and solaris support EAs. In fact, every modern system supports them in one way or another (NT calls them streams, Mac calls it the Resource fork - there is only one and Apple has standardized ways to store stuff there). They exist because there is a valid need for them. E.g. you could label a folder with a "sticky note" so when your coworker opens the folder, he sees the note. ...Rewind 20 years... Since there was no system to store "metadata" when MS-DOS users multiplied, (perhaps even before) people started using file extensions. .DOC, .XLS, .MP3. It's become so ingrained that nowadays people think of extensions as the real deal "file types", unknowingly stretching the definition. To everyone who is interested, here's the memo: file extensions are NOT file types. NOT. NOT. Whoever uses the term "file extension" and "file type" is an ignorant moron. File extensions are just one way (and a terrible way at that) to let us distinguish file types. The fact that they are a bad thing pops up in inconsistencies (two files apparently named the same, because extensions are hidden) and mass-mailer viruses (a file posing as a zip file is actually an executable). I repeat it and I'll repeat it myself: using extensions to do anything meaningful is a terrible design decision and should be abandoned ASAP. Sure, they solved the needs of discerning file types for MS-DOSers and Windows'ers, for a while. Perhaps the fact that even Hollywood has used them on movies shows how bad they are =). ...return to 2004... Today, we're mostly stuck to using either the extension or the contents of the files to discern its type. Since the file type is a function of the file contents, it's logical that the description of the type of the file be stored along the files. But not directly in the file, because the file type is not data that belongs in the file: it's data about the file. Metadata again. But now we have the metadata store that MS-DOS didn't have. A metadata store that is standard, will get augmented with search technology, and is guaranteed to be stable. Someone has written it for us. We didn't have to agree at all. Well, we have to agree to start using it now, but that's about it. I think it's about time that we started taking advantage of it. Since Nautilus and Konqueror (the two most prominent file managers) both play such a central role, their support is crucial. This is the way it could work. 1) Applications start tagging files each with an EA entry (tentatively named mimetype) that contains the MIME type of the file - portable libraries would need to be written (libmimetype? libmetadata? a "libmimetype" above "libmetadata"? KFileMetaInfo extensions?) to automate this job and even deduct and apply more-or-less functional fallbacks if EAs can't be used (this has to be easy for devels, perhaps a one-liner in C) 2) File managers start tagging files with the file type they determine for each file on the first visit to its directory. From that point on, file managers never again use the file contents extension to ascertain the file type, but instead directly use the EA mimetype entry. For those who investigated about WinFS, this is akin to "promoting" files into WinFS. This has advantages: 1) OpenOffice documents won't ever be detected again as Zip files =) 2) Using EAs as file type stores is much faster than sniffing each file (I concede extensions are even faster, but since they are BAD, they are disqualified). 3) New things become possible (file notes? icons? etcetera... it is only a matter to standardize) 4) Downloading a file from the Internet isn't a problem either (the MIME type is transferred with the most frequently used protocols, and the file managers can "promote" the status of the file whenever they "see" it for the first time). For 2005: * Faster Linux file managers * More accurate file types (since every app writes file types along with files, there's much more certainty) * New features (per-file icons, notes on files, ACLs) * Medusa-like/Storage-like search services, that take advantage of EAs (instead of sniffing files for metadata, the EA store could become the primary source for it, and when files are moved around or sent around the internet, metadata is reintegrated back into the file whenever possible, such as MP3 files) * Linux leading the pack I see this plan as the way to stop importing idiocy and start exporting innovations. > > > Once you've determined the file type and stored it in an EA, subsequent > > reads would be faster than sniffing the files, for all the > > aforementioned reasons. -- Manuel Amador (Rudd-O) GPG key ID: 0xC1033CAD at keyserver.net
Attachment:
signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente