What needs doing in the long run



The table layout and database aspects of Storage have been our focus
lately, presumably because they are most relevant to Sutra (which is a
more immediate goal). However, Marco has pointed out that both from a
self-motivation and PR standpoint it is a good idea to provide a
working, demo-able "desktop storage" at the same time. 

So with that in mind, I'm drafting a broad list (with commentary) of the
various tasks and activities that need doing in the long run for
Storage. These are at a wide range of granularity. The short story is
that there are a lot of interesting and hard problems remaining, many of
which could really shake up the specifics of the store to get right. So
we should get stuff ready for Sutra, but at the same time we shouldn't
be thinking what we use for Sutra will be right long term.

1) Multi-user support
There's a lot of aspects to this, and a lot of ways to approach it.
Sub-issues include permissions, authentication (see #9), using something
like ACLs. I'm pretty sure we need a system daemon to make this work,
but we could either have a postgresql server that's protected by views
installed as root and allow different functions, OR we could have our
own daemon that sits between postgres and libstorage. Long term our own
daemon would allow us to go beyond postgres' ideas of accounts and to
use other naming services and authentication systems. It *might* add
some latency? OTOH, we could possibly optimize query results in there to
be well suited to libstoreage Item objects. A storage daemon could also
manage the store, cleaning out dead/invalid nodes periodically,
optimizing table layout: whatever automated tasks need to be done.

2) Port NL parser to use MRS (minimal recursion semantics) directly
I'm working on this slowly but surely. Its a very hard problem,
especially because MRS doesn't have enough descriptive papers written
about it. Looks like a new one is coming out "soon" that might be really
helpful.

3) Figure out how to provide accurate "file sizes" 
Is this needed? Still, we should be able to give people some idea how
much of their precious disk resources are consumed by a big file. Easy
to do with blobs, very hard to do with things broken apart into nodes.
:-/

4) Get notification feedback working 
Using NOTIFY and LISTEN for Postgresql, unfortunately tying us to
Postgresql for now, but c'est la vie. A storage daemon, as proposed in
(1), could potentially handle notification tasks for SQL servers that
don't support it. It could look at requests to modify nodes and send
notification events,  handle registering listeners, etc. There's no
reason this has to be done in the SQL server itself.

5) Design an API, write code for allowing multiple processes to work
simultaneously on a single node

6) Figure out how we are going to do revisions 
Probably the hardest remaining issue that we have not investigated at
all. It might be wise to find somebody with lots of experience in this
area, because I know many of the best algorithms are very sophisticated
and hard to implement. *Doing this well could radically alter the table
format*

7) Increase the size of the semantic grammar about 20x

8) Write some sample applications using libstorage's various features.
The ones that come to mind right now are a programming editor (that
allows people to work simultaneously on code like Hydra, and store's
stuff in a nice searchable object format), a mail reader (so we can show
off the e-mail store capabilities and have something to do when people
double click: this might take the form of an Evolution plugin not a new
mail reader), and maybe a simple rich-text editor (ultra-simple word
processor basically, proving that the base concepts are sound for word
processing). Since Storage already has python bindings, these should be
fairly fast to develop using Pygtk when the appropriate features are
implemented in libstorage itself.

9) Figure out how Storage should integrate with naming and
authentication services.
HTTP authentication? LDAP? SSL certificates? etc. I think its hokey to
rely on people having logins on the postgres server long term!

10) Implement a better caching and pre-fetching model in libstorage
This could improve performance of storage items by an order of
magnitude.

11) Provide GErrors for all the libstorage functions, and start
implementing much better error returning and handling

12) Automate setting up of stores
This could be done partially with a Storage daemon, as proposed in (1).

13) Write StorageStore
This shouldn't be too bad

14) Design an intermediate query language
Because depending on exact SQL syntax and certain table layouts is
probably a bad idea. Marco and I briefly discussed the option of using
XPath, which seems like it is at least interesting and would be a big
plus if we wanted this to be an "XML store". Another option would be to
write a query API that allowed building queries that were as complex as
our particular table system allowed, and then later build an XPath
interpreter on top of this. That'd probably be smarter.

15) Implement a storage daemon
Given the number of problems this could solve, this may end up being one
of the tasks. That means defining a protocol for libstorage to talk to,
handling compression of data for returning stuff, etc. But it also needs
to not slow things down a bunch.

-Seth




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]