The great "kill-scrollkeeper-with-a-blunt-spoon" Proposal



Hi,

Since feature freeze is now upon us, I figured, what better time to
propose something new?

For the last little while, I have been musing on how to kill
scrollkeeper.  I don't know if Shaun has a plan for this or not, but I
got bored enough to write all my thoughts down into a cohesive document.
I've attached this below.  Its fairly long to read through, so I'll
summarise here:
The basic plan would be to create a new package that provided library
access to the list of documents in a system.

This package would also provide a number of utility programs to register
new docs etc. (like scrollkeeper does now).  The internals of the
package would be substantially different to scrollkeeper (read: easier
to follow / understand), but it would be compatible with current
scrollkeeper, allowing easy replacement and a gentle upgrade path.  It
would also provide a migration tool to move from scrollkeeper to the new
system and possibly an emulation mode so it could be used as a drop-in
replacement for scrollkeeper.

Questions, comments, flames all welcome.
Thanks
Don

P.S. Shaun, if you have a plan for this, please let us know, so we don't
waste too much time pondering this proposal ;)  Cheers
The "Kill-scrollkeeper-with-a-blunt-spoon" Proposal
---------------------------------------------------

This is a plan I've been musing about for a little while to replace the ageing scrollkeeper system.  Scrollkeeper is used by Yelp to store the list of registered documents.  It has a number of problems (section 1).  It also appears to be used by KDE in KHelpCenter (although, I'm not sure of the extent of this use)

1. The Problems
---------------

* Scrollkeeper is a pain
* Scrollkeeper isn't (even passively) maintained
* It is painfully slow to install new docs
* It is a pest to use, as you have to go through the command line
* If I see one more bug appear in bugzilla caused by scrollkeeper crashing, I will scream

1.1. Restarting the scrollkeeper project
----------------------------------------

Possibly the "simplest" way around the problem is to fork scrollkeeper and restart the project.  This would allow us to fix at least some of the problems outlined above and save having to change any of the client programs.  Due to the structure of the scrollkeeper, a lot of problems will still remain - dumping the content to a tmp file and having clients parse it themselves, having to open the omf files themselves.  The code is really quite complex as well.  Have a look through the source tree - the source files are crazy to navigate.  To learn how it works, someone would have to spend a fair amount of time going through it.  There would also be some fairly major architectural changes required.  This time, I feel, would be better spent writing a new package.

2. The Solution
---------------

Kill scrollkeeper.  Replace it with something better, faster, stronger.  Here, I'll call it Spoon (as it'll kill scrollkeeper).  It almost certainly wouldn't be called that if and when its created - any suggestion for a name considered.  This section will look at what would be required in the scrollkeeper replacement.

2.1. Required of Spoon
----------------------

The absolutely fundamental requirements of Spoon that absolutely must be implemented to replace scrollkeeper:
* Able to register documents
* Able to cope with different doc types and languages
* Able to provide information to interested apps
* Provide an upgrade-path from scrollkeeper

2.2. Desired Features
---------------------

Things that would be very, very nice to have.

* Platform- and desktop-agnostic [1]
* Cope with both KDE and GNOME style document loctions [2]

[1] Ideally, this would allow all the desktop environments to use it, creating peace and harmony throughout the world.  In reality, if we can get GNOME and KDE using it, it would be a massive boost.
[2] Unless we can get Shaun and his counterpart in KDE to sit down and agree on joint scheme without anyone losing any teeth.

3. Implementation
-----------------

As the package is intended to be agnostic, it rules out the use of glib, causing no-end of hassle.

Spoon will be split into a number of utils.  These are outlined here:

3.1. The Library
----------------

The main thrust of Spoon is the library, libspoon.  This is a real, living library, complete with a pkg-config file and everything.  It provides access to all saved information.  Described better below.

3.2. spoon-register
-------------------

Similar to scrollkeeper-install, scrollkeeper-uninstall and possibly other tools.
spoon-register <file>
spoon-register -u <file> (to uninstall)
Takes in an omf* and either installs it into the doc register (checking its not already installed), or uninstalls it (if it is installed already and -u is used).  If it is requested to install and the file already exists, 

3.3. spoon-migrate
------------------

Migration tool that would be run on installation.  It looks for the scrollkeeper index and extracts all the info to populate the new spoon-db

3.4. spoon-rebuilddb
--------------------

Similar functionality to scrollkeeper-rebuilddb.  Rebuilds the internal database from the omf* files.  Useful if (for some reason) the database gets corrupted.  Possible to fold it into spoon-register?

4. omf* files
-------------

To provide extra functionality but maintain backwards compatibility, the omf file used by scrollkeeper may have to be extended slightly.

Currently, there are the following entries in a scrollkeeper omf file:

Name            Required?       Purpose
<Standard Entries>
creator 	    mandatory       Who wrote the doc originally
maintainer 	optional        Who currently looks after the doc
contributor 	optional        People who add to the doc
title 	        mandatory       Document title
date 	        mandatory       Date (of last edit?)
version 	    optional        Version of document
subject 	    optional        Stored which category the doc belongs to
description 	optional        The blurb that gets put under the title on the TOC
type 	        optional        The of document (User's Guide, Manual etc.)
format 	    optional        Stores the dtd
identifier 	    optional        Stores the url location of the doc
source 	        optional        Heritage of document (?)
language 	    optional        Which language the doc is in
relation 	    optional        Stores the seriesid
coverage 	    optional        ?
rights 	        optional        What license the doc is distributed under
<sk-specific>
seriesid (attribute in relation)
Category (attribute in subject)
Language Code (attribute in language)
URL (attribute in identifier)
Mime Type (attribute in format)

In addition to these, I propose adding the following:
bugzilla-product     }  as optional attributes to identifier
bugzilla-component   }

This will allow Shaun's snazzy "Suggest an improvement" malarky to Yelp and may provide use to other people.

I'm also open to further additions.  I'm sure there are a few more useful attributes that could be added, if people put their minds to it.

5. Registering and Storing files
--------------------------------

This is where we start diverging from scrollkeeper.  When a new document is registered, the omf* file is copied into the omf* storage area (as we may need it later to rebuild the db)

The main files are stored in <prefix>/share/Spoon/.  Each language has its own Spoon-db-<langcode>.xml file, which will contain all the data.  The registering process will open and read all the data from the omf* file and determine the language code.

If a Spoon-db-<lc>.xml file exists, it is opened and all the current entries are read in.  If no file exists, a new file will be created.  Now, all the current entries are checked to see if the seriesid matches any current doc.  If it does, the entry is replaced with the new contents.  If there is no match, the new entry is appended.  The resulting XML file can then be saved again.

The advantage of this is that no prior knowledge of any languages is needed.  If a new language is encountered, a new XML file will be produced.  This may cause problems if an incorrect language identifier is given, in which case, the XML file will be installed under the incorrect langcode.  This is better than scrollkeeper currently does (as it fails silently and without even acknowledging its failure), but may not be ideal.

Uninstalling can work in a very similar way.  The language code of the omf* file is determined and the relevant XML file is read in.  The seriesid is compared to all the docs.  If one is found that matches, the entry is removed and the result is saved back to disk.

5.1 Migration
-------------

The migration path calls for a migration tool that will be run on scrollkeeper files.  This tool would take an argument that is the directory where all the content lists reside (in <LANG_CODE> directories).  The tool would iterate through each directory, read in and parse the content-list.  From this, the omf file will be found for each manual (along with other bits-n-pieces?).  A list is constructed, full of SpoonEntries, with only the omf field filled in.  The program when iterates through the list, opening the omf file and filling in the structs values from the read values.  This list would then be saved to the appropriate XML save file.


6. The Library
--------------

Yes, a real library to interact with and everything!  As a first pass at an API, here we go:

void spoon_init (void)
void spoon_init (char * language)

These would initialise the library.  It would either determine the language code (from the $LANG / $LANGUAGE environment variable) or from the passed in language.

The library then opens the language XML file (if it exists) and reads the db into a list of structs.  It then reads in the 'C' XML file in the same way, ignoring any with the same seriesid as is already available.  This follows the current sk style where C is the fallback.

6.1. The Struct
---------------

struct SpoonEntry {
}

contains all the fields from the omf* file that could ever possibly be of interest.  If something isn't set, it defaults to NULL.

The "type" field may also be replaced by an enum (to save a tiny little amount of memory).  Other types may be replaced by appropriate variable types.

</end_6.1>

onward with the API:
SpoonEntry * get_next_struct ()
If it hasn't been called before, it sets an internal vble to point to the head of the list.  If it has been called before, it points to the next SpoonEntry in the list.  It then returns a copy of the struct pointed to.  The user is then responsible to free the struct when finished.  If its at the end of the list, it returns NULL

void for_each (function_to_call)

Goes through each struct in the list and calls the function given.  Basically, gives an alternative to the get_next_struct being called in a while loop.

void reset_list ()

Basically resets the pointer to the start of the list.

Other functions:

SpoonEntry * get_struct_by_name (char *name)
   "         get_struct_by_seriesid (char *id)

Searches the list for a struct with the name / seriesid.  Provides access to structs.  Partially fixes bug #333948 (or provides a way to solve it)

There will almost certainly be more functions added as requirements dictate, but these are the main ones for now that would allow Yelp to be set-up to run properly.

As Spoon is intended to be cross-platform / desktop, it can't take advantage of the coolness of glib.  Instead, it'll do things with a simple linked-list, with each list entry having a SpoonEntry * and a pointer to the next on the list.  This also has the advantage that it stops people accessing things they shouldn't (as opposed to having the next pointer directly in the SpoonEntry).

It also has a problem with XML processing.  This is needed to parse the XML save file, parse the omf* files and save the XML save files again.  Using libXML2 may be a possibility.  There is also the possibility of using tinyXML or another library that could be embedded in the source directory.  Would need to be investigated.

7. Migration path from sk
-------------------------

One of the goals of the project is to provide a smooth upgrade path from scrollkeeper to Spoon.  This is outlined in this section.

When a clean install is done, the install-hook checks for the existence of <prefix>/etc/scrollkeeper.conf.  If this exists, the omf directory is in the file.  Each omf file is migrated to the new location (or the path is reused as the omf dir?).  The directory <prefix>/var/scrollkeeper is then checked to ensure it exists and the Spoon-migrate script is called with the <prefix>/var/scrollkeeper as its argument.  This is described in section 3.3.

If either scrollkeeper files are missing, or there is a previous Spoon install (i.e. the files <prefix>/share/Spoon/*xml exist), the migrate isn't run.

It is also possible to change where it checks for scrollkeeper files at configure time (using the --with-sk-config= and --with-sk-cllocation= or similar).

The package will also supply wrapper scripts for scrollkeeper-install, scrollkeeper-uninstall and scrollkeeper-rebuilddb.  These would print a huge warning to the terminal saying "This script is obsolete, please move to $NEW_TOOL" and then run the equivalent Spoon command with the correct arguments.

This will allow people to slowly move the new framework and still keep things working.  There is also the potential to create a scrollkeeper-get-cl / scrollkeeper-get-content-list compatibility scripts that does as they currently do (by loading the appropriate XML save files and dumping them to a /tmp/content.1 file in a format that can be parsed similarly to current).

After a set amount of time, with constant reminders and bugs filed and moving Yelp etc to the new framework, the wrapper scripts will be disabled by default and the install-hook won't be run any more.

After another amount of time, the wrapper scripts would be removed from the package and the install-hook would be rewritten.  This would also allow the removal of the Spoon-migrate script.

8. Conclusions
--------------

So, this is the end of this proposal.  There were a series of requirements listed at the start of the document:
* Able to register documents
This is dealt with using Spoon-install, analogous to scrollkeeper.

* Able to cope with different doc types and languages
Since the architecture is based around the omf files of scrollkeeper, the doc type requirement is automatically handled.  The languages is coped with due to not needing to rely on translations.

* Able to provide information to interested apps
The library is small and self-contained.  It will allow access to anyone that's interested.  It can't be changed from the library though.  The library provides read-only access to the database.  This is intentional, to save random people writing nasty things to it, and also to save having to be run / installed as root.

* Provide an upgrade-path from scrollkeeper
Section 7 deals with this.

Things that would be very, very nice to have.

* Platform- and desktop-agnostic
Since the library (and all the tools) would be written in vanilla C, there shouldn't be any problems with access.

* Cope with both KDE and GNOME style document loctions
Implicitly handled by the omf files.  Shouldn't need to worry about this.

Each requirement is met in this.  It should also hopefully fix a few of scrollkeeper's flaws.  It provides library access instead of going through the command line.  It will also remove any "Off-by-one" errors that occur.  If any sneak in, the person responsible will be subject to a lynching.  Scrollkeeper has (potentially) far too many of them.  Hopefully, it will be made quicker than scrollkeeper (for registering docs at least).  The library access also has the absolutely huge bonus that if a crash / invalid free does occur, a stack-trace can be obtained easily, which will actually point to where the problem is.

The End.
















[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]