[Gnome-devtools] Parsing/Language Analysis



Hello,
  I've been doing a bit of research lately on parsing and language
analysis. I've agreed with most the stuff I read, so first off here are
the references I've looked at:

[1] http://www.cs.berkeley.edu/~harmonia/overview.html - read (or skim)
this first
[2] http://www.cs.berkeley.edu/~twagner/TR.ps.Z - only if your
interested in the hardcore algorithms

Background:
--------------

To summarize, the basic framework of the Harmonia project looks like the
following:

http://www.cs.berkeley.edu/~harmonia/arch.gif

It comes in three parts:

The "Language Kernel" handles the following:
Structure of the document - "Syntax Tree Abstraction"
Incremental Parsing
Incremental Lexing
Incremental Attribution Support - Not sure exactly what this does
Fine-grain version control
Tree Pattern Matching

All the services above are written in a very general way, so multiple
languages can be supported. 

The "Language Modules":
The actual configuration of the language kernel is done via dynamic
libraries. These dynamic libraries are appropriately called "Language
Modules". The language modules contain the batch lexer, parse tables,
and other language specific services and data.

And lastly the user applications:
gIDE, gtkeditor, ect...

Since the framework is very general, a large amount of language based
tools can be built around it. Obvious things would be a editor, class
browser, ect. Although other more advanced tools are possible as well.

Implementation:
-----------------

Following a somewhat strict implementation of the Harmonia framework, I
think the Gtk Object system is well suited to the task.

Language Kernel:
Data:
Would most likely be GtkObject derived.
Would contain the document text in the syntax tree.

Methods:
I imagine there would be methods for inserting and deleting text, as
well as getting/setting information at the nodes of the syntax tree.
Also if versioning was decided to be supported, there would be methods
for undoing operations, or setting the tree to a particular version.

Algorithms:
See [2] for a description of the incremental algorithms employed.

Signals:
Signals would be emitted primarily from changes in the syntax tree, such
as a new node being created, or a node being deleted, or a node being
changed.

Language Modules:
These would be implemented using a few different techniques. First I
think some kind of dynamic object system might be required, for any type
of language inheritance. The parse table would most likely be generated
by bison, and the lexer would be generated by flex. Other custom code
would be allowed in the language modules.

I suppose the Language Modules would also need a decided upon interface,
as well as some standard way of specifying them/building them.

User Applications:
User applications would most likely use the language kernel in the
following manner:

1. create an instance of the language kernel.
2. load in a language specific module
3. connect to the appropriate signals
4. Handle signals

Example:
In gtkeditor we want an implementation of syntax highlighting. So first
we would do 1 and 2. In 3 we want to know when a new node has been
created, when it is deleted, and when it has changed, so we connect to
the appropriate signals. If versioning were implemented we could simply
look at the previous version of the document to know which parts to
highlight/unhighlight. 

Let me know of any comments you have. I think this a good outline, but
there is a lot of details to fill in.

Mark




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]