Re: GtkEtext final design [really OT]



From: Christopher Kohnert <cjkohner@brain.uccs.edu>
> Emmanuel DELOGET wrote:
> > 
> > From: Tim janik <timk@gtk.org>
> > > the main reason i didn't make those configurable in the first place in
> > > GScanner, was that there are issues on whether and how you allow nesting
> > > of multi line comments. so for the time being i made them like C comments.
> > 
> >     I don't think it's up to the scanner to say wether some lines
> >     are a multiple lined comment or not - it's more a parser task
> >     [so you should not have to deal with this issue in gscanner]
> > 
> [snip]
> > 
> >     A scanner is just a tokenizer - ie it reads an input from
> >     a buffer (possibly the keyboard buffer if you do interactive
> >     scanning), brake the entry into words and tries to match the
> >     word to a largeur group of tokens.
> > 
> [large snip]
> 
> Erm, a scanner is all relative.  Often things that theoretically should
> go into a parser are put into a lexer for convenience.  As a lexer only
> identifies regular grammars, you need to add things to it to identify
> non-regular strings.  Such as the capability for a nested comment by
> adding a single integer to the scanner.  Symbols are often added at lex
> time not parse time for convenience as well, though symbols apply more
> to a compiler than to generic scanning.  There is not such a distinction
> that you draw, and often things that are theoretically supposed to go
> into a parser are much more suited to go into the lexer both for speed
> and convenience.

    Symbols are available as tokens. For example, the 'if' symbol
    can be represented just as a keyword - or you'll have to deals
    with a lot of problems in your parser implementation. Symbol
    determination is really not a parser issue - but their correct
    use is.

> And as far as some of your examples of multi-line tokens... there's no
> reason you couldn't identify a multi-line #define (as it still matches a
> regular expression).

    [yes... figured out that '\\\n' is still acceptable for a scanner :)]

> 
> I think you're being a bit to formal as to what goes into a scanner, and
> if you do that, it becomes significantly less powerful.
>
> Christopher

    When you try to explain the difference between two different
    stuff, it is a bad idea to begin your speach with 'they are
    not the same, but they can do the same thing'. Of course,
    a LR(1) grammar is equivalent to a set of regular expression
    (this mean that you do not need a parser to parse such a
    grammar) but LR(1) grammer are not as useful as LALR(1)'s.

    I did fairly interesting stuff in the past for a school
    project (reimplementing a yacc-like tool without any use
    of global and/or static vars - not very difficult but very
    interesting task) and therefore I know what a scanner can
    do. But I choose to described under other terms : what a
    scanner should actually do :)

    Moreover, if you want to have a full, reusable scanner code
    I think it's better to not deal with the parser 'reserved'
    areas. Actually, a scanner does not need to know what are
    nested comments. He should know what are these '/*' and
    '*/' words but he really don't care their meaning.
    Lexical analyse is not syntaxic analyse. The lexical
    pass just deals with the 'word A is correctly spelled'
    issue, while syntaxic deals with 'word A is at the correct
    place in the sentence'. 

    That's how I learned English (well, my personnal scanner 
    and parsers still have some bugs and memory leaks :)

    Now, a word about the gscanner comments feature: 
    it is clear that a scanner which knows what a comment 
    is provides implementation facilities from a parser point 
    of view. But there is another way to do it : using a 
    tool like cpp does the trick - and if the only goal of
    such a tool is to get rid of comments in the
    source, it's a trivial routine to write (even if
    you want to allow multiple lined nested comments).

    Yours,

    Emmanuel




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]