Re: [gtk-list] Re: A description format for Gtk features



Marius Vollmer <mvo@zagadka.ping.de> writes:

> > Funny you should ask.  I just wrote a *very* small parser last night
> > in perl for handling a simple scheme-ish syntax (since perl's native
> > support for data files with any heirarchical structure is not so
> > good).
> 
> That is nice to hear.  I think we can expect every developer to have
> Perl installed (for automake, etc).

Here's a rough hack.  This is actually just a sample program that
implements a *really* simple read-eval-print loop for a simple scheme
syntax (actually it reads the whole file, then parses and prints the
result).

It accepts as input a sequence of forms where a form is:

  FORM   := LIST | SCALAR
  LIST   := '(' FORM* ')'
  SCALAR := NUMBER | STRING

Numbers are either floats or integers (i.e. 45 or 23.33) and strings
are double quoted and may contain a double quote via \" and a
backslash via \\.  

Each call to read_item returns the next form in the input (string
reference), and chops off the front of the string, up to the next
form.  The incoming data is transformed to perl data structures in the
following way:

  LIST => perl reference to list
  SCALAR => perl scalar

You can use the perl "ref" operator to determine what you got back,
but you'll have to use "=~ /\d+(\.\d+)?/o" or something similar to
distinguish numbers and strings since perl doesn't (yuck).

Two notes:

1) It's possible to argue in favor of having everything converted to a
perl reference.  The advantage is that if you get a bunch of really
long strings as input that aren't inside a list, you'll get each one
back as a reference -- avoiding at least one excess copy.  The
disadvantage is that storage space increases for all scalars returned.
I think the latter concern probably outweighs the former in the normal
case, but it's a trivial change to the code to accommodate either.

2) In some cases, it may make more sense to make the returned data
more complex.  Right now, as I mentioned above, you can't easily tell
what was originally a string and what was a number, and it's not clear
how you can extend to handle things like #f #t, later if desired.  One
solution is to use a more complex representatio on the perl side so
that something like this

  (1 2 3 4 ("test" #f 5))

translates to 

  [('integer', 1), ('integer', 2), ('integer', 3), ('integer', 4),
   [ ('string', "test"), ('boolean', 0), ('integer', 5)]]

on the perl side.  This would be unambiguous, but somewhat more
difficult to deal with.  In situations where this complexity is
needed, I might recommend using global vars rather than 'integer',
'boolean', etc to identify the types.  It would be much more
efficient.  Something like 

  my $type_integer = 0;
  my $type_string = 1;
  my $type_boolean = 2;

So that you get 

  [(0, 1), (0, 2), (0, 3), (0, 4), [ (1, "test"), (2, 0), (0, 5)]]

internally, and can do things like "if($type == $type_integer)" in
your code.

#!/usr/bin/perl -w
# Copyright 1997 Rob Browning <rlb@cs.utexas.edu>
# This code is GPLed.  Knock yourself out.

use strict;
use English;
use IO;

# Slurp in the whole file and make it a reference to a scalar so it
# doesn't get copied all over the place.
undef $RS;
my $tmp_input = <>;
my $input = \$tmp_input;

sub eat_ws {
  my($input_ref) = @_;
  $$input_ref =~ s/^\s*//mo;
}

sub get_number {
  my($input_ref) = @_;
  $$input_ref =~ s/^((\d+)(\.\d+)?)//o or die "Couldn't parse number";
  return $1;
}

sub get_string {
  # Strings must be enclosed with double quotes, and may contain
  # a double quote via \" and a backslash via \\.

  my($input_ref) = @_;    
  my $done = 0;
  my $result = "";
  $$input_ref =~ s/^\"//mo  or die "Couldn't parse string";
  while(!$done) {
    $$input_ref =~ s/^([^\"\\]*)//mo;
    $result .= $1;
    if($$input_ref =~ s/^\"//o) {
      $done = 1;
    } elsif($$input_ref =~ s/^\\\\//o) {
      $result .= '\\';
    } elsif($$input_ref =~ s/^\\\"//o) {
      $result .= '"';
    } else {
      die "Unknown backslash escape in string $$input_ref:$result.";
    }
  }
  return $result;
}

sub next_lexical_item {
  my($input) = @_;
  eat_ws($input);
  $$input or return('','');
  
  if($$input =~ s/^\(//o) {
    return('(', '(');
  } elsif($$input =~ s/^\)//o) {
    return(')', ')');
  } elsif($$input =~ /^\d/o) {
    my $value = get_number($input);
    return("number", $value);
  } elsif($$input =~ /^\"/o) {
    my $value = get_string($input);
    return("string", $value);
  } else {
    die "Unknown syntax [" . $$input . "]";
  }
}

sub read_list {
  my($input) = @_;
  my $the_list = [];
  my $item;
  
  while(1) {
    my ($type, $value) = next_lexical_item($input);
    if(!$type) { 
      return 0; 
    } elsif($type eq '(') {
      $item = read_list($input);
    } elsif ($type eq ')') {
      return $the_list;
    } else { 
      $item = $value;
    }
    push @$the_list, $item;
  }
}

sub read_item {
  my($input) = @_;
  my $result;
  
  my ($type, $value) = next_lexical_item($input);
  if(!$type) { 
    return 0; 
  } elsif($type eq '(') {
    $result = read_list($input);
  } elsif ($type eq ')') {
    die "Found close paren outside list context.";
  } else { 
    $result = $value;
  }
  return $result;
}


sub print_item {
  # Just a hack for testing...
  my ($item) = @_;
  my $type = ref($item);
  
  if(!$type) {
    print $item . "\n";
  } elsif($type eq 'ARRAY') {
    my $sub_item;
    print "(";
    foreach $sub_item (@$item) {
      print " "; 
      print_item($sub_item);      
    }
    print ")\n";
  } else {
    die "Unknown item $item of type $type in argument to print_item.";
  }
}

while($$input) {
  my $item = read_item($input);
  if($item) {
    print_item($item);
  }
}

-- 
Rob



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]