Re: Performance implications of GRegex structure

From: Owen Taylor <otaylor redhat com>
To: Marco Barisione <marco www barisione org>
Cc: GTK Devel <gtk-devel-list gnome org>
Subject: Re: Performance implications of GRegex structure
Date: Sat, 17 Mar 2007 11:08:31 -0400

On Fri, 2007-03-16 at 21:30 +0100, Marco Barisione wrote:
> Il giorno gio, 15/03/2007 alle 10.18 -0400, Owen Taylor ha scritto:
> > But looking over the header file, there is something that puzzles me
> > about the way that it's set up: there is no distinction between a
> > "pattern/regular expression" object and a match/matcher object.
> 
> The internal code in GRegex was deeply modified but the API is quite
> similar to the original one written by Scott Wimer and then modified by
> Matthias Clasen, so I kept a single GRegex object but with lots of
> doubts.
> 
> In the end I decided to keep a single object because I prefer this
> approach when using languages without a garbage collector and because
> QRegExp (the equivalent object in QT) is a single object.
> 
> This matter was brought out in the mailing list and in bugzilla but only
> Havoc Pennington and Yevgen Muntyan expressed their opinion saying that
> they prefer a single object.

I apologize for not speaking up on the bugzilla bug. I must admit that
though I saw the discussion, I didn't really pay a lot of attention
until the header file appeared in CVS.

I certainly appreciate the arguments for convenience in C; it's a valid
concern. But I don't think we should let convenience be the overriding
factor over everything else; after all, the user *is* writing in C,
so convenience almost certainly wasn't utmost on their mind ;-)

If we can identify the most common patterns of usage, I think we can
add convenience functions that make usage of an immutable pattern object
almost as convenient as the current GRegex.

You can have functions like:

 if (g_regex_matches(regex, str, -1, 0)) 
    ...

 if (g_regex_get_matches(regex, str, -1, 0,
                         0, &whole_match,
                         1, &first_substring,
                         -1)
    ...

 if (g_regex_get_named_matches(regex, str, -1, 0,
                               "firstName", &first_name,
                               "lastName", &last_name,
                               NULL)
    ...

The first two cover 98% of all cases when I've ever used a regular
expression ... I either want a boolean match / doesn't match, or I
want to match against a pattern, and if succeeds, do something with
several substrings.

That to me, would relegate the matcher object to cases where the
annoyance of an extra object is small compared to the complexity of
the operation.

You could also take the above functions and have the same thing for:

 - Strings (like the current _simple() convenience functions)
 - Something like my GStaticRegex proposal

As always, the question about convenience functions is "where do you
stop?"...

						- Owen

Follow-Ups:
- Re: Performance implications of GRegex structure
  - From: Yevgen Muntyan

References:
- Performance implications of GRegex structure
  - From: Owen Taylor
- Re: Performance implications of GRegex structure
  - From: Marco Barisione

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]