Re: GRegex



Il giorno mar, 24/10/2006 alle 13.17 -0400, Dominic Lachowicz ha
scritto:
> 1) Please don't name variables 'string', as there may be a conflict
> with C++'s std::string

I think they were called "string" in the original version of GRegex
written by Scott Wimer in 1999. PCRE calls the string "subject".

However it's not a problem with C++, this program is valid:
#include <string>
#include <iostream>

using namespace std;

int main ()
{
  string string = "hello";
  cout << string << endl;
}

> 2) I noticed that there are g_regex_ref/unref() methods. Why did you
> choose to do this, rather than subclass GObject? You would also then
> have easy GObject-style accessors for the regex's "pattern" and
> "match_options".

The original plan was to include directly GRegex in GLib, so it cannot
depend on GObject. This could be changed if we decide to include GRegex
in a separate library.

However is really necessary to have a real object?

I added _ref and _unref because the only two programs that are currently
using my modified version of EggRegex are GtkSourceView and MooEdit.Both
programs need reference counting for regular expressions.

In Glib there are other structures that are reference counted without
being objects, such as GHashTable, GAsyncQueue, GIOChannel and others.

> 3) Should there be a "GRegexMatch" object too? For instance, at least
> Python and Java have a notion of a read-only "Pattern" and a "Match
> Set". Your design combines the two into a single GRegex object. Having
> the pattern be read-only gets around your thread-safety "gotcha"
> comment in the docs.

I know this but using them in a language with garbage collector is
easier. The regex class in QT uses the same approach of GRegex.

> 4) Python's search() and match() methods have a "start position" and
> an "end position" argument, while your match_full() has only a "start
> position" argument. Is there a reason for this? Could it be
> implemented?

It has a length argument.

> 5) I didn't fully investigate, but Java and Python have a concept of
> "search vs. match" with slightly different semantics. Is this semantic
> distinction easily expressible in your API?
> 
> http://docs.python.org/lib/re-objects.html

In Python match matches only at the start of the string, search at any
position. You can have the match behavior adding a "^" at the beginning
of the string or passing the compile option G_REGEX_ANCHORED or the
match option G_REGEX_MATCH_ANCHORED.

I prefer to have only a function as I always this distinction in Python
a bit confusing.

> 6) GRegex requires that PCRE be built with UTF-8 support, which some
> existing installations aren't. For reference, Gnumeric and Goffice get
> around this by including a copy of PCRE in their distribution and
> statically link it in. How do you ensure that GRegex finds a version
> of PCRE compiled with UTF-8 support?

The default for GRegex is to use its internal copy of PCRE. This is
automatically patched to use GLib for Unicode and memory management.

If you prefer you can pass --enable-system-pcre to use the
system-supplied library but, if it's compiled without utf-8 support,
g_regex_new fails.


-- 
Marco Barisione
http://www.barisione.org/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]