Re: EggRegex

From: Marco Barisione <barisione gmail com>
To: gtk-devel-list gnome org
Subject: Re: EggRegex
Date: Thu, 20 Jul 2006 11:57:16 +0200

Matthias Clasen wrote:

When I was last looking at regular expressions for GLib (which
resulted in the current eggregex code), the first decision was to
go for Perl regular expression, rather than posix. That naturally
leads to PCRE. The main gripe with PCRE was (and is) that it
had (and probably still has) relatively limited Unicode support.

The version of eggregex in libegg uses the three years old pcre 4.5. Nowpcre 6.7 has a better support for Unicode.


Now PCRE:
- handles UTF-8
- knows that, doing a caseless match, à matches À

- has generic character types for non ASCII characters, so \p{Lt}matches a title case letter, \p{Sc} matches a currency symbol, and so on


Extended properties such as "Greek" or "InMusicalSymbols" are not supported.

And it brings its own implementation of the necessary Unicode
data, instead of using the GLib one.

Yes, but it shouldn't be too difficult to port pcre to use glib forUnicode. I can't do it because my knowledge of Unicode is very limited.

However this would mean that we should always use the internal PCREinstead of the system supplied one.


--
Marco Barisione
http://www.barisione.org/

Follow-Ups:
- Re: EggRegex
  - From: Owen Taylor

References:
- EggRegex
  - From: Marco Barisione
- Re: EggRegex
  - From: Hubert Figuiere
- Re: EggRegex
  - From: Behdad Esfahbod
- Re: EggRegex
  - From: Matthias Clasen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]