Re: EggRegex



Matthias Clasen wrote:
When I was last looking at regular expressions for GLib (which
resulted in the current eggregex code), the first decision was to
go for Perl regular expression, rather than posix. That naturally
leads to PCRE. The main gripe with PCRE was (and is) that it
had (and probably still has) relatively limited Unicode support.

The version of eggregex in libegg uses the three years old pcre 4.5. Now pcre 6.7 has a better support for Unicode.

Now PCRE:
- handles UTF-8
- knows that, doing a caseless match, à matches À
- has generic character types for non ASCII characters, so \p{Lt} matches a title case letter, \p{Sc} matches a currency symbol, and so on

Extended properties such as "Greek" or "InMusicalSymbols" are not supported.

And it brings its own implementation of the necessary Unicode
data, instead of using the GLib one.

Yes, but it shouldn't be too difficult to port pcre to use glib for Unicode. I can't do it because my knowledge of Unicode is very limited.

However this would mean that we should always use the internal PCRE instead of the system supplied one.

--
Marco Barisione
http://www.barisione.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]