Re: PROP: use of Perl Compatible Regular Expressions (long, sorry)



On Tue, 17 July 10:14 Albrecht Dreß wrote:

I'll give the patch a run later today!

> Problems:
> =========
> 
> I made several tests with this lib on Linux/Intel and Linux/PowerPC and
> could not see problems yet. The performance is not very different from the
> posix stuff.

In my experience, PCRE performs about the same as GNU libc's regex.
It is *much* faster than the Henry Spencer RE code.

> Some caution is necessary as pcre can return empty matches.
> E.g. rewriting the expression above to "\\b[[:alpha:]']*\\b" may return an
> empty string (which is correct). The simple solution is to use
> "\\b[[:alpha:]']+\\b" instead.

Can't beat writing the correct RE :-)

> Note that PCRE do not resolve the problems with detecting the national
> characters ("Umlauts") in the [:alpha:] class as it relies on libc.

I wrote a simple test program as follows.

#include <locale.h>
#include <ctype.h>


int
main (int argc, char **argv)
{
  int c;

  if (argc > 1)
    setlocale (LC_CTYPE, argv[1]);

  printf ("[[:alpha:]]\n");
  for (c = 0; c < 256; c++)
    if (isalpha (c))
      putchar (c);
  putchar ('\n');
  printf ("[[:alnum:]]\n");
  for (c = 0; c < 256; c++)
    if (isalnum (c))
      putchar (c);
  putchar ('\n');
  printf ("[[:upper:]]\n");
  for (c = 0; c < 256; c++)
    if (isupper (c))
      putchar (c);
  putchar ('\n');
  printf ("[[:lower:]]\n");
  for (c = 0; c < 256; c++)
    if (islower (c))
      putchar (c);
  putchar ('\n');
}

Running this a few times gave the following output

1025 $ ./locale 
[[:alpha:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
[[:alnum:]]
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
[[:upper:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[[:lower:]]
abcdefghijklmnopqrstuvwxyz
1026 $ ./locale en_GB
[[:alpha:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz­ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
[[:alnum:]]
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz­ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
[[:upper:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
[[:lower:]]
abcdefghijklmnopqrstuvwxyzßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
1027 $ ./locale C    
[[:alpha:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
[[:alnum:]]
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
[[:upper:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[[:lower:]]
abcdefghijklmnopqrstuvwxyz
1029 $ ./locale de_DE
[[:alpha:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz­ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
[[:alnum:]]
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz­ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
[[:upper:]]
ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
[[:lower:]]
abcdefghijklmnopqrstuvwxyzßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ


Given the above, I'd suspect the problem lies in Blasa's use of setlocale()
and not with GNU libc.

Brian Stafford




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]