Re: EggRegex



Marco Barisione wrote:
Can someone take a look to pcre/ucptable.c, pcre/ucp.h and pcre/pcre_ucp_searchfuncs.c?

Now the internal PCRE uses glib for Unicode properties.

There is a problem, PCRE allows script names in \p{}, so you can match an arabic character using \p{Arabic}. But AFAIK glib does not know about scripts.

gucharmap handles this internally but I can't copy the code because, as far as I know, it's under GPL and not LGPL.


However I think that the better solution is to add this directly to glib:

typedef enum
{
  G_UNICODE_SCRIPT_ARABIC,
  G_UNICODE_SCRIPT_ARMENIAN,
  ...
  G_UNICODE_SCRIPT_UGARITIC
} GUnicodeScript;

/* returns the script of c */
GUnicodeScript g_unichar_get_script(gunichar c);

/* returns the (translated?) name of the script */
const gchar *g_unichar_get_script_name(GUnicodeScript script);


--
Marco Barisione
http://www.barisione.org/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]