Re: Clickable URL's again... new patch



On 2001.06.13 22:32:24 +0100 Jamie Webb wrote:
> On 2001.06.13 11:46:47 -0400 Brian Stafford wrote:
> > On Wed, 13 June 21:31 Jamie Webb wrote:
> > | Shouldn't URLs be detected by a configurable regex, like quoted text?
> > 
> > The syntax for URLs is fixed by RFC 2396.  Assuming the parser is
> > correct, there will be no need to configure it.
> 
> Ah, but you are assuming that we only ever want to match vanilla URLs. I
> might want to match, e.g. email addresses:
> 
> [^\s]+@[^\s]+
> 
> Or ICQ numbers:
> 
> \b(ICQ|icq)[#:\s]*[0-9]+
> 
> ..but maybe both those regexes are too primitive and I think of a reason
> to
> change them later. Plus, as you point out yourself, not all things we
> consider to be URLs are correct according to the RFC.

That's almost but not quite what I said.   There are certain strings
that are not URLs but which, having been matched, can be processed
similarly, e.g. regarding email addresses as mailto: URLs or embellishing
domain names starting with www. into http: URLs.

My real point is that the generic URL syntax is quite difficult to write
a correct RE for, so having done so and validated it against the
standard, users should not be allowed to change it.  My guess is that
by the time the URL pattern is debugged, nobody will understand how
it works any more :-)

> > | Maybe a list of regexes, with different target apps, e.g. browser,
> > | ftp, balsa compose message.
> > 
> > No, a list of REs is worse still, matching them is pretty slow.
> 
> I have to say, that Emacs seems to do pretty well colorising much longer
> texts than the average email, using regexes, by doing it once the text is
> already displayed.

I mentioned this because there were complaints on the list recently
that the REs were slowing things.  My own experience of various RE
implementations is that they vary widely in quality and speed.
In emacs's case, the GNU regex implementation it uses does seem to be one
of the faster ones.

Anyhow, IMO there should be a URL for the generic syntax + a few extra
simpler REs for special cases.  I took your suggestion to mean one RE per
URL scheme.

Regards,
Brian






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]