Re: Prop: make URL's clickable



On 2001.05.15 19:20:26 +0100 Albrecht Dreß wrote:
> Am 14.05.2001 11:15:08 schrieb(en) Carlos Morgado:
> [snipped a long discussion ;-))]
> > you just said it's invalid. foo.bar.some)host is invalid for an host name.
> > 
> > > I think that http://foo.bar.host/file)name may be valid, though.  I'd
> > > need to check the RFC on URI-encoding.
> > 
> > ( and ) are restricted but not reserevd in URIs (ie. valid)
> 
> A really interesting discussion! Meanwhile I checked RFC1738 "Uniform
> Resource
> Locators", so some additional comments... A "(" or ")" is *not* allowed in
> the
> hostname part of an ftp/http[s] url, which can only be [A-Za-z.-], but it is

yes. i said so too. ( and ) are allowed on URI except on the bits where they
aren't ;)

> allowed in the search path. Neither "[" "]" nor "<" ">" are, though. So I
> think the correct regexp would be
> 
> char *url_str = "\\<((ht|f)tps?://[A-Za-z0-9$_.+!*'(),%;:@&=?/-]+)\\>";
> 
> Remember that this does *not* check for cases which are not allowed, e.g.
> http://www.&@!.net.
> 
it doesn't ? i'll believe you :)

> The decision if a ")" at the end of an URL belongs to the URL or to the text
> is also not that easy: think of e.g. (see http://www.balsa.net) or
> http://some.domain.org/page(3). IMHO we should not invest too much effort in
...

agreed. having a complex regex with enough smart to sort out () around urls
going through the mailbox isn't a very good aproach. lets KISS and not make
it noticeably slower for the user or fall into the pits of memory bloat
by the regex engine.



-- 
Carlos Morgado - chbm(at)chbm(dot)nu - http://chbm.nu/ -- gpgkey: 0x1FC57F0A
http://wwwkeys.pgp.net/ FP:0A27 35D3 C448 3641 0573 6876 2A37 4BB2 1FC5 7F0A
Software is like sex; it's better when it's free. - Linus Torvalds




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]