Re: Prop: make URL's clickable



Am 14.05.2001 11:15:08 schrieb(en) Carlos Morgado:
[snipped a long discussion ;-))]
> you just said it's invalid. foo.bar.some)host is invalid for an host name.
> 
> > I think that http://foo.bar.host/file)name may be valid, though.  I'd
> > need to check the RFC on URI-encoding.
> 
> ( and ) are restricted but not reserevd in URIs (ie. valid)

A really interesting discussion! Meanwhile I checked RFC1738 "Uniform Resource
Locators", so some additional comments... A "(" or ")" is *not* allowed in the
hostname part of an ftp/http[s] url, which can only be [A-Za-z.-], but it is
allowed in the search path. Neither "[" "]" nor "<" ">" are, though. So I
think the correct regexp would be

char *url_str = "\\<((ht|f)tps?://[A-Za-z0-9$_.+!*'(),%;:@&=?/-]+)\\>";

Remember that this does *not* check for cases which are not allowed, e.g.
http://www.&@!.net.

The decision if a ")" at the end of an URL belongs to the URL or to the text
is also not that easy: think of e.g. (see http://www.balsa.net) or
http://some.domain.org/page(3). IMHO we should not invest too much effort in
developing a hyper-intelligent scanner for URL's (we could also think about
sorting out malformed urls, which is even more complicated, as the rules for
ftp and http are different). If someone sends a mail with a weired link the
user will get weired problems. In this cace, the only solution is sending it
as html with the links marked.

Cheers, Albrecht.

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Albrecht Dreß  -  Monschauer Straße 22  -  D-53121 Bonn (Germany)
      Phone (+49) 228 6199571  -  E-Mail albrecht.dress@arcormail.de
_________________________________________________________________________




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]