Re: Fix for GaimLog.Striptags()



Hi,

On 22/09/04 09:35, Nat Friedman wrote:
One concern here is that we're ignoring everthing in between "<" and
">", which is not good. A line like:

------------------------
dude1: e < mc^2+1. I *really* want the text here indexed. e > mc^2+1
dude2: Cool! I want that indexed too!
------------------------

would be reduced to:

------------------------
dude1: e  mc^2+1
dude2: Cool! I want that indexed too! But I can't. I'm so unhappy.
------------------------

Maybe a patch to Gaim to write "<" and ">" when input by the user as
"&lt;" and "&gt;" would be good. Any foreseeable objections from the
Gaim folks?


Yeah, that's a serious problem.  I'll commit this, if you go ahead and
talk to the Gaim folks about outputting more sensible logs.

As far as I know, this is only an issue with old-style gaim logs. I have one instance in my logs where an <a href> splits over two lines, which totally messes things up if not using this patch.

However, the new-style gaim logs can either be plain text or HTML.
As plain text, '<' and '>' aren't stripped out by ImLog. And in HTML logging mode, gaim replaces '<' and '>' with "&lt;" and "&gt;", so there's no issue there either.

I just tested this with a bunch of crazy examples and it indexed all the text perfectly each time, including Arun's example above. The only time it won't capture text is if you do "<fake>tags</fake>" while using HTML logging because all the entities merge together into one big text blob. So, replacing "&lt;" and "&gt;" (there's probably others too) with spaces in new-style HTML logs would probably help there.

Regards,
Chris



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]