Hi, On 22/09/04 09:35, Nat Friedman wrote:
One concern here is that we're ignoring everthing in between "<" and ">", which is not good. A line like: ------------------------ dude1: e < mc^2+1. I *really* want the text here indexed. e > mc^2+1 dude2: Cool! I want that indexed too! ------------------------ would be reduced to: ------------------------ dude1: e mc^2+1 dude2: Cool! I want that indexed too! But I can't. I'm so unhappy. ------------------------ Maybe a patch to Gaim to write "<" and ">" when input by the user as "<" and ">" would be good. Any foreseeable objections from the Gaim folks?Yeah, that's a serious problem. I'll commit this, if you go ahead and talk to the Gaim folks about outputting more sensible logs.
As far as I know, this is only an issue with old-style gaim logs. I have one instance in my logs where an <a href> splits over two lines, which totally messes things up if not using this patch.
However, the new-style gaim logs can either be plain text or HTML.As plain text, '<' and '>' aren't stripped out by ImLog. And in HTML logging mode, gaim replaces '<' and '>' with "<" and ">", so there's no issue there either.
I just tested this with a bunch of crazy examples and it indexed all the text perfectly each time, including Arun's example above. The only time it won't capture text is if you do "<fake>tags</fake>" while using HTML logging because all the entities merge together into one big text blob. So, replacing "<" and ">" (there's probably others too) with spaces in new-style HTML logs would probably help there.
Regards, Chris