Re: [evolution-patches] Optimise vCard parsing



I'm curious, how much difference does this make?

The only real logic change seems to be in the unfolding code.  Which is
pretty over-done since, '\r' in utf8 == '\r' in ascii and cannot
possibly be interpreted any other way.  The same for all the other
characters it is checking for.

i.e. the code could just use trivial char * pointer stuff and no g_utf8*
functions at all.

e.g.

start = p;
while ((c = *p++)) {
	if (c == '\r' || c == '\n') {
		c = *p++;
		if (c == '\r' || c== '\n')
			c = *p++;
		well you get the picture
	}
}


I think you'd find more gains by changing this stuff anyway:

static void
parse (EVCard *evc, const char *str)
{
	char *buf = g_strdup (str);
	char *p, *end;
	EVCardAttribute *attr;

	/* first validate the string is valid utf8 */
	if (!g_utf8_validate (buf, -1, (const char **)&end)) {
		/* if the string isn't valid, we parse as much as we can from it */
		g_warning ("invalid utf8 passed to EVCard.  Limping along.");
		*end = '\0';
	}
	
	buf = fold_lines (buf);

Copying the string is wildly unecessary, validate is unecessary, and
fold_lines could be done differently.

parse could:
 unfold one line, using simple ascii based comparisons
 then parse that one line, ignoring bad chars as it goes (below)
 loop until done

the g_utf8 stuff is pretty messy and hard to use (very easy to overrun
buffers or go into infinite loops), that is why you have to validate the
strings first, because they fail miserably if you pass it bad data.  Not
a well thought-out performance api that one.

The function claims to 'try to recover' as best it can, but it just
stops at the first invalid character.

In camel we face similar problems, we have a few simple functions which
solve this problem in a cleaner way:

c = camel_utf8_getc(&p);

will return the next decoded unicode value, skipping bad ones (not
strict conformance, but it really isn't necessary), or 0 once it's
finished, updating p accordingly.  it doesn't need pre-validation.

It is a trivially small function, and can be renamed and copied into
e-vcard, and even made inlinable.

If your changes make any practical difference, it is probably worth a
closer look.  By just doing one line at a time you should improve
locality of reference even more, it does slightly less work anyway, and
will use less memory as a bonus.

(actually the whole parser looks pretty awful performance wise ...)


On Thu, 2005-08-04 at 17:05 +0100, Ross Burton wrote:
> Hi,
> 
> I did some benchmarking of general EDS addressbook performance and
> string manipulation when loading vCards is way higher in the profile
> than I'd expect.
> 
> http://bugzilla.gnome.org/show_bug.cgi?id=312581 has a patch which
> speeds things up a bit.
> 
> Ross
-- 
adfa(evolution-2.4:20087): gtkhtml-WARNING **: cannot find icon:
'stock_insert-url' in gnome 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]