[Vala] Aho–Corasick string matching algorithm Vapi



I've made a Vapi for "Aho–Corasick string matching algorithm".
Lib can be taken from here: http://sourceforge.net/projects/multifast/files/
Need modifications of Makefile to make shared lib:
    add line to the head:
        SONAME := libahocorasick.so.$(ACVERSION)
    add to CFLAGS:
        -fPIC
    add section:
        so: aho_corasick.o node.o
            gcc -shared -Wl,-soname,$(SONAME) -o $(SONAME) aho_corasick.o node.o
            ln -s -f $(SONAME) libahocorasick.so

    run "make so".

What is the advantage on using this library:
Get  substrings  found  in  given  string  (for example: tags in text,
domains/keywords in uri, etc.)

My benchmarks on i7:
500 000 substrings: index time - 15 sec., in memory 1.7Gb, average key
length 32 chars.
10 000 keys check: overall time 0.048 sec.


Example of use with: --pkg ahocorasick -X -lahocorasick :


using AC;

public static int match_handler (Match m, void * param)
{
       uint j;
                        
       for (j=0; j < m.match_num; j++)
       {

                                stdout.printf("%ld ", m.position);
                                stdout.printf("%ld ", m.matched_strings[j].id);
                                stdout.printf("%s ", m.matched_strings[j].str);
                                stdout.printf("\n");
       };

       return 0; /* Find all matches */
}

public void main (string[] args)
{
       var aca = AC.Automata (match_handler);

       var str = AC.String () {
           id = 1,
           str = "test",
           length = 4 // "str".length should be passed here
       };
       aca.add_string (str);

       aca.build();  // this build an index, before it's done - search
       can't be executed

       var str = AC.String () {
           str = "tes",
           length = "tes".length
       };
       aca.search (str, null);
                                                
       aca.reset(); // reset AC instance
}

Attachment: ahocorasick.vapi
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]