I've made a Vapi for "Aho–Corasick string matching algorithm". Lib can be taken from here: http://sourceforge.net/projects/multifast/files/ Need modifications of Makefile to make shared lib: add line to the head: SONAME := libahocorasick.so.$(ACVERSION) add to CFLAGS: -fPIC add section: so: aho_corasick.o node.o gcc -shared -Wl,-soname,$(SONAME) -o $(SONAME) aho_corasick.o node.o ln -s -f $(SONAME) libahocorasick.so run "make so". What is the advantage on using this library: Get substrings found in given string (for example: tags in text, domains/keywords in uri, etc.) My benchmarks on i7: 500 000 substrings: index time - 15 sec., in memory 1.7Gb, average key length 32 chars. 10 000 keys check: overall time 0.048 sec. Example of use with: --pkg ahocorasick -X -lahocorasick : using AC; public static int match_handler (Match m, void * param) { uint j; for (j=0; j < m.match_num; j++) { stdout.printf("%ld ", m.position); stdout.printf("%ld ", m.matched_strings[j].id); stdout.printf("%s ", m.matched_strings[j].str); stdout.printf("\n"); }; return 0; /* Find all matches */ } public void main (string[] args) { var aca = AC.Automata (match_handler); var str = AC.String () { id = 1, str = "test", length = 4 // "str".length should be passed here }; aca.add_string (str); aca.build(); // this build an index, before it's done - search can't be executed var str = AC.String () { str = "tes", length = "tes".length }; aca.search (str, null); aca.reset(); // reset AC instance }
Attachment:
ahocorasick.vapi
Description: Binary data