Re: adding metadata to documents via web scraping



Updated my perl script (had a copy/paste error in the url somehow), now it seems to work.

Unfortunately the Desktop Search doesn't show snippets for videos -- see attached screenshot: "harrison ford" is found for Indy.avi but it doesn't show where.

Cheers, D Baser

#!/usr/bin/perl

$s = $ARGV[0];

$s =~ s/.*?\///g;
$s =~ s/\.avi$//ig;
$s =~ s/[^a-zA-Z0-9-]/+/g;
$s =~ s/([a-z])([A-Z0-9])/$1+$2/g;

$c = `lynx -source 'http://www.google.com/search?q=$s+site:www.imdb.com/title&btnI'`;

$c =~ s/<script.*?>(.*?|\n)*<\/script>/ /g;
$c =~ s/<style.*?>(.*?|\n)*<\/style>/ /g;
$c =~ s/<(([^>])+)>/ /g;
$c =~ s/&[a-z#0-9]+;/ /g;

print $c;

Attachment: metadata-from-web.png
Description: PNG image



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]