Re: adding metadata to documents via web scraping
- From: "D Baser" <dblips gmail com>
- To: dashboard-hackers gnome org
- Subject: Re: adding metadata to documents via web scraping
- Date: Thu, 29 May 2008 15:23:56 +0200
Updated my perl script (had a copy/paste error in the url somehow), now it seems to work.
Unfortunately the Desktop Search doesn't show snippets for videos -- see attached screenshot: "harrison ford" is found for Indy.avi but it doesn't show where.
Cheers, D Baser
#!/usr/bin/perl
$s = $ARGV[0];
$s =~ s/.*?\///g;
$s =~ s/\.avi$//ig;
$s =~ s/[^a-zA-Z0-9-]/+/g;
$s =~ s/([a-z])([A-Z0-9])/$1+$2/g;
$c = `lynx -source '
http://www.google.com/search?q=$s+site:www.imdb.com/title&btnI'`;
$c =~ s/<script.*?>(.*?|\n)*<\/script>/ /g;
$c =~ s/<style.*?>(.*?|\n)*<\/style>/ /g;
$c =~ s/<(([^>])+)>/ /g;
$c =~ s/&[a-z#0-9]+;/ /g;
print $c;
Attachment:
metadata-from-web.png
Description: PNG image
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]