Re: newbie wants data from google scholar



On Thu, Sep 27, 2012 at 12:54 AM, Rudra Banerjee <bnrj rudra yahoo com> wrote:
Dear friends,
I am a complete newbie in fetching data from website(beyond using
wget).
My current project(self assigned, for fun! not a homework) needs
fetching data from google scholar. Since, other part of the project is
based on gtk, I am thinking of using libsoup for the purpose.
But libsoup tutorial in gnome-dev is completely latin to a newbie.

Can you please provide me a simple program using libsoup for getting and
scrapping HTML page?
Libsoup is not meant to be used for scrapping HTML. It's main purpose is to download content through http or https.

Once you have downloaded a web resource, most likely an HTML webpage you're on your own. If you want to scrap HTML you will need another library. If you're using C I don't know of any library that can do this for you besides using WebKit (or Mozilla). You can use WebKit to fetch and render the webpage (this can be done offscreen) and then use _javascript_ for crapping the HTML.

If your using a scripting language then you might be more lucky depending of the language. I know the Perl has lots of libraries for doing HTML scrapping, one of the most popular WWW::Mechanize [1].

[1] being http://search.cpan.org/perldoc?WWW%3A%3AMechanize

--
Emmanuel Rodriguez


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]