Re: newbie wants data from google scholar
- From: Rudra Banerjee <bnrj rudra yahoo com>
- To: Brian Lavender <brian brie com>
- Cc: libsoup-list gnome org
- Subject: Re: newbie wants data from google scholar
- Date: Mon, 01 Oct 2012 17:40:29 +0100
Hello,
Thanks for the response and sorry for being late returning back. I was
trying my hand on python.
But, first thing is, neither of the methods you suggested works for
google scholar:
PYTHON WAY:
$ python
Python 2.7.3 (default, Jul 24 2012, 10:05:38)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from HTMLParser import HTMLParser
>>> import urllib2
>>> response =
urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert+einstein%
2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/usr/lib64/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib64/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/usr/lib64/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 527, in
http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
>>>
WGET WAY:
$ wget "http://scholar.google.co.uk/scholar?hl=en&q=albert+einstein%
2B1905&btnG=&as_sdt=1%2C5&as_sdtp="
--2012-10-01 17:38:01--
http://scholar.google.co.uk/scholar?hl=en&q=albert+einstein%
2B1905&btnG=&as_sdt=1%2C5&as_sdtp=
Resolving scholar.google.co.uk... 173.194.41.81, 173.194.41.82,
173.194.41.83, ...
Connecting to scholar.google.co.uk|173.194.41.81|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-10-01 17:38:02 ERROR 403: Forbidden.
but offcourse the link is not forbidden as I can access it using
browser.
What is the reason here?
On Thu, 2012-09-27 at 09:54 -0700, Brian Lavender wrote:
> Rudra,
>
> First, you need to quote your argument so that the shell does not
> interpret your argument.
>
> $ wget "http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5"
>
> Second, I suggest you switch to Python. It has a nice command shell that you
> can try these things out.
>
> The reason the instructions for libsoup look like they are written in
> Latin is because they are written in "Latin". And, therefore you need
> to read "Latin". Before you even start with libsoup, you should start
> with libglib developer guide and write small programs.
>
> http://developer.gnome.org/glib/2.32/
>
> I assume you already know how to code in C. If you don't, then go the
> python path. Here is an example
>
> $ python
> >>> from HTMLParser import HTMLParser
> >>> import urllib2
> >>> response = urllib2.urlopen('http://python.org/')
> >>> html = response.read()
> >>> print html
>
> parsing html left as an exercise for you.
> http://docs.python.org/library/htmlparser.html
>
> You can of course put these in an python program. I think you will
> get a lot more traction going this path.
>
> On Thu, Sep 27, 2012 at 10:47:13AM +0100, Rudra Banerjee wrote:
> > On Thu, 2012-09-27 at 07:48 +0200, Emmanuel Rodriguez wrote:
> > >
> > > Once you have downloaded a web resource, most likely an HTML webpage
> > > you're on your own.
> > Downloading google scholar have some problem as well:
> > $ wget http://scholar.google.co.uk/scholar?q=albert+einstein%
> > 2B1905&btnG=&hl=en&as_sdt=0%2C5
> > [1] 18552
> > [2] 18553
> > [3] 18554
> > [2]- Done btnG=
> > [3]+ Done hl=en
> > [rudra@roddur ~]$ --2012-09-27 10:42:29--
> > http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905
> > Resolving scholar.google.co.uk... 173.194.41.113, 173.194.41.114,
> > 173.194.41.115, ...
> > Connecting to scholar.google.co.uk|173.194.41.113|:80... connected.
> > HTTP request sent, awaiting response... 403 Forbidden
> > 2012-09-27 10:42:30 ERROR 403: Forbidden.
> >
> > ^C
> > [1]+ Exit 8 wget
> > http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905
> >
> > Can you kindly hint me the source of error?
> >
> > _______________________________________________
> > libsoup-list mailing list
> > libsoup-list gnome org
> > https://mail.gnome.org/mailman/listinfo/libsoup-list
>
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]