Re: newbie wants data from google scholar



Hello,
Thanks for the response and sorry for being late returning back. I was
trying my hand on python.
But, first thing is, neither of the methods you suggested works for
google scholar:
PYTHON WAY:
$ python
Python 2.7.3 (default, Jul 24 2012, 10:05:38) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from HTMLParser import HTMLParser
>>> import urllib2
>>> response =
urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert+einstein%
2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 527, in
http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
>>> 


WGET WAY:
$ wget "http://scholar.google.co.uk/scholar?hl=en&q=albert+einstein%
2B1905&btnG=&as_sdt=1%2C5&as_sdtp="
--2012-10-01 17:38:01--
http://scholar.google.co.uk/scholar?hl=en&q=albert+einstein%
2B1905&btnG=&as_sdt=1%2C5&as_sdtp=
Resolving scholar.google.co.uk... 173.194.41.81, 173.194.41.82,
173.194.41.83, ...
Connecting to scholar.google.co.uk|173.194.41.81|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2012-10-01 17:38:02 ERROR 403: Forbidden.

but offcourse the link is not forbidden as I can access it using
browser.

What is the reason here?



On Thu, 2012-09-27 at 09:54 -0700, Brian Lavender wrote:
> Rudra,
> 
> First, you need to quote your argument so that the shell does not
> interpret your argument.
> 
> $ wget  "http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5";
> 
> Second, I suggest you switch to Python. It has a nice command shell that you 
> can try these things out.
> 
> The reason the instructions for libsoup look like they are written in
> Latin is because they are written in "Latin". And, therefore you need
> to read "Latin". Before you even start with libsoup, you should start
> with libglib developer guide and write small programs.
> 
> http://developer.gnome.org/glib/2.32/
> 
> I assume you already know how to code in C. If you don't, then go the
> python path. Here is an example
> 
> $ python
> >>> from HTMLParser import HTMLParser
> >>> import urllib2
> >>> response = urllib2.urlopen('http://python.org/')
> >>> html = response.read()
> >>> print html
> 
> parsing html left as an exercise for you. 
> http://docs.python.org/library/htmlparser.html
> 
> You can of course put these in an python program. I think you will
> get a lot more traction going this path. 
> 
> On Thu, Sep 27, 2012 at 10:47:13AM +0100, Rudra Banerjee wrote:
> > On Thu, 2012-09-27 at 07:48 +0200, Emmanuel Rodriguez wrote:
> > > 
> > > Once you have downloaded a web resource, most likely an HTML webpage
> > > you're on your own.
> > Downloading google scholar have some problem as well:
> > $ wget http://scholar.google.co.uk/scholar?q=albert+einstein%
> > 2B1905&btnG=&hl=en&as_sdt=0%2C5
> > [1] 18552
> > [2] 18553
> > [3] 18554
> > [2]-  Done                    btnG=
> > [3]+  Done                    hl=en
> > [rudra@roddur ~]$ --2012-09-27 10:42:29--
> > http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905
> > Resolving scholar.google.co.uk... 173.194.41.113, 173.194.41.114,
> > 173.194.41.115, ...
> > Connecting to scholar.google.co.uk|173.194.41.113|:80... connected.
> > HTTP request sent, awaiting response... 403 Forbidden
> > 2012-09-27 10:42:30 ERROR 403: Forbidden.
> > 
> > ^C
> > [1]+  Exit 8                  wget
> > http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905
> > 
> > Can you kindly hint me the source of error?
> > 
> > _______________________________________________
> > libsoup-list mailing list
> > libsoup-list gnome org
> > https://mail.gnome.org/mailman/listinfo/libsoup-list
> 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]