Re: Maximum results to return from query



On Mon, Dec 05, 2005 at 07:25:49AM -0700, Elijah Newren wrote:
> On 12/5/05, Olav Vitters <olav bkor dhs org> wrote:
> > Searching for a bug can produce lots of results. Some queries can
> > return all the bugs in the database. The script has a very idiotic way
> > to protect against such queries (nothing after the ? in the URL or just
> > buglist.cgi).
> >
> > A java program was requesting:
> >   http://bugzilla.gnome.org/buglist.cgi?bug_id=
> > This caused buglist.cgi to retrieve all bugs. I've blocked his IP &
> > changed buglist.cgi to reject above query, but the java program already
> > had 3 buglist.cgi processes running on window, each consuming lots of
> > processor time (20min) & memory (180MB+).
> 
> Was their java program running core-bugs-today.cgi or another
> braindead script that we have?  I've requested
>   http://bugzilla.gnome.org/buglist.cgi?bug_id=
> several times myself as well.  I didn't do it intentionally, but
> rather because reports/core-bugs-today.cgi does it's own little query
> to get a comma separated list of bugs, and then just appends them to
> that above URL and automatically redirects--if no bugs were found (as
> is the case just after 24:00 UTC), the appended list is merely an
> empty string.  Taking a look at the code, this bug is still there
> although it appears you fixed it when you ported that script to 2.20. 
> We may have other scripts that are similarly braindead.  I think
> boogle had this problem at one point (causing me to request that same
> url...) before I added code to special case that and print an error
> that no bugs were found instead.

Ahh.. that may explain it. Still strange that the USER_AGENT only
contained Java/some_ver. I assumed it was a badly written bot as also
nobody was logged in from that IP (and /24). I couldn't say if it
accessed other urls as I do not have access to the server logs (perhaps
ask the sysadmins, but they are currently busy). For the processes I
just /proc/$PID/environ to find out the information. Using that file is
also way better cause it is way more precise.

> > Ideally buglist.cgi should contain a better detection of such queries.
> 
> Well, I think special casing an empty query since it has happened so
> many times makes sense.  Something more advanced would be welcome too,
> but this would probably be about as simple as you get and an empty
> query really ought to return an empty list.

Currently it avoids buglist.cgi?bug_id= (added today) and buglist.cgi. I
know I sometimes use buglist.cgi ?id=somebug (this will still hang.. ).
I do not see an easy way to detect such queries, that is why I'm
suggesting the limit.

> > Another way would be to limit the number of bugs in the SQL. This isn't
> > perfect as the java process would still return lots of results, but it
> > is easy to implement. This is what I want to do now.
> 
> Sounds like it'd also be a good idea, but I really do think we should
> also special case an empty query and make it return
> nothing--especially since we have caused it so many times ourselves.

Yeah.. I'll look again.. seemed hard to do :-(


-- 
Regards,
Olav



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]