As a precursor to this, I did some playing and realised (later than most it seems) that the GoogleSharing proxy implements a straight HTTP 1.1 proxy. A few quick lines of code, thanks to some help from Andrew Mohawk due to some gzip'ed return data trouble, and you have a very simple PHP interface to GoogleSharing:
<?php
ini_set("user_agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.google.com/custom?q=" . urlencode($_REQUEST['q']));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_PROXY, "http://proxy.googlesharing.net");
curl_setopt($ch, CURLOPT_PROXYPORT, 80);
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
$x = curl_exec($ch);
print $x;
curl_close($ch);
die();
?>
My only worry is that I've been down this road before, 5 years ago, and I want things to happen a little differently this time. What happened then is thousands of porn sites hosting malware decided that privacy enhanced search was just what their customers needed. This resulted in Google seeing several hundred malware infested links linking back to this site. The net result was that I dropped out of Google completely (with no warning or explanation of course). So my intention is not that you use my search interface. That's stupid anyway as you have no reason to trust that I'm not mining your search data. So here is a tarball that can be used to set up your own PHP front-end. You'll need a PHP-enabled webserver with curl. The readme has more.
Archived Scroogle Announcement
July 1,
2010: Here we go again...
We regret to announce that our Google scraper may have to be
permanently retired, thanks to a change at Google. It depends on whether
Google is willing to restore the simple interface that we've been
scraping
since Scroogle started five years ago. Actually, we've been using that
interface for scraping since Google-Watch.org began in 2002.
This interface (here's a sample
from years ago) was
remarkably stable all that time. During those eight years there were
only
about five changes that required some programming adjustments. Also,
this
interface was available at every Google data center in exactly the same
form, which allowed us to use 700 IP addresses for Google.
That interface was at www.google.com/ie
but on May 10, 2010 they took it down and inserted a redirect to
/toolbar/ie8/sidebar.html.
It used to have a search box, and the results it showed were generic
during that entire time. It didn't show the snippets unless you
moused-over the links it produced (they were there for our program, so
that was okay), and it has never had any ads. Our impression was that
these results were from Google's basic algorithms, and that extra
features
and ads were added on top of these generic results. Three years ago
Google
launched "Universal Search," which meant that they added results from
other Google services on their pages. But this simple interface we were
using was not affected at all.
It is not possible to continue Scroogle unless we have a simple
interface
that is stable. Google's main consumer-oriented interface that they want
everyone to use is too complex, too bloated, and changes too frequently,
to make our scraping operation possible.
After a lot of suggestions from Scroogle users, and a fair amount of
publicity, we found a fix and Scroogle was back in 24 hours. This fix
was
to insert an extra parameter, &output=ie, into the search terms that
were
relayed to Google. The extra parameter recovered the same interface that
we thought was gone forever.
Now it seems like it actually might be gone forever. Late on June 30,
2010,
the results produced while using this parameter began to shift to the
usual
busy Google interface with ads and a left-margin sidebar. Scroogle users
saw a Scroogle page that said, "Google returned no results for this
search," when in fact Google returned results but our scraper was unable
to deal with them. Over the next few days we will attempt to contact
Google and determine whether the old interface is gone as a matter of
policy at Google, or if they simply have it hidden somewhere and will
tell
us where it is so that we can continue to use it.
Thank you for your support during these past five years. Check back in a
week or so; if we don't hear from Google by next week, I think we can
all
assume that Google would rather have no Scroogle, and no privacy for
searchers.
— Daniel Brandt, Public Information Research, scroogle AT lavabit.com
Tracked: Jul 05, 08:03