Aug 19
Security

I noticed that Google recently started using some sneaky JavaScript to redirect page clicks via them. This is usually done to allow better usage stats. I don't want Google to have more personal information about me than I already give them and it is irritating to copy a link and have to de-googlify it.

For example, this search, will take you to:
http://www.google.co.za/url?sa=t&ct=res&cd=1&\
url=http%3A//www.faqs.org/rfcs/rfc3092.html&ei=6Q0FQ-vAH8a2YLq_tf4J
If you click on the first link. The above is the result of a Right Click on Link -> Copy Link Location.

So, because I think tinfoil is in this winter I hacked up the Google Sanitiser (source). Clicking directly on the link will redirect you to 'real Google'. This PHP just passes on your normal Google request string and strips the <script> tags and their contents. It also rewrites some <img> tags. Here is a Mozilla/Firefox search plugin for it. Just put it in your ~/.mozilla/searchplugins directory. Now look at the same search, but sanitised. Feel free to host your own sanitiser and save my server bandwidth.

UPDATE 24th Aug 2005:
Ha ha, Google thinks I am spyware. I am not sure what the legalities of this are. I can't see there being much unless they want to outlaw browsers that don't support Javascript. My logs show that a total of 16 people have used the script a total of 32 times (excluding me, I make it 61), this isn't very much so I doubt it is some form of anomaly detection (although it could be).

I moved the script over to another webserver and it worked fine and the sitesearch from my blog worked fine. This indicates they are not just blocking on the hostname or IP. I then tried:

  • all sorts of URL modifications including changing the path and vhost name
  • removing the <img> tag rewriting thinking that maybe they are picking up on those requests
  • I changed from google.co.za to google.com
  • I changed/added (fopen doesn't have one by default) the user-agent string

None of these worked. So I whipped out netcat and noticed that PHP's fopen() was adding a "From: phpfopen@rucus.ru.ac.za" header. This appears to be what they are using to block the script. To get around this I stopped using fopen() and now use curl. This allows me to craft my own header which looks like a webbrowser. I even deflate the mod_gzip'ed html. This works great.

Check out the new improved version here. I added in a check to make sure you are coming from my netblock to encourage you to run the script from your own site. Also if I get any nasty e-mails from Google I will take it down, I am not that interested :)

Posted by Dominic White

Last modified on 2005-08-24 02:50

0 Trackbacks

  1. No Trackbacks

3 Comments

Display comments as(Linear | Threaded)
  1. Tristan Seligmann says:

    I do have to point out that your sanitiser has exactly the same privacy issues as Google's URL intercepting. Also, they've done this off and on for ages now; but I think the Javascript cloaking of the interception is new.

  2. Dominic White says:

    They could always track where you are going by checking their webserver logs, and I guess you have no reason to trust me (that's another reason to run your own), but it does prevent Google from tracking the dissemination of the link (if the link isn't de-googlified). Mostly it just gets rid of the nasty URL pasting.

  3. Dominic White says:

    I got a private e-mail with a suggestion of an alternative to the CURL library: function sendToHost($host,$method,$path,$data,$useragent=0) { if (empty($method)) $method = 'GET'; $method = strtoupper($method); $fp = fsockopen($host,80); $path .= '?' . $data; fputs($fp, "$method $path HTTP/1.1\n"); fputs($fp, "Host: $host\n"); if ($useragent) fputs($fp, "User-Agent: MSIE\n"); fputs($fp, "Connection: close\n\n"); while (!feof($fp)) echo fgets($fp,128); fclose($fp); return 1; } sendToHost('www.google.com','get','/search','q=php_imlib',1); Thanks Chris.

Add Comment


E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA