I noticed that Google recently started using some sneaky JavaScript to redirect page clicks via them. This is usually done to allow better usage stats. I don't want Google to have more personal information about me than I already give them and it is irritating to copy a link and have to de-googlify it.
For example, this search, will take you to:
http://www.google.co.za/url?sa=t&ct=res&cd=1&\
url=http%3A//www.faqs.org/rfcs/rfc3092.html&ei=6Q0FQ-vAH8a2YLq_tf4J
If you click on the first link. The above is the result of a Right Click on Link -> Copy Link Location.
So, because I think tinfoil is in this winter I hacked up the Google Sanitiser (source). Clicking directly on the link will redirect you to 'real Google'. This PHP just passes on your normal Google request string and strips the <script> tags and their contents. It also rewrites some <img> tags. Here is a Mozilla/Firefox search plugin for it. Just put it in your ~/.mozilla/searchplugins directory. Now look at the same search, but sanitised. Feel free to host your own sanitiser and save my server bandwidth.
UPDATE 24th Aug 2005:
Ha ha, Google thinks I am spyware. I am not sure what the legalities of this are. I can't see there being much unless they want to outlaw browsers that don't support Javascript. My logs show that a total of 16 people have used the script a total of 32 times (excluding me, I make it 61), this isn't very much so I doubt it is some form of anomaly detection (although it could be).
I moved the script over to another webserver and it worked fine and the sitesearch from my blog worked fine. This indicates they are not just blocking on the hostname or IP. I then tried:
- all sorts of URL modifications including changing the path and vhost name
- removing the <img> tag rewriting thinking that maybe they are picking up on those requests
- I changed from google.co.za to google.com
- I changed/added (fopen doesn't have one by default) the user-agent string
None of these worked. So I whipped out netcat and noticed that PHP's fopen() was adding a "From: phpfopen@rucus.ru.ac.za" header. This appears to be what they are using to block the script. To get around this I stopped using fopen() and now use curl. This allows me to craft my own header which looks like a webbrowser. I even deflate the mod_gzip'ed html. This works great.
Check out the new improved version here. I added in a check to make sure you are coming from my netblock to encourage you to run the script from your own site. Also if I get any nasty e-mails from Google I will take it down, I am not that interested :)
Barry Irwin

singe: Awesome breakdown from the reigning Web App Scanner queens NTObjectives on why their scanner kicked the other's asses http://is.gd/9e0GZ
extern blog SensePost; : Decrypting Symantec BackupExec passwords
0 Trackbacks