View Single Post
  #2 (permalink)  
Old October 22nd, 2007, 10:13 AM
ciderpunx ciderpunx is offline
Friend of Wrox
Points: 1,515, Level: 15
Points: 1,515, Level: 15 Points: 1,515, Level: 15 Points: 1,515, Level: 15
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Dec 2003
Location: Oxford, , United Kingdom.
Posts: 488
Thanks: 0
Thanked 3 Times in 3 Posts
Default

Yeh, it looks like google disallows some user agents. Using the firefox useragent switcher, I pretended to be wget and got a page warning me:

Quote:
quote:
Your client does not have permission to get URL /search?q=http+user+agent&ie=utf-8&oe=utf-8&aq=t&rls=org.debian:en-GB:unofficial&client=iceweasel-a from this server. (Client IP address: 80.68.82.90)

Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html
OK, well, there's some stuff about not modifying any results you get. Fair do-s. You should read the ToS yourself and make sure that you're not doing anything to breach them.

Anyhow, I hopped back to my user-agent switcher and looked at my default user-agent string. It was
Code:
Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)
Catchy.

So, I need to set the user-agent property. Bit of googling in the docs tells me that:

Quote:
quote:
Opens a connection to this URL and returns an InputStream for reading from that connection.
This method is a shorthand for:
         openConnection().getInputStream()
Right, openConnection() returns a URLConnection object. Now a URLConnection object has a setRequestProperty(String key,String val) method. We know from reading the HTTP docs (e.g. http://www.w3.org/Protocols/HTTP/HTR...tml#user-agent ) that User-Agent is a property of an HTTP request. Bingo! All we need is to craft ourselves a nice URLConnection which we call a setRequestProperty("User-Agent","Our-Agent/1.0") method on.

This is the code I made:

Code:
import java.net.*;
import java.io.*;

public class googet
{
        public static void main ( String[] args ) throws IOException {
                try {
                        URL url = new URL("http://www.google.com/search?q=example");
                        URLConnection conn =  url.openConnection();
                        conn.setRequestProperty("User-Agent",
                                        "Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)");
                        BufferedReader in = new BufferedReader(
                                new InputStreamReader(conn.getInputStream())
                        );
                        String str;

                        while ((str = in.readLine()) != null) {
                                System.out.println(str);
                        }

                        in.close();
                }
                catch (MalformedURLException e) {}
                catch (IOException e) {}
        }
}
And running it gives me:
Code:
maui:/home/charlie/src/java# java googet 
<html><head><meta http-equiv=content-type content="text/html; charset=UTF-8"><title>example - Google Search</title><style>div,td,.n a,.n a:visited{color:#000}.ts td,.tc{padding:0}.ts,.tb{border-collapse:collapse}.f{color:#666}.flc,a.fl{color:#77c}a,.w,.q:visited,.q:active,.q,.b a,.b a:visited,.mblink:visited{color:#00c}a:visited{color:#551a8b}a:active{color:red}.t{background:#d5dff3;color:#000;padding:5px 1px 4px}.bb{border-bottom:1px solid #36c}.bt{border-top:1px solid #36c}.j{width:34em}.h{color:#36c;font-size:14px}.i{color:#a90a08}.a{color:green}.z{display:none}div.n{margin-top:1ex}.n a,.n .i{font-size:10pt}.n .i,.b a{font-weight:bold}.b a{font-size:12pt}#np,#nn,.nr,#logo span,.ch{cursor:pointer;cursor:hand}.ta{padding:3px 3px 3px 5px}#tpa2,#tpa3{padding-top:9px}#gbar{float:left;font-weight:bold;height:22px;padding-left:2px}#gbh{border-top:1px solid #c9d7f1;font-size:0;height:0;position:absolute;right:0;top:24px;width:200%}#gbi{background:#fff;border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;font-size:13px;top:24px;z-index:1000}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}}#gbi,.gb2{display:none;position:absolute;width:8em}.gb2{z-index:1001}#gbar a,#gbar a:active,#gbar a:visited{color:#00c;font-weight:normal}.gb2 a,.gb3 a{text-decoration:none}.gb2 a{display:block;padding:.2em .5em}#gbar .gb2 a:hover{background:#36c;color:#fff}.sl,.r{display:inline;font-weight:normal;margin:0}.sl{font-size:84%}.r{font-size:1em}.e{margin:.75em 0}.sm{display:block;margin:0;margin-left:40px}.slk{margin-left:40px}.slk td{font-size:83%;line-height:16px;padding:5px 40px 0 0;vertical-align:top}.n div,#logo span{background:url(/images/nav_logo3.png) no-repeat;height:26px;overflow:hidden}.n .nr{background-position:-60px 0;width:16px}#np{width:44px}#nf{background-position:-26px 0;width:18px}#nc{background-position:-44px 0;width:16px}#nn{margin-right:34px;width:66px}#nl{width:46px}#nn,#nl{background-position:-76px 0}#logo{display:block;height:52px;margin:13px 0 7px;overflow:hidden;position:relative;width:150px}#logo span{background-position:0 -26px;height:100%;left:0;position:absolute;top:0;width:100%}body{font-family:arial,sans-serif}.g{margin:1em 0}#sd{font-size:84%;font-weight:bold}#ap{font-size:64%}</style><script>window.google={kEI:"db0cR721CpywQY2m9PwH",kEXPI:"17259,17497,17735",kHL:"en"};window.clk=function(b,c,d,e,f,g){if(document.images){var a=encodeURIComponent||escape;(new Image).src="/url?sa=T"+(c?"&oi="+a(c):"")+(d?"&cad="+a(d):"")+"&ct="+a(e)+"&cd="+a(f)+(b?"&url="+a(b.replace(/#.*/,"")).replace(/\+/g,"%2B"):"")+"&ei=db0cR721CpywQY2m9PwH"+g}return true};window.gbar={};(function(){;var g=window.gbar,a,f,h;function m(b,e,d){b.display=b.display=="block"?"none":"block";b.left=e+"px";b.top=d+"px"}g.tg=function(b){var e=0,d,c,i,j=0,k=window.navExtra;!f&&(f=document.getElementById("gbar"));!h&&(h=f.getElementsByTagName("span"));(b||window.event).cancelBubble=true;if(!a){a=document.createElement(Array.every||window.createPopup?"iframe":"div");a.frameBorder="0";a.id="gbi";a.scrolling="no";a.src="#";document.body.appendChild(a);if(k)for(var n in k){var l=document.createElement("span");l.appendChild(k[n]);l.className="gb2";f.appendChild(l)}document.onclick=g.close}for(;h[j];j++){c=h[j];i=c.className;if(i=="gb3"){d=c.offsetLeft;while(c=c.offsetParent)d+=c.offsetLeft;m(a.style,d,24)}else if(i=="gb2"){m(c.style,d+1,25+e);e+=20}}a.style.height=e+"px"};g.close=function(b){a&&a.style.display=="block"&&g.tg(b)};})();</script></head><body bgcolor=#ffffff onload="" topmargin=3 marginheight=3><div id=gbar><nobr><span class=gb1>Web</a></span> <span class=gb1><a href="http://images.google.com/images?q=example&um=1&ie=UTF-8&sa=N&tab=wi">Images</a></span> <span class=gb1><a href="http://video.google.com/videosearch?q=example&um=1&ie=UTF-8&sa=N&tab=wv">Video</a></span> <span class=gb1><a href="http://news.google.com/news?q=example&um=1&ie=UTF-8&sa=N&tab=wn">News</a></span> <span class=gb1><a href="http://maps.google.com/maps?q=example&um=1&ie=UTF-8&sa=N&tab=wl">Maps</a></span> <span class=gb1><a href="http://mail.google.com/mail?um=1&ie=UTF-8&sa=N&tab=wm">Mail</a></span> <span class=gb3><a href="http://www.google.com/intl/en/options/" onclick="this.blur();gbar.tg(event);return false">[u]more</u> <span style=font-size:11px>#9660;</span></a></span>
---SNIPPED---

--
Charlie Harvey's website - linux, perl, java, anarchism and punk rock: http://charlieharvey.org.uk
Reply With Quote