Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Java > Java and JDK > Java Basics
|
Java Basics General beginning Java language questions that don't fit in one of the more specific forums. Please specify what version.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Java Basics section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old October 22nd, 2007, 09:09 AM
Registered User
 
Join Date: Oct 2007
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default Accessing Google search with Java

I have tried to access Google search using the code below. When I run the program the server returns HTTP response code: 403. I haven't had any problems with other search engines. Does anyone know how to access Google search with programming?

// download text content of URL

import java.net.*;
import java.io.*;

public class Jget
{
  public static void main ( String[] args ) throws IOException
  {
    try
    {
        URL url = new URL("http://www.google.com/search?q=example");

        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        String str;

        while ((str = in.readLine()) != null)
        {
          System.out.println(str);
        }

        in.close();
    }
    catch (MalformedURLException e) {}
    catch (IOException e) {}
  }
}

 
Old October 22nd, 2007, 10:13 AM
Friend of Wrox
 
Join Date: Dec 2003
Posts: 488
Thanks: 0
Thanked 3 Times in 3 Posts
Default

Yeh, it looks like google disallows some user agents. Using the firefox useragent switcher, I pretended to be wget and got a page warning me:

Quote:
quote:
Your client does not have permission to get URL /search?q=http+user+agent&ie=utf-8&oe=utf-8&aq=t&rls=org.debian:en-GB:unofficial&client=iceweasel-a from this server. (Client IP address: 80.68.82.90)

Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html
OK, well, there's some stuff about not modifying any results you get. Fair do-s. You should read the ToS yourself and make sure that you're not doing anything to breach them.

Anyhow, I hopped back to my user-agent switcher and looked at my default user-agent string. It was
Code:
Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)
Catchy.

So, I need to set the user-agent property. Bit of googling in the docs tells me that:

Quote:
quote:
Opens a connection to this URL and returns an InputStream for reading from that connection.
This method is a shorthand for:
         openConnection().getInputStream()
Right, openConnection() returns a URLConnection object. Now a URLConnection object has a setRequestProperty(String key,String val) method. We know from reading the HTTP docs (e.g. http://www.w3.org/Protocols/HTTP/HTR...tml#user-agent ) that User-Agent is a property of an HTTP request. Bingo! All we need is to craft ourselves a nice URLConnection which we call a setRequestProperty("User-Agent","Our-Agent/1.0") method on.

This is the code I made:

Code:
import java.net.*;
import java.io.*;

public class googet
{
        public static void main ( String[] args ) throws IOException {
                try {
                        URL url = new URL("http://www.google.com/search?q=example");
                        URLConnection conn =  url.openConnection();
                        conn.setRequestProperty("User-Agent",
                                        "Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)");
                        BufferedReader in = new BufferedReader(
                                new InputStreamReader(conn.getInputStream())
                        );
                        String str;

                        while ((str = in.readLine()) != null) {
                                System.out.println(str);
                        }

                        in.close();
                }
                catch (MalformedURLException e) {}
                catch (IOException e) {}
        }
}
And running it gives me:
Code:
maui:/home/charlie/src/java# java googet 
<html><head><meta http-equiv=content-type content="text/html; charset=UTF-8"><title>example - Google Search</title><style>div,td,.n a,.n a:visited{color:#000}.ts td,.tc{padding:0}.ts,.tb{border-collapse:collapse}.f{color:#666}.flc,a.fl{color:#77c}a,.w,.q:visited,.q:active,.q,.b a,.b a:visited,.mblink:visited{color:#00c}a:visited{color:#551a8b}a:active{color:red}.t{background:#d5dff3;color:#000;padding:5px 1px 4px}.bb{border-bottom:1px solid #36c}.bt{border-top:1px solid #36c}.j{width:34em}.h{color:#36c;font-size:14px}.i{color:#a90a08}.a{color:green}.z{display:none}div.n{margin-top:1ex}.n a,.n .i{font-size:10pt}.n .i,.b a{font-weight:bold}.b a{font-size:12pt}#np,#nn,.nr,#logo span,.ch{cursor:pointer;cursor:hand}.ta{padding:3px 3px 3px 5px}#tpa2,#tpa3{padding-top:9px}#gbar{float:left;font-weight:bold;height:22px;padding-left:2px}#gbh{border-top:1px solid #c9d7f1;font-size:0;height:0;position:absolute;right:0;top:24px;width:200%}#gbi{background:#fff;border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;font-size:13px;top:24px;z-index:1000}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}}#gbi,.gb2{display:none;position:absolute;width:8em}.gb2{z-index:1001}#gbar a,#gbar a:active,#gbar a:visited{color:#00c;font-weight:normal}.gb2 a,.gb3 a{text-decoration:none}.gb2 a{display:block;padding:.2em .5em}#gbar .gb2 a:hover{background:#36c;color:#fff}.sl,.r{display:inline;font-weight:normal;margin:0}.sl{font-size:84%}.r{font-size:1em}.e{margin:.75em 0}.sm{display:block;margin:0;margin-left:40px}.slk{margin-left:40px}.slk td{font-size:83%;line-height:16px;padding:5px 40px 0 0;vertical-align:top}.n div,#logo span{background:url(/images/nav_logo3.png) no-repeat;height:26px;overflow:hidden}.n .nr{background-position:-60px 0;width:16px}#np{width:44px}#nf{background-position:-26px 0;width:18px}#nc{background-position:-44px 0;width:16px}#nn{margin-right:34px;width:66px}#nl{width:46px}#nn,#nl{background-position:-76px 0}#logo{display:block;height:52px;margin:13px 0 7px;overflow:hidden;position:relative;width:150px}#logo span{background-position:0 -26px;height:100%;left:0;position:absolute;top:0;width:100%}body{font-family:arial,sans-serif}.g{margin:1em 0}#sd{font-size:84%;font-weight:bold}#ap{font-size:64%}</style><script>window.google={kEI:"db0cR721CpywQY2m9PwH",kEXPI:"17259,17497,17735",kHL:"en"};window.clk=function(b,c,d,e,f,g){if(document.images){var a=encodeURIComponent||escape;(new Image).src="/url?sa=T"+(c?"&oi="+a(c):"")+(d?"&cad="+a(d):"")+"&ct="+a(e)+"&cd="+a(f)+(b?"&url="+a(b.replace(/#.*/,"")).replace(/\+/g,"%2B"):"")+"&ei=db0cR721CpywQY2m9PwH"+g}return true};window.gbar={};(function(){;var g=window.gbar,a,f,h;function m(b,e,d){b.display=b.display=="block"?"none":"block";b.left=e+"px";b.top=d+"px"}g.tg=function(b){var e=0,d,c,i,j=0,k=window.navExtra;!f&&(f=document.getElementById("gbar"));!h&&(h=f.getElementsByTagName("span"));(b||window.event).cancelBubble=true;if(!a){a=document.createElement(Array.every||window.createPopup?"iframe":"div");a.frameBorder="0";a.id="gbi";a.scrolling="no";a.src="#";document.body.appendChild(a);if(k)for(var n in k){var l=document.createElement("span");l.appendChild(k[n]);l.className="gb2";f.appendChild(l)}document.onclick=g.close}for(;h[j];j++){c=h[j];i=c.className;if(i=="gb3"){d=c.offsetLeft;while(c=c.offsetParent)d+=c.offsetLeft;m(a.style,d,24)}else if(i=="gb2"){m(c.style,d+1,25+e);e+=20}}a.style.height=e+"px"};g.close=function(b){a&&a.style.display=="block"&&g.tg(b)};})();</script></head><body bgcolor=#ffffff onload="" topmargin=3 marginheight=3><div id=gbar><nobr><span class=gb1>Web</a></span> <span class=gb1><a href="http://images.google.com/images?q=example&um=1&ie=UTF-8&sa=N&tab=wi">Images</a></span> <span class=gb1><a href="http://video.google.com/videosearch?q=example&um=1&ie=UTF-8&sa=N&tab=wv">Video</a></span> <span class=gb1><a href="http://news.google.com/news?q=example&um=1&ie=UTF-8&sa=N&tab=wn">News</a></span> <span class=gb1><a href="http://maps.google.com/maps?q=example&um=1&ie=UTF-8&sa=N&tab=wl">Maps</a></span> <span class=gb1><a href="http://mail.google.com/mail?um=1&ie=UTF-8&sa=N&tab=wm">Mail</a></span> <span class=gb3><a href="http://www.google.com/intl/en/options/" onclick="this.blur();gbar.tg(event);return false">[u]more</u> <span style=font-size:11px>#9660;</span></a></span>
---SNIPPED---

--
Charlie Harvey's website - linux, perl, java, anarchism and punk rock: http://charlieharvey.org.uk
 
Old October 23rd, 2007, 10:38 AM
Registered User
 
Join Date: Oct 2007
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks for a great answer. Your code worked for me too.

Jyrki

 
Old December 15th, 2007, 03:08 AM
jomet
Guest
 
Posts: n/a
Default

thanks for the answer ciderpunx
very informative post


jomet.
---------------------------------------------
Once you start a working on something,
dont be afraid of failure and dont abandon it.
People who work sincerely are the happiest.





Similar Threads
Thread Thread Starter Forum Replies Last Post
Accessing exe in Java pandian Java Basics 1 October 9th, 2006 08:25 PM
Google-like SQL search jemaj21 Access VBA 2 April 7th, 2006 09:52 PM
Google Adsense for Search script in asp.net page claudew BOOK: ASP.NET Website Programming Problem-Design-Solution 6 September 20th, 2004 08:59 PM
Accessing DLL using Java Script sooraj_iyer Javascript 5 September 17th, 2004 04:45 AM
How can I search at Yahoo or Google ? bapechun Classic ASP Basics 1 March 26th, 2004 11:08 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.