Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > Java > Java and JDK > Java Basics
Password Reminder
Register
| FAQ | Members List | Search | Today's Posts | Mark Forums Read
Java Basics General beginning Java language questions that don't fit in one of the more specific forums. Please specify what version.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Java Basics section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old October 22nd, 2007, 09:09 AM
Registered User
 
Join Date: Oct 2007
Location: , , .
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default Accessing Google search with Java

I have tried to access Google search using the code below. When I run the program the server returns HTTP response code: 403. I haven't had any problems with other search engines. Does anyone know how to access Google search with programming?

// download text content of URL

import java.net.*;
import java.io.*;

public class Jget
{
  public static void main ( String[] args ) throws IOException
  {
    try
    {
        URL url = new URL("http://www.google.com/search?q=example");

        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        String str;

        while ((str = in.readLine()) != null)
        {
          System.out.println(str);
        }

        in.close();
    }
    catch (MalformedURLException e) {}
    catch (IOException e) {}
  }
}

Reply With Quote
  #2 (permalink)  
Old October 22nd, 2007, 10:13 AM
Friend of Wrox
Points: 1,515, Level: 15
Points: 1,515, Level: 15 Points: 1,515, Level: 15 Points: 1,515, Level: 15
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Dec 2003
Location: Oxford, , United Kingdom.
Posts: 488
Thanks: 0
Thanked 3 Times in 3 Posts
Default

Yeh, it looks like google disallows some user agents. Using the firefox useragent switcher, I pretended to be wget and got a page warning me:

Quote:
quote:
Your client does not have permission to get URL /search?q=http+user+agent&ie=utf-8&oe=utf-8&aq=t&rls=org.debian:en-GB:unofficial&client=iceweasel-a from this server. (Client IP address: 80.68.82.90)

Please see Google's Terms of Service posted at http://www.google.com/terms_of_service.html
OK, well, there's some stuff about not modifying any results you get. Fair do-s. You should read the ToS yourself and make sure that you're not doing anything to breach them.

Anyhow, I hopped back to my user-agent switcher and looked at my default user-agent string. It was
Code:
Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)
Catchy.

So, I need to set the user-agent property. Bit of googling in the docs tells me that:

Quote:
quote:
Opens a connection to this URL and returns an InputStream for reading from that connection.
This method is a shorthand for:
         openConnection().getInputStream()
Right, openConnection() returns a URLConnection object. Now a URLConnection object has a setRequestProperty(String key,String val) method. We know from reading the HTTP docs (e.g. http://www.w3.org/Protocols/HTTP/HTR...tml#user-agent ) that User-Agent is a property of an HTTP request. Bingo! All we need is to craft ourselves a nice URLConnection which we call a setRequestProperty("User-Agent","Our-Agent/1.0") method on.

This is the code I made:

Code:
import java.net.*;
import java.io.*;

public class googet
{
        public static void main ( String[] args ) throws IOException {
                try {
                        URL url = new URL("http://www.google.com/search?q=example");
                        URLConnection conn =  url.openConnection();
                        conn.setRequestProperty("User-Agent",
                                        "Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)");
                        BufferedReader in = new BufferedReader(
                                new InputStreamReader(conn.getInputStream())
                        );
                        String str;

                        while ((str = in.readLine()) != null) {
                                System.out.println(str);
                        }

                        in.close();
                }
                catch (MalformedURLException e) {}
                catch (IOException e) {}
        }
}
And running it gives me:
Code:
maui:/home/charlie/src/java# java googet 
<html><head><meta http-equiv=content-type content="text/html; charset=UTF-8"><title>example - Google Search</title><style>div,td,.n a,.n a:visited{color:#000}.ts td,.tc{padding:0}.ts,.tb{border-collapse:collapse}.f{color:#666}.flc,a.fl{color:#77c}a,.w,.q:visited,.q:active,.q,.b a,.b a:visited,.mblink:visited{color:#00c}a:visited{color:#551a8b}a:active{color:red}.t{background:#d5dff3;color:#000;padding:5px 1px 4px}.bb{border-bottom:1px solid #36c}.bt{border-top:1px solid #36c}.j{width:34em}.h{color:#36c;font-size:14px}.i{color:#a90a08}.a{color:green}.z{display:none}div.n{margin-top:1ex}.n a,.n .i{font-size:10pt}.n .i,.b a{font-weight:bold}.b a{font-size:12pt}#np,#nn,.nr,#logo span,.ch{cursor:pointer;cursor:hand}.ta{padding:3px 3px 3px 5px}#tpa2,#tpa3{padding-top:9px}#gbar{float:left;font-weight:bold;height:22px;padding-left:2px}#gbh{border-top:1px solid #c9d7f1;font-size:0;height:0;position:absolute;right:0;top:24px;width:200%}#gbi{background:#fff;border:1px solid;border-color:#c9d7f1 #36c #36c #a2bae7;font-size:13px;top:24px;z-index:1000}#guser{padding-bottom:7px !important}#gbar,#guser{font-size:13px;padding-top:1px !important}@media all{.gb1,.gb3{height:22px;margin-right:.73em;vertical-align:top}}#gbi,.gb2{display:none;position:absolute;width:8em}.gb2{z-index:1001}#gbar a,#gbar a:active,#gbar a:visited{color:#00c;font-weight:normal}.gb2 a,.gb3 a{text-decoration:none}.gb2 a{display:block;padding:.2em .5em}#gbar .gb2 a:hover{background:#36c;color:#fff}.sl,.r{display:inline;font-weight:normal;margin:0}.sl{font-size:84%}.r{font-size:1em}.e{margin:.75em 0}.sm{display:block;margin:0;margin-left:40px}.slk{margin-left:40px}.slk td{font-size:83%;line-height:16px;padding:5px 40px 0 0;vertical-align:top}.n div,#logo span{background:url(/images/nav_logo3.png) no-repeat;height:26px;overflow:hidden}.n .nr{background-position:-60px 0;width:16px}#np{width:44px}#nf{background-position:-26px 0;width:18px}#nc{background-position:-44px 0;width:16px}#nn{margin-right:34px;width:66px}#nl{width:46px}#nn,#nl{background-position:-76px 0}#logo{display:block;height:52px;margin:13px 0 7px;overflow:hidden;position:relative;width:150px}#logo span{background-position:0 -26px;height:100%;left:0;position:absolute;top:0;width:100%}body{font-family:arial,sans-serif}.g{margin:1em 0}#sd{font-size:84%;font-weight:bold}#ap{font-size:64%}</style><script>window.google={kEI:"db0cR721CpywQY2m9PwH",kEXPI:"17259,17497,17735",kHL:"en"};window.clk=function(b,c,d,e,f,g){if(document.images){var a=encodeURIComponent||escape;(new Image).src="/url?sa=T"+(c?"&oi="+a(c):"")+(d?"&cad="+a(d):"")+"&ct="+a(e)+"&cd="+a(f)+(b?"&url="+a(b.replace(/#.*/,"")).replace(/\+/g,"%2B"):"")+"&ei=db0cR721CpywQY2m9PwH"+g}return true};window.gbar={};(function(){;var g=window.gbar,a,f,h;function m(b,e,d){b.display=b.display=="block"?"none":"block";b.left=e+"px";b.top=d+"px"}g.tg=function(b){var e=0,d,c,i,j=0,k=window.navExtra;!f&&(f=document.getElementById("gbar"));!h&&(h=f.getElementsByTagName("span"));(b||window.event).cancelBubble=true;if(!a){a=document.createElement(Array.every||window.createPopup?"iframe":"div");a.frameBorder="0";a.id="gbi";a.scrolling="no";a.src="#";document.body.appendChild(a);if(k)for(var n in k){var l=document.createElement("span");l.appendChild(k[n]);l.className="gb2";f.appendChild(l)}document.onclick=g.close}for(;h[j];j++){c=h[j];i=c.className;if(i=="gb3"){d=c.offsetLeft;while(c=c.offsetParent)d+=c.offsetLeft;m(a.style,d,24)}else if(i=="gb2"){m(c.style,d+1,25+e);e+=20}}a.style.height=e+"px"};g.close=function(b){a&&a.style.display=="block"&&g.tg(b)};})();</script></head><body bgcolor=#ffffff onload="" topmargin=3 marginheight=3><div id=gbar><nobr><span class=gb1>Web</a></span> <span class=gb1><a href="http://images.google.com/images?q=example&um=1&ie=UTF-8&sa=N&tab=wi">Images</a></span> <span class=gb1><a href="http://video.google.com/videosearch?q=example&um=1&ie=UTF-8&sa=N&tab=wv">Video</a></span> <span class=gb1><a href="http://news.google.com/news?q=example&um=1&ie=UTF-8&sa=N&tab=wn">News</a></span> <span class=gb1><a href="http://maps.google.com/maps?q=example&um=1&ie=UTF-8&sa=N&tab=wl">Maps</a></span> <span class=gb1><a href="http://mail.google.com/mail?um=1&ie=UTF-8&sa=N&tab=wm">Mail</a></span> <span class=gb3><a href="http://www.google.com/intl/en/options/" onclick="this.blur();gbar.tg(event);return false">[u]more</u> <span style=font-size:11px>#9660;</span></a></span>
---SNIPPED---

--
Charlie Harvey's website - linux, perl, java, anarchism and punk rock: http://charlieharvey.org.uk
Reply With Quote
  #3 (permalink)  
Old October 23rd, 2007, 10:38 AM
Registered User
 
Join Date: Oct 2007
Location: , , .
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks for a great answer. Your code worked for me too.

Jyrki

Reply With Quote
  #4 (permalink)  
Old December 15th, 2007, 03:08 AM
jomet
Guest
 
Posts: n/a
Default

thanks for the answer ciderpunx
very informative post


jomet.
---------------------------------------------
Once you start a working on something,
dont be afraid of failure and dont abandon it.
People who work sincerely are the happiest.
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Accessing exe in Java pandian Java Basics 1 October 9th, 2006 08:25 PM
Google-like SQL search jemaj21 Access VBA 2 April 7th, 2006 09:52 PM
Google Adsense for Search script in asp.net page claudew BOOK: ASP.NET Website Programming Problem-Design-Solution 6 September 20th, 2004 08:59 PM
Accessing DLL using Java Script sooraj_iyer Javascript 5 September 17th, 2004 04:45 AM
How can I search at Yahoo or Google ? bapechun Classic ASP Basics 1 March 26th, 2004 11:08 PM



All times are GMT -4. The time now is 12:36 AM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.