Wrox Programmer Forums
|
BOOK: Beginning PHP4/PHP 5 ISBN: 978-0-7645-4364-7; v5 ISBN: 978-0-7645-5783-5
This is the forum to discuss the Wrox book Beginning PHP4 by Wankyu Choi, Allan Kent, Chris Lea, Ganesh Prasad, Chris Ullman; ISBN: 9780764543647
Welcome to the p2p.wrox.com Forums.

You are currently viewing the BOOK: Beginning PHP4/PHP 5 ISBN: 978-0-7645-4364-7; v5 ISBN: 978-0-7645-5783-5 section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old February 13th, 2004, 02:38 PM
Registered User
 
Join Date: Feb 2004
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default Search Engine in PHP

How do I access all of the URLs in the world using PHP so that I can write my own search engine? I already know how to call other search engines (such as google) using class Snoopy. How do I get to all of the URLs directly? Or by “type of URL” (for whatever “types” there are)?

[email protected]


 
Old February 13th, 2004, 03:15 PM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default

Well you could write a PHP Bot. Can it be done?? Absolutely.

What you need is a regular expression <html> parser to extract URL's from a webpage, and then a mechanism to put those URL's in queue. Finding a balance between excess resource consumption and a healthy robot is quite a challenge. Doing so with cronjobs would be the likely solution, schedule the robot to run on intervals, then sleep, then run...

The question is do you how many TERABYTES of free space do you have for all the information that such a bot would gather? If you're doing a Google-like search, then your free space will be very quickly consumed.

My suggestion is look at dmoz.org, if you're determined to have your own search, they make their directory freely available to anyone who wants it. Aside from Google's bot, Dmoz is one of Google's primary sources of information.

If you do go to all the trouble of writing a bot, be very careful with how quickly it parses information from one website, because, one man's bot can quickly become another's denial of service attack!

And then there's always Google, run a search for PHP Robot, chances are someone's already written it. Writing a robot is a complex application, I don't think anyone here has the time or energy to go into the details of what that would require.

: )
Rich

:::::::::::::::::::::::::::::::::
Smiling Souls
http://www.smilingsouls.net
:::::::::::::::::::::::::::::::::





Similar Threads
Thread Thread Starter Forum Replies Last Post
search engine iralala ASP.NET 2.0 Basics 6 September 18th, 2008 04:26 AM
search engine dhoward SQL Server 2000 4 April 11th, 2007 12:42 AM
php search engine sunsetbay Beginning PHP 0 June 30th, 2005 03:40 AM
Search Engine for Full-text Search Kala ASP.NET 1.0 and 1.1 Professional 2 August 29th, 2004 02:16 AM
Search engine that search through local drive! wenzation Classic ASP Basics 0 August 26th, 2003 09:15 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.