Wrox Programmer Forums
|
Pro PHP Advanced PHP coding discussions. Beginning-level questions will be redirected to the Beginning PHP forum.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Pro PHP section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old January 4th, 2005, 03:41 PM
Friend of Wrox
 
Join Date: Aug 2004
Posts: 117
Thanks: 0
Thanked 2 Times in 2 Posts
Default Spider

Hi
Is anybody knows what is the spiders are in the web and what they do?
I did some searches on the web and for php the only thing that i found was phpDig.(The documentation is not realy good).

can anybody gives me the realy simple code that create spider?

Thanks
regards
Mani_he

 
Old January 5th, 2005, 03:01 PM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default

A spider is software that is automated to access web pages and gather certain data from those web pages.

A spider is also a fairly complex program. Depending on what you intend to do with it, so it is unlikely anyone will write an example of one for you here on the fly.

These are also refered to as robots, or bots. You can do some searching on the terms "PHP robot" or "PHP bot" and likely come up with some examples.

Be very careful with bots! You don't want to access pages from a site too quickly. There is a fine line between you being considered a robot and a denial of service attack.

Regards,
Rich

--
[http://www.smilingsouls.net]
Mail_IMAP: A PHP/C-Client/PEAR solution for webmail
Author: Beginning CSS: Cascading Style Sheets For Web Design
 
Old January 5th, 2005, 03:23 PM
Friend of Wrox
 
Join Date: Aug 2004
Posts: 117
Thanks: 0
Thanked 2 Times in 2 Posts
Default

hi
thanks richard.york for your reply.

So the first page of yahoo and the search engine of google all works on spiders. Am I right?

So those hackers somehow can use spiders to get some data from other website. Am I right?

if I learn to how create spider and send it for example to your website and get some information like articles from your website and output it on my website, so you can sue me for that. Am I right?

Thanks
Regards
Mani_he


 
Old January 5th, 2005, 03:42 PM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default

That all depends on where you are in the world, your government's stance on copyright infringement.

In most places, certainly, copyright infringement is a big deal with serious repercussions. In contrast, not such a big deal in places like China.

It also depends on how you present the data that you gather. If you present it as your own you're likely to ruffle someone's feathers. It also depends on whether or not you provide the public with a mechanism to remove their data from your index.

Yahoo doesn't use spiders that I am aware of. Their search is a directory. The difference is each listing is human-reviewed instead of automatically gathered and index by a robot. Google on the other hand does use robots.

Regards,
Rich

--
[http://www.smilingsouls.net]
Mail_IMAP: A PHP/C-Client/PEAR solution for webmail
Author: Beginning CSS: Cascading Style Sheets For Web Design
 
Old January 5th, 2005, 11:09 PM
Friend of Wrox
 
Join Date: Aug 2004
Posts: 117
Thanks: 0
Thanked 2 Times in 2 Posts
Default

Thanks for your reply.
Do you know any books that teach, how to create it?

Thanks
Regards
Mani_he






Similar Threads
Thread Thread Starter Forum Replies Last Post
web spider/crawler hashjane Other Programming Languages 0 January 3rd, 2006 09:28 PM
strip html from spider script results mcalcagno PHP How-To 0 January 31st, 2005 09:14 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.