You don't even need to do any socket stuff if allow_url_fopen is turned
on, just use
$data = file( "http://www.somesite.com/sompage.html" );
and then you've got everything in the page in the array $data.
You could use fopen()/fgets() and just search each line one at a time
until you find what you're looking for, but that means holding the
connection open and making multiple requests for data. Since PHP doesn't
have any firm upper limit on the size of a string, I'd just use file(),
glom it together into a single string with implode() and search the
string -- if you're that worried about memory, do
$content = implode( $data );
unset( $data );
If you're running PHP 4.3 or above, you can get the entire contents of
the file as a single string $content using
$content = file_get_contents( "http://www.somesite.com/sompage.html" );
and save yourself the intermediate step.
Since your typical Web page doesn't usually get *that* big (maybe
75-100k of text max, if that much?) I really wouldn't obsess on the
memory issue too much. (Not that you should flagrantly waste it to
excess, but no need to cringe over every byte, either!)
You can't inspect any content that you don't somehow load into one or
more variables, if that's what you mean. One thing I would watch so far
as resource menagement goes: use str_replace()rather than regexp's
whenever possible -- regular expressions can eat up a lot of memeory in
hurry, if you let them get out of hand.
Just my 2 cents' Australian, I guess. :)
j.
professional php digest wrote:
>
> Subject: screen scraping with PHP
> From: "Christopher Janney" <E-MAIL REMOVED>
> Date: Sat, 1 Mar 2003 13:03:57 -0800
> X-Message-Number: 2
>
> I'm trying to build a 'screen scraping' class that is general enough to
> access a broad range of sites for the same info. That's for me to figure
> out. The question is what is the best approach to getting the page? Open a
> an http socket, request the page, dump the page into an array, search the
> array and pow! done? That sounds like a lot of wasted memory to me, but
> I've only built a shopping cart and dynamic pages in PHP so far. No socket
> stuff yet.
>
>
> TIA,
>
> -ctj
--
jon stephens
<zontar@m...>
http://hiveminds.info/ HiveMinds Group
http://phpuddi.sourceforge.net/ phpUDDI Project
http://www.wrox.com/ Wrox Press "Programmer To Programmer"
http://www.glasshaus.com/ glasshaus "Web Developer To Web Developer"