|
 |
asp_web_howto thread: Extracting URL from a page?
Message #1 by "David Murphy" <yomommaissofat@h...> on Wed, 16 Oct 2002 11:31:32
|
|
I'm currently writing a search for my Intranet, I've built one that will
search the text in each file,.. and then I moved on to one that will search
the text on each page as they would be served (as a lot of the pages are
dynamic),..
I want to extend my search engine so that it will spider the site. I've
modified the code and everythiing is ready to go. EXCEPT! I can't work out
a way of extracting URLs from the pages.
I've got the whole code for the page as a string, strPageCode , and I have
an array into which I'll be inserting the URLs I find on each page,
URLArray , everything else is ready to go,.. I just need to extract the
hrefs from strPageCode and insert them into URLArray ...
Any thoughts?
David Murphy
NHS Direct Wales
Message #2 by "George Draper" <gdraper@c...> on Wed, 16 Oct 2002 11:38:58 -0400
|
|
David,
It's just a matter of looking for "href=" and parsing out the following
url. I had good luck using the RegExp object to get the location of
each href, such as:
Set regEx1 = New RegExp
regEx1.Pattern = "href=" ' Set pattern.
regEx1.IgnoreCase = True ' Set case insensitivity.
regEx1.Global = True ' Set global applicability.
Then you can use the Match object to For Each through the match
collection. I used the InStr and Mid functions to extract the url
string.
- George
>>> yomommaissofat@h... 10/16/2002 11:31:32 AM >>>
I'm currently writing a search for my Intranet, I've built one that
will
search the text in each file,.. and then I moved on to one that will
search
the text on each page as they would be served (as a lot of the pages
are
dynamic),..
I want to extend my search engine so that it will spider the site.
I've
modified the code and everythiing is ready to go. EXCEPT! I can't work
out
a way of extracting URLs from the pages.
I've got the whole code for the page as a string, strPageCode , and I
have
an array into which I'll be inserting the URLs I find on each page,
URLArray , everything else is ready to go,.. I just need to extract
the
hrefs from strPageCode and insert them into URLArray ...
Any thoughts?
David Murphy
NHS Direct Wales
---
Improve your web design skills with these new books from Glasshaus.
Usable Web Menus
http://www.amazon.com/exec/obidos/ASIN/1904151027/ref=nosim/theprogramme
r-20
Constructing Accessible Web Sites
http://www.amazon.com/exec/obidos/ASIN/1904151000/ref=nosim/theprogramme
r-20
Practical JavaScript for the Usable Web
http://www.amazon.com/exec/obidos/ASIN/1904151051/ref=nosim/theprogramme
r-20
Message #3 by "David Murphy" <yomommaissofat@h...> on Wed, 16 Oct 2002 16:00:42 +0000
|
|
Thank you,.. I puzzled it out a bit and managed to get my head round it
after a while,.. but glad to know I'm doing it the right way. Comes out as
a nice small bit of code,.. which is usually a good pointer :)
-David
>From: "George Draper" <gdraper@c...>
>Reply-To: "ASP Web HowTo" <asp_web_howto@p...>
>To: "ASP Web HowTo" <asp_web_howto@p...>
>Subject: [asp_web_howto] Re: Extracting URL from a page?
>Date: Wed, 16 Oct 2002 11:38:58 -0400
>
>David,
>
>It's just a matter of looking for "href=" and parsing out the following
>url. I had good luck using the RegExp object to get the location of
>each href, such as:
>
> Set regEx1 = New RegExp
> regEx1.Pattern = "href=" ' Set pattern.
> regEx1.IgnoreCase = True ' Set case insensitivity.
> regEx1.Global = True ' Set global applicability.
>
>Then you can use the Match object to For Each through the match
>collection. I used the InStr and Mid functions to extract the url
>string.
>
>- George
_________________________________________________________________
Unlimited Internet access -- and 2 months free! Try MSN.
http://resourcecenter.msn.com/access/plans/2monthsfree.asp
|
|
 |