|
 |
asptoday_discuss thread: creating crawler for dead links stored in database
Message #1 by "Bryan Ax" <axb@a...> on Wed, 5 Sep 2001 08:57:22
|
|
We have a site designed for middle school students that includes web links
to other sites. However, due to the nature of the Internet, we're finding
that someone is having to monitor our links to other sites, to make sure
the links haven't gone dead, or changed to something that isn't deemed
appropriate. This is taking a lot of time for this person, and I'd like to
explore a solution that would automatically check links, and report via
email any dead links or "sketchy" links...
The links are currently all stored in a SQL database, and dynamically
generated when a user hits the page. Is there any way to use SQL, create a
report of some sort, or an ASP script, that could walk thru the table
that stores the URLs - have it "go" to each page, and return some
information about that page -whether it's dead, or if the meta
description tag has changed (and therefore maybe the content of the site
has changed). If nothing has changed, no problems - however, if the page
is dead, or changed, it could generate an email to someone that they
should manually follow up on that particular link? Is something like this
possible, and if so, where do I start? The logic of what needs to happen
is apparent to me - just need to know ideas for implementing it.
Sincerely,
Bryan Ax
Developer
AMC Cancer Research Center
Message #2 by "Dave Stamper" <davids@s...> on Thu, 27 Sep 2001 13:33:17
|
|
Hi Bryan,
Your postign was a while ago so maybe you got it sorted.
I am doing the same thing at the moment. There are numerous ways to
programmatically connect to a web page and get the text and a status code.
There are some artciles in ASP Today somewhere.
An easy place to start is to use the XMLHTTP object which is part of
Microsft XML parser component.
Here is a snippet of VB code which uses this to connect to a URL and
return the text of the page and a status code :
----------------------------
' This is Vb code rather than ASP
' My project has a reference to Microsoft XML, Version 2.0 (I believe
' the latest version is 3.0 +)
' Code will need to be modified to work in ASP, but you get the picture...
Dim objHTTP As MSXML.XMLHTTPRequest
Dim strURL as String
Dim strResponseText As String
Dim intStatusCode as Integer
strURL = "http://www.mysite.com/mypage.asp"
Set objHTTP = New MSXML.XMLHTTPRequest
' Send a request to the page
objHTTP.open "GET", strURL, False
objHTTP.send
strResponseText = objHTTP.responseText
' Status code will be 404 for a page not found.
intStatusCode = objHTTP.status
Set objHTTP = Nothing
----------------------------
Hope this is useful,
Dave
|
|
 |