Wrox Home  
Search P2P Archive for: Go

  Return to Index  

asptoday_discuss thread: creating crawler for dead links stored in database


Message #1 by "Bryan Ax" <axb@a...> on Wed, 5 Sep 2001 08:57:22
We have a site designed for middle school students that includes web links 

to other sites. However, due to the nature of the Internet, we're finding  

that someone is having to monitor our links to other sites, to make sure 

the links haven't gone dead, or changed to something that isn't deemed  

appropriate. This is taking a lot of time for this person, and I'd like to 

explore a solution that would automatically check links, and report via  

email any dead links or "sketchy" links...



The links are currently all stored in a SQL database, and dynamically 

generated when a user hits the page. Is there any way to use SQL, create a 

report  of some sort, or an ASP script, that could walk thru the table 

that stores the URLs - have it "go" to each page, and return some 

information about that  page -whether it's dead, or if the meta 

description tag has changed (and therefore maybe the content of the site 

has changed). If nothing has changed,  no problems - however, if the page 

is dead, or changed, it could generate an email to someone that they 

should manually follow up on that particular  link? Is something like this 

possible, and if so, where do I start? The logic of what needs to happen 

is apparent to me - just need to know ideas for  implementing it.



Sincerely,



Bryan Ax

Developer

AMC Cancer Research Center
Message #2 by "Dave Stamper" <davids@s...> on Thu, 27 Sep 2001 13:33:17
Hi Bryan,

Your postign was a while ago so maybe you got it sorted. 

I am doing the same thing at the moment. There are numerous ways to 

programmatically connect to a web page and get the text and a status code. 

There are some artciles in ASP Today somewhere. 



An easy place to start is to use the XMLHTTP object which is part of 

Microsft XML parser component. 



Here is a snippet of VB code which uses this to connect to a URL and 

return the text of the page and a status code : 





----------------------------

' This is Vb code rather than ASP

' My project has a reference to Microsoft XML, Version 2.0 (I believe 

' the latest version is 3.0 +)

' Code will need to be modified to work in ASP, but you get the picture...



Dim objHTTP As MSXML.XMLHTTPRequest

Dim strURL as String

Dim strResponseText As String

Dim intStatusCode as Integer



strURL = "http://www.mysite.com/mypage.asp"





Set objHTTP = New MSXML.XMLHTTPRequest



' Send a request to the page

objHTTP.open "GET", strURL, False



objHTTP.send





strResponseText = objHTTP.responseText



' Status code will be 404 for a page not found. 



intStatusCode = objHTTP.status







Set objHTTP = Nothing 

----------------------------

Hope this is useful,

Dave

  Return to Index