Wrox Home  
Search P2P Archive for: Go

  Return to Index  

asp_web_howto thread: Parsing news from page on remote server


Message #1 by "Rob" <rmcdonald@t...> on Wed, 7 Nov 2001 23:11:02
I need to write some asp to request a page from a remote machine and 

authenticate the request with a username and password, and then strip a 

table out of the html page. We're using win2k and IIS5. The news is only 

available in html format- not xml.

The table contains links to news content. What's the easiest way to do 

this? Do I need a component or I can I build it myself?

Thanks,

R.

Message #2 by Kyle Burns <kburns@c...> on Thu, 8 Nov 2001 10:43:54 -0500
The first thing that you need to do is ensure that what you would like to do

is legal.  Companies are becoming increasingly irate over others stealing

their content because most sites that display news content are paying

somebody for the newsfeed.  That said, what you need is some good old

fashioned string parsing.  Find out what comes immediately before and

immediately after the content that you want.  Use Instr() to find the

position of these delimeters and Mid() to pull out the text between them.

If the HTML is well-formed, you could even build an XML document from the

table on the fly and use the DOM to parse the information.  It's important

to remember that it's important to have the cooperation of whoevers content

you are bringing.  Besides the legal implications discussed earlier, the

best string parsing (or screen scraping) can be ruined by the slightest

change to the format of the page that you are parsing.





=================================

Kyle M. Burns, MCSD, MCT

ECommerce Technology Manager

Centra Credit Union

kburns@c...



 





-----Original Message-----

From: Rob [mailto:rmcdonald@t...]

Sent: Wednesday, November 07, 2001 6:11 PM

To: ASP Web HowTo

Subject: [asp_web_howto] Parsing news from page on remote server





I need to write some asp to request a page from a remote machine and 

authenticate the request with a username and password, and then strip a 

table out of the html page. We're using win2k and IIS5. The news is only 

available in html format- not xml.

The table contains links to news content. What's the easiest way to do 

this? Do I need a component or I can I build it myself?

Thanks,

R.






$subst('Email.Unsub')


  Return to Index