|
 |
asp_web_howto thread: Parsing news from page on remote server
Message #1 by "Rob" <rmcdonald@t...> on Wed, 7 Nov 2001 23:11:02
|
|
I need to write some asp to request a page from a remote machine and
authenticate the request with a username and password, and then strip a
table out of the html page. We're using win2k and IIS5. The news is only
available in html format- not xml.
The table contains links to news content. What's the easiest way to do
this? Do I need a component or I can I build it myself?
Thanks,
R.
Message #2 by Kyle Burns <kburns@c...> on Thu, 8 Nov 2001 10:43:54 -0500
|
|
The first thing that you need to do is ensure that what you would like to do
is legal. Companies are becoming increasingly irate over others stealing
their content because most sites that display news content are paying
somebody for the newsfeed. That said, what you need is some good old
fashioned string parsing. Find out what comes immediately before and
immediately after the content that you want. Use Instr() to find the
position of these delimeters and Mid() to pull out the text between them.
If the HTML is well-formed, you could even build an XML document from the
table on the fly and use the DOM to parse the information. It's important
to remember that it's important to have the cooperation of whoevers content
you are bringing. Besides the legal implications discussed earlier, the
best string parsing (or screen scraping) can be ruined by the slightest
change to the format of the page that you are parsing.
=================================
Kyle M. Burns, MCSD, MCT
ECommerce Technology Manager
Centra Credit Union
kburns@c...
-----Original Message-----
From: Rob [mailto:rmcdonald@t...]
Sent: Wednesday, November 07, 2001 6:11 PM
To: ASP Web HowTo
Subject: [asp_web_howto] Parsing news from page on remote server
I need to write some asp to request a page from a remote machine and
authenticate the request with a username and password, and then strip a
table out of the html page. We're using win2k and IIS5. The news is only
available in html format- not xml.
The table contains links to news content. What's the easiest way to do
this? Do I need a component or I can I build it myself?
Thanks,
R.
$subst('Email.Unsub')
|
|
 |