|
|
 |
| Classic ASP Professional For advanced coder questions in ASP 3. NOT for ASP.NET 1.0, 1.1, or 2.0. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the Classic ASP Professional section of the Wrox p2p Programmer to Programmer discussion community. This is a community of more than 40,000 computer programmers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining our free Wrox p2p community you can post your own programming questions and respond to other programmers’ questions. Registered users also don't have to see the ads that are displayed to guests. Registration is fast, simple and absolutely free so please, join today!
Join today and post to win prizes! Post more to increase your chances of being Wrox’s top poster of the month.
|
 |

December 17th, 2003, 03:51 AM
|
|
Authorized User
|
|
Join Date: Jun 2003
Location: Bromsgrove, Worcestershire, United Kingdom.
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Extracting text between tags
Hi
I've an application which is required to display 2 documents (html) from 2 sources on a single page, the sources are a database and static HTML files.
My question is how do I extract the text between the <BODY> tags of the html page?
-------UPDATE--------
I've gotten this far - The following code does what I want but it's not case sensitive on the tags, how can this be done, Also it was a bit of a fudge geting the numbers right - is this normal.
intStart = instr(strP, "<Body>") + 6
intEnd = instr(strP, "</Body>") - 7
strText = mid(strP, intStart, intEnd)
Response.Write strText
------FURTHER UPDATE--------------
Not very elagant - can anyone improve on this.
It works for now.
if instr(strP, "<body>") > 0 then '1
intStart = instr(strP, "<body>")
else
if instr(strP, "<Body>") > 0 then '2
intStart = instr(strP, "<Body>")
else
if instr(strP, "<BODY>") > 0 then '3
intStart = instr(strP, "<BODY>")
else
intstart = 0
End if '3
End if '2
End if '1
if instr(strP, "</body>") > 0 then '1
intEnd = instr(strP, "</body>")
else
if instr(strP, "</Body>") > 0 then '2
intEnd = instr(strP, "</Body>")
else
if instr(strP, "</BODY>") > 0 then '3
intEnd = instr(strP, "</BODY>")
else
intEnd = 1000
End if '3
End if '2
End if '1
intStart = intStart + 6
intEnd = IntEnd - 7
strText = mid(strP, intStart, intEnd)
Response.Write strText
Thanks in advance
Andy Green
|

December 18th, 2003, 05:40 AM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Location: , , United Kingdom.
Posts: 1,212
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Andy, do you know anything about regular expressions? they are tailor-made for things like this.
|

December 19th, 2003, 06:29 AM
|
|
Authorized User
|
|
Join Date: Jun 2003
Location: Bromsgrove, Worcestershire, United Kingdom.
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Hi
No, but now I know what to look for I'll see what I can find out.
Thanks
|

December 23rd, 2003, 06:53 AM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Location: , , United Kingdom.
Posts: 1,212
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Its a bit of a tricky regular expression to match a body tag which may or may not have attributes, and also which may be spread over multiple lines. Try this one:
Code:
' the text between the <BODY></BODY> tags will be stored in variable sBodyText
Set re = New RegExp
re.Pattern = "<body(?:.|\n)*?>((?:.|\n)*)</body>"
re.IgnoreCase = True
re.Global = False
re.Multiline = True
Set oMatches = re.Execute(strP)
If Not oMatches Is Nothing Then
If oMatches.Count > 0 Then
If Not oMatches(0).SubMatches Is Nothing Then
sBodyText = oMatches(0).SubMatches(0)
End If
End If
End If
I'll try and explain the regexp:
<body - matches the opening body tag
(?:.|\n)*? - is for attributes, it matches any char, including new-line, zero or more times (the opening ?: prevents the characters here from being stored in the submatches)
((?:.|\n)*) - same again, except we want to capture the whole text between the tags in the submatches collection
</body> - matches the ending body tag
hth
Phil
|

December 24th, 2003, 04:25 AM
|
|
Authorized User
|
|
Join Date: Jun 2003
Location: Bromsgrove, Worcestershire, United Kingdom.
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Hi
Thanks for this.
I've done some research on Regular Expressions and played with RegExp.
I got my code to extract text but could not get the syntax correct for all cases of <body......>.
I'll give this a try. I'm now a convert to RE's
Thanks again
Andy G
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
 |