Wrox Programmer Forums
|
Classic ASP Professional For advanced coder questions in ASP 3. NOT for ASP.NET 1.0, 1.1, or 2.0.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Classic ASP Professional section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old December 17th, 2003, 03:51 AM
Authorized User
 
Join Date: Jun 2003
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
Default Extracting text between tags

Hi

I've an application which is required to display 2 documents (html) from 2 sources on a single page, the sources are a database and static HTML files.

My question is how do I extract the text between the <BODY> tags of the html page?

-------UPDATE--------
I've gotten this far - The following code does what I want but it's not case sensitive on the tags, how can this be done, Also it was a bit of a fudge geting the numbers right - is this normal.

intStart = instr(strP, "<Body>") + 6
intEnd = instr(strP, "</Body>") - 7
strText = mid(strP, intStart, intEnd)
Response.Write strText

------FURTHER UPDATE--------------
Not very elagant - can anyone improve on this.
It works for now.

if instr(strP, "<body>") > 0 then '1
intStart = instr(strP, "<body>")
else
if instr(strP, "<Body>") > 0 then '2
intStart = instr(strP, "<Body>")
else
if instr(strP, "<BODY>") > 0 then '3
intStart = instr(strP, "<BODY>")
else
intstart = 0
End if '3
End if '2
End if '1


if instr(strP, "</body>") > 0 then '1
intEnd = instr(strP, "</body>")
else
if instr(strP, "</Body>") > 0 then '2
intEnd = instr(strP, "</Body>")
else
if instr(strP, "</BODY>") > 0 then '3
intEnd = instr(strP, "</BODY>")
else
intEnd = 1000
End if '3
End if '2
End if '1

intStart = intStart + 6
intEnd = IntEnd - 7

strText = mid(strP, intStart, intEnd)
Response.Write strText

Thanks in advance

Andy Green
 
Old December 18th, 2003, 05:40 AM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 1,212
Thanks: 0
Thanked 1 Time in 1 Post
Default

Andy, do you know anything about regular expressions? they are tailor-made for things like this.
 
Old December 19th, 2003, 06:29 AM
Authorized User
 
Join Date: Jun 2003
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi

No, but now I know what to look for I'll see what I can find out.

Thanks
 
Old December 23rd, 2003, 06:53 AM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 1,212
Thanks: 0
Thanked 1 Time in 1 Post
Default

Its a bit of a tricky regular expression to match a body tag which may or may not have attributes, and also which may be spread over multiple lines. Try this one:
Code:
' the text between the <BODY></BODY> tags will be stored in variable sBodyText
Set re = New RegExp
re.Pattern = "<body(?:.|\n)*?>((?:.|\n)*)</body>"
re.IgnoreCase = True
re.Global = False
re.Multiline = True
Set oMatches = re.Execute(strP)
If Not oMatches Is Nothing Then
    If oMatches.Count > 0 Then
        If Not oMatches(0).SubMatches Is Nothing Then
            sBodyText = oMatches(0).SubMatches(0)
        End If
    End If
End If
I'll try and explain the regexp:
<body - matches the opening body tag

(?:.|\n)*? - is for attributes, it matches any char, including new-line, zero or more times (the opening ?: prevents the characters here from being stored in the submatches)

((?:.|\n)*) - same again, except we want to capture the whole text between the tags in the submatches collection

</body> - matches the ending body tag

hth
Phil
 
Old December 24th, 2003, 04:25 AM
Authorized User
 
Join Date: Jun 2003
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi

Thanks for this.

I've done some research on Regular Expressions and played with RegExp.

I got my code to extract text but could not get the syntax correct for all cases of <body......>.

I'll give this a try. I'm now a convert to RE's

Thanks again

Andy G





Similar Threads
Thread Thread Starter Forum Replies Last Post
extracting multi-line text. Ceromus C# 0 November 7th, 2008 10:06 PM
Extracting a flexibel amount of text part 2 jmaronilla PHP Databases 0 July 28th, 2008 08:22 PM
Help extracting text from a regular expression crazymanju BOOK: Beginning Regular Expressions 0 April 10th, 2007 05:43 AM
Extracting Text Jeff Mason BOOK: Beginning Regular Expressions 1 October 24th, 2006 06:38 AM
Extracting text between <body> tags sumedha HTML Code Clinic 2 June 29th, 2005 02:37 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.