p2p.wrox.com Forums

Need to download code?

View our list of code downloads.


Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
Classic ASP Professional For advanced coder questions in ASP 3. NOT for ASP.NET 1.0, 1.1, or 2.0.

Welcome to the p2p.wrox.com Forums.

You are currently viewing the Classic ASP Professional section of the Wrox p2p Programmer to Programmer discussion community. This is a community of more than 40,000 computer programmers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining our free Wrox p2p community you can post your own programming questions and respond to other programmers’ questions. Registered users also don't have to see the ads that are displayed to guests. Registration is fast, simple and absolutely free so please, join today!
Join today and post to win prizes! Post more to increase your chances of being Wrox’s top poster of the month.

Reply
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old December 17th, 2003, 03:51 AM
Authorized User
Points: 203, Level: 4
Points: 203, Level: 4 Points: 203, Level: 4 Points: 203, Level: 4
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: Bromsgrove, Worcestershire, United Kingdom.
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
Default Extracting text between tags

Hi

I've an application which is required to display 2 documents (html) from 2 sources on a single page, the sources are a database and static HTML files.

My question is how do I extract the text between the <BODY> tags of the html page?

-------UPDATE--------
I've gotten this far - The following code does what I want but it's not case sensitive on the tags, how can this be done, Also it was a bit of a fudge geting the numbers right - is this normal.

intStart = instr(strP, "<Body>") + 6
intEnd = instr(strP, "</Body>") - 7
strText = mid(strP, intStart, intEnd)
Response.Write strText

------FURTHER UPDATE--------------
Not very elagant - can anyone improve on this.
It works for now.

if instr(strP, "<body>") > 0 then '1
intStart = instr(strP, "<body>")
else
if instr(strP, "<Body>") > 0 then '2
intStart = instr(strP, "<Body>")
else
if instr(strP, "<BODY>") > 0 then '3
intStart = instr(strP, "<BODY>")
else
intstart = 0
End if '3
End if '2
End if '1


if instr(strP, "</body>") > 0 then '1
intEnd = instr(strP, "</body>")
else
if instr(strP, "</Body>") > 0 then '2
intEnd = instr(strP, "</Body>")
else
if instr(strP, "</BODY>") > 0 then '3
intEnd = instr(strP, "</BODY>")
else
intEnd = 1000
End if '3
End if '2
End if '1

intStart = intStart + 6
intEnd = IntEnd - 7

strText = mid(strP, intStart, intEnd)
Response.Write strText

Thanks in advance

Andy Green
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit!
Reply With Quote
  #2 (permalink)  
Old December 18th, 2003, 05:40 AM
Friend of Wrox
Points: 2,450, Level: 20
Points: 2,450, Level: 20 Points: 2,450, Level: 20 Points: 2,450, Level: 20
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: , , United Kingdom.
Posts: 1,212
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Andy, do you know anything about regular expressions? they are tailor-made for things like this.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit!
Reply With Quote
  #3 (permalink)  
Old December 19th, 2003, 06:29 AM
Authorized User
Points: 203, Level: 4
Points: 203, Level: 4 Points: 203, Level: 4 Points: 203, Level: 4
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: Bromsgrove, Worcestershire, United Kingdom.
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi

No, but now I know what to look for I'll see what I can find out.

Thanks
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit!
Reply With Quote
  #4 (permalink)  
Old December 23rd, 2003, 06:53 AM
Friend of Wrox
Points: 2,450, Level: 20
Points: 2,450, Level: 20 Points: 2,450, Level: 20 Points: 2,450, Level: 20
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: , , United Kingdom.
Posts: 1,212
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Its a bit of a tricky regular expression to match a body tag which may or may not have attributes, and also which may be spread over multiple lines. Try this one:
Code:
' the text between the <BODY></BODY> tags will be stored in variable sBodyText
Set re = New RegExp
re.Pattern = "<body(?:.|\n)*?>((?:.|\n)*)</body>"
re.IgnoreCase = True
re.Global = False
re.Multiline = True
Set oMatches = re.Execute(strP)
If Not oMatches Is Nothing Then
    If oMatches.Count > 0 Then
        If Not oMatches(0).SubMatches Is Nothing Then
            sBodyText = oMatches(0).SubMatches(0)
        End If
    End If
End If
I'll try and explain the regexp:
<body - matches the opening body tag

(?:.|\n)*? - is for attributes, it matches any char, including new-line, zero or more times (the opening ?: prevents the characters here from being stored in the submatches)

((?:.|\n)*) - same again, except we want to capture the whole text between the tags in the submatches collection

</body> - matches the ending body tag

hth
Phil
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit!
Reply With Quote
  #5 (permalink)  
Old December 24th, 2003, 04:25 AM
Authorized User
Points: 203, Level: 4
Points: 203, Level: 4 Points: 203, Level: 4 Points: 203, Level: 4
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: Bromsgrove, Worcestershire, United Kingdom.
Posts: 59
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi

Thanks for this.

I've done some research on Regular Expressions and played with RegExp.

I got my code to extract text but could not get the syntax correct for all cases of <body......>.

I'll give this a try. I'm now a convert to RE's

Thanks again

Andy G
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Reddit!
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
extracting multi-line text. Ceromus C# 0 November 7th, 2008 10:06 PM
Extracting a flexibel amount of text part 2 jmaronilla PHP Databases 0 July 28th, 2008 09:22 PM
Help extracting text from a regular expression crazymanju BOOK: Beginning Regular Expressions 0 April 10th, 2007 06:43 AM
Extracting Text Jeff Mason BOOK: Beginning Regular Expressions 1 October 24th, 2006 07:38 AM
Extracting text between <body> tags sumedha HTML Code Clinic 2 June 29th, 2005 03:37 PM



All times are GMT -4. The time now is 01:19 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
© 2008 Wiley Publishing, Inc