Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Web Programming > HTML > HTML Code Clinic
|
HTML Code Clinic Do you have some HTML code you'd like to share and get suggestions from others for tweaking or improving it? This discussion is the place.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the HTML Code Clinic section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old June 29th, 2005, 02:04 PM
Registered User
 
Join Date: Jun 2005
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default Extracting text between <body> tags

Hi,

I need to extract text occurring between the <body></body> tags of an html file. I need to collect text from various html files in this manner and concatenate it all into one file that can be preferably stored in pdf format. I am presuming that this can be done using regular expressions. Is there any software that does this kind of parsing? Or a much simpler option? Any suggestions?

Thanks in advance,

Sumedha

 
Old June 29th, 2005, 02:20 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 425
Thanks: 0
Thanked 3 Times in 3 Posts
Default

Do you mean simply strip all tags? I think many text/html editors can do that. I know Notetab does it. You can d/l the trial or the free light version and give it a try.

http://notetab.com

(If you download it what you want is on the Modify menu:
Modify | Strip HTML tags | Preserve URLs)

(o<
//\ =^..^=
 
Old June 29th, 2005, 02:37 PM
Registered User
 
Join Date: Jun 2005
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi,

Not strip the tags but just extract all the text (and images) that exist between the <body></body> tags only and output the same to a common file (that stores similar results obtained from multiple html docs).

Sumedha






Similar Threads
Thread Thread Starter Forum Replies Last Post
Adding a <p> in text that has no xml tags dzisaacs XSLT 1 November 1st, 2005 02:14 PM
<style> tags in a <body> vs. <div> bcat BOOK: Beginning CSS: Cascading Style Sheets for Web Design ISBN: 978-0-7645-7642-3 1 March 27th, 2005 08:50 AM
Response.Write output is of <body></body> yoord BOOK: Beginning ASP.NET 1.0 0 October 13th, 2004 07:06 AM
functions in <HEAD> vs <BODY> John K. King Javascript 4 April 5th, 2004 08:53 AM
Extracting text between tags aware Classic ASP Professional 4 December 24th, 2003 04:25 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.