Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Web Programming > JavaScript > Javascript
|
Javascript General Javascript discussions.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Javascript section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old September 19th, 2006, 10:09 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 139
Thanks: 0
Thanked 0 Times in 0 Posts
Default extract content from file

Hi,

using xmlHTTP through an hta, I'm able to get back a broken web page that sadly has errors, and I can't fix it (outside of my control)... Now I get the text back, but I would like to know how could I extract a section of the text that is not corrupt... In essense I would need to get everything below a comment, and then take that bit, and through it in a Div so I can Dom it...



 
Old September 19th, 2006, 10:15 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 139
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi,

To clarify, the data that I want is in the last table... so if I use regex (!<table>.*</table>!s), how do I get the last table from the text?

 
Old September 19th, 2006, 10:38 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

A different approach is to load into a HTML Dom document:
Code:
var oDoc = new ActiveXObject("htmlfile"); 
oDoc.open();
var sHtml = "<html><head><title>Test</title></head><body>Hello 
World</body></html>"; 
oDoc.write(sHtml);
oDoc.close();
alert(oDoc.documentElement.outerHTML);
Obviously you'll need to find the table, show the actual HTML if still stuck.

Using your method with regexp you need to get the second match object after perform the search.

http://msdn.microsoft.com/library/de...b8ed696d0a.asp

--

Joe (Microsoft MVP - XML)
 
Old September 19th, 2006, 11:31 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 139
Thanks: 0
Thanked 0 Times in 0 Posts
Default

No that approach cannot work... as the page that I'm getting back was very poorly written, and there is javscript within the <body> that doesn't work, and generates errors once it loads... my best solution is to do the regex, and use the exec to pull build the array... I need help to build the regex expression, as !<table>.*</table>!s does not work @ all...

How can make a regex to grab the tables...?

 
Old September 19th, 2006, 12:29 PM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 139
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Ok, I've managed to make this one... which works fine (ish) for the first table.

var myRe = new RegExp("<table((.|\n)*?)</table>","ig");

Problem now, is that there are embedded tables... so this does not work... I just need to get the last table tags of the text... any suggestions?






Similar Threads
Thread Thread Starter Forum Replies Last Post
ZipCode.txt sample extract file beath SQL Server 2005 6 May 17th, 2010 09:42 PM
How to extract the download file? glam BOOK: Beginning Microsoft Visual C# 2008 ISBN: 978-0-470-19135-4 2 February 8th, 2010 11:49 AM
Extract the XML file from CDATA haixia XSLT 2 August 3rd, 2006 02:01 PM
extract text and image content from .ppt file pratik28 C# 0 May 11th, 2006 06:20 AM
Extract from file cannot be signed connlichan Infopath 0 October 24th, 2005 08:10 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.