Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Other Programming > BOOK: Beginning Regular Expressions
BOOK: Beginning Regular Expressions
This is the forum to discuss the Wrox book Beginning Regular Expressions by Andrew Watt; ISBN: 9780764574894
Welcome to the p2p.wrox.com Forums.

You are currently viewing the BOOK: Beginning Regular Expressions section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
Old May 26th, 2008, 03:46 AM
Registered User
Join Date: May 2008
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default Issue with replace function using regex


I'm working with a cms tool called Smartsite. I'm not that good yet with regular expressions. I found allot solutions for my problems.

 But i'm having problems with 2 expressions.

First issue

<td style='BORDER-RIGHT: windowtext 1pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 1pt solid; PADDING-LEFT: 5.4pt; PADDING-BOTTOM: 0cm; BORDER-LEFT: windowtext 1pt solid; WIDTH: 426.45pt; PADDING-TOP: 0cm; BORDER-BOTTOM: windowtext 1pt solid; BACKGROUND-COLOR: transparent' valign='top' width='569' colspan='2'>

This is what the msword converter does with a td tag.

I placed this expression text.replace(/style[^<]*/gi,''); and i only keep this <td> instead of all that crap. But i like to keep the colspan="2" between the td tag.
I tried allot of expression but none of them works.

Second issue
I'm working with xhtml, img tags need to be closed at the end like <img src=""/>
It's not doing that automaticly with the mswordconverter to html, its not closing the image tag

<img height="301" src="[urlprefix]Docs/daniel/ConvertedWordFile5334240.html_files/image002.jpg" width="567">

Is there a way to say that it needs to keep everything between the image tag but only place a "\" to close the tag like -> text = text.replace(/<img *> /gi,'<img* \/>');

 I hope you can help me, i'm already searching 4 houres after this.

thx for helping.

Regards Colemonts Peter

This is my code so far. It's used to convert imported word document to plain html

   var text = edit().getHTML();
   text = text.replace(//gi,'');
   text = text.replace(/<\/font>/gi,'');
   text = text.replace(/<span[^<]*>/gi,'');
   text = text.replace(/<\/span>/gi,'');
   text = text.replace(/<ins[^<]*>/gi,'');
   text = text.replace(/<\/ins>/gi,'');
   text = text.replace(/style[^<]*/gi,'');
   text = text.replace(/<table[^<]*>/gi,'<table border=1>');
   text = text.replace(/<p[^<]*>/gi,'<p>');
   text = text.replace(/<div[^<]*>/gi,'<div>');
   text = text.replace(/<h1[^<]*>/gi,'<h1>');
   text = text.replace(/<h2[^<]*>/gi,'<h2>');
   text = text.replace(/<h1><b[^<]*>/gi,'<h1>');
   text = text.replace(/<\/b><\/h1>/gi,'</h1>');
   text = text.replace(/<h2[^<]*><b>/gi,'<h2>');
   text = text.replace(/<\/b><\/h2>/gi,'</h2>');
   text = text.replace(/<ul[^<]*>/gi,'[list]');
   text = text.replace(/<li[^<]*>/gi,'<li>');
   text = text.replace(/<a[^<]*>/gi,'');
   text = text.replace(/<\/a>/gi,'');
   text = text.replace(/.nbsp;/gi,'');
   text = text.replace(/<p>\s*<\/p>/gi,'');
   text = text.replace(/\r+/g,'\r');
   text = text.replace(/\n+/g,'\n');
   text = text.replace(/(\r\n)+/g,'\r\n');
   text = text.replace(/<p>(-|.middot;|.bull;)\s*/gi,'<p>-');

Similar Threads
Thread Thread Starter Forum Replies Last Post
replace text without using regex mrame XSLT 7 July 28th, 2008 09:12 AM
Checking regex with replace and URI test rjonk XSLT 1 July 28th, 2006 05:19 PM
replace function keyvanjan Classic ASP Basics 3 May 15th, 2006 12:57 AM
replace function Adam H-W Classic ASP Basics 3 September 20th, 2004 09:09 PM

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.