Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XML
|
XML General XML discussions.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XML section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old March 8th, 2009, 05:58 AM
Registered User
 
Join Date: Nov 2008
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default how can split/replace the <br> with <Para> tags.

need to split content to '&lt;br&gt;' to <Para> tags. since input content is coming as a text,but not as tag to the xsl. how we can split/replace the <br> with <Para> tags.

input xml content

<table border='1' cellspacing='0'>
<tr>
<td valign="center">first cell &lt;br&gt; after BR <a name='12_0'></a>12.0 end of first td</td>
<td>second cell <a href='#12_4'>12.4</a>&lt;br&gt;after br tag second td end</td>
<td>12.4(22)T1&lt;br&gt;12.4(23a)&lt;br&gt;end of thrid td </td>
</tr>
</table>

output XML

<table>
<tr>
<td><Para>first cell </Para>
<Para>after BR <XRef name="12_0"/>12.0 end of first td</Para>
</td>
<td>
<Para>second cell <XRef URL="#12_4">12.4</XRef> </Para>
<Para>after br tag second td end</Para>
</td>
<td>
<Para>12.4(22)T1</Para>
<Para>12.4(23a)</Para>
<Para>end of thrid td </Para>
</td>
</tr>
</table>
 
Old March 8th, 2009, 11:00 AM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

Here is an XSLT 2.0 stylesheet you can use with Saxon 9 or any other XSLT 2.0 processor. Note that it includes the stylesheet http://www.dcarlisle.demon.co.uk/htmlparse.xsl, written by David Carlisle, to parse the escaped HTML (e.g. <br> elements).
Code:
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:d="data:,dpc"
  exclude-result-prefixes="d"
  version="2.0">
  
  <xsl:output method="xml" indent="yes"/>
  
  <xsl:include href="http://www.dcarlisle.demon.co.uk/htmlparse.xsl"/>
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="td">
    <xsl:variable name="phtml">
      <xsl:apply-templates mode="parse"/>
    </xsl:variable>
    <xsl:copy>
      <xsl:for-each-group select="$phtml/node()" group-ending-with="br">
        <Para>
          <xsl:copy-of select="current-group()[not(self::br)]"/>
        </Para>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="a[@name]" mode="parse">
    <XRef name="{@name}"/>
  </xsl:template>
  
  <xsl:template match="a[@href]" mode="parse">
    <XRef URL="{@href}">
      <xsl:apply-templates/>
    </XRef>
  </xsl:template>
  
  <xsl:template match="td/text()" mode="parse">
    <xsl:copy-of select="d:htmlparse(., '', true())"/>
  </xsl:template>
  
</xsl:stylesheet>
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
 
Old March 8th, 2009, 11:11 AM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

The HTML parser by David Carlisle might be overkill if you are sure it is only escape br elements that are in there. In that case the following XSLT stylesheet should suffice:
Code:
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  
  <xsl:output method="xml" indent="yes"/>
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="td">
    <xsl:variable name="phtml">
      <xsl:apply-templates mode="parse"/>
    </xsl:variable>
    <xsl:copy>
      <xsl:for-each-group select="$phtml/node()" group-ending-with="br">
        <Para>
          <xsl:copy-of select="current-group()[not(self::br)]"/>
        </Para>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="a[@name]" mode="parse">
    <XRef name="{@name}"/>
  </xsl:template>
  
  <xsl:template match="a[@href]" mode="parse">
    <XRef URL="{@href}">
      <xsl:apply-templates/>
    </XRef>
  </xsl:template>
  
  <xsl:template match="td/text()" mode="parse">
    <xsl:analyze-string select="." regex="&lt;br&gt;">
      <xsl:matching-substring>
        <br/>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:template>
  
</xsl:stylesheet>
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog





Similar Threads
Thread Thread Starter Forum Replies Last Post
SQL TAGS HIDING eg <BR> simondev Crystal Reports 2 March 8th, 2014 03:33 AM
Replace "<enter>" with "<br>" Varg_88 Classic ASP Basics 5 February 14th, 2011 12:33 PM
Regular Expression to remove <table> </table> tags mathalete CSS Cascading Style Sheets 2 January 23rd, 2006 01:59 PM
Replace vbcrlf with <br> using C# kgriffin ASP.NET 1.0 and 1.1 Basics 2 May 4th, 2005 09:29 AM
<style> tags in a <body> vs. <div> bcat BOOK: Beginning CSS: Cascading Style Sheets for Web Design ISBN: 978-0-7645-7642-3 1 March 27th, 2005 08:50 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.