 |
| XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

February 3rd, 2010, 07:16 AM
|
|
Registered User
|
|
Join Date: Feb 2010
Posts: 4
Thanks: 3
Thanked 0 Times in 0 Posts
|
|
Escaped CDATA - outputting HTML
Hi, I am trying to consume the XML output from a webservice and extract the XML and translate it to HTML using XSL.
I have run into an issue where the content of the nodes is surrounded with CDATA which is escaped, as is the content of the CDATA. I would like to be able to remove the CDATA and output the contents as unescaped HTML.
Here's a sample of the XML, any suggestions gratefully received.
Code:
<?xml version="1.0" encoding="UTF-8"?>
<Articles>
<Article>
<ID>{GUID}</ID>
<Title>Article TItle</Title>
<DatePublished>2008-11-26 12:19:00</DatePublished>
<Intro><![CDATA[<strong>Intro text that is surrounded by a strong tag. </strong>]]></Intro>
<Summary><![CDATA[<![CDATA[Summary text that sometimes has a double CDATA tag for some reason. May also have strong or p tags also. ]]>]]></Summary>
<Url></Url>
</Article>
</Articles>
Thanks,
Matt
|
|

February 3rd, 2010, 07:30 AM
|
 |
Friend of Wrox
|
|
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
|
|
I'm fairly sure that double CData elements is invalid XML, unless one of the CData elements is encoded as well.
Saxon contains a method called saxon:parse which can be used to part an elements text as if it were XML. You can then output this using xsl:copy for example:
<xsl:copy select="saxon:parse(//Info)" xmlns:saxon="http://saxon.sf.net/"/>
Last edited by samjudson; February 3rd, 2010 at 07:37 AM..
|
|
The Following User Says Thank You to samjudson For This Useful Post:
|
|
|

February 3rd, 2010, 07:33 AM
|
|
Friend of Wrox
|
|
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
|
|
That is not easy to solve as the CDATA section markup is also escpaped.
If you had e.g.
Code:
<Intro><![CDATA[<strong;>foobar</strong>]]></Intro>
you could simply use disable-output-escaping as in
Code:
<xsl:template match="Intro">
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:template>
But as your CDATA section markup is also escaped that approach does not work. Which XSLT processor do you use? In case of Saxon 9 you might be able to parse the contents of the 'Intro' or 'Summary' elements with an extension function.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
|
|

February 3rd, 2010, 07:50 AM
|
|
Registered User
|
|
Join Date: Feb 2010
Posts: 4
Thanks: 3
Thanked 0 Times in 0 Posts
|
|
Hi, thanks for your responses.
I wasn't using any XSLT processor at the moment, just hand coding the transform with basic templates and <xsl:value-of type stuff.
Perhaps I should investigate Saxon? The Parse solution looks good.
All the examples I have found so far online seem to have the CDATA unescaped which makes a lot more sense to me than to escape the lot.
I was wondering if I could do a simple replace on the starting and ending CDATA, replacing with an empty string and then unescape the rest? Does that sound reasonable? It might get around the illogical inclusion of double CDATAs.
Thanks again,
Matt
|
|

February 3rd, 2010, 08:06 AM
|
 |
Friend of Wrox
|
|
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
|
|
Apologies, since Martin posted I've double checked - and the parse method doesn't work I'm afraid - at least not with the CData escaped as well.
|
|

February 3rd, 2010, 08:44 AM
|
|
Friend of Wrox
|
|
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
|
|
Quote:
Originally Posted by mattisimo
Hi, thanks for your responses.
I wasn't using any XSLT processor at the moment, just hand coding the transform with basic templates and <xsl:value-of type stuff.
Perhaps I should investigate Saxon? The Parse solution looks good.
All the examples I have found so far online seem to have the CDATA unescaped which makes a lot more sense to me than to escape the lot.
I was wondering if I could do a simple replace on the starting and ending CDATA, replacing with an empty string and then unescape the rest? Does that sound reasonable? It might get around the illogical inclusion of double CDATAs.
Thanks again,
Matt
|
If you use XSLT 2.0 and you know those elements like 'Intro' or 'Summary' start and end with that escaped CDATA section markup then you could remove it as follows and then output the escaped HTML markup:
Code:
<xsl:template match="Intro | Summary">
<xsl:value-of select="replace(., '^(<!\[CDATA\[)+|(\]\]>)+$', '')" disable-output-escaping="yes"/>
</xsl:template>
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
Last edited by Martin Honnen; February 3rd, 2010 at 09:52 AM..
Reason: correcting typo
|
|
The Following User Says Thank You to Martin Honnen For This Useful Post:
|
|
|

February 3rd, 2010, 10:48 AM
|
|
Registered User
|
|
Join Date: Feb 2010
Posts: 4
Thanks: 3
Thanked 0 Times in 0 Posts
|
|
Hi again, thanks for your suggestion.
I've tried this approach in Altova XML Spy 2004 and it said that the replace function wasn't valid.
I tried writing as ASP.Net page to implement it instead in case this version of the software didn't support it and I get the same message:
'replace()' is an unknown XSLT function.
Here's the sample stylesheet:
Code:
<?xml version="1.0" encoding="UTF-16"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://fxsl.sf.net/">
<xsl:output method="xml" encoding="utf-16" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="LiveWellArticle">
<xsl:apply-templates select="Intro"/>
</xsl:template>
<xsl:template match="Intro">
<xsl:param name="text" select="."/>
<br/><xsl:value-of select="replace($text, '^(<!\[CDATA\[)+|(\]\]>)+$', '')" disable-output-escaping="yes"/>
<br/><xsl:value-of select="$text" disable-output-escaping="yes"/>
<br/><xsl:value-of select="." disable-output-escaping="yes"></xsl:value-of>
</xsl:template>
</xsl:stylesheet>
In XML Pad it just seems to ignore the line with the replace as it outputs nothing for that line and outputs the full contents including the CDATA for the other two lines.
Am I missing something with the implementation of <xsl:value-of select="replace(.......)">?
Thanks again for your help with this. As you may have realised I'm new to XSL.
|
|

February 3rd, 2010, 10:54 AM
|
|
Friend of Wrox
|
|
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
|
|
replace is defined in XPath 2.0 so you need to use an XSLT 2.0 processor with XSLT 2.0 stylesheet to use that function. As XSLT and XPath 2.0 exist since the beginning of 2007 an editor with 2004 in its name is not likely to support that function or XSLT/XPath 2.0 at all.
Try Saxon 9 or the free AltovaXML tools 2010.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
|
|
The Following User Says Thank You to Martin Honnen For This Useful Post:
|
|
|

February 3rd, 2010, 11:01 AM
|
|
Registered User
|
|
Join Date: Feb 2010
Posts: 4
Thanks: 3
Thanked 0 Times in 0 Posts
|
|
Great, thanks for you help - I'll investigate these further.
|
|
 |