Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old September 5th, 2007, 07:41 AM
Authorized User
 
Join Date: Jul 2007
Posts: 55
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hello Michael,

I've been listening to you very carefully. I understand what you mean when you say "©' and '&_#x000A9;' are identical as these are two different ways of writing the same copyright character".

Questions:
"What is the reason for wanting to do this - why do I need the unicode version in my output?
Answer: Actually I am working on a pagination platform, which uses xml as an input. Before importing my actual xml (which infact contains unicode values for different standard symbols in it like &_#x000A9; [without underscore]) to it I am adding few attributes and elements to it (using XSLT) to make things easy for composition purposes.

Once the xml is imported into this typesetting engine, all the unicode entities gets converted into their corresponding symbols.

Anyways, its not an issue at composition stage. But once the composition is finished, we are exporting the xml from the composition engine itself (as this may contain many textual correction it), with symbols (entities) it.
I am using XSLT to re-transform exported xml back to the required format, which includes:

1. Removal of additional elements and attributes.
2. Converting unicode symbols (for © ) back to their unicode values (&_#x000A9;-without underscore).

I don't find any problem with first one but stucked with the unicode ones

In fact I've tested to replace the literal © to another character e.g. a Big "X" instead , but it doesn't seems to be working, and which is worrying me most. It seems I'm making mistake in defining my style sheet (seems to be very tired today and thanks for your patience as I am almost novice in XSLT, who's trying his of its own). Here's my style sheet:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:aid="http://ns.adobe.com/AdobeInDesign/4.0/">


 <xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="from"/>
    <xsl:param name="to"/>

    <xsl:choose>
      <xsl:when test="contains($text, $from)">

    <xsl:variable name="before" select="substring-before($text, $from)"/>
    <xsl:variable name="after" select="substring-after($text, $from)"/>
    <xsl:variable name="prefix" select="concat($before, $to)"/>

    <xsl:value-of select="$before"/>
    <xsl:value-of select="$to"/>
        <xsl:call-template name="replace-string">
      <xsl:with-param name="text" select="$after"/>
      <xsl:with-param name="from" select="$from"/>
      <xsl:with-param name="to" select="$to"/>
    </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
 </xsl:template>


<xsl:template match="text()">
<xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="from">©</xsl:with-param>
        <xsl:with-param name="to" select="P"/>
</xsl:call-template>
</xsl:template>

<xsl:template match="*|@*">
      <xsl:choose>
      <xsl:when test="name()='XXX'">
      </xsl:when>
      <xsl:when test="name()='YYY'">
      </xsl:when>

<xsl:when test="name()='char'">
      </xsl:when>
     <xsl:otherwise>
    <xsl:copy>
        <xsl:apply-templates select="text()|*|@*"/>
        </xsl:copy>
     </xsl:otherwise>
     </xsl:choose>
</xsl:template>
</xsl:stylesheet>

Hope I've made myself clear now.

Pankaj

 
Old September 5th, 2007, 07:52 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

<xsl:with-param name="to" select="P"/>

Try

<xsl:with-param name="to" select="'P'"/>

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old September 5th, 2007, 08:05 AM
samjudson's Avatar
Friend of Wrox
 
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

I am glad you agree that the two symbols are identical, but what you don't seem to realise is that replacing one with the other makes no difference in the slightest to what is output by the XML serialization after you have performed your transformation.

The way to affect the output from the XML serialization is to use the <xsl:output> declaration.

As for why your example doesn't work - thats cause its got bugs in it.

Code:
<xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="from" select="'©'"/>
        <xsl:with-param name="to" select="'X'"/>
</xsl:call-template>
You're first example had single quotes around the "." - which meant you where searching the string '.' for the copyright symbol, instead of searching the current context node.

The second example didn't have quotes around the "P" in the "to" parameter - so unless you have an element in your XML input called <P> that wouldn't even run.

/- Sam Judson : Wrox Technical Editor -/
 
Old September 5th, 2007, 08:46 AM
Authorized User
 
Join Date: Jul 2007
Posts: 55
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks Sam and Michael.

I will look into it tomorrow, seems to be tired today as I've started making blunders now. I am sure there is a bug in my code.

Thanks for your patience and guidance.

Pankaj

 
Old September 6th, 2007, 06:12 AM
Authorized User
 
Join Date: Jul 2007
Posts: 55
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Well it seems to me that I need to switch over to PERL script specifically for replacing symbols into unicode entities. I've tested it with Perl and getting perfect results.

Alas I wanted it with XSLT itself, but seems to be of no use as © (including other symbols) are converting into © only.

Thanks for help by the way.

Pankaj

 
Old September 6th, 2007, 06:22 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Did you try the solution proposed using <xsl:output>?

XSLT itself works on a more abstract level that isn't concerned with how characters are serialized, on the basis that it doesn't actually matter. But the serializer does give you some control, driven by xsl:output.

Using Perl is fine, but be aware of the dangers. It isn't XML-aware, so you might find yourself translating characters in contexts where numeric character references aren't recognized or allowed (for example in comments, in CDATA sections, or in element names), and it might deliver the output file in an encoding that doesn't match the declared encoding in the XML declaration.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old September 9th, 2007, 04:08 AM
Authorized User
 
Join Date: Jul 2007
Posts: 55
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks Michael. I did tried your <xsl:output> suggestion for unicode conversion and in fact I am testing it right now. I was not able to test it properly few days back as I was out of office for some urgent work. I too want to get it done thru XSLT only and would not prefer to unnecessarily execute a perl or any other program to convert symbols to unicode.

Let me test it thoroughly, will revert back in case I see any trouble in it.

Thanks.
Pankaj

 
Old September 10th, 2007, 05:09 AM
Authorized User
 
Join Date: Jul 2007
Posts: 55
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi,

I've tried <xsl:output encoding="us-ascii"/> to translate all symbols, but not getting the desired output. Fo example converting © gives me the value #169;. On the other hand their are some places in text where the I am getting strange output like #169;#8233;.

Below is the snippet my stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:aid="http://ns.adobe.com/AdobeInDesign/4.0/">



<xsl:output encoding="us-ascii"/>

 <xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="from"/>
    <xsl:param name="to"/>

    <xsl:choose>
      <xsl:when test="contains($text, $from)">
    <xsl:variable name="before" select="substring-before($text, $from)"/>
    <xsl:variable name="after" select="substring-after($text, $from)"/>
    <xsl:variable name="prefix" select="concat($before, $to)"/>

    <xsl:value-of select="$before"/>
    <xsl:value-of select="$to"/>
     <xsl:call-template name="replace-string">
      <xsl:with-param name="text" select="$after"/>
      <xsl:with-param name="from" select="$from"/>
      <xsl:with-param name="to" select="$to"/>

    </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
</xsl:template>



<xsl:template match="text()|*|@*">
  <xsl:choose>
      <xsl:when test="name()='aid:pstyle'">
      </xsl:when>
      <xsl:when test="name()='aid:cstyle'">
      </xsl:when>

    <xsl:when test="name()='char'">
      </xsl:when>
     <xsl:otherwise>
    <xsl:copy>
        <xsl:apply-templates select="text()|*|@*"/>
    </xsl:copy>
     </xsl:otherwise>
 </xsl:choose>
</xsl:template>


<xsl:template match="text()">
    <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="'©'"/>
        <xsl:with-param name="from">©</xsl:with-param>
        <xsl:with-param name="to" select="'#x000A9;'"/>
    </xsl:call-template>
    <xsl:copy>
        <xsl:apply-templates select="text()|*|@*"/>
    </xsl:copy>

</xsl:template>

</xsl:stylesheet>

Any suggestions please where I possibly wrong.

I am working on TestXSLT using "Sablotron" XSLT proc. I have also tried using LibXSLT but of no use. By the way what's major difference in "Salotron" and LibXSLT? which one is better?
 Sorry for asking so much.


Pankaj




 
Old September 10th, 2007, 05:13 AM
Authorized User
 
Join Date: Jul 2007
Posts: 55
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Please note I am using &_#x000A9 without Underscore and same is te case with resulting output i.e., &_#169.

 
Old September 10th, 2007, 05:54 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

hex A9 is the same as decimal 169, so this output is correct.

If there's a character 8223 in the output then that's because this character is present in the input. 8223 is hex 201F, which is a typographical ("99") closing quotation mark.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference





Similar Threads
Thread Thread Starter Forum Replies Last Post
clearing unused symbols and objects nairaby Flash (all versions) 4 August 16th, 2007 02:41 PM
finding symbols keytecstaff Word VBA 1 August 3rd, 2007 11:27 AM
How to get list of Currencies (symbols/Name)? Jell VB.NET 0 June 22nd, 2005 05:48 AM
Symbols and Membership Types rgerald Forum and Wrox.com Feedback 9 August 13th, 2003 10:11 AM
trademark and registered symbols kikoyjr XML 3 July 25th, 2003 03:33 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.