Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old May 12th, 2006, 12:38 AM
Authorized User
 
Join Date: Dec 2005
Posts: 71
Thanks: 10
Thanked 0 Times in 0 Posts
Default Unicode translation using [xsl:output-character]

Hi All,

I have problem in converting XML to XML in entities.

Following is my XML.

<para>This is MAYA xml α δ <iemph>α δ</iemph>.</para>

Following is my expected XML.

<para>This is MAYA xml &agr; &dgr; &alpha; &delta;.</para>

The exact problem is that, I would like to convert unicode which was NOT coming inside <iemph> to &agr; &dgr; and unicode coming inside <iemph> to &alpha; &delta;. I would like to make that using "xsl:output-character".

Any help would be appreciated.
Thanks,
ROCXY


__________________
Thanks,
Rocxy.
 
Old May 12th, 2006, 03:35 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

You can't achieve this with XSLT 2.0 character maps, as they are not sensitive to context. You could do it with disable-output-escaping provided your XSLT processor supports it, or you could write your own serialization post-processing code.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old May 15th, 2006, 11:13 AM
Registered User
 
Join Date: May 2006
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Default

I'd like to piggyback on this question with a variation of the same question. Like ROCXY, I'm converting XML to XML and having special character issues. I have the following original XML:

<title>François Hollande : "La droite a pris l'Etat en otage"</title>

When converted, though, I get this:

<title><![CDATA[Fran?ois Hollande : "La droite a pris l'Etat en otage"]]></title>

Within the conversion, I'm specifying cdata elements and encoding in the xsl:output tag via the cdata-section-elements and encoding attributes, respectively. I'm specifying UTF-8 encoding.

I should note that the source XML specifies an encoding value of "iso-8859-1". Since the original XML will load and display fine, I'm not sure what, if any significance this may have, but it seems worth mentioning.

Surely there's a way to do this...right? What am I missing?

Any guidance would be /greatly/ appreciated.

Thanks.

Rob Wilkerson
 
Old May 15th, 2006, 11:43 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

What XSLT processor are you using?

The serializer should never output numeric character references (like & #65535; [I think ampersands are getting lost in this forum]) within a CDATA section, because XML doesn't recognize them there. This looks like a bug in your processor.

Secondly, Unicode 65533 is a substitute character for use when a character is found that can't be output in the selected encoding. If the selected encoding is UTF-8, I can't see any reason why it would be used.

The first thing to check is that your input XML is correctly encoded. What is the actual encoding of the c-with-cedilla (use a hex editor to find out), and what is the encoding specified in the XML declaration of the input file? Do they match?

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old May 15th, 2006, 11:58 AM
Registered User
 
Join Date: May 2006
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hey Michael -

I noticed the same thing when I posted and updated my original post. What it actually does is replace the unicode character in the original XML with the replacement character. The forums may have made that substitution.

My CDATA block actually contains the "unfound replacement character" (for want of a better term). When rendered, either in a database or a web page, the character is translated as the question mark or as the square character.

I also updated the original post with the original XML encoding. It's iso-8859-1.

I'll check the hex encoding now...

Thanks for the quick response.

Rob Wilkerson
 
Old May 15th, 2006, 12:19 PM
Registered User
 
Join Date: May 2006
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Well, the original XML I was using is gone, but the updated file (it's an RSS feed) contains another entry that includes the "c-with-cedilla" character so I tested that in the hex editor.

Original XML:
<description>Le juge a été longuement entendu, lundi, par sa hiérarchie concernant ses liens avec Jean-Louis Gergorin, soupçonné d#38;#39;être le #38;#34;corbeau#38;#34;. </description>

The c-with-cedilla appears to be rendering in the hex editor as E7. The accented e at the end of the same word is encoded as E9 and also won't render properly.

Thanks again for your help.

Rob Wilkerson





Similar Threads
Thread Thread Starter Forum Replies Last Post
Character to Unicode entities Pankaj C XSLT 2 February 15th, 2008 08:59 AM
Same character set output by pressing any key.... anilsaroliya Visual Basic 2005 Basics 1 May 18th, 2007 11:20 AM
Get UNICODE or ASCII Value of a character Eyob_the_pro C# 0 January 10th, 2007 03:42 AM
namespaces appearing in tags after XSL translation mphare XSLT 4 February 20th, 2006 03:49 PM
Converting Unicode to Character RobinR Classic ASP Basics 4 August 6th, 2004 11:40 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.