strange xsl encoding problem

teoxp · February 9th, 2005, 04:54 AM

I am facing a strange encoding problem using xsl transformation.
Concisely, i execute a SQL statement to fill a recordset which contain some fields with Greek Characters saved as UTF-16 (on SQL Server 2000 using nvarchar) ,which then i save to the
Response object of an ASP page as xml (adPersistXML).Then i use an xsl file to transform XML to HTMl.The problem is that when i see the page in the internet explorer instead of the greek character i get something like this "Ãâ¢ÃâºÃâºÃâÃÂÃâ¢ÃÅ¡ÃÅ¸ÃÂ£ ÃÂ¤Ãâ¢ÃÂ¤ÃâºÃÅ¸ÃÂ£" . Moreover when i do a "view source" and see the HTML source using Notepad the Text is displayed correctly in Greek.

Currently i am using the following ASP code to do the transformation :

Code:

' rs is an ADO recordset filled with data from a sql Select statement

styleFile = Server.MapPath(xslfile)
set stylexml =Server.CreateObject("MSXML2.FreeThreadedDOMDocument")
stylexml.async = false
stylexml.load(styleFile)

set sourcexml = Server.CreateObject("MSXML2.FreeThreadedDOMDocument")
sourcexml.async = false
rs.Save sourcexml,1 ' Save as adPersistXML
set rs=Nothing

strPath=BuildPath(id,18)

dim xslty,xslProc

set xslt = Server.CreateObject("MSXML2.XSLTemplate")
xslt.stylesheet =stylexml
Set xslProc =xslt.createProcessor()
xslProc.input=sourcexml
xslProc.addParameter "Path",escape(strPath)
Response.charSet = "UTF-16"
xslProc.output = Response
xslProc.transform

The xsl file looks like this :

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"
                xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
                xmlns:rs="urn:schemas-microsoft-com:rowset"
                xmlns:z="#RowsetSchema"
                version="1.0">

  <xsl:param name="Path" />
  <xsl:output encoding="utf-16" method="html" version="4.0"/>
.....

<h2 style="color:white;font-family: 'Verdana', Arial, Helvetica, Tahoma, sans-serif;" align="left"><xsl:value-of disable-output-escaping="yes" select="rs:data/z:row/@Description_GR" /></h2>

......

Iâve posted the same question on the MSDN NEwsGroups with the title âStrange xsl encoding problemâ and I had a really useful answer from Mike Sharp.

He told me that the problem has to do with Big-endian and Little-endian switch. His answer was the following :

Quote:

quote:

Quote:

This is proving difficult to nail down (no surprise to you, I'm sure), but I *think* I see what's happening...
It looks like it's an "endian" switch. That is, the ADO recordset is being saved as UTF-16 big endian (UTF-16BE). Rather, the data is. "Endianness" only matters when serializing UTF-16. When it's in-memory, it doesn't
matter. When there is no byte order mark, it's supposed to be Big Endian.When I play around with those bytecodes I can get a similar result.I'm not sure how this will appear in your newsreader, but as an example, the character codes: 03 B5 03 BB 03 B1 in UTF-16BE are displayed as Ã¥Ã«Ã¡ but if I interpret those same character codes as UTF-16M, they show up as Âµ Â» Â± which is similar to your result.

This article by Mark Davis is extremely helpful in this area:
http://www-106.ibm.com/developerwork...rms/index.html
Since the entire file isn't save as big endian, only the characters, I'm thinking that perhaps the original data is being stored that way.

I think you'll find Mark Davis's UTF-converter very useful for diagnosing problems like this. I use it all the time. The page is at:
http://www.macchiato.com/unicode/convert.html
He's got a lot of interesting stuff on his site, in fact.

Anyway ,I still canât figure out how to correct the problem.THose links was realy usefull but i don't managed to fix the problem.
Is there any way to fix it ?

Thanks In advanced
Teo

sonicDace · February 9th, 2005, 11:28 AM

Teo,

I had the same, or maybe a similar problem. If your HTML is OK, but is somehow displayed incorrectly by the browser.

Basically in my case, I found that when going to different comps, I would notice that some would display the HTML correctly and some incorrectly. After some investigation, I noticed that MS DOM parsers inserted a default character encoding header into the rendered HTML after the transformation and the different comps were interpretting this problem in different ways due to different language selections on their comps (I had spanish, others english). This was causing my #xA0; to be displayed as "?"

This is more of a workaround, but what I did was hardcode the same character encoding tag into my template with the iso 8859-1 character set ( if my memory suits me correctly, you'll have to find a way to insert it before, but if it doesn't work try inserting it after the tag inserted by the MS Dom parser)

I'm sorry I don't have code samples... this was too long ago :-P

Hope this helps

teoxp · February 9th, 2005, 08:03 PM

Hi
Thanks for the reply

Quote:

quote:

Quote:

I noticed that MS DOM parsers inserted a default character encoding header into the rendered HTML after the transformation

When you say "a default character encoding header into the rendered HTML" you mean an HTML <meta charset=".." > directive or something else ?

I ask becouse my output HTML seems that has the correct encoding directive :

Code:

<html xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema">
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-16">

pgtips · February 10th, 2005, 06:04 AM

I think Dace is saying he solved a similar problem by swapping encodings, but iso-8859-1 won't work for you because it doesn't contain Greek chars.

Is there any reason you must use UTF-16? Have you tried UTF-8 instead?

teoxp · February 10th, 2005, 06:29 AM

Yes i've try to use UTF-8 with no success.

I have also try several ways to perform the transformation
including : TransformNode , transformNodeToObject but the result was always the same.

marianna.krini · May 30th, 2007, 09:13 AM

Try to declare the output encoding attribute in your xsl file like this:

<xsl:output method="html" encoding="WINDOWS-1253" indent="no"></xsl:output>

mhkay · May 30th, 2007, 10:05 AM

You've asked for UTF-16 output but what you've shown looks more like a rendition of UTF-8 by something that doesn't understand that it's looking at UTF-8.

Is there a META element in the generated HTML that gives the charset, if so what does it say?

And what's in the HTTP headers?

Have you tried different browsers?

In Firefox, what does View/Character Encoding say?

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference