Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old March 18th, 2011, 01:03 AM
Authorized User
 
Join Date: Feb 2005
Posts: 26
Thanks: 0
Thanked 0 Times in 0 Posts
Send a message via AIM to IronStar
Default Using translate function on Word document data

I have a XML document with nodes that contain resume text that was stored in a database as a Word document. The XML document must be converted into a CSV file and Word's special characters must be replaced with zero length strings. I used the following XSL to convert the file (the translate function contains the actual &#nnnn values I found in the CSV file using a Hex editor, but they are converted to their display values here):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>

<xsl:template match="/">

<xsl:for-each select="dataset/data/row">

<xsl:for-each select="value">

<xsl:variable name="val" select="."/>

<xsl:variable name="val1" select="translate($val, ', ', ' ')"/>
<xsl:variable name="val1" select="translate($val1, 'â–ª€“™¦Ã¯Â', '')"/>
<xsl:variable name="val1" select="translate($val1, '§¢®ï�¬ ‚·”', '')"/>
<xsl:variable name="val1" select="translate($val1, '¼ƒœ³†’�˜—‹', '')"/>
<xsl:value-of select="$val1"/>
<!-- add comma delimiter -->
<xsl:text>,</xsl:text>
</xsl:for-each>

<!-- Add CR at end of line -->
<xsl:text> </xsl:text>

</xsl:for-each>

</xsl:template>

</xsl:stylesheet>

Here's a snippet of the XML:

<row>
<value>38489</value>
<value>03/2003</value>
<value>500 Oracle Parkway Redwood Shores, CA 94065</value>
<value></value>
<value>present</value>
<value>00,000</value>
<value>Oracle Corp.</value>
<value>nnn.xxx.yyyy</value>
<value>Full Time</value>
<value>Currently Employed</value>
<value>Oracle Corp. âM-^@M-^S Senior Supply Chain Analyst E-Business Suite April 2008 âM-^@M-^S Present
âM-^@¢ Provide consulting services to English, Dutch, French, and German speaking customers using Oracle EBS Advanced Supply Chain Planning (Memory Based Planner / User Interface)
âM-^@¢ Conduct QA testing for Readiness 12.1.3 specializing in JDE/VCP integration and Manufacturing Operations Center
Oracle Corp. (formerly J.D. Edwards) âM-^@M-^S Senior Supply Chain Analyst March 2003 âM-^@M-^S Arpil 2008
âM-^@¢ Provide consul</value>
<value></value>
<value></value>
<value></value>
<value></value>

This is the resulting CSV:

38489,03/2003,500 Oracle Parkway Redwood Shores CA 94065,,present,00 000,Oracle Corp.,nnn.xxx.yyyy,Full Time,Currently Employed,Oracle Corp. âM-^@M-^S Senior Supply Chain Analyst E-Business Suite
April 2008 âM-^@M-^S PresentâM-^@¢ Provide consulting services to English Dutch French and German speaking customers using Oracle EBS Advanced Supply Chain Planning (Memory Based Planne
r / User Interface) âM-^@¢ Conduct QA testing for Readiness 12.1.3 specializing in JDE/VCP integration and Manufacturing Operations CenterOracle Corp. (formerly J.D. Edwards) âM-^@M-^S Senior Supply
Chain Analyst March 2003 âM-^@M-^S Arpil 2008âM-^@¢ Provide consul,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,Later,,,,,,,,,,,

I can convert any embedded commas and CR/LF to a single space but none of the special characters are removed. Does anyone know why these characters are not translated?

Thanks in advance for your help.
 
Old March 18th, 2011, 05:06 AM
samjudson's Avatar
Friend of Wrox
 
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

You can't redefine $val1 like that - in XSLT once you've created a variable it is set, therefore the output will be using the result of the first one, without all the other replacements having been done.

If you don't want to do all the translates in one call then do this instead:

Code:
<xsl:variable name="val1" select="translate($val, ', ', ' ')"/>
<xsl:variable name="val2" select="translate($val1, 'â–ª€“™Â¦ÃƒÂ¯Ã‚', '')"/>
<xsl:variable name="val3" select="translate($val2, '§¢®ï�¬ ‚·”', '')"/>
<xsl:variable name="val4" select="translate($val3, '¼ƒœÂ³†’�˜—‹', '')"/>
<xsl:value-of select="$val4"/>
__________________
/- Sam Judson : Wrox Technical Editor -/

Think before you post: What have you tried?
 
Old March 18th, 2011, 09:10 AM
Authorized User
 
Join Date: Feb 2005
Posts: 26
Thanks: 0
Thanked 0 Times in 0 Posts
Send a message via AIM to IronStar
Default

I made that change but the special characters are still in the CSV file.
 
Old March 20th, 2011, 01:33 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

It's possible that the code you've shown has been corrupted in the course of pasting it into this forum. But if not, it looks wrong. When you see things like "âM", alarm bells should ring: this is how non-ascii characters encoded in UTF-8 appear when they have been wrongly interpreted as ISO 8859-1 characters.

Because this code contains a mixture of characters that appear to be correctly displayed and others that are incorrectly displayed, I suspect a history of editing or cutting-and-pasting without proper attention being paid to encoding differences.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference





Similar Threads
Thread Thread Starter Forum Replies Last Post
TRANSLATE FUNCTION INSIDE XPATH pallone XSLT 8 February 8th, 2008 05:30 PM
open word document and insert data PorcupineRabbit Visual Basic 2005 Basics 1 August 9th, 2007 06:31 PM
Extract data from Word Document(RTF format) PaulJH Access 2 September 19th, 2006 07:52 AM
Extracting data from Word Document (RTF) PaulJH BOOK: Access 2003 VBA Programmer's Reference 0 September 14th, 2006 02:19 PM
Word document to user delphi VS.NET 2002/2003 0 January 11th, 2006 02:22 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.