Using translate function on Word document data
I have a XML document with nodes that contain resume text that was stored in a database as a Word document. The XML document must be converted into a CSV file and Word's special characters must be replaced with zero length strings. I used the following XSL to convert the file (the translate function contains the actual &#nnnn values I found in the CSV file using a Hex editor, but they are converted to their display values here):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:for-each select="dataset/data/row">
<xsl:for-each select="value">
<xsl:variable name="val" select="."/>
<xsl:variable name="val1" select="translate($val, ', ', ' ')"/>
<xsl:variable name="val1" select="translate($val1, 'ââªâ¬ââ¢Â¦Ã¯Ã', '')"/>
<xsl:variable name="val1" select="translate($val1, '§¢®ï�¬ â·â', '')"/>
<xsl:variable name="val1" select="translate($val1, '¼ÆÅ³â â�Ëââ¹', '')"/>
<xsl:value-of select="$val1"/>
<!-- add comma delimiter -->
<xsl:text>,</xsl:text>
</xsl:for-each>
<!-- Add CR at end of line -->
<xsl:text> </xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Here's a snippet of the XML:
<row>
<value>38489</value>
<value>03/2003</value>
<value>500 Oracle Parkway Redwood Shores, CA 94065</value>
<value></value>
<value>present</value>
<value>00,000</value>
<value>Oracle Corp.</value>
<value>nnn.xxx.yyyy</value>
<value>Full Time</value>
<value>Currently Employed</value>
<value>Oracle Corp. âM-^@M-^S Senior Supply Chain Analyst E-Business Suite April 2008 âM-^@M-^S Present
âM-^@¢ Provide consulting services to English, Dutch, French, and German speaking customers using Oracle EBS Advanced Supply Chain Planning (Memory Based Planner / User Interface)
âM-^@¢ Conduct QA testing for Readiness 12.1.3 specializing in JDE/VCP integration and Manufacturing Operations Center
Oracle Corp. (formerly J.D. Edwards) âM-^@M-^S Senior Supply Chain Analyst March 2003 âM-^@M-^S Arpil 2008
âM-^@¢ Provide consul</value>
<value></value>
<value></value>
<value></value>
<value></value>
This is the resulting CSV:
38489,03/2003,500 Oracle Parkway Redwood Shores CA 94065,,present,00 000,Oracle Corp.,nnn.xxx.yyyy,Full Time,Currently Employed,Oracle Corp. âM-^@M-^S Senior Supply Chain Analyst E-Business Suite
April 2008 âM-^@M-^S PresentâM-^@¢ Provide consulting services to English Dutch French and German speaking customers using Oracle EBS Advanced Supply Chain Planning (Memory Based Planne
r / User Interface) âM-^@¢ Conduct QA testing for Readiness 12.1.3 specializing in JDE/VCP integration and Manufacturing Operations CenterOracle Corp. (formerly J.D. Edwards) âM-^@M-^S Senior Supply
Chain Analyst March 2003 âM-^@M-^S Arpil 2008âM-^@¢ Provide consul,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,Later,,,,,,,,,,,
I can convert any embedded commas and CR/LF to a single space but none of the special characters are removed. Does anyone know why these characters are not translated?
Thanks in advance for your help.
|