Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old July 7th, 2008, 11:45 AM
Authorized User
 
Join Date: Jul 2008
Posts: 11
Thanks: 0
Thanked 0 Times in 0 Posts
Default separate a string with whitespace

hi,everyone,
i spent days with this problem. i want to split a string like "lessThan" into "less than".within a string there is one or more upper case,what to do is adding a whitespace before every upper case,and translate it to lower case.for latter it is easy .can anybody help me? thanks.

 
Old July 7th, 2008, 11:49 AM
samjudson's Avatar
Friend of Wrox
 
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

In XSLT 2.0 you can use the <xsl:analyze-string> instruction. Otherwise you will likely have to write a recursive template and using the substring() function.

/- Sam Judson : Wrox Technical Editor -/
 
Old July 7th, 2008, 11:58 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

In XSLT 2.0

<xsl:analyze-string select="$in" regex="\p{Lu}">
  <xsl:matching-substring>
    <xsl:value-of select="concat(' ', lower-case(.))"/>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
    <xsl:value-of select="."/>
  </xsl:-nonmatching-substring>
</xsl:analyze-string>

In XSLT 1.0 it's rather more difficult...

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old July 7th, 2008, 11:59 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Correction, should be regex="\p{{Lu}}" - the curlies need to be doubled.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old July 7th, 2008, 01:42 PM
Authorized User
 
Join Date: May 2008
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Default

another way of doing it in XSLT 2.0 would be:
Code:
<xsl:value-of select="lower-case(replace(val,'([A-Z])',' $1'))"/>
 
Old July 7th, 2008, 03:50 PM
Authorized User
 
Join Date: May 2008
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Default

tried to implement it with XSLT 1.0 using recursion.
here what I got:

INPUT XML:
Code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
    <val>lessThan</val>
    <val>hellOWorld</val>
</root>
--------------------------------------------------------------------------------------------
XSL:
Code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE stylesheet [
    <!ENTITY UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ">
    <!ENTITY LOWER "abcdefghijklmnopqrstuvwxyz">
    <!ENTITY UPPER_TO_LOWER " '&UPPER;' , '&LOWER;' ">
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/root">
        <root>
            <xsl:apply-templates select="val"/>
        </root>
    </xsl:template>
    <xsl:template match="val">
        <old_val>
            <xsl:value-of select="."/>
        </old_val>
        <new_val>
            <xsl:call-template name="split_replace">
                <xsl:with-param name="input" select="."/>
            </xsl:call-template>
        </new_val>
    </xsl:template>
    <xsl:template name="split_replace">
        <xsl:param name="input"/>
        <xsl:choose>
            <xsl:when test="string-length($input)-string-length(translate($input,'&UPPER;',''))>0 and string-length($input)>1">
                <xsl:call-template name="split_replace">
                    <xsl:with-param name="input" select="substring($input,1,floor(string-length($input) div 2))"/>
                </xsl:call-template>
                <xsl:call-template name="split_replace">
                    <xsl:with-param name="input" select="substring($input,floor(string-length($input) div 2)+1)"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:when test="string-length($input)-string-length(translate($input,'&UPPER;',''))=1 and string-length($input)=1">
                <xsl:value-of select="concat(' ', translate($input,&UPPER_TO_LOWER;))"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$input"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>
---------------------------------------------------------------------------------------
resulting XML:
Code:
<?xml version="1.0" encoding="UTF-8" ?> 
<root>
  <old_val>lessThan</old_val> 
  <new_val>less than</new_val> 
  <old_val>hellOWorld</old_val> 
  <new_val>hell o world</new_val> 
</root>
would be nice to receive some criticism how it can be improved.

 
Old July 7th, 2008, 04:10 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

>another way of doing it in XSLT 2.0

the replace() solution works if you make the assumption that a character that doesn't match [A-Z] will be unchanged by the lower-case() function. That clearly works only for ASCII. Even if you change it to \p{Lu}, there are characters (such as title-case Fi) that aren't upper-case, but are modified by the lower-case() function.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old July 7th, 2008, 10:53 PM
Authorized User
 
Join Date: May 2008
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Default

>That clearly works only for ASCII.
sure, I didn't mention it explicitly, because it was quite obvious.
According to example provided by the OP, probably english alphabet is enough for him.

>there are characters (such as title-case Fi) that aren't upper-case, but are modified by the lower-case() function.
you are talking about digraphs here, right?
but why it shouldn't be modified by lower-case() function?
Of course digraphs are something special, but still if it is upper-case inside a word it should be transformed to lower-case, shouldn't it?

PS
I re-read the initial post of OP
and I think he wants to make the described amendments only in case capital letter is 'within' a string, but not the first letter of a word, for example,
so e.g. 'LessThan a Mile' should become 'Less than a Mile'.
So both our solutions are not working properly in this a case.



 
Old July 8th, 2008, 01:56 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

A notoriously difficult problem, how do you handle the following?

TheBBCDeniedPoliticalBias

TheUNVotedForAVeto

In the general class of problems it's unsolvable generically because you are trying to add information or decrease entropy...



--

Joe (Microsoft MVP - XML)





Similar Threads
Thread Thread Starter Forum Replies Last Post
In IE6 How do you get Whitespace??? David P. Manning CSS Cascading Style Sheets 0 April 24th, 2007 08:01 AM
Remove the trailing whitespace ksskumar XSLT 1 October 13th, 2006 06:23 AM
whitespace in label Twinklestar ASP.NET 1.0 and 1.1 Basics 4 July 28th, 2004 07:37 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.