Separating strings and replacing characters

sunrain · April 6th, 2008, 03:21 PM

Hello,

I have strings (such as "100mW", or "35 s") in the form of digits followed by non-digit characters. I need a way to extract only the digits and also only the non-digit characters, thus in the end displaying "100" and "mW", or "35" and "s". I am using the 'tokenize' function on my initial string and some regular expressions, but while I can extract the digits, I am having problems with the non-digit characters. I get " mW" (no. of white spaces = no. of digits) instead of "mW". What I use to get the characters is "tokenize($initial_string,'[0-9]')".

I tried getting rid of the leading white spaces, but all the functions that I used give me something like "A sequence of more than one item is not allowed as the first argument of ...". Such functions I tried are normalize-space(tokenize(...)) and translate(tokenize(...), ...).

How can I solve this?

Thank you!

Michael

joefawcett · April 7th, 2008, 01:52 AM

Seems to me that you can use a different token, such as \s+ so that you get two parts, the digits and the letters or use replace() later in the process to remove the spaces. Depends on the XML you are processing and what fits your current process.

--

Joe (Microsoft MVP - XML)

samjudson · April 7th, 2008, 03:02 AM

The tokenize() function returns an array. It splits the first string on every instance of the second pattern found. So your example above gives the array:

'', '', '', 'mW'

Why this is then being serialised as the empty strings converted to spaces I'm not 100% sure.

A better way might be to use <xsl:analyze-string>

Code:

<xsl:analyze-string select="$initial_string" regex="[0-9]+">

    <xsl:matching-substring>

      <xsl:value-of select="."/>
    </xsl:matching-substring>

    <xsl:non-matching-substring>

      <xsl:value-of select="."/>
    </xsl:non-matching-substring>

  </xsl:analyze-string>

/- Sam Judson : Wrox Technical Editor -/

mhkay · April 7th, 2008, 03:39 AM

Firstly, you want to be processing one string at a time, typically by doing inside a template rule or for-each loop. Your error message ""A sequence of more than one item is not allowed as the first argument of ..." says that you are trying to process several strings at once, so there is some structural problem in your code. (But you've made the mistake of not showing your code so we can't tell you what you're doing wrong).

I think the simplest way of doing this with regular expressions is to use replace() twice: replace($in, '([0-9]*)([^0-9]*)', '$1') to get the digits, and replace($in, '([0-9]*)([^0-9]*)', '$2') to get the non-digits. Or if you used xsl:analyze-string you would be able to extract both parts using regex-group().

It's not hard to do it using translate():

translate($in, '0123456789', '') gives you the non-digits, and

translate($in, translate($in, '0123456789', ''), '') gives you the digits.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference

sunrain · April 7th, 2008, 09:33 AM

Thank you everybody, it solved my problem. I used the translate function as suggested.

Michael