View Single Post
  #1 (permalink)  
Old July 14th, 2007, 07:13 PM
igraham igraham is offline
Authorized User
Join Date: Jul 2007
Location: , , .
Posts: 14
Thanks: 0
Thanked 0 Times in 0 Posts
Default Grouping plain text into paragraphs

I'm trying to process plain text to turn it into XML/DITA <p> and <pre> elements. The idea is that consecutive lines of text with indents of exactly n spaces should be grouped into a <p> element, whereas lines with either fewer or more spaces before non-whitespace content should be grouped into <pre> elements.

I've come up with the following template that does the job for a specific indent, in this case 15 spaces, but I haven't figured out how to support an indent defined by my indent parameter. Basically what I want is to dynamically create my regular expression with the correct indent value inserted where I currently have the value 15 hard-coded:
   <xsl:template name="convertFixedIndentToParagraphs">
     <xsl:param name="text"/>
     <xsl:param name="indent"/>
     <xsl:analyze-string select="$text" regex="(^ {{15}}[^ ][^\n]*\n?)+" flags="m">
              <xsl:for-each select="tokenize(., '\n')">
                 <xsl:value-of select="substring(., $indent + 1)"/>
           <xsl:if test="matches(., '\S')">
                 <xsl:call-template name="eliminateMinimumIndent"/></pre>
I really thought I had this working well enough with the hard-coded indent value, until I discovered that many of the text nodes I'm processing have slightly different standard indents, so I need to be able to use the indent parameter properly.

Is there an easy way to parameterize that value in my regex? Or am I going to have to come up with a completely different solution?