Extract 5 word before and after an element

ROCXY · June 17th, 2010, 12:31 PM

Hi All,

I have XML content for text content validation. I really have no idea on what XSLT function has to be used; Sorry! for not adding any tried out XSL codes. Any one can sharing a small hint on what function is to be used would help be more .

Input:
<p>This is a paragraph First Second Third 4 Five <cv>many words and more words<cv> six seven 8 nine ten the paragraph ends

Expected Output:
<p>This is a paragraph
<td>First Second Third 4 Five</td><td>many words and more words</td>
<td>six seven 8 nine ten</td>
the paragraph ends</p>

Any help would be grateful.

Martin Honnen · June 17th, 2010, 01:04 PM

Please tell us whether you are looking for an XSLT 2.0 or 1.0 solution. With XSLT 2.0 you could process the preceding-sibling and following-sibling, tokenize them and then wrap the tokens you want to wrap.

mhkay · June 17th, 2010, 01:10 PM

If you only want the words, then (assuming XSLT 2.0) something of the form tokenize(following-sibling::text(),'\W')[position() lt 6] or tokenize(preceding-sibling::text(),'\W')[position() gt last()-5] should do the trick. However, this loses the separators between the words. If it's necessary to retain the separators as well as the words, then a more complex solution using xsl:analyze-string is called for.