Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old August 16th, 2009, 10:47 PM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default Find line number of text

Hi all,

I was wondering if it was possible to find the line number of some matching text in a xml document. And then find all the text in the line

For example, if i have the xml below. How can I find the line number of the word investment? Also when I find the line number how can i obtain all the text in the line. So I would need the text - "investment in shares"

<GenericXMLWrapperElement>
investment in shares
asset
property
</GenericXMLWrapperElement>

Regards
 
Old August 17th, 2009, 03:37 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Line numbers aren't retained in the XDM data model. They may be available in an extension (see for example saxon:line-number()) but probably not at the granularity you are after (Saxon retains line numbers only for element start tags).

If lines are significant structural units I would suggest first transforming them into elements:

Code:
<xsl:template match="GenericXMLWrapperElement">
  <xsl:for-each select="tokenize(., '\n')">
     <line><xsl:value-of select="."/></line>
  </xsl:for-each>
</xsl:template>
and then you can use
Code:
line[contains(., 'investment')]
to get the line containing this string.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference

Last edited by mhkay; August 17th, 2009 at 03:38 AM.. Reason: typo
The Following User Says Thank You to mhkay For This Useful Post:
JohnBampton (August 17th, 2009)
 
Old August 17th, 2009, 05:41 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

Thanks Mike. That works.

But say for example I have the following xml and and I want to match on the word trade only, would i use a regular expression and the matches function instead of contains. And if so what would the regex pattern be?

<GenericXMLWrapperElement>
trade shows
trade
tradereceivables
</GenericXMLWrapperElement>

Any help is greatly appreciated.

Regards
 
Old August 17th, 2009, 05:45 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Are you saying you would want to match the first two and not the third, because "trade" is not a word on its own? Then yes, you could use regular expressions and matches. The exact regex to use is up to you and depends on your definition of "word", for example whether you regard a hyphen as a word separator. The XPath regex language does not include a concept of word separator because it's difficult to come up with a definition that's not biased to one language, e.g. English.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old August 17th, 2009, 05:46 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

The code that i am actually working on is as follows:

Code:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
    version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="/">
        <!-- select all the text in the file-->
        <xsl:variable name="allthetext" select="upper-case(.)"></xsl:variable>
        
        <synonyms>
            <xsl:for-each select="document('Synonyms.xml')//us-gaap_ShortTermInvestments">
                <match>
                    <xsl:attribute name="word">
                        <xsl:value-of select="node-name(.)"></xsl:value-of>
                    </xsl:attribute>
                    
                    <xsl:call-template name="countsynonyms">
                        <xsl:with-param name="allthetext" select="$allthetext"></xsl:with-param>
                        <!-- the number of synonyms-->
                        <xsl:with-param name="number" select="count(child::*)"></xsl:with-param>
                        <xsl:with-param name="index" select="number(1)"></xsl:with-param>
                        <xsl:with-param name="sum" select="number(0)"></xsl:with-param>
                    </xsl:call-template>
                </match>
            </xsl:for-each>    
        </synonyms>    
    </xsl:template>
    <xsl:template name="countsynonyms">
        <xsl:param name="allthetext"></xsl:param>
        <xsl:param name="number"></xsl:param>
        <xsl:param name="index"></xsl:param>
        <xsl:param name="sum"></xsl:param>
        <xsl:choose>
            <xsl:when test="$index &lt;= $number">
                <xsl:variable name="currentsynonym" select="upper-case(normalize-space(./synonym[$index]))"></xsl:variable>
                <xsl:variable name="currentsum" select="count(tokenize($allthetext, upper-case(normalize-space(./synonym[$index])))) "></xsl:variable>
                <xsl:variable name="revisedcurrentsum">
                    <xsl:choose>
                        <xsl:when test="$currentsum > 0">
                            <xsl:value-of select="$currentsum - 1"></xsl:value-of>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:value-of select="$currentsum"></xsl:value-of>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:variable>
                <synonym>
                    <xsl:attribute name="count">
                        <xsl:value-of select="$revisedcurrentsum"></xsl:value-of>
                    </xsl:attribute>
                    <xsl:attribute name="word">
                        <xsl:value-of select="normalize-space(./synonym[$index])"></xsl:value-of>
                    </xsl:attribute>
                    <xsl:for-each select="tokenize($allthetext, '\n')">
                        <xsl:if test="contains(.,$currentsynonym)">
                            <line><xsl:value-of select="."/></line>
                        </xsl:if>
                    </xsl:for-each>
                </synonym>
                <xsl:call-template name="countsynonyms">
                    <xsl:with-param name="allthetext" select="$allthetext"></xsl:with-param>
                    <xsl:with-param name="number" select="$number"></xsl:with-param>
                    <xsl:with-param name="index" select="$index + 1"></xsl:with-param>
                    <xsl:with-param name="sum" select="$sum + $revisedcurrentsum"></xsl:with-param>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <total>
                    <xsl:value-of select="$sum"></xsl:value-of>
                </total>    
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    
</xsl:stylesheet>
and the xml file is:

Code:
<us-gaap_StatementOfFinancialPosition>
    <us-gaap_StatementTable>
        <us-gaap_StatementLineItems>
            <us-gaap_Assets>
                <us-gaap_AssetsCurrent>
                    <us-gaap_CashCashEquivalentsAndShortTermInvestments>
                        <us-gaap_ShortTermInvestments>
                            <synonym> short term investments </synonym>
                            <synonym> investments </synonym>
                            <synonym> short term cash investments </synonym>
                            <synonym> short term investment at cost </synonym>
                            <synonym> short term loans </synonym>
                            <synonym> short term investments total </synonym>
                        </us-gaap_ShortTermInvestments>
                    </us-gaap_CashCashEquivalentsAndShortTermInvestments>
                    <us-gaap_ReceivablesNetCurrent>
                        <us-gaap_AccountsNotesAndLoansReceivableNetCurrent>
                            <us-gaap_AccountsReceivableNetCurrent>
                                <synonym> accounts receivable net of allowance for doubtful accounts </synonym>
                                <synonym> accounts receivable </synonym>
                                <synonym> accounts receivable less allowance </synonym>
                                <synonym> accounts receivable less allowance for doubtful accounts </synonym>
                                <synonym> accounts receivable less allowances </synonym>
                                <synonym> accounts receivable net </synonym>
                                <synonym> accounts receivable net allowances </synonym>
                                <synonym> accounts receivable net of allowance for doubtful accounts and sales returns </synonym>
                                <synonym> accounts receivable net of allowance for doubtful receivables </synonym>
                                <synonym> accounts receivable net of allowance for uncollectible accounts </synonym>
                                <synonym> accounts receivable trade less allowance for doubtful accounts </synonym>
                                <synonym> accounts receivable trade less allowances </synonym>
                                <synonym>accounts receivable trade net</synonym>
                                <synonym>Accounts Receivable Net Current</synonym>
                                <synonym>Accounts Receivable Net Current Total</synonym>
                                <synonym>current portion of accounts receivable net of contractual allowances</synonym>
                                <synonym>customer receivables net allowance doubtful accounts</synonym>
                                <synonym>trade</synonym>
                                <synonym>trade accounts receivable</synonym>
                                <synonym>trade accounts receivable less allowance reserves</synonym>
                                <synonym>trade accounts receivable less allowances</synonym>
                                <synonym>trade accounts receivable net</synonym>
                                <synonym>trade and installment accounts receivable net</synonym>
                                <synonym>trade and installment accounts receivable trade net</synonym>
                                <synonym>trade less allowance</synonym>
                                <synonym>trade net of allowance</synonym>
                                <synonym>trade net of allowance of approximately</synonym>
                                <synonym>trade receivables</synonym>
                                <synonym>trade receivables net</synonym>
                                <synonym>trade receivables net of allowance for doubtful accounts</synonym>
                                <synonym>trade net</synonym>
                            </us-gaap_AccountsReceivableNetCurrent>
                        </us-gaap_AccountsNotesAndLoansReceivableNetCurrent>
                    </us-gaap_ReceivablesNetCurrent>
                    <us-gaap_InventoryNet>
                        <synonym>inventories and supplies</synonym>
                        <synonym>inventories and supplies net</synonym>
                        <synonym>inventories at cost not in excess of market</synonym>
                        <synonym>inventories at lifo cost</synonym>
                        <synonym>inventories at lower of cost or market</synonym>
                        <synonym>inventories less allowance for obsolescence</synonym>
                        <synonym>inventories net</synonym>
                        <synonym>inventories</synonym>
                        <synonym>inventories and other</synonym>
                        <synonym>inventories net of advances and progress billings</synonym>
                        <synonym>inventory</synonym>
                        <synonym>Inventory Net</synonym>
                        <synonym>Inventory Net Total</synonym>
                        <synonym>merchandise inventories</synonym>
                        <synonym>merchandise inventory including lifo reserves</synonym>
                        <synonym>Total inventories</synonym>
                    </us-gaap_InventoryNet>
                </us-gaap_AssetsCurrent>
            </us-gaap_Assets>
        </us-gaap_StatementLineItems>
    </us-gaap_StatementTable>
</us-gaap_StatementOfFinancialPosition>
And there is a another xml file that i am transforming with all the text in it

Maybe this might put things in perspective
 
Old August 17th, 2009, 08:08 AM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

Quote:
Originally Posted by JohnBampton View Post

But say for example I have the following xml and and I want to match on the word trade only, would i use a regular expression and the matches function instead of contains. And if so what would the regex pattern be?

<GenericXMLWrapperElement>
trade shows
trade
tradereceivables
</GenericXMLWrapperElement>
I think the regular expression language used with XSLT/XPath does not know a word boundary so all you can use is (^|\s) to match the start of the string or whitespace and (\s|$) to match whitespace or the end of the string:
Code:
  <xsl:template match="GenericXMLWrapperElement">
    <xsl:variable name="lines">
      <xsl:for-each select="tokenize(., '\n')">
        <line><xsl:value-of select="."/></line>
      </xsl:for-each>
    </xsl:variable>
    
    <xsl:for-each select="$lines/line">
      <xsl:if test="matches(., '(^|\s)trade(\s|$)')">
        <matched line="{position()}"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>
that way you would get
Code:
<matched line="2"/>
<matched line="3"/>
for the sample you posted.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog

Last edited by Martin Honnen; August 17th, 2009 at 08:13 AM.. Reason: correcting problem with code sample
The Following User Says Thank You to Martin Honnen For This Useful Post:
JohnBampton (August 17th, 2009)
 
Old August 17th, 2009, 09:03 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

Hi,

say now I wanted the word trade to be a xslt variable how would i put that in the regular expression matches function?

Regards
 
Old August 17th, 2009, 09:09 AM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

The regular expression is built from an XPath string so you can concat(enate) what you need e.g.
Code:
  <xsl:param name="w" select="'trade'"/>
  
  <xsl:template match="GenericXMLWrapperElement">
    <xsl:variable name="lines">
      <xsl:for-each select="tokenize(., '\n')">
        <line><xsl:value-of select="."/></line>
      </xsl:for-each>
    </xsl:variable>
    
    <xsl:for-each select="$lines/line">
      <xsl:if test="matches(., concat('(^|\s)', $w, '(\s|$)'))">
        <matched line="{position()}"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:template>
The only problem are characters that have a special meaning in regular expressions patterns, you would need to escape them. http://www.xsltfunctions.com/xsl/fun...for-regex.html can help with that.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
The Following User Says Thank You to Martin Honnen For This Useful Post:
JohnBampton (August 17th, 2009)
 
Old August 17th, 2009, 09:20 AM
Friend of Wrox
 
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
Default

i just used the concat function.

thanks anyway





Similar Threads
Thread Thread Starter Forum Replies Last Post
Issue using "saxon:line-number()" in command line XSL with Saxon9.jar ROCXY XSLT 3 June 3rd, 2009 04:24 AM
XSLT - xml difference with line number mrame XSLT 1 June 4th, 2008 05:49 PM
How to find line number FileFound Visual Studio 2005 2 June 15th, 2007 05:19 AM
Counting number of spaces in a line Suomi Access VBA 3 September 9th, 2005 02:34 PM
How to retrieve last selected text line in text bo garetho General .NET 1 May 3rd, 2005 09:17 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.