 |
| XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

August 16th, 2009, 10:47 PM
|
|
Friend of Wrox
|
|
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
|
|
Find line number of text
Hi all,
I was wondering if it was possible to find the line number of some matching text in a xml document. And then find all the text in the line
For example, if i have the xml below. How can I find the line number of the word investment? Also when I find the line number how can i obtain all the text in the line. So I would need the text - "investment in shares"
<GenericXMLWrapperElement>
investment in shares
asset
property
</GenericXMLWrapperElement>
Regards
|
|

August 17th, 2009, 03:37 AM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
Line numbers aren't retained in the XDM data model. They may be available in an extension (see for example saxon:line-number()) but probably not at the granularity you are after (Saxon retains line numbers only for element start tags).
If lines are significant structural units I would suggest first transforming them into elements:
Code:
<xsl:template match="GenericXMLWrapperElement">
<xsl:for-each select="tokenize(., '\n')">
<line><xsl:value-of select="."/></line>
</xsl:for-each>
</xsl:template>
and then you can use
Code:
line[contains(., 'investment')]
to get the line containing this string.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
Last edited by mhkay; August 17th, 2009 at 03:38 AM..
Reason: typo
|
|
The Following User Says Thank You to mhkay For This Useful Post:
|
|
|

August 17th, 2009, 05:41 AM
|
|
Friend of Wrox
|
|
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
|
|
Thanks Mike. That works.
But say for example I have the following xml and and I want to match on the word trade only, would i use a regular expression and the matches function instead of contains. And if so what would the regex pattern be?
<GenericXMLWrapperElement>
trade shows
trade
tradereceivables
</GenericXMLWrapperElement>
Any help is greatly appreciated.
Regards
|
|

August 17th, 2009, 05:45 AM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
Are you saying you would want to match the first two and not the third, because "trade" is not a word on its own? Then yes, you could use regular expressions and matches. The exact regex to use is up to you and depends on your definition of "word", for example whether you regard a hyphen as a word separator. The XPath regex language does not include a concept of word separator because it's difficult to come up with a definition that's not biased to one language, e.g. English.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
|
|

August 17th, 2009, 05:46 AM
|
|
Friend of Wrox
|
|
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
|
|
The code that i am actually working on is as follows:
Code:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<!-- select all the text in the file-->
<xsl:variable name="allthetext" select="upper-case(.)"></xsl:variable>
<synonyms>
<xsl:for-each select="document('Synonyms.xml')//us-gaap_ShortTermInvestments">
<match>
<xsl:attribute name="word">
<xsl:value-of select="node-name(.)"></xsl:value-of>
</xsl:attribute>
<xsl:call-template name="countsynonyms">
<xsl:with-param name="allthetext" select="$allthetext"></xsl:with-param>
<!-- the number of synonyms-->
<xsl:with-param name="number" select="count(child::*)"></xsl:with-param>
<xsl:with-param name="index" select="number(1)"></xsl:with-param>
<xsl:with-param name="sum" select="number(0)"></xsl:with-param>
</xsl:call-template>
</match>
</xsl:for-each>
</synonyms>
</xsl:template>
<xsl:template name="countsynonyms">
<xsl:param name="allthetext"></xsl:param>
<xsl:param name="number"></xsl:param>
<xsl:param name="index"></xsl:param>
<xsl:param name="sum"></xsl:param>
<xsl:choose>
<xsl:when test="$index <= $number">
<xsl:variable name="currentsynonym" select="upper-case(normalize-space(./synonym[$index]))"></xsl:variable>
<xsl:variable name="currentsum" select="count(tokenize($allthetext, upper-case(normalize-space(./synonym[$index])))) "></xsl:variable>
<xsl:variable name="revisedcurrentsum">
<xsl:choose>
<xsl:when test="$currentsum > 0">
<xsl:value-of select="$currentsum - 1"></xsl:value-of>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$currentsum"></xsl:value-of>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<synonym>
<xsl:attribute name="count">
<xsl:value-of select="$revisedcurrentsum"></xsl:value-of>
</xsl:attribute>
<xsl:attribute name="word">
<xsl:value-of select="normalize-space(./synonym[$index])"></xsl:value-of>
</xsl:attribute>
<xsl:for-each select="tokenize($allthetext, '\n')">
<xsl:if test="contains(.,$currentsynonym)">
<line><xsl:value-of select="."/></line>
</xsl:if>
</xsl:for-each>
</synonym>
<xsl:call-template name="countsynonyms">
<xsl:with-param name="allthetext" select="$allthetext"></xsl:with-param>
<xsl:with-param name="number" select="$number"></xsl:with-param>
<xsl:with-param name="index" select="$index + 1"></xsl:with-param>
<xsl:with-param name="sum" select="$sum + $revisedcurrentsum"></xsl:with-param>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<total>
<xsl:value-of select="$sum"></xsl:value-of>
</total>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
and the xml file is:
Code:
<us-gaap_StatementOfFinancialPosition>
<us-gaap_StatementTable>
<us-gaap_StatementLineItems>
<us-gaap_Assets>
<us-gaap_AssetsCurrent>
<us-gaap_CashCashEquivalentsAndShortTermInvestments>
<us-gaap_ShortTermInvestments>
<synonym> short term investments </synonym>
<synonym> investments </synonym>
<synonym> short term cash investments </synonym>
<synonym> short term investment at cost </synonym>
<synonym> short term loans </synonym>
<synonym> short term investments total </synonym>
</us-gaap_ShortTermInvestments>
</us-gaap_CashCashEquivalentsAndShortTermInvestments>
<us-gaap_ReceivablesNetCurrent>
<us-gaap_AccountsNotesAndLoansReceivableNetCurrent>
<us-gaap_AccountsReceivableNetCurrent>
<synonym> accounts receivable net of allowance for doubtful accounts </synonym>
<synonym> accounts receivable </synonym>
<synonym> accounts receivable less allowance </synonym>
<synonym> accounts receivable less allowance for doubtful accounts </synonym>
<synonym> accounts receivable less allowances </synonym>
<synonym> accounts receivable net </synonym>
<synonym> accounts receivable net allowances </synonym>
<synonym> accounts receivable net of allowance for doubtful accounts and sales returns </synonym>
<synonym> accounts receivable net of allowance for doubtful receivables </synonym>
<synonym> accounts receivable net of allowance for uncollectible accounts </synonym>
<synonym> accounts receivable trade less allowance for doubtful accounts </synonym>
<synonym> accounts receivable trade less allowances </synonym>
<synonym>accounts receivable trade net</synonym>
<synonym>Accounts Receivable Net Current</synonym>
<synonym>Accounts Receivable Net Current Total</synonym>
<synonym>current portion of accounts receivable net of contractual allowances</synonym>
<synonym>customer receivables net allowance doubtful accounts</synonym>
<synonym>trade</synonym>
<synonym>trade accounts receivable</synonym>
<synonym>trade accounts receivable less allowance reserves</synonym>
<synonym>trade accounts receivable less allowances</synonym>
<synonym>trade accounts receivable net</synonym>
<synonym>trade and installment accounts receivable net</synonym>
<synonym>trade and installment accounts receivable trade net</synonym>
<synonym>trade less allowance</synonym>
<synonym>trade net of allowance</synonym>
<synonym>trade net of allowance of approximately</synonym>
<synonym>trade receivables</synonym>
<synonym>trade receivables net</synonym>
<synonym>trade receivables net of allowance for doubtful accounts</synonym>
<synonym>trade net</synonym>
</us-gaap_AccountsReceivableNetCurrent>
</us-gaap_AccountsNotesAndLoansReceivableNetCurrent>
</us-gaap_ReceivablesNetCurrent>
<us-gaap_InventoryNet>
<synonym>inventories and supplies</synonym>
<synonym>inventories and supplies net</synonym>
<synonym>inventories at cost not in excess of market</synonym>
<synonym>inventories at lifo cost</synonym>
<synonym>inventories at lower of cost or market</synonym>
<synonym>inventories less allowance for obsolescence</synonym>
<synonym>inventories net</synonym>
<synonym>inventories</synonym>
<synonym>inventories and other</synonym>
<synonym>inventories net of advances and progress billings</synonym>
<synonym>inventory</synonym>
<synonym>Inventory Net</synonym>
<synonym>Inventory Net Total</synonym>
<synonym>merchandise inventories</synonym>
<synonym>merchandise inventory including lifo reserves</synonym>
<synonym>Total inventories</synonym>
</us-gaap_InventoryNet>
</us-gaap_AssetsCurrent>
</us-gaap_Assets>
</us-gaap_StatementLineItems>
</us-gaap_StatementTable>
</us-gaap_StatementOfFinancialPosition>
And there is a another xml file that i am transforming with all the text in it
Maybe this might put things in perspective
|
|

August 17th, 2009, 08:08 AM
|
|
Friend of Wrox
|
|
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
|
|
Quote:
Originally Posted by JohnBampton
But say for example I have the following xml and and I want to match on the word trade only, would i use a regular expression and the matches function instead of contains. And if so what would the regex pattern be?
<GenericXMLWrapperElement>
trade shows
trade
tradereceivables
</GenericXMLWrapperElement>
|
I think the regular expression language used with XSLT/XPath does not know a word boundary so all you can use is (^|\s) to match the start of the string or whitespace and (\s|$) to match whitespace or the end of the string:
Code:
<xsl:template match="GenericXMLWrapperElement">
<xsl:variable name="lines">
<xsl:for-each select="tokenize(., '\n')">
<line><xsl:value-of select="."/></line>
</xsl:for-each>
</xsl:variable>
<xsl:for-each select="$lines/line">
<xsl:if test="matches(., '(^|\s)trade(\s|$)')">
<matched line="{position()}"/>
</xsl:if>
</xsl:for-each>
</xsl:template>
that way you would get
Code:
<matched line="2"/>
<matched line="3"/>
for the sample you posted.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
Last edited by Martin Honnen; August 17th, 2009 at 08:13 AM..
Reason: correcting problem with code sample
|
|
The Following User Says Thank You to Martin Honnen For This Useful Post:
|
|
|

August 17th, 2009, 09:03 AM
|
|
Friend of Wrox
|
|
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
|
|
Hi,
say now I wanted the word trade to be a xslt variable how would i put that in the regular expression matches function?
Regards
|
|

August 17th, 2009, 09:09 AM
|
|
Friend of Wrox
|
|
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
|
|
The regular expression is built from an XPath string so you can concat(enate) what you need e.g.
Code:
<xsl:param name="w" select="'trade'"/>
<xsl:template match="GenericXMLWrapperElement">
<xsl:variable name="lines">
<xsl:for-each select="tokenize(., '\n')">
<line><xsl:value-of select="."/></line>
</xsl:for-each>
</xsl:variable>
<xsl:for-each select="$lines/line">
<xsl:if test="matches(., concat('(^|\s)', $w, '(\s|$)'))">
<matched line="{position()}"/>
</xsl:if>
</xsl:for-each>
</xsl:template>
The only problem are characters that have a special meaning in regular expressions patterns, you would need to escape them. http://www.xsltfunctions.com/xsl/fun...for-regex.html can help with that.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
|
|
The Following User Says Thank You to Martin Honnen For This Useful Post:
|
|
|

August 17th, 2009, 09:20 AM
|
|
Friend of Wrox
|
|
Join Date: Feb 2009
Posts: 119
Thanks: 25
Thanked 3 Times in 3 Posts
|
|
i just used the concat function.
thanks anyway
|
|
 |