Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old November 9th, 2013, 08:00 AM
Authorized User
 
Join Date: Aug 2013
Posts: 30
Thanks: 9
Thanked 0 Times in 0 Posts
Default xsl:analyze-string to work on latex file

Dear all, I am again trying to use xslt for thing which I reckon it is not designed for: it works very well so often that I insist in doing that...

I have now a latex file which has thousands of entries like this

Code:
\index[p]{pino!1!115.3}
which need to get (According to kinds) into this format

Code:
\index[p]{[email protected]}
So I have used the following xsl

Code:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0">
    <xsl:template match="/">
        <xsl:variable name="hdt">
        <xsl:call-template name="hdt"/>
        </xsl:variable>

        <xsl:for-each select="$hdt">
    <xsl:call-template name="plut"/>
</xsl:for-each>

    </xsl:template>
    
    
    
    <xsl:template name="hdt">
        <xsl:analyze-string select="." regex="([A-Za-z0-9]+)*(!)(\d)(!)((\d+)(\.)*(\d*)(\-*)(\d*)((\.)(\d*))*)(\|)*([A-Za-z0-9]+)*(\}})">
    <xsl:matching-substring>
        <xsl:value-of select="regex-group(1)"/>
        <xsl:value-of select="regex-group(2)"/>
        <xsl:value-of select="regex-group(3)"/>
        <xsl:value-of select="regex-group(4)"/>
            <xsl:value-of select="format-number(number(regex-group(6)), '0000')"/>
        <xsl:choose>
            <xsl:when test="regex-group(8)">
            <xsl:value-of select="format-number(number(regex-group(8)), '0000')"/>
        </xsl:when>
        <xsl:otherwise>
<xsl:text>0000</xsl:text>
        </xsl:otherwise></xsl:choose>
        <xsl:text>@</xsl:text>
        <xsl:value-of select="regex-group(5)"/>
        <xsl:value-of select="regex-group(14)"/>
        <xsl:value-of select="regex-group(15)"/>
        <xsl:value-of select="regex-group(16)"/>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
        <xsl:value-of select="."/>
    </xsl:non-matching-substring>
</xsl:analyze-string>
    </xsl:template>
    <xsl:template name="plut">
        <xsl:analyze-string select="." regex="([A-Za-z0-9]+)*(!)(\\*)([A-Za-z0-9]+)*(\{{)*([A-Za-z0-9]+)*(\}})*((!)((\d+)(\.)*(\d*)(\-*)(\d*)((\.)(\d*))*)(\|)*([A-Za-z0-9]+)*)*(\}})">
            <xsl:matching-substring>
                <xsl:value-of select="regex-group(1)"/>
                <xsl:value-of select="regex-group(2)"/>
                <xsl:value-of select="regex-group(3)"/>
                <xsl:value-of select="regex-group(4)"/>
                <xsl:value-of select="regex-group(5)"/>
                <xsl:value-of select="regex-group(6)"/>
                <xsl:value-of select="regex-group(7)"/>
                <xsl:value-of select="regex-group(9)"/>
                
                <xsl:value-of select="format-number(number(regex-group(10)), '0000')"/>
                <xsl:choose>
                    <xsl:when test="regex-group(12)">
                        <xsl:value-of select="format-number(number(regex-group(12)), '0000')"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:text>0000</xsl:text>
                    </xsl:otherwise></xsl:choose>
                <xsl:text>@</xsl:text>
                <xsl:value-of select="regex-group(10)"/>
                <xsl:value-of select="regex-group(19)"/>
                <xsl:value-of select="regex-group(20)"/>
                <xsl:value-of select="regex-group(21)"/>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>
This works very well on a small file with only a selection of all things I want to match, which I have used to test. It does not work when I try to run it on a larger file. Please, do you have any suggestion on why this happens? Thank you very much
Pietro
 
Old November 9th, 2013, 01:08 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

It's not so much that XSLT is unsuited to this task, as that regular expressions are unsuited to it. I've no idea if latex is actually a regular language (=one that is cabable of being parsed using regular expressions), but even if it is, a regular expression that allows backtracking will often have performance that increases exponentially with the size of the string to be parsed. (And I mean expontentially in the technical sense of the term.)

I can't advise you in detail without knowing the syntax of the language you are trying to analyse, but I would have thought a conventional (e.g. recursive descent) parsing approach without unlimited lookahead or backtracking would be the right approach.

Looking very quickly at your actual regex, it starts

([A-Za-z0-9]+)*

which is surely equivalent to

[A-Za-z0-9]*

Nested repetitions like this vastly increase the number of possible ways in which a regular expression can match a string, and therefore worsen the performance of the matching algorithm.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
The Following User Says Thank You to mhkay For This Useful Post:
pietromaria.liuzzo (November 11th, 2013)





Similar Threads
Thread Thread Starter Forum Replies Last Post
Analyze string variable JohnBampton XSLT 1 September 16th, 2009 04:07 AM
Regex in xsl:analyze-string mrame XSLT 2 July 1st, 2009 01:06 AM
xsl:analyze-string - units of measure bonekrusher XSLT 2 April 16th, 2009 11:12 AM
analyze-string and variables pcase XSLT 1 June 8th, 2007 04:22 PM
XSL Transform with xsl string NOT xsl file skin XSLT 0 June 16th, 2003 07:30 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.