Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > XML > XSLT
Password Reminder
Register
Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developersí questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old May 9th, 2009, 10:15 AM
Registered User
 
Join Date: May 2009
Posts: 3
Thanks: 3
Thanked 0 Times in 0 Posts
Default Need faster pivoting XSLT

I have an input XML like the following:
Code:
<resultset>
    <heading>
        <column>server</column>
        <column>name</column>
        <column>val</column>
        <column>at</column>
    </heading>
    <rows>
        <row>
            <server>wmi1</server>
            <name>cpu</name>
            <val>1</val>
            <at>0000</at>
        </row>
        <row>
            <server>wmi1</server>
            <name>Current-service-threads</name>
            <val>1</val>
            <at>0000</at>
        </row>
        <row>
            <server>wmi1</server>
            <name>Current-system-threads</name>
            <val>148</val>
            <at>0000</at>
        </row>
        <row>
            <server>wmi1</server>
            <name>cpu</name>
            <val>2</val>
            <at>0005</at>
        </row>
        <row>
            <server>wmi1</server>
            <name>Current-service-threads</name>
            <val>4</val>
            <at>0005</at>
        </row>
        <row>
            <server>wmi1</server>
            <name>Current-system-threads</name>
            <val>142</val>
            <at>0005</at>
        </row>
        .....
    </rows>
</resultset>
The actual input is large --the "at" element runs from 0000 to 2355 in increments of 5, and the server element's value changes from wmi1 to wm10 in increments of 1. So you can imagine how big the input file is.

I need to get an output like so:
Code:
<resultset>
    <heading>
        <column>at</column>
        <column>wmi1-cpu</column>
        <column>wmi1-Current-service-threads</column>
        ...
    <heading>
    <rows>
        <row>
            <at>0000</at>
            <wmi1-cpu>1</server-name>
            <wmi1-Current-service-threads>1</wmi1-Current-service-threads>
            <wmi1-Current-system-threads>148</wmi1-Current-system-threads>
        </row>
        <row>
            <at>0005</at>
            <wmi1-cpu>2</server-name>
            <wmi1-Current-service-threads>4</wmi1-Current-service-threads>
            <wmi1-Current-system-threads>142</wmi1-Current-system-threads>
        </row>
        .....
    </rows>
</resultset>
I created the following XSLT program, and that works.
Code:
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:output method="xml" />

    <xsl:key name="key_server" match="row" use="server"/>
    <xsl:variable name="all_server">
        <xsl:copy-of select="/resultset/rows/row[count(.|key('key_server', server)[1])=1]/server"/>
    </xsl:variable>
    
    <xsl:key name="key_name" match="row" use="name"/>
    <xsl:variable name="all_name">
        <xsl:copy-of select="/resultset/rows/row[count(.|key('key_name', name)[1])=1]/name"/>
    </xsl:variable>
    
    <xsl:key name="key_at" match="row" use="at"/>
    <xsl:variable name="all_at">
        <xsl:copy-of select="/resultset/rows/row[count(.|key('key_at', at)[1])=1]/at"/>
    </xsl:variable>

    <xsl:variable name="docu"><xsl:copy-of select="/"/></xsl:variable>
    
    <xsl:template match="/">
        <resultset>
            <heading>
                <column>at</column>
                <xsl:for-each select="$all_server/*">
                    <xsl:sort select="."/>
                    <xsl:variable name="server"><xsl:value-of select="."/></xsl:variable>
                    <xsl:for-each select="$all_name/*">
                        <xsl:sort select="."/>
                        <xsl:variable name="name"><xsl:value-of select="."/></xsl:variable>
                        <column><xsl:value-of select="$server"/>-<xsl:value-of select="$name"/></column>
                    </xsl:for-each>
                </xsl:for-each>
            </heading>
            <rows>
                   <xsl:for-each select="$all_at/*">
                      <xsl:sort select="." data-type="number"/>
                    <xsl:variable name="at"><xsl:value-of select="."/></xsl:variable>
                    <row>
                        <xsl:message><xsl:value-of select="$at"/></xsl:message>
                        <at><xsl:value-of select="$at"/></at>
                        <xsl:for-each select="$all_server/*">
                            <xsl:sort select="."/>
                            <xsl:variable name="server"><xsl:value-of select="."/></xsl:variable>
                            <xsl:for-each select="$all_name/*">
                                <xsl:sort select="."/>
                                <xsl:variable name="name"><xsl:value-of select="."/></xsl:variable>
                                <xsl:element name="{$server-$name}">
                                    <xsl:value-of select="$docu/resultset/rows/row[server=$server and name=$name and at=$at]/val"/>
                                </xsl:element>
                            </xsl:for-each>
                        </xsl:for-each>
                    </row>
                </xsl:for-each>
            </rows>
        </resultset>
    </xsl:template>

</xsl:stylesheet>
But, the trouble is, it literally takes about an hour to do the job in a decent machine. I have written so many other XSLTs for almost close to a decade now.
But there are some cases where I am not writing something efficiently where the CPU goes sky high and it takes over an hour!. What is the key point here that I am missing.

I figure I must be doing something in XSLT in a less efficient way than what XSLT is capable of delivering.
Reply With Quote
  #2 (permalink)  
Old May 10th, 2009, 11:22 AM
Friend of Wrox
Points: 6,525, Level: 34
Points: 6,525, Level: 34 Points: 6,525, Level: 34 Points: 6,525, Level: 34
Activity: 50%
Activity: 50% Activity: 50% Activity: 50%
 
Join Date: Nov 2007
Location: Germany
Posts: 1,221
Thanks: 0
Thanked 238 Times in 237 Posts
Default

Your presented stylesheet does not even "compile", the line
Code:
<xsl:element name="{$server-$name}">
is giving me an error. When I correct that to
Code:
<xsl:element name="{$server}-{$name}">
it compiles but the next line is giving a runtime error:
Code:
                                    <xsl:value-of select="$docu/resultset/rows/row[server=$server and name=$name and at=$at]/val"/>
as it tries to use an XPath expression on a result tree fragment (i.e. $docu).
So I guess instead of
Code:
    <xsl:variable name="docu"><xsl:copy-of select="/"/></xsl:variable>
you might simply want
Code:
    <xsl:variable name="docu" select="/"/>
So far I am astonished that you are able to use the presented stylesheet at all with an XSLT 1.0 processor as there are three further variables creating a result tree fragment where later on an XPath expression is used on them. And the first syntax error would not even work with an XSLT 2.0 processor.

So which XSLT processor exactly do you use? If that is an XSLT 2.0 processor, have you considered whether the problem can be solved using XSLT 2.0 grouping perhaps?

How large is that file that takes an hour to process? Have you tried different XSLT processors?

Because of all those problems I have not looked at the transformation in detail to try to make suggestions on improving the performance. Let's first establish whether you want to use XSLT 1.0 or 2.0.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
Reply With Quote
The Following User Says Thank You to Martin Honnen For This Useful Post:
keith h nicholsen (May 10th, 2009)
  #3 (permalink)  
Old May 10th, 2009, 01:36 PM
Friend of Wrox
Points: 6,525, Level: 34
Points: 6,525, Level: 34 Points: 6,525, Level: 34 Points: 6,525, Level: 34
Activity: 50%
Activity: 50% Activity: 50% Activity: 50%
 
Join Date: Nov 2007
Location: Germany
Posts: 1,221
Thanks: 0
Thanked 238 Times in 237 Posts
Default

I had now a look at what the stylesheet does and assuming your are using an XSLT 2.0 processor anyway I have done some changes that might improve performance.
Instead of using keys to find distinct values I have the distinct-values function. Instead of sorting the values stored in the global variables again and again in xsl:for-eachs I have used xsl:perform-sort to sort them only once and then use the sorted sequences.
Code:
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xsd"
    version="2.0">

    <xsl:output method="xml" indent="yes"/>
    
    <xsl:variable name="all_server" as="xsd:string*">
      <xsl:perform-sort select="distinct-values(/resultset/rows/row/server)">
        <xsl:sort select="."/>
      </xsl:perform-sort>
    </xsl:variable>
    
    <xsl:variable name="all_name" as="xsd:string*">
      <xsl:perform-sort select="distinct-values(/resultset/rows/row/name)">
        <xsl:sort select="."/>
      </xsl:perform-sort>
    </xsl:variable>
    
    <xsl:variable name="all_at" as="xsd:string*">
      <xsl:perform-sort select="distinct-values(/resultset/rows/row/at)">
        <xsl:sort select="." data-type="number"/>
      </xsl:perform-sort>
    </xsl:variable>

    <xsl:variable name="docu" select="/"/>
    
    <xsl:template match="/">
        <resultset>
            <heading>
                <column>at</column>
                <xsl:for-each select="$all_server">
                    <xsl:variable name="server" select="."/>
                    <xsl:for-each select="$all_name">
                        <xsl:variable name="name" select="."/>
                        <column><xsl:value-of select="$server"/>-<xsl:value-of select="$name"/></column>
                    </xsl:for-each>
                </xsl:for-each>
            </heading>
            <rows>
                   <xsl:for-each select="$all_at">
                      <xsl:variable name="at" select="."/>
                      <row>
                        <xsl:message><xsl:value-of select="$at"/></xsl:message>
                        <at><xsl:value-of select="$at"/></at>
                        <xsl:for-each select="$all_server">
                            <xsl:variable name="server" select="."/>
                            <xsl:for-each select="$all_name">
                                <xsl:variable name="name" select="."/>
                                <xsl:element name="{$server}-{$name}">
                                    <xsl:value-of select="$docu/resultset/rows/row[server=$server and name=$name and at=$at]/val"/>
                                </xsl:element>
                            </xsl:for-each>
                        </xsl:for-each>
                      </row>
                  </xsl:for-each>
            </rows>
        </resultset>
    </xsl:template>

</xsl:stylesheet>
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
Reply With Quote
The Following User Says Thank You to Martin Honnen For This Useful Post:
keith h nicholsen (May 10th, 2009)
  #4 (permalink)  
Old May 10th, 2009, 02:33 PM
mhkay's Avatar
Wrox Author
Points: 18,274, Level: 58
Points: 18,274, Level: 58 Points: 18,274, Level: 58 Points: 18,274, Level: 58
Activity: 100%
Activity: 100% Activity: 100% Activity: 100%
 
Join Date: Apr 2004
Location: Reading, Berks, United Kingdom.
Posts: 4,932
Thanks: 0
Thanked 282 Times in 277 Posts
Default

Martin: I think a number of XSLT 1.0 processors let you get away with accessing an RTF using a path expression. For example, I think Xalan does. If you're using Xalan, then the first thing to do is to try it on something else; many XSLT processors are 10 times faster than Xalan.

The first inefficiency I can see in this code is the repeated use of

<xsl:variable>
<xsl:value-of select="xxx"/>
</xsl:variable>

or

<xsl:variable>
<xsl:copy-of select="xxx"/>
</xsl:variable>

when you could simply use

<xsl:variable select="xxx"/>

thus avoiding the need to copy large amounts of data.

The repeated sorting is also bad, as Martin points out: you should save the sorted result in a variable.

But I suspect the main culprit is this filter expression:

<xsl:value-of select="$docu/resultset/rows/row[server=$server and name=$name and at=$at]/val"/>


Most XSLT processors don't have a powerful enough optimizer to deal with what is effectively a multi-way join. On exception is Saxon-SA: you'll probably get a good speed-up by running this using Saxon-SA.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
Reply With Quote
The Following User Says Thank You to mhkay For This Useful Post:
keith h nicholsen (May 10th, 2009)
  #5 (permalink)  
Old May 10th, 2009, 07:58 PM
Registered User
 
Join Date: May 2009
Posts: 3
Thanks: 3
Thanked 0 Times in 0 Posts
Red face It works way better.

Wow. This is amazing --you guys are amazing. One of the smaller input files that I tested took about 25 minutes with the original code. Now, with the modified version that Martin provided, it only took 35 seconds!.

Michael and Martin, I can't thank you enough for taking the time to review the scenario and pointing out exactly where the inefficienes were in my XSLT program. I hope others benefit from your insights too.

It surprises me that the original stylesheet was not even compiling for Martin. I could compile+run the stylesheet okay even with the "offending" /xsl:element/@name's value of {$server-$name} and the xpath with $docu as mentioned in the original post.

The application's classpath contains saxon9*.jars and xalan.jar+xercesImpl.jar as well. My intent is to go 100% SAX since I wanted to use more and more of the XSLT 2.0 features --since the app was originally written using xalan.jar+xercesImpl.jar I couldn't as easily get rid of the XALAN/XERCES pieces as I wish since other parts of application screams!. That said, I at least want to first get the XML parsing and XSLT processing handled by SAX. I have a suspicion that XERCES may be parsing our XMLs... I am not sure, but I will research that.

All this gain was possible even with out using SAX-SA. As Michael pointed out when we switch to SAX-SA I am sure we are going to see another sizable jump in performance.

You gurus saved the day! Thank you again.
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Faster Performance iceandrews XSLT 2 February 2nd, 2009 05:17 PM
Faster Performance iceandrews BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition ISBN: 978-0-470-19274-0 1 February 2nd, 2009 05:06 PM
which is faster pegasus51 Ajax 3 December 30th, 2007 08:56 AM
They Want It To Be Faster! kindler Access 7 December 20th, 2005 10:54 AM
Any faster? Ben Access VBA 19 March 12th, 2004 07:04 PM



All times are GMT -4. The time now is 07:18 AM.


Powered by vBulletin®
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.