Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old June 14th, 2011, 08:54 AM
Authorized User
 
Join Date: Apr 2008
Posts: 70
Thanks: 17
Thanked 1 Time in 1 Post
Send a message via Yahoo to iceandrews
Default Performance Question

We have an issue with performance with one of our XSLTs. For normal sized documents it runs fine, but when input documents get to be 30-50MB (Yes they're that large) we have some issues. The basic functionality of the XSLT is the identity template. Then we have 1 line templates that match node sets and remove them from the input tree.

So which template type actually runs and processes faster? The subtle difference between these performances make a big difference.

A Union of nodes
Code:
<xsl:template match="NODE[Large predicate 1] | NODE[Large predicate 2] | NODE[Large predicate 3]" />
A single path of ORs
Code:
<xsl:template match="NODE[Large predicate 1 or Large predicate 2 or Large predicate 3]"/>
Separate Templates
Code:
<xsl:template match="NODE[Large predicate 1]"/>
<xsl:template match="NODE[Large predicate 2]"/>
<xsl:template match="NODE[Large predicate 3]"/>
We're using Saxon 8.9.4 in an enterprise service bus implementation. Thank you
 
Old June 14th, 2011, 09:05 AM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

I don't know details of any XSLT processor implementation but I would expect your first option and the third option to result in the same implementation path in most implementations, simply as match="foo | bar" is defined as shortcut for match="foo" and match="bar".
I don't know about the second option, generally the best way to answer performance questions is to set up test cases and measure performance.
__________________
Martin Honnen
Microsoft MVP (XML, Data Platform Development) 2005/04 - 2013/03
My blog
The Following User Says Thank You to Martin Honnen For This Useful Post:
iceandrews (June 14th, 2011)
 
Old June 14th, 2011, 09:24 AM
Authorized User
 
Join Date: Apr 2008
Posts: 70
Thanks: 17
Thanked 1 Time in 1 Post
Send a message via Yahoo to iceandrews
Default

We are in the process of running some in depth profiles of various flavors of the XSLT. When the input documents bloat to 30MB+, the XSLT takes 2+ hours to run. When the input documents are smaller sized (<5MB), the XSLT has nearly the exact same performance time. So it's not scaling well. So setting each profile with large documents to look at the scaling and getting results just takes lots of time. There's obviously a lot of detail I'm leaving out because I don't want to get too complicated in a forum post.

I just wanted to get some general performance advice between those template ideas so I had a direction to create other flavors. If I know that a having a separate templates vs the union of those templates isn't going to buy me much, I won't put much effort into trying to re-write the XSLT to get there. So your thoughts are very welcome. Thanks.
 
Old June 14th, 2011, 10:33 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Plotting performance against document size is a good plan: if the dependence is quadratic, this helps to narrow down the investigation.

Try running the Saxon profiling tool (-TP option). This may not be possible on the larger documents as the trace files get impossibly big, but you can still get useful information running against smaller documents.

Quadratic performance can usually be detected quite easily in the source code, and it's often easy to eliminate (replace filter expressions with calls on key(), replace hand-written grouping that uses the preceding axes with for-each-group, etc).

Your conjecture about match patterns is a wild guess and almost certainly irrelevant. This kind of guesswork isn't generally a useful diagnostic technique. You need to drill down: get data about where the time is being spent.

If performance is a concern, then I would certainly try moving to a more recent release of the software. It might not help, but it costs nothing to try.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old June 14th, 2011, 11:01 AM
Authorized User
 
Join Date: Apr 2008
Posts: 70
Thanks: 17
Thanked 1 Time in 1 Post
Send a message via Yahoo to iceandrews
Default

Thanks for the tips. We are certainly using the -TP option and generating the profile data set. It's been really useful. When we have large inputs, the profile gets to be 100MB+ but actually is still useful.

We found the timing-profile converter here (Link) that takes the results of Saxon's -TP and produces a nice tabled HTML page. It's been VERY helpful in giving us useful information about which templates and instructions are taking the most time. Even if the profile output is 100MB, we just run jobs overnight to produce out results. I've also linked one of the reports we generated so you could see a bit of detail. http://dl.dropbox.com/u/11278475/summary.html

Thanks for you help as always.
 
Old June 14th, 2011, 11:48 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

The immediate thing I notice from the timing profile is the heavy use of match="*:local" in match patterns. That's very bad news from a performance point of view because it means Saxon will need to test all the patterns against each node, it can't rely on hashing the element name. (However, this will be a linear effect, not quadratic, so if you're getting quadratic performance then this problem may be unrelated.)
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
The Following User Says Thank You to mhkay For This Useful Post:
iceandrews (June 14th, 2011)
 
Old June 14th, 2011, 12:52 PM
Authorized User
 
Join Date: Apr 2008
Posts: 70
Thanks: 17
Thanked 1 Time in 1 Post
Send a message via Yahoo to iceandrews
Default

Yeah, unfortunately that's just the way it has to be. This adapter is part of a enterprise middleware SOA architecture that consumes documents that can be in any namespace, no namespace or mixed namespaces.

As you've said it just adds a flat cost though and wouldn't have a increased scaling issue. We've found some nice tricks over the course of today based on this thread's suggestion.

It's a really interesting topic though and one of the reason I really love XSLT. the transformation in its current form is well written from a best practice/theoretical standpoint. Delving into all this "devilry" to improve performance has been a great learning experience.
 
Old June 14th, 2011, 05:05 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

"This adapter is part of a enterprise middleware SOA architecture that consumes documents that can be in any namespace, no namespace or mixed namespaces."

Generally my advice for that kind of situation is to first do a pass that normalizes the namespaces, for ease of subsequent processing.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference





Similar Threads
Thread Thread Starter Forum Replies Last Post
Question on scalability and performance ThanhD BOOK: ASP.NET 2.0 Website Programming Problem Design Solution ISBN: 978-0-7645-8464-0 4 April 17th, 2007 02:13 PM
indexing performance question BinFrog SQL Server 2000 1 February 23rd, 2005 11:47 PM
query performance question kBusby Oracle 3 February 14th, 2005 04:42 PM
performance question.. gbianchi Pro VB 6 6 October 8th, 2003 11:21 AM
Performance Question Kenny Alligood VB Databases Basics 2 August 11th, 2003 08:54 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.