Performance Question

iceandrews · June 14th, 2011, 08:54 AM

We have an issue with performance with one of our XSLTs. For normal sized documents it runs fine, but when input documents get to be 30-50MB (Yes they're that large) we have some issues. The basic functionality of the XSLT is the identity template. Then we have 1 line templates that match node sets and remove them from the input tree.

So which template type actually runs and processes faster? The subtle difference between these performances make a big difference.

A Union of nodes

Code:

<xsl:template match="NODE[Large predicate 1] | NODE[Large predicate 2] | NODE[Large predicate 3]" />

A single path of ORs

Code:

<xsl:template match="NODE[Large predicate 1 or Large predicate 2 or Large predicate 3]"/>

Separate Templates

Code:

<xsl:template match="NODE[Large predicate 1]"/>
<xsl:template match="NODE[Large predicate 2]"/>
<xsl:template match="NODE[Large predicate 3]"/>

We're using Saxon 8.9.4 in an enterprise service bus implementation. Thank you

Martin Honnen · June 14th, 2011, 09:05 AM

I don't know details of any XSLT processor implementation but I would expect your first option and the third option to result in the same implementation path in most implementations, simply as match="foo | bar" is defined as shortcut for match="foo" and match="bar".
I don't know about the second option, generally the best way to answer performance questions is to set up test cases and measure performance.

iceandrews · June 14th, 2011, 09:24 AM

We are in the process of running some in depth profiles of various flavors of the XSLT. When the input documents bloat to 30MB+, the XSLT takes 2+ hours to run. When the input documents are smaller sized (<5MB), the XSLT has nearly the exact same performance time. So it's not scaling well. So setting each profile with large documents to look at the scaling and getting results just takes lots of time. There's obviously a lot of detail I'm leaving out because I don't want to get too complicated in a forum post.

I just wanted to get some general performance advice between those template ideas so I had a direction to create other flavors. If I know that a having a separate templates vs the union of those templates isn't going to buy me much, I won't put much effort into trying to re-write the XSLT to get there. So your thoughts are very welcome. Thanks.

mhkay · June 14th, 2011, 10:33 AM

Plotting performance against document size is a good plan: if the dependence is quadratic, this helps to narrow down the investigation.

Try running the Saxon profiling tool (-TP option). This may not be possible on the larger documents as the trace files get impossibly big, but you can still get useful information running against smaller documents.

Quadratic performance can usually be detected quite easily in the source code, and it's often easy to eliminate (replace filter expressions with calls on key(), replace hand-written grouping that uses the preceding axes with for-each-group, etc).

Your conjecture about match patterns is a wild guess and almost certainly irrelevant. This kind of guesswork isn't generally a useful diagnostic technique. You need to drill down: get data about where the time is being spent.

If performance is a concern, then I would certainly try moving to a more recent release of the software. It might not help, but it costs nothing to try.

iceandrews · June 14th, 2011, 11:01 AM

Thanks for the tips. We are certainly using the -TP option and generating the profile data set. It's been really useful. When we have large inputs, the profile gets to be 100MB+ but actually is still useful.

We found the timing-profile converter here (Link) that takes the results of Saxon's -TP and produces a nice tabled HTML page. It's been VERY helpful in giving us useful information about which templates and instructions are taking the most time. Even if the profile output is 100MB, we just run jobs overnight to produce out results. I've also linked one of the reports we generated so you could see a bit of detail. http://dl.dropbox.com/u/11278475/summary.html

Thanks for you help as always.

mhkay · June 14th, 2011, 11:48 AM

The immediate thing I notice from the timing profile is the heavy use of match="*:local" in match patterns. That's very bad news from a performance point of view because it means Saxon will need to test all the patterns against each node, it can't rely on hashing the element name. (However, this will be a linear effect, not quadratic, so if you're getting quadratic performance then this problem may be unrelated.)

iceandrews · June 14th, 2011, 12:52 PM

Yeah, unfortunately that's just the way it has to be. This adapter is part of a enterprise middleware SOA architecture that consumes documents that can be in any namespace, no namespace or mixed namespaces.

As you've said it just adds a flat cost though and wouldn't have a increased scaling issue. We've found some nice tricks over the course of today based on this thread's suggestion.

It's a really interesting topic though and one of the reason I really love XSLT. the transformation in its current form is well written from a best practice/theoretical standpoint. Delving into all this "devilry" to improve performance has been a great learning experience.

mhkay · June 14th, 2011, 05:05 PM

"This adapter is part of a enterprise middleware SOA architecture that consumes documents that can be in any namespace, no namespace or mixed namespaces."

Generally my advice for that kind of situation is to first do a pass that normalizes the namespaces, for ease of subsequent processing.