Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old September 26th, 2006, 03:06 AM
Authorized User
 
Join Date: Sep 2006
Posts: 92
Thanks: 0
Thanked 0 Times in 0 Posts
Default Strategies for large XML files

Hi Everyone,

As my users find their way around the XML/XSL reports that we have built, they are asking for more and more detailed data. This is great but it means that our XML files are getting bigger and bigger and the access time for retrieving data is getting longer and longer.

I am currently investigating switching the data sources (i.e. splitting up the data into various XML files that would, for example, hold distinct time periods) and hope that this will speed things up.

But I also wanted to ask if anyone out there has any ideas? I suppose that, at some point, I will be forced to resort to using a database back-end (with indexing) but I am curious to know where the practical limits of accessing large XML files is?

Any tips and any annacdotes about your experience would be very interesting for me.

Regards and thanks,
Alan Searle.

 
Old September 26th, 2006, 03:35 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

Unless you are prepared to invest in XML specific hardware, a solution that one company I worked with tried out, large XML files are difficult. Most of our data is held in a relational database and the XML emerges after the filters are applied. It is then transformed. We do store such things a s invoices as XML as it helps recreate them for audits etc. but they are not huge.

You can also use SAX or .NET's XmlReader for linear processing, this is often a good start for breaking down large documents that contain repetitive data; for example multiple invoices. You can then process the section using XSLT.


--

Joe (Microsoft MVP - XML)
 
Old September 26th, 2006, 09:58 AM
Authorized User
 
Join Date: Sep 2006
Posts: 92
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks very much for this tip: I googled on the key words and there seems to be a some good sources I can use.

Cheers,
Alan.

 
Old September 26th, 2006, 12:37 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

I've come across users who asked about processing "large" files and then discovered they meant 1Mb. The first thing is to provide some numbers.

A lot depends on the access pattern. If you're loading the document into memory in order to get one piece of information out of it, then the parsing time is the dominant factor; or rather, the relationship of the parsing time to your required response time. Otherwise it may be memory that's the limiting factor. Or it might be that you're doing complex joins and the queries are showing O(n^2) performance, in which case you can probably solve the problem using keys.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old September 27th, 2006, 02:46 AM
Authorized User
 
Join Date: Sep 2006
Posts: 92
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi Michael,

...

[quote]Originally posted by mhkay
 I've come across users who asked about processing "large" files and then discovered they meant 1Mb. The first thing is to provide some numbers.

I seem to start having problems over 5Mb but this is because I 'graze' the XML file to get the contents of picklist that I display in the header.

My plan is to generate picklists separately and access them without reading the XML file.

I also want to split my data files (XMLs) and then connect them 'on-demand'. This would mean that I could probably reduce the size of each file to about 2mb.

A lot depends on the access pattern. If you're loading the document into memory in order to get one piece of information out of it, then the parsing time is the dominant factor; or rather, the relationship of the parsing time to your required response time. Otherwise it may be memory that's the limiting factor. Or it might be that you're doing complex joins and the queries are showing O(n^2) performance, in which case you can probably solve the problem using keys.

I do all the joining in an Oracle DB before I export to XML.

It's interesting that keys can help speed: I currently use them for generating my picklists and for grouping. I will investigate how I can use them more.

Many thanks for your tips.

Cheers,
Alan.


 
Old September 27th, 2006, 03:31 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

I've never had problems on a server with files less than 100Mb, in my experience DOM takes about 3 to 4 times the file size so 100Mb ~= abot 350Mb of RAM. For creating picklists rather tahn complex XSLT I'd use XmlReader/SAX, if you are using .NET and show an example of the file structure I'll try to come up with an example if you need it.

--

Joe (Microsoft MVP - XML)
 
Old September 27th, 2006, 09:33 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

If the size is 5Mb, that should be quite manageable. What kind of performance are you seeing, and what performance do you require?

There will be two aspects to the cost: XML parsing time and transformation time. The parsing time will be fixed, and there's no way of getting this down other than reducing the file size. The transformation time depends on your code, and it might be possible to get it down considerably. Try to measure the two components separately so you can see where the costs are being incurred.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
 
Old September 28th, 2006, 02:38 AM
Authorized User
 
Join Date: Sep 2006
Posts: 92
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi Michael, hi Joe,

I am in the middle of a redesign phase at the moment and plan to remove all 'picklist' generation to an external file which will mean that my code doesn't have to 'graze' the source XML file any more.

I will see what speed this brings and will then come back to you.

It is very encouraging that you say that larger files should be no problem. They will also be accessed over a network so I will see how that performs.

I'll give more feedback as soon as I have implemented my changes.

Many thanks,
Alan.






Similar Threads
Thread Thread Starter Forum Replies Last Post
Reading large files ravichandrae Pro Java 1 January 11th, 2008 04:42 AM
Uploading Large Files to a Doc Lib viccoleman SharePoint Admin 1 May 15th, 2006 01:13 PM
uploading large files a 4Mb limit seems present! Grahame2003 C# 2 December 4th, 2003 05:09 AM
Merge XML files into a xml file using xslt lxu XML 4 November 6th, 2003 06:01 PM
handle large data files in VB andy_routledge Pro VB 6 1 August 6th, 2003 01:08 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.