p2p.wrox.com Forums

Need to download code?

View our list of code downloads.


  Return to Index  

xslt thread: Transforming large XML-Files


Message #1 by "Tobias Blickle" <T.Blickle@i...> on Fri, 07 Jun 2002 07:58:41 +0200
Yes, these are all good suggestions.

Another suggestion is to write a SAX filter that runs before the
transformation. The SAX filter can split the document up into lots of
small documents and transform each one separately; if necessary you can
have another SAX filter after the transformation that puts the results
back together again.

Saxon's default tree implementation probably takes less memory than JDOM
and is certainly faster.

200Mb is feasible for XSLT, but you should have at least 2Gb of physical
memory, and make sure it is allocated to the Java VM.

Michael Kay
Software AG
home: Michael.H.Kay@n...
work: Michael.Kay@s... 

> -----Original Message-----
> From: Guijt, Bart (fin) [mailto:Bart.Guijt@o...] 
> Sent: 07 June 2002 08:34
> To: P2P_XSLT
> Subject: [xslt] RE: Transforming large XML-Files
> 
> 
> I see!
> 
> Here are some options:
> 
> 1. Use the incremental property in Xalan: 
> TransformerFactory.setAttribute( 
> "http://xml.apache.org/xalan/features/incremental", 
> Boolean.TRUE ) This _should_ tell xalan to process the 
> document while it is read, instead of afterwards. I believe 
> this feature is not finished yet (2.3.1) but I'm not sure.
> 
> 2. Saxon has a preview mode which enables you to attach a 
> handler to certain elements. Don't know details about this, 
> but this is no generic way of 'stream' processing your XML: 
> your java code needs to know some details about the transformation.
> 
> 3. Find an XML parser which uses as less memory as possible. 
> JDOM is perhaps the best candidate here, which is supported by Saxon.
> 
> 4. Use JVM parameters like '-Xmx256m'
> 
> 
> - Bart Guijt
> 
> Ordina Finance Utopics
> 
> E: bart.guijt@o...
> W: http://www.utopics.nl/
> 
> A: Paalbergweg 46
>    1105 BW Amsterdam
> P: Postbus 94690
>    1090 GR Amsterdam
> 
> T: +xx xx xxx xxxx
> F: +xx xx xxx xxxx
> M: +xx x xxxx xxxx
> 
> 
> 
> > ----------
> > From: 	Tobias Blickle[SMTP:T.Blickle@i...]
> > Reply To: 	P2P_XSLT
> > Sent: 	Friday, June 07, 2002 8:52 AM
> > To: 	P2P_XSLT
> > Subject: 	[xslt] Antw: RE: Transforming large XML-Files
> > 
> > Dear Bart,
> > 
> > that's exactly what I have already done. But then I have to 
> implement 
> > the desired transformation in Java - and need a new class 
> for each new 
> > transformation.
> > 
> > Of course I don't want to output "X" - but I ended up with 
> this simple 
> > stylesheet that still can't be transformed without "out of memory".
> > 
> > Regards
> > Tobias
> > 
> > >>> Bart.Guijt@o... 07.06.2002 08:30:11 >>>
> > Perhaps the best you could try is to write your own SAX 
> > ContentHandler, which takes your data from a SAXParser and performs 
> > simple actions on that.
> > If you don't need much context information *and* performance is an
> > issue,
> > SAX is the way to go IMHO.
> > 
> > What exactly is your transformation - just outputting the 'X'?
> > 
> > Ciao,
> > 
> > 
> > - Bart Guijt
> > > ----------
> > > From: 	Tobias Blickle[SMTP:T.Blickle@i...] 
> > > HI,
> > > 
> > > I try to transform a large XML-file (about 200 MB) using XSLT in 
> > > another file. I use quite simple transforms,but the XLST 
> processors 
> > > (XALAN and
> > SAXON)
> > > 
> > > run out of memory.
> > > 
> > > I'm aware that in the general case the XML-Inputfile must 
> be stored
> > as
> > > DOM-tree in memory, however I hoped there could be some 
> > > XSLT-statements and some XSLT implementations that do not 
> store the 
> > > whole document in memory in every case.
> > > 
> > > Has anybody a solution to this?
> > > 
> > > Regards,
> > > Tobias
> > > 
> > > PS: The XML source document contains about 10.000 <reportitem> 
> > > elements, and all I try is to write an "X" to the
> > output
> > > for each element in the document using:
> > > <xsl:template
> > match="reportitem"><xsl:text>X</xsl:text></xsl:template>
> > > 
> > > 
> > 
> > 
> > 
> 
> 


  Return to Index