Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old April 8th, 2012, 07:57 PM
Registered User
 
Join Date: Apr 2012
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default How to split +2Gb XML into 500Mb chunks?

I have several +2Gb XML files (from openstreetmap.org as .OSM) which I want to 'shred' to SQL server. For that 2Gb-1 byte is the max size I can handle.

The structure is like this:
Code:
<?xml version='1.0' encoding='UTF-8'?>
<osm version="" generator="">
  <node id="" lat="" lon="" version="" changeset="" user="" uid="" timestamp=""/>
  <node id="" lat="" lon="" version="" changeset="" user="" uid="" timestamp="">
      <tag k="" v="" />
      <tag k="" v="" />
  </node>
  <way id="" version="" changeset="" user="" uid="" timestamp="">
      <nd ref=""/>
      <nd ref=""/>
      <tag k="" v="" />
      <tag k="" v="" />
  </way>
  <relation id="" version="" changeset="" user="" uid="" timestamp="">
      <member type="" ref="" role=""/>
      <member type="" ref="" role=""/>
      <tag k="" v="" />
      <tag k="" v="" />
  </relation>
</osm>
In short, millions of nodes , followed by millions of ways, followed by millions of relations, so splitting into one file per node/way/relation is not really an option.

I don't know that much about XML and don't even know if current XSLT processors can handle +2Gb sizes.

I managed to read the file line by line into a table and use a cursor in T-SQL to combine the lines to well-formed XML chunks for further processing. It does the trick, but takes 'forever' (several hours) for even far smaller files (60Mb). So I was wondering if XSLT 2.0 transform could speed up the chunking.

Cheers
 
Old April 9th, 2012, 03:38 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Most XSLT processors can't handle anything approaching this size. You'll need a processor that handles streaming, such as Saxon-EE. However, this is still bleeding edge, so I'm a bit reluctant to recommend it to a newbie: there's certainly a learning curve. You say you want the data in 500Mb chunks, but I wonder if that's really your requirement? The question is, what exactly will SQL Server accept that ends up creating a useful database?

Rather than splitting the data, another approach might be to write a program that reads the XML in a streaming mode, and issues SQL insert instructions for each "record" that it encounters.

Another approach might be to try and persuade OSM to generate multiple smaller chunks of XML by refining your search criteria.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
 
Old April 10th, 2012, 02:29 PM
Registered User
 
Join Date: Apr 2012
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks for your thoughts.

Yes there is SSIS, which is probably capable to loading this straight into tables.

And No, I cannot be absolutely sure that the smaller subsets (counties) of my area of interest (UK) actually cover the whole UK until I've tested this (there are a few counties missing, but they could be incorporated other ones.

As for requirements, this is just trying to replicate openstreetmap.org db functionality in SQL Server.

I don't mind a bit of a learning curve, though I have to admit that XML is not on my plate regularly, so I usually end up re-inventing even my own wheels in the rare occasion where I need do do something with XML.

Cheers





Similar Threads
Thread Thread Starter Forum Replies Last Post
Split large XML to Smaller XML using XSLT chilly XSLT 30 March 8th, 2013 09:43 AM
C# XML File Split rangeshram C# 2005 6 May 27th, 2010 08:30 AM
Split xml file with result document and javax.xml.transform.Transformer. nisargmca XSLT 3 January 12th, 2010 06:26 AM
Getting chunks from a graphic file mike_abc Pro VB.NET 2002/2003 0 April 12th, 2006 11:28 AM
Split large file to chunks eelisMX Pro VB.NET 2002/2003 4 February 8th, 2005 04:48 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.