Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XML
XML General XML discussions.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XML section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
Old October 21st, 2013, 04:43 PM
Registered User
Join Date: Jun 2012
Posts: 6
Thanks: 2
Thanked 0 Times in 0 Posts
Default Importing ~1Gb XML into MSAccess

Dear all,
I've tried different approaches for importing a ~1Gb XML file into MSAccess unsuccessfully.

The approaches I've taken are:
1. Using the default ImportXml method. Problem: Since the XML file has multiple levels of nesting, data are imported into different tables without a common ID, so I can't relate the records between tables.
2. Importing first into Excel. Problem: Never finish loading.
3. Applying a transformation using the MSXML library. Problem: After 15 hours I stopped the process. Never finished loading.
4. Applying the same transformation using Saxon-B. Crashes while creating the tree.
5. Using SAX inside MSAccess. RAM consumption very low (~38Mb). Problem: The database bloated very fast up to 1.9 Gb in ~10min. After compact and repair the size returned to 24Mb. 5% of the records effectively imported up to that point.

It seems that no matter what approach I take, I'm going to have to split the xml file into several pieces and process it in chunks.

My next step will be using the reader and the writer of the MSXML library for splitting the file.

However, I'm kind of exhausted and would prefer not to think too much anymore for doing this.

Can somebody point me to a free tool, method, or language that can handle the partition effectively with the simplicity of couple clicks or couple lines of code?

Thanks in advance.

Old October 22nd, 2013, 10:55 AM
samjudson's Avatar
Friend of Wrox
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts

You seem to have tried everything I can think of in the 'simple' category of solutions.

Have you considered trying to parse it into an intermediate format - from your hierarchical XML format to a more flat XML format that one of the other tools you've tried might not struggle with so much? XSLT would probably be good for that I'd suggest, or SAX perhaps.

The MSXML library is pretty dated now, so I wouldn't expect it to handle such a big file very well I'm afraid. Saxon-B is also very dated I believe. The latest 'free' version is Saxon-HE. Saxon-EE (enterprise edition) supports file streaming, which would be good for such as large file as this.
/- Sam Judson : Wrox Technical Editor -/

Think before you post: What have you tried?
Old October 22nd, 2013, 12:05 PM
Registered User
Join Date: Jun 2012
Posts: 6
Thanks: 2
Thanked 0 Times in 0 Posts

Thank you Sam.
It seems that the problem was the database itself.
I imported all my objects into a new database, fix my code a little bit and now it seems to be working properly. No bloating noted so far, but I will be able to confirm that just when the process finishes.
I forgot to mention explicitly that I had tried XSLT also, but since the xml is ~1Gb and I have 4Gb of RAM that approach didn't work.
Thank you very much.

Similar Threads
Thread Thread Starter Forum Replies Last Post
importing XML file in to xslt ashyabhi_hp XSLT 2 September 22nd, 2007 11:15 PM
Importing XML into database siachen00 Other Programming Languages 0 April 21st, 2006 09:35 AM
Importing XML into database siachen00 XML 0 April 21st, 2006 09:30 AM
Importing Data to MsAccess ranakdinesh ADO.NET 2 February 24th, 2005 10:03 AM
Does this book deal with Importing XML? zardos BOOK: Access 2003 VBA Programmer's Reference 0 September 22nd, 2004 06:00 AM

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.