Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > XML > XML
Password Reminder
Register
Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
XML General XML discussions.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XML section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developersí questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old September 18th, 2006, 09:35 AM
Registered User
 
Join Date: Sep 2006
Location: Hyderabad, AP, India.
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default Parse/Load/Search xml file size near about 1 GB

Hi All
  I have given a problem set in which i need to develop dot net application which should Parse/Load/Search xml document of size ~ 1GB . And it is given that i should not use database for it .Please help me to solve this problem . how can i achieve this ?

Reply With Quote
  #2 (permalink)  
Old September 18th, 2006, 01:00 PM
mhkay's Avatar
Wrox Author
Points: 18,252, Level: 58
Points: 18,252, Level: 58 Points: 18,252, Level: 58 Points: 18,252, Level: 58
Activity: 100%
Activity: 100% Activity: 100% Activity: 100%
 
Join Date: Apr 2004
Location: Reading, Berks, United Kingdom.
Posts: 4,929
Thanks: 0
Thanked 281 Times in 276 Posts
Default

It depends very much on the nature of the "search". You either need to allocate a fairly large amount of memory, or you need to search using a low-level technology such as Sax, Stax, or STX. There are some XSLT and XQuery products that can handle a limited range of searches using serial processing: for example in XSLT, Saxon-SA has a serial processing mode for a very restricted class of XPath expressions. Some products such as DataDirect XQuery have an option to do "document projection" in which the parts of the document that aren't accessed by the query aren't loaded into memory.

When I see constraints like "I should not use a database", my question is always "Why?". What are the real requirements that make a database an unacceptable solution?

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
Reply With Quote
  #3 (permalink)  
Old September 19th, 2006, 12:21 AM
Registered User
 
Join Date: Sep 2006
Location: Hyderabad, AP, India.
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thank you Michael Sir for your reply. It is the Problem set which i have given to solve.They want it without using database or might be thinking that Why to again store in database if you have already have it in XML ? :):)

Reply With Quote
  #4 (permalink)  
Old September 19th, 2006, 02:43 AM
mhkay's Avatar
Wrox Author
Points: 18,252, Level: 58
Points: 18,252, Level: 58 Points: 18,252, Level: 58 Points: 18,252, Level: 58
Activity: 100%
Activity: 100% Activity: 100% Activity: 100%
 
Join Date: Apr 2004
Location: Reading, Berks, United Kingdom.
Posts: 4,929
Thanks: 0
Thanked 281 Times in 276 Posts
Default

I think it's a always a good idea to question requirements. If "they" don't want a database, there could be any number of reasons: cost of purchase, cost of administration, performance of database loading. If you discover the real reasons you may find that they also rule out some non-database solutions - and you may find that they don't rule out some solutions that do use a database. Users, managers, and customers have a right to define the requirements, but they don't have a right to make design decisions - that's the job of the engineer.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference
Reply With Quote
  #5 (permalink)  
Old September 19th, 2006, 04:47 AM
Registered User
 
Join Date: Sep 2006
Location: Hyderabad, AP, India.
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hi Michael Sir
   Yes Its true.I have taken part in Tech Fest and this problem set is from that Tech Fest Only so i can't ask them requirement of it.
The whole problem is like
Objective
    Define a development approach to Parse/Load/Search an XML document of size ~1GB
Description
    Project Gutenberg (www.gutenberg.org) maintains a list of books in a RDF format.There is an offline version of the same available at \\ht-dynapps\gutenberg
    You need to provide the following APIs that will allow you to use the contents:
    //Given a start and end index provides allows to incrementally get the books from the list(ala google way)
    public List<Book> GutenbergBookManager.getBooks(int start, in end)
    // Given an ID of the book searches the document return the book details
    public Book GutenbergBookManager.getBook(String id)
    // Given the search phrase returns the list of the books with matching subject (word occurring anywhere in the subject line)
    public List<Book> GutenbergBookManager.searchBook(String subject)
* Assume that you do not have the luxury to dump the data int0o a relational database.

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to parse the XML file and DTD through Xerces tufailfifa C++ Programming 0 June 25th, 2007 07:03 AM
how to parse the XML file and DTD tufailfifa XML 0 June 25th, 2007 07:02 AM
parse error xml load document asp.net academics2006 ASP.NET 1.0 and 1.1 Basics 0 March 13th, 2006 02:21 PM
parse xml file with Xerces-C_2_5_0 ,DOM taianmhzy XML 0 May 27th, 2004 04:14 AM
max size for xml file dg1234 XML 1 October 22nd, 2003 03:14 AM



All times are GMT -4. The time now is 02:02 AM.


Powered by vBulletin®
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.