Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > C# and C > C# 1.0 > C#
Password Reminder
Register
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
C# Programming questions specific to the Microsoft C# language. See also the forum Beginning Visual C# to discuss that specific Wrox book and code.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the C# section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old July 14th, 2008, 05:47 AM
Registered User
 
Join Date: Jul 2008
Location: Thiruvananthapuram, Kerala, India.
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Default Search a binary file of million records

My company is on the verge of starting a new project in which the back end is supposed to be a File system instead of a database server. And, this file is going to contain millions of records.

Since a flat file has to be searched sequentially, it might take a lot of time to retrieve a specific record.
Some of my colleagues told me that, using indexing would make the search much faster. As to now I don't have much of an idea about how to implement indexing.

I would like to have some of your suggestions & solutions
regarding this matter..


any kind of help in this regard would be much appreciated..


thank you
:)

Reply With Quote
  #2 (permalink)  
Old July 14th, 2008, 05:54 AM
samjudson's Avatar
Friend of Wrox
Points: 8,687, Level: 40
Points: 8,687, Level: 40 Points: 8,687, Level: 40 Points: 8,687, Level: 40
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2007
Location: Newcastle, , United Kingdom.
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

http://en.wikipedia.org/wiki/Index_(database)

/- Sam Judson : Wrox Technical Editor -/
Reply With Quote
  #3 (permalink)  
Old July 14th, 2008, 10:23 AM
planoie's Avatar
Friend of Wrox
Points: 16,481, Level: 55
Points: 16,481, Level: 55 Points: 16,481, Level: 55 Points: 16,481, Level: 55
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,407
Thanks: 0
Thanked 16 Times in 16 Posts
Default

What is the justification for using a flat file something so large?
Why write your own data indexing when you could use a real database engine that already has all of that functionality?

-Peter
compiledthoughts.com
Reply With Quote
  #4 (permalink)  
Old July 15th, 2008, 12:43 AM
Registered User
 
Join Date: Jul 2008
Location: Thiruvananthapuram, Kerala, India.
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Quote:
quote:Originally posted by planoie
 What is the justification for using a flat file something so large?
Why write your own data indexing when you could use a real database engine that already has all of that functionality?

-Peter
compiledthoughts.com

We are not planning to use database server for this project. We are planning to do something like PeachTree which is #1 Accounting Software in the US. It uses file system as its backend.

Ours is also a web based one and needs lots of security..
I would like to hav some ideas regarding the storing of data in files as well as using index for those datas in the file...Also , how to make the search faster in such a scenario...

So, plz give suggestions regarding this matter, which will highly appreciated..

thanx.. :-)

Reply With Quote
  #5 (permalink)  
Old July 15th, 2008, 03:14 AM
Friend of Wrox
 
Join Date: Mar 2007
Location: Hampshire, United Kingdom.
Posts: 432
Thanks: 0
Thanked 1 Time in 1 Post
Default

Wow, thats crazy.

Just a couple of points to reiterate what the other guys have said here:

- Just because something is "#1" it doesnt mean it is built "right". MS Office is one of the most widely used office packages out there. Does it mean that its put together right (or as good as could be based on MODERN software practices)? Probably not. Peachtree is really old, just because it has a following, it doesnt mean you should duplicate its architecture.

- Based on the above, we know that a database could outperform this by far, so why settle for second best?

- Also - there will be major development time needed to "fill the void" that not using a DB will incur.

- As for security, if you are using a file based system, the only security you really have is control over the ACL. If using a DB, you have an additional layer of securty (both domain access as well as database access).

- Also, there are the points of disaster recovery, transactional processing, performance that will also be "worse off" by using a file system.

I am pretty sure I don't speak alone here, but I think you're mad. :D

Rob
http://cantgrokwontgrok.blogspot.com
Reply With Quote
  #6 (permalink)  
Old July 15th, 2008, 03:46 AM
samjudson's Avatar
Friend of Wrox
Points: 8,687, Level: 40
Points: 8,687, Level: 40 Points: 8,687, Level: 40 Points: 8,687, Level: 40
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2007
Location: Newcastle, , United Kingdom.
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

And just because PeachTree stores its data in a 'file' doesn't mean that file isn't actually a database. It could be an Access Database, a SQL Server Compact edition file or one of many other database file formats.

As for the basics of 'indexing' the link I provided above should have taken you to Wikipedia, where there are whole articles on the topics of indexing, both from a conceptual level and what that usually means in a database. Indexing is a huge topic of much academic debate, and if you don't even know what one is then I can guarantee you that writing your own is completely the wrong thing to be doing.

And thirdly (if you really needed any more arguments) the locking and contention issues you would have trying to run a web site off a single flat file would be horrendous.

/- Sam Judson : Wrox Technical Editor -/
Reply With Quote
  #7 (permalink)  
Old July 15th, 2008, 10:55 AM
Authorized User
 
Join Date: Nov 2006
Location: Valparaiso, IN, USA.
Posts: 93
Thanks: 0
Thanked 1 Time in 1 Post
Default

I'd just like to point out that there are times when a standard relational database such as MSSQL Server may not be the best solution. I manage a proprietary database system that stores process data. It's not unusual to find one of these systems with years of data online. The problem with a relational database is there is a limit to the size of a table beyond which it becomes very cumbersome to retrieve data.

The type of system he is looking for would be such a system. Basically, you would want to keep a running general ledger for years. The system I manage stores it's data in multiple archive files where each file is the same size and no two files store overlapping data by time. So each archive has a start and end date. When retrieving data you retrieve data for a specific point over a specific date range. The database engine knows which files have which date ranges. So it doesn't matter whether you are asking for data from yesterday or ten years ago, it takes the same time to find it. Also it doesn't matter whether you have a years worth of data or 20 years worth, it takes the same time to find the data.

I googled "Trasactional Database" and "Financial Database", you can try some different combinations, but I didn't see any generic data storage solutions that looked like they would do what this process database I manage would do.

I have to agree with the others that what you are talking about here is a huge undertaking. And you will probably want to use a commercially available RDBMS for much of your data, but your core transactional database you will probably have to develop yourself. The only others I could find were part of a financial software, which is what you want to write so ...

As to your actual question. I would look at the System.IO namespace particularly at the BinaryReader and BinaryWriter classes. BinaryReader has a method "BinaryReader.Read Method (Byte[], Int32, Int32)" which will allow you to read in a block of bytes at a specific index. BinaryWriter has "BinaryWriter.Write Method (Byte[], Int32, Int32)" which will allow you to write to a specific location in a file.

Structuring the file will be up to you.



What you don't know can hurt you!
Reply With Quote
  #8 (permalink)  
Old July 15th, 2008, 01:20 PM
Friend of Wrox
 
Join Date: Jun 2008
Location: Snohomish, WA, USA
Posts: 1,649
Thanks: 3
Thanked 141 Times in 140 Posts
Default

And there is always the possibility of using an Object Oriented DBMS ("OODMBS") such as Objectivity/DB. I know it is capable of *adding* terabytes of data per day, not to mention accessing many many terabytes of data. And there are othe OODBMS products out there.

If it's only capacity that is the concern, the OODBMS's were designed for much higher capacity than most RDBMS products.
Reply With Quote
  #9 (permalink)  
Old July 15th, 2008, 01:41 PM
Registered User
 
Join Date: Jul 2008
Location: Ann Arbor, MI, USA.
Posts: 1
Thanks: 1
Thanked 0 Times in 0 Posts
Default

Hi DineshGirij008. A few suggestions:

1. If it's a flat file, it's easy to compute the offset in the file where record N starts: (N - 1) * recordLength, assuming the first record is #1.

2. If you need to be able to handle many millions of records in seconds, see www.patternscope.com.

It's a data-mining tool, but it can handle huge amounts of data very quickly, since it processes the patterns that make up the data, rather than the raw data itself.

It can do queries as well as find patterns in your data.

Reply With Quote
  #10 (permalink)  
Old July 15th, 2008, 02:09 PM
Friend of Wrox
 
Join Date: Jun 2008
Location: Snohomish, WA, USA
Posts: 1,649
Thanks: 3
Thanked 141 Times in 140 Posts
Default

Alan8 wrote:
"1. If it's a flat file, it's easy to compute the offset in the file where record N starts: (N - 1) * recordLength, assuming the first record is #1."

Ummm...and what if the "records" are not all the same length????

You're making a huge assumption there.

If they aren't the same length, it's still not impossible; you just have to first make a pre-scan through the file, finding the start position of each record and creating an index. And, of course, if you are doing that anyway, then you could create an index of key values. Or multiple indexes of multiple keys. And what, pray tell, have you now done? AHA! You've created the beginnings of a relational database engine. You're maybe 10% to 20% of the way there, aren't you?
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Read Text file and convert to Binary file VB.net sjlsysprg1 Pro VB.NET 2002/2003 4 June 29th, 2007 06:53 AM
Binary Search on a text file scoobie Pro Java 1 August 25th, 2006 12:43 AM
SQL 2000 database / text file / binary file pallone Javascript How-To 3 January 28th, 2005 01:26 PM
Binary Search Tree mehdi62b General .NET 1 September 10th, 2004 03:01 AM
Processing of 10 Million Records kalyan_samaddar Oracle 1 July 9th, 2003 07:50 AM



All times are GMT -4. The time now is 10:09 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.