Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > .NET > .NET 2.0 and Visual Studio. 2005 > Visual Studio 2005
Password Reminder
Register
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read
Visual Studio 2005 For discussing Visual Studio 2005. Please post code questions about a specific language (C#, VB, ASP.NET, etc) in the correct language forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Visual Studio 2005 section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old August 10th, 2007, 07:11 AM
Authorized User
 
Join Date: Jun 2007
Location: , , .
Posts: 22
Thanks: 0
Thanked 0 Times in 0 Posts
Default Moving 2,00,000 records effectively

I face this situation consistently and I need expert guidance to solve it.

Typically, My middle tier (implemented through Remoting) fetches around 2,00,000 records. Middle tier must transfer this in batches of 20,000 records (in 10 iterations) to my front end (vb.net).

Once all the 2,00,000 records are fetched I would bind it to a grid.


The vice versa: namely sending 2,00,000 records from front end to the remote object, that is, middle tier also needs to be solved.

Please help me out.




Reply With Quote
  #2 (permalink)  
Old August 10th, 2007, 09:12 AM
planoie's Avatar
Friend of Wrox
Points: 16,481, Level: 55
Points: 16,481, Level: 55 Points: 16,481, Level: 55 Points: 16,481, Level: 55
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,407
Thanks: 0
Thanked 16 Times in 16 Posts
Default

Seeing as you haven't really told us how you are transferring the records apart from remoting, it will be hard to provide improvements.

What types of objects are you transferring thru the remote calls?

Can you compress the data on either end to minimize the data being sent thru the remote calls?

I also find it amazing that you have users who actually could be working with 200,000 records at a time such that you need to transfer them all to your application. What are they doing with all these records that they need all of them at once? Can you elaborate on what your application is doing? Perhaps we can suggest some changes that will eliminate the need to push that much data around.

-Peter
Reply With Quote
  #3 (permalink)  
Old August 12th, 2007, 11:46 PM
Authorized User
 
Join Date: Jun 2007
Location: , , .
Posts: 22
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Enlightening questions, Peter!

Quote:
quote:What types of objects are you transferring thru the remote calls?
Only bytearray and nothing else. Actually I pass and return datasets after compressing them into bytearrays. In the remoting client, i would compress a dataset into a byte array (using DeflateStream) and pass it as a parameter. When I return data from the remote object, the data is once again a dataset that I compress into a byte array (using DeflateStream) and returned. Of course, to convert a bytearray into the dataset, I decompress it using DeflateStream.

Quote:
quote:Can you compress the data on either end to minimize the data being sent thru the remote calls?
I do compress the data as I said previously. I compress the dataset into byte array whenever I need to send it across the network. Where I receive the bytearray I decompress it to get back the dataset.

Quote:
quote:I also find it amazing that you have users who actually could be working with 200,000 records at a time such that you need to transfer them all to your application. What are they doing with all these records that they need all of them at once? Can you elaborate on what your application is doing? Perhaps we can suggest some changes that will eliminate the need to push that much data around.
The exact scenario is actually confidential. What my users would actually do is to view all the 200,000 records in a grid (it is mandatory, My users MUST see all the records, I cannot handle it in backend). They would then select few of these records ('few' means around 10,000 to 20,000 records). The records thus selected alone would be eliminated and the remaining would be moved to a different table.

That describes the problem correctly. I would be happy to provide further information if required.
Reply With Quote
  #4 (permalink)  
Old August 13th, 2007, 05:52 AM
planoie's Avatar
Friend of Wrox
Points: 16,481, Level: 55
Points: 16,481, Level: 55 Points: 16,481, Level: 55 Points: 16,481, Level: 55
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,407
Thanks: 0
Thanked 16 Times in 16 Posts
Default

If your users must view all 200,000 records and you are already compressing the data you are left with 2 options as I see it:

1) Reduce/optimize the data that has to be compressed. I see 2 options for that:

    A) Remove data they don't need to see. This is probably not doable. I imagine you have already taken out as much as you can to reduce it already.

    B) A serialized dataset is pretty verbose (XML). You could create your own serialization of the data to remove the redundancy (presumably the compression is already doing a lot of this for you however). Depending on the format of the data could you convert it into a more terse format like CSV? You are controlling both ends so the self describing capability of XML isn't necessary. As long as you can safely re-format (which you can do with some custom character escaping) you can probably greatly reduce the compression's inbound character stream. Hopefully you could see a proportional decrease in the resulting compressed byte array.

    C) Normalize the data: Does the dataset you are returning have denormalized data? Is it a single table with denormalized column values? If so, what about returning a more faithful/optimized representation of the source data using multiple DataTables and establishing the data relationships such that you can do the denormalized data selection back at the client end of the app. This could greatly reduce the amount of data in the dataset.

2) Get better hardware. You are already compressing the data. If option 1 doesn't help further, the only way I see to fix the problem is to get a bigger pipe.


Can I also just say: Holy inefficient users Batman! You have users that actually will look at 200,000 records??? I can't imagine that. Isn't there some other way for them to work? Isn't that what search algorithms are for? Could you create a search system so they can programmatically find the 10-20k records that they are looking for? What is the likely hood that a user would fail to find some of the 1000s of records they are looking for? Out of the 200,000 I imagine it would be pretty high. Now you have me really curious as to what your users are doing.

-Peter
Reply With Quote
  #5 (permalink)  
Old August 22nd, 2007, 01:29 AM
Authorized User
 
Join Date: Jun 2007
Location: , , .
Posts: 22
Thanks: 0
Thanked 0 Times in 0 Posts
Default



Quote:
quote:A) Remove data they don't need to see. This is probably not doable. I imagine you have already taken out as much as you can to reduce it already.
This is the current situation. There is no way to reduce the amount of data.

Quote:
quote:B) A serialized dataset is pretty verbose (XML). You could create your own serialization of the data to remove the redundancy (presumably the compression is already doing a lot of this for you however). Depending on the format of the data could you convert it into a more terse format like CSV? You are controlling both ends so the self describing capability of XML isn't necessary. As long as you can safely re-format (which you can do with some custom character escaping) you can probably greatly reduce the compression's inbound character stream. Hopefully you could see a proportional decrease in the resulting compressed byte array.
Self describing is necessary. Because my remote object retrieves the values from my dataset and passes them to backend as parameters. So I am out of luck here also.

Quote:
quote:
C) Normalize the data: Does the dataset you are returning have denormalized data? Is it a single table with denormalized column values? If so, what about returning a more faithful/optimized representation of the source data using multiple DataTables and establishing the data relationships such that you can do the denormalized data selection back at the client end of the app. This could greatly reduce the amount of data in the dataset.
The dataset is actually normalized. No issues there.

Quote:
quote:Get better hardware. You are already compressing the data. If option 1 doesn't help further, the only way I see to fix the problem is to get a bigger pipe.
That is something to think about.

I have read Data Adapter could be used to transfer records in batches. Some people tell me, transfering records in batches would be possible using the Command object. There are those who tell me, transfering in batches requires the remote object to be Singleton (Currently my remote object operates in Single call mode)

I am in need of clear direction telling me how exactly (if possible in code) to handle transferring records in batches.

I understand your question. You are still unable to understand how can a user handle around 200,000 records. Let me elaborate. My users are not viewing all the records (that would be practically impossible). I have options to filter the data in the front end itself, not in the backend. May be you have a point there. I may have to filter the data in the back end itself.

Even if the number of records is less (say 500), how would I transfer it in batches (say 50 records each in 10 iterations)? Would I be able to use a DataAdapter? I can have a background process to fetch records (50 records in each iteration) and another process that would be processing the records so far fetched. I require speed in my application.

Kindly help me.



Reply With Quote
  #6 (permalink)  
Old August 22nd, 2007, 06:26 PM
planoie's Avatar
Friend of Wrox
Points: 16,481, Level: 55
Points: 16,481, Level: 55 Points: 16,481, Level: 55 Points: 16,481, Level: 55
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,407
Thanks: 0
Thanked 16 Times in 16 Posts
Default

There is a significant difference between needing to *work* with 200,000 records versus working with a subset of them. If you can create filters in the back end then you could significantly decrease the amount of data you need to transfer over the wire.

I haven't worked with batching techniques myself so I won't be much help there. I would agree that one would need to use the singleton pattern for the remote class. I assume the technique would be something like this:

1. Call remote object to start query to database, remote class stores full dataset locally
2. Call remote object to get a small set of records, remote class returns the small set from the full set
3. Use the small set in your app
4. Repeat steps 2-3 until all records are received.

I'm no remoting or data batch expert but that's how I would approach it.

-Peter
Reply With Quote
  #7 (permalink)  
Old August 22nd, 2007, 07:55 PM
planoie's Avatar
Friend of Wrox
Points: 16,481, Level: 55
Points: 16,481, Level: 55 Points: 16,481, Level: 55 Points: 16,481, Level: 55
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,407
Thanks: 0
Thanked 16 Times in 16 Posts
Default

Quote:
quote:Originally posted by FileFound
 Self describing is necessary. Because my remote object retrieves the values from my dataset and passes them to backend as parameters. So I am out of luck here also.

I'm not sure I follow. If you are controlling both sides of the communications (i.e. the client and remote class) you can use a non self describing data format. You just need to reconstruct it back into a useful state on the other end of the wire. The point is: if you control both ends then you can get rid of the superfluous information in the transmission and streamline it by sending only the actual data. I would imagine you could reduce the size of the data significantly by doing this, however, as I mentioned before, it is likely that the stream compression is doing much of this already.

-Peter
Reply With Quote
  #8 (permalink)  
Old September 12th, 2007, 03:19 AM
Authorized User
 
Join Date: Jun 2007
Location: , , .
Posts: 22
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Quote:
quote:
1. Call remote object to start query to database, remote class stores full dataset locally
2. Call remote object to get a small set of records, remote class returns the small set from the full set
3. Use the small set in your app
4. Repeat steps 2-3 until all records are received.
Thanks. That was something I had in mind, and I needed additional details that would help me turn this logic into good code (like VB.NET, Remoting specific information, how to implement batching and code optimizations)

Quote:
quote:
I'm not sure I follow. If you are controlling both sides of the communications (i.e. the client and remote class) you can use a non self describing data format. You just need to reconstruct it back into a useful state on the other end of the wire. The point is: if you control both ends then you can get rid of the superfluous information in the transmission and streamline it by sending only the actual data. I would imagine you could reduce the size of the data significantly by doing this, however, as I mentioned before, it is likely that the stream compression is doing much of this already.
I agree. I need to include the structure of my data, in my remote object, so that I can reconstruct the object back in the remote server. I will think about it. Thanks.



Reply With Quote
  #9 (permalink)  
Old September 18th, 2007, 10:39 PM
Authorized User
 
Join Date: Jun 2007
Location: , , .
Posts: 22
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Dear Peter,

Your first observation was correct. Moving the 200,000 records to and fro through all the tiers was a mistake. It was an unnecessarily complicated approach. I am now doing all the operations in my Stored Procedure itself, and the total number of records that actually move (and need to move) to the frontend is in thousands only. Users would see only a small percentage of this data.

Thanks a lot!

Reply With Quote
  #10 (permalink)  
Old September 21st, 2007, 08:20 AM
planoie's Avatar
Friend of Wrox
Points: 16,481, Level: 55
Points: 16,481, Level: 55 Points: 16,481, Level: 55 Points: 16,481, Level: 55
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,407
Thanks: 0
Thanked 16 Times in 16 Posts
Default

Glad you hear you got this worked out.

-Peter
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with Extended price over $1,000.00 JDBennett BOOK: Beginning PHP5, Apache, and MySQL Web Development ISBN: 978-0-7645-7966-0 1 July 8th, 2008 12:04 PM
Searching through 400,000 records clueless_may Access VBA 6 July 12th, 2004 12:21 PM
effectively show only one value from a table where codespike Classic ASP Basics 1 August 15th, 2003 07:37 AM
Format rounding figure : 1235.00 => 1200.00 Jane SQL Language 1 June 27th, 2003 02:08 AM



All times are GMT -4. The time now is 11:46 PM.


Powered by vBulletin®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.