|
|
 |
| Visual Studio 2005 For discussing Visual Studio 2005. Please post code questions about a specific language (C#, VB, ASP.NET, etc) in the correct language forum instead. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the Visual Studio 2005 section of the Wrox p2p Programmer to Programmer discussion community. This is a community of more than 40,000 computer programmers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining our free Wrox p2p community you can post your own programming questions and respond to other programmers’ questions. Registered users also don't have to see the ads that are displayed to guests. Registration is fast, simple and absolutely free so please, join today!
Join today and post to win prizes! Post more to increase your chances of being Wrox’s top poster of the month.
|
 |

August 10th, 2007, 08:11 AM
|
|
Authorized User
|
|
Join Date: Jun 2007
Location: , , .
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Moving 2,00,000 records effectively
I face this situation consistently and I need expert guidance to solve it.
Typically, My middle tier (implemented through Remoting) fetches around 2,00,000 records. Middle tier must transfer this in batches of 20,000 records (in 10 iterations) to my front end ( vb.net).
Once all the 2,00,000 records are fetched I would bind it to a grid.
The vice versa: namely sending 2,00,000 records from front end to the remote object, that is, middle tier also needs to be solved.
Please help me out.
|

August 10th, 2007, 10:12 AM
|
 |
Friend of Wrox
Points: 16,368, Level: 55 |
|
|
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,394
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
Seeing as you haven't really told us how you are transferring the records apart from remoting, it will be hard to provide improvements.
What types of objects are you transferring thru the remote calls?
Can you compress the data on either end to minimize the data being sent thru the remote calls?
I also find it amazing that you have users who actually could be working with 200,000 records at a time such that you need to transfer them all to your application. What are they doing with all these records that they need all of them at once? Can you elaborate on what your application is doing? Perhaps we can suggest some changes that will eliminate the need to push that much data around.
-Peter
|

August 13th, 2007, 12:46 AM
|
|
Authorized User
|
|
Join Date: Jun 2007
Location: , , .
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Enlightening questions, Peter!
Quote:
|
quote:What types of objects are you transferring thru the remote calls?
|
Only bytearray and nothing else. Actually I pass and return datasets after compressing them into bytearrays. In the remoting client, i would compress a dataset into a byte array (using DeflateStream) and pass it as a parameter. When I return data from the remote object, the data is once again a dataset that I compress into a byte array (using DeflateStream) and returned. Of course, to convert a bytearray into the dataset, I decompress it using DeflateStream.
Quote:
|
quote:Can you compress the data on either end to minimize the data being sent thru the remote calls?
|
I do compress the data as I said previously. I compress the dataset into byte array whenever I need to send it across the network. Where I receive the bytearray I decompress it to get back the dataset.
Quote:
|
quote:I also find it amazing that you have users who actually could be working with 200,000 records at a time such that you need to transfer them all to your application. What are they doing with all these records that they need all of them at once? Can you elaborate on what your application is doing? Perhaps we can suggest some changes that will eliminate the need to push that much data around.
|
The exact scenario is actually confidential. What my users would actually do is to view all the 200,000 records in a grid (it is mandatory, My users MUST see all the records, I cannot handle it in backend). They would then select few of these records ('few' means around 10,000 to 20,000 records). The records thus selected alone would be eliminated and the remaining would be moved to a different table.
That describes the problem correctly. I would be happy to provide further information if required.
|

August 13th, 2007, 06:52 AM
|
 |
Friend of Wrox
Points: 16,368, Level: 55 |
|
|
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,394
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
If your users must view all 200,000 records and you are already compressing the data you are left with 2 options as I see it:
1) Reduce/optimize the data that has to be compressed. I see 2 options for that:
A) Remove data they don't need to see. This is probably not doable. I imagine you have already taken out as much as you can to reduce it already.
B) A serialized dataset is pretty verbose (XML). You could create your own serialization of the data to remove the redundancy (presumably the compression is already doing a lot of this for you however). Depending on the format of the data could you convert it into a more terse format like CSV? You are controlling both ends so the self describing capability of XML isn't necessary. As long as you can safely re-format (which you can do with some custom character escaping) you can probably greatly reduce the compression's inbound character stream. Hopefully you could see a proportional decrease in the resulting compressed byte array.
C) Normalize the data: Does the dataset you are returning have denormalized data? Is it a single table with denormalized column values? If so, what about returning a more faithful/optimized representation of the source data using multiple DataTables and establishing the data relationships such that you can do the denormalized data selection back at the client end of the app. This could greatly reduce the amount of data in the dataset.
2) Get better hardware. You are already compressing the data. If option 1 doesn't help further, the only way I see to fix the problem is to get a bigger pipe.
Can I also just say: Holy inefficient users Batman! You have users that actually will look at 200,000 records??? I can't imagine that. Isn't there some other way for them to work? Isn't that what search algorithms are for? Could you create a search system so they can programmatically find the 10-20k records that they are looking for? What is the likely hood that a user would fail to find some of the 1000s of records they are looking for? Out of the 200,000 I imagine it would be pretty high. Now you have me really curious as to what your users are doing.
-Peter
|

August 22nd, 2007, 02:29 AM
|
|
Authorized User
|
|
Join Date: Jun 2007
Location: , , .
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Quote:
|
quote:A) Remove data they don't need to see. This is probably not doable. I imagine you have already taken out as much as you can to reduce it already.
|
This is the current situation. There is no way to reduce the amount of data.
Quote:
|
quote:B) A serialized dataset is pretty verbose (XML). You could create your own serialization of the data to remove the redundancy (presumably the compression is already doing a lot of this for you however). Depending on the format of the data could you convert it into a more terse format like CSV? You are controlling both ends so the self describing capability of XML isn't necessary. As long as you can safely re-format (which you can do with some custom character escaping) you can probably greatly reduce the compression's inbound character stream. Hopefully you could see a proportional decrease in the resulting compressed byte array.
|
Self describing is necessary. Because my remote object retrieves the values from my dataset and passes them to backend as parameters. So I am out of luck here also.
Quote:
quote:
C) Normalize the data: Does the dataset you are returning have denormalized data? Is it a single table with denormalized column values? If so, what about returning a more faithful/optimized representation of the source data using multiple DataTables and establishing the data relationships such that you can do the denormalized data selection back at the client end of the app. This could greatly reduce the amount of data in the dataset.
|
The dataset is actually normalized. No issues there.
Quote:
|
quote:Get better hardware. You are already compressing the data. If option 1 doesn't help further, the only way I see to fix the problem is to get a bigger pipe.
|
That is something to think about.
I have read Data Adapter could be used to transfer records in batches. Some people tell me, transfering records in batches would be possible using the Command object. There are those who tell me, transfering in batches requires the remote object to be Singleton (Currently my remote object operates in Single call mode)
I am in need of clear direction telling me how exactly (if possible in code) to handle transferring records in batches.
I understand your question. You are still unable to understand how can a user handle around 200,000 records. Let me elaborate. My users are not viewing all the records (that would be practically impossible). I have options to filter the data in the front end itself, not in the backend. May be you have a point there. I may have to filter the data in the back end itself.
Even if the number of records is less (say 500), how would I transfer it in batches (say 50 records each in 10 iterations)? Would I be able to use a DataAdapter? I can have a background process to fetch records (50 records in each iteration) and another process that would be processing the records so far fetched. I require speed in my application.
Kindly help me.
|

August 22nd, 2007, 07:26 PM
|
 |
Friend of Wrox
Points: 16,368, Level: 55 |
|
|
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,394
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
There is a significant difference between needing to *work* with 200,000 records versus working with a subset of them. If you can create filters in the back end then you could significantly decrease the amount of data you need to transfer over the wire.
I haven't worked with batching techniques myself so I won't be much help there. I would agree that one would need to use the singleton pattern for the remote class. I assume the technique would be something like this:
1. Call remote object to start query to database, remote class stores full dataset locally
2. Call remote object to get a small set of records, remote class returns the small set from the full set
3. Use the small set in your app
4. Repeat steps 2-3 until all records are received.
I'm no remoting or data batch expert but that's how I would approach it.
-Peter
|

August 22nd, 2007, 08:55 PM
|
 |
Friend of Wrox
Points: 16,368, Level: 55 |
|
|
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,394
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
Quote:
quote:Originally posted by FileFound
Self describing is necessary. Because my remote object retrieves the values from my dataset and passes them to backend as parameters. So I am out of luck here also.
|
I'm not sure I follow. If you are controlling both sides of the communications (i.e. the client and remote class) you can use a non self describing data format. You just need to reconstruct it back into a useful state on the other end of the wire. The point is: if you control both ends then you can get rid of the superfluous information in the transmission and streamline it by sending only the actual data. I would imagine you could reduce the size of the data significantly by doing this, however, as I mentioned before, it is likely that the stream compression is doing much of this already.
-Peter
|

September 12th, 2007, 04:19 AM
|
|
Authorized User
|
|
Join Date: Jun 2007
Location: , , .
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Quote:
quote:
1. Call remote object to start query to database, remote class stores full dataset locally
2. Call remote object to get a small set of records, remote class returns the small set from the full set
3. Use the small set in your app
4. Repeat steps 2-3 until all records are received.
|
Thanks. That was something I had in mind, and I needed additional details that would help me turn this logic into good code (like VB.NET, Remoting specific information, how to implement batching and code optimizations)
Quote:
quote:
I'm not sure I follow. If you are controlling both sides of the communications (i.e. the client and remote class) you can use a non self describing data format. You just need to reconstruct it back into a useful state on the other end of the wire. The point is: if you control both ends then you can get rid of the superfluous information in the transmission and streamline it by sending only the actual data. I would imagine you could reduce the size of the data significantly by doing this, however, as I mentioned before, it is likely that the stream compression is doing much of this already.
|
I agree. I need to include the structure of my data, in my remote object, so that I can reconstruct the object back in the remote server. I will think about it. Thanks.
|

September 18th, 2007, 11:39 PM
|
|
Authorized User
|
|
Join Date: Jun 2007
Location: , , .
Posts: 17
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Dear Peter,
Your first observation was correct. Moving the 200,000 records to and fro through all the tiers was a mistake. It was an unnecessarily complicated approach. I am now doing all the operations in my Stored Procedure itself, and the total number of records that actually move (and need to move) to the frontend is in thousands only. Users would see only a small percentage of this data.
Thanks a lot!
|

September 21st, 2007, 09:20 AM
|
 |
Friend of Wrox
Points: 16,368, Level: 55 |
|
|
Join Date: Aug 2003
Location: Clifton Park, New York, USA.
Posts: 5,394
Thanks: 0
Thanked 2 Times in 2 Posts
|
|
Glad you hear you got this worked out.
-Peter
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
 |