Performance comparisons

Jim Dean · July 7th, 2006, 08:48 AM

I need a bit of advice that *might* be fairly simple to answer. It is specific to some of the apps I need to write.

Much of what I do involves the analysis of stock price/volume data. This data is not very complex - a fairly static top level ID table with records containing symbol, name, exchange, etc, and a large number (>6000) of individual tables with sequential chronological lists of the data for each symbol ... records in these tables contain date, open, high, low, close, volume.

That's about it for the core database. The analysis I do involves many loops thru the price/volume data files ... calculating technical indicator values, testing systems theory, and the like. There is no need for open-ended ad-hoc SQL queries ... it's more of a "batch run" environment. The calculations are extensive ... runs with optimization or complex logic can take hours to complete in some cases. For the most part, the data does not change ... the files grow larger each day, as each new bar of price/volume info is added. Occasionally some symbols are added or deleted. That's about it.

So, here's my question: from a PERFORMANCE standpoint (ie speed of execution), is it wiser for me to create and process and maintain this simple file architecture directly from VB, or will it be more efficient for me to utilize Access or SQL as the storage/recall engines, or is the Access/SQL overhead and processing so well optimized that I would be unlikely to see any difference?

I realize that the *best* way to determine this would be to create A/B tests ... but to do so definitively, I'd need to know all the tricks/nuances of each approach to assure that the results were not being skewed by my own lack of skill at the time.

So, could you please comment on which course, in your opinion, would be the wisest to pursue for these purposes?

Thanks very much!

luishrd · July 7th, 2006, 09:55 PM

Hi Jim,

I suggest you do your best to master programming at the Database level, mainly Store Procedures and User Defined Functions. Dump the Data into SQL Server and do all the heavy lifting there using T-SQL programming. That way you pass the DB Procedures/Functions the parameters it needs to run and pass back to the UI only the results.

Look into it.

Good luck.

__________________________
I'm just a noob

Thearon · July 8th, 2006, 05:46 AM

Luishrd is correct. Use the database for what it is designed for...do not duplicate the data into some type of file architecture. Can you do the anaylsis of data in stored procedures? Have you looked into Analysis Server? This is a separate product that ships with SQL Server 2000 and SQL Server 2005 with a separate install. You can find information about Analysis Server in the SQL Server Books Online.

If nothing else, retrieve the data as needed from the database and perform your analysis in your VB program.

Thearon

Jim Dean · July 8th, 2006, 08:58 AM

I'm trying to avoid the necessity of reading extra books and fishing in the dark doing experiments. Maybe that is not possible, and I will have to do it myself. Tanstaafl!

Maybe I can restate my question more simply.

My data consists simply of a large number of *unrelated* flat files, simple, fixed record structures and approx 9,000 records in each (30 years of price/volume data).

My processing involves working forward historically through the records in each file, by itself (sometimes starting in the middle), calculating moving averages and other indicators, and using those plus additional complex logic to determine entry and exit points for trading simulation, and/or plotting the info on a chart.

The calculations are 99+% of the code - accessing my data is pretty simple. If I were doing this in Fortran, which is what I cut my teeth on in the late 1960's and used extensively for a couple of decades after that, then I would set all this up as a bunch of direct-access files. The only "complexity" is if I want to start at a particular date within the file and move forward ... either a simple key file with a pointer array, or a quick binary search, would find me the starting point.

OK ... that's the kind of thing that I already know how to do ... can be done in VB in a variety of ways.

My question is ... if I port this limited data I/O scheme over to an Access or SQL engine and just do the same things with it (no ad hoc stuff required), will the Access or SQL interface to the simple tables operate FASTER or SLOWER than my "old fashioned" method? And, will the overhead in RAM be a lot greater?

I would be interested in the answer to those questions, separately for two different scenarios:

a) In the event that the processing of the data is very extensive, possibly I would read in just one record at a time from the file, work on it, and then proceed to the next record. This is the kind of thing you had to do in the "old days", when big mainframes had only 64k of RAM to work within.

b) Alternatively (more likely), I would be loading the full set of historical data for a given symbol into a working array before looping through it and doing the calc's.

I realize this question may be somewhat out of scope for this book ... but it does sort of fit into a chapter one-ish area that could be entitled: "Wnen and Why Do I Need to Use a Database Engine?"

If anyone has actually done comparisons like this before, with simple I/O and simplistic DB architectures such as my own ... could you please let me know which tree I should be barking up, so to speak :~)

Thanks!

Thearon · July 8th, 2006, 01:22 PM

Jim,

As I understand your first post, the data does change but not often. If you have no control over how the data is updated, then I would leave it as flat files. If you do have control over how the data is updated then I would port the data into a SQL Server database and not Access. There may be a lot of processing that could be better performed in the database than in code and Access certainly couldn't handle that type of work.

Next, if you are going to be reading the data from flat files, use OleDb and load the data into DataSets. Then you can filter out the data you do not need (e.g. apply a filter based on date range). That way you only end up with the data that you do need.

You are going to have to dig into the online help documentation for further direction on loading data from flat files using OleDb and using DataSets and filtering.

I hope this helps.

Thearon