BACKGROUND:
Since I spent about 1 1/2 weeks total finding information about using
text files as data tables and then getting the whole scenario working
properly, and since several people in this forum and the Wrox forum
responded to me with help and suggestions, I felt a responsiblity to
post a very thorough description of the problem and the solution.
The overall project was to take a flat, fixed width data file that gets
produced monthly by our IBM mainframe billing system and convert that
into an XML file for importing into the PeopleSoft General Ledger. I work
for the Department of Information Technology (DOIT) for the State of CT.
My agency bills all the other agencies for technology resource usage each
month. The XML conversion was not an issue, I finished that a few months
ago - and then they changed the spec - no problem I fixed that up in a few hours.
Then my boss asked me about Audit Control reports. Audit control reports??
Nobody had ever given me a spec for those and I didn't even know they existed!
Well, they did and when I saw them it was apparent that each report could be
generated extremely easily with one SQL statement. The problem was, I had a text
file. Additional problem, the text file didn't have some fields I needed, but it
had two nice foreign keys if only I could do a join.
You're asking, "Why didn't you put the damned records into a database and save
yourself a lot of time, you stupid git?!" The short answer is "politics." This
is a tiny program. It runs in less than one second. We use it once each month.
If I had introduced a database into the mix then I would be forced by agency
policy to implement a 3-tier architecture. That is serious overkill for my very
important but teensy little program! I ONLY needed the database for the
reporting so I could use SQL to do a join and use the SUM function with a couple
of GROUP BYs. It made no sense to enter the gates of political hell just to
generate three pages worth of audit control reports!
PROBLEM:
Do a join on two tables, tables are text files in the same directory,
there is a correct schema.ini file describing the text files, one file is
fixedlength and the other is csvdelimited. I have no control over the
fixedlength file, it comes from our IBM mainframe. (I don't want databases
involved at all, not even Access with linked text-tables. That will be a last,
last, LAST resort.)
RESEARCH:
I knew there were Microsoft drivers that allowed you to use text files as data
tables. I'd never done it. I tried searching MS site but didn't come up with any
hits. I tried googling the same search term. I then tried many other search
terms and *finally* found a few references at various sites including MS.
100% of those references, taken by themselves, sucked. Not a single one of them
gave a complete example, many of them leaving out quite crucial information,
like the schema.ini file, for instance. So I spent some time gathering the
little pieces of the picture into a workable whole. My first experiment, a
SELECT statement that counted the records, was successful! Next I counted a
subset of records with a WHERE clause and again the answer was correct. Hoorah
for me! Right?
Naturally, no, not right :( because the next thing I tried was the needed join
and that failed miserably with an error message that was quite general in nature
and therefore wasn't helpful. I did many more experiments and then decided it
was time to seek help in two forums I was familiar with, one on the MS site and
one at Wrox.
FINAL RESULT / SOLUTION:
Several days, many curses, several forum posts, and more experiments than I can
count later and here is the cause of all my problems: You CANNOT do a join
between a CSVformat file and a Fixedlength file. The drivers do not support it,
period. If you need to do joins you will have to have all of your text files be
of the same type, CSV or Fixed.
TUTORIAL:
Here is what I intend to be a thorough tutorial on using the text drivers in
order to be able to run SQL SELECT statements on text files. I believe that
other kinds of SQL statements like INSERT and UPDATE and DELETE will not work
although I have not tried them. But I think I read that the drivers provide
âread-onlyâ access to the text files.
In the world of databases, a DATABASE holds TABLES and TABLES hold FIELDS.
Since text files are not structured entities, theyâre just rows of plain text
with some defined order to them, the text drivers fudge the concept of a
database a little. They consider the directory to be the database and every
file in them to be potential tables.
SCHEMA.INI
In order for the text drivers to understand what is in the text files you need
to create a plain-text file called SCHEMA.INI in the *same directory* as the
text files you want to use as tables. Each text file you want to use as a table
will be described in the schema.ini file. The text driver will read this file
to allow it to understand where the FIELDS are and what data type they are.
This is the SCHEMA.INI file that I used:
Code:
[AgencyMapping.txt]
Format=FixedLength
ColNameHeader=False
MaxScanRows=0
Col1=agency Text width 2
col2=f1 text width 1
Col3=coreagency text width 6
col4=f2 text width 1
Col5=agencyname text width 32
[GLCodeToChartfield2Mapping.txt]
Format=FixedLength
ColNameHeader=False
MaxScanRows=0
Col1=glcode Text width 4
col2=f1 text width 1
Col3=chartfield2 text width 7
col4=f2 text width 1
Col5=cfield2descr text width 30
[currpacedata.txt]
Format=FixedLength
ColNameHeader=False
MaxScanRows=0
Col1=agency Text Width 2
Col2=series Text Width 1
Col3=f1 Text width 1
col4=invnum text width 7
col5=f2 text width 2
col6=invamt text width 9
col7=amtcredit text width 1
col8=f3 text width 2
col9=invqty text width 14
col10=qtycredit text width 1
col11=f4 text width 1
col12=glcode text width 4
col13=f5 text width 1
col14=xferdescr text width 34
[AgencyMapping.csv]
Format=CSVDelimited
ColNameHeader=False
MaxScanRows=0
Col1=agency Text
Col2=coreagency, text
Col3=agencyname, text
[GLCodeToChartfield2Mapping.csv]
Format=CSVDelimited
ColNameHeader=False
MaxScanRows=0
Col1=glcode, Text
Col2=chartfield2, text
Col3=cfield2descr, text
You can see that Iâve defined three TABLES, AgencyMapping.txt,
GLCodeToChartfield2Mapping.txt, and currpacedate.txt. I also included the
definitions for the delimited versions of AgencyMapping.csv and
GLCodeToChartfield2Mapping.csv. Youâll note that for my uses I set all the
field types to TEXT, but that is not the only choice. See
(
http://msdn.microsoft.com/library/de...a_ini_file.asp)
for notes on schema.ini and data types.
THE CONNECTION STRING:
I donât like using bound data controls. Thatâs just a preference from my VB3 and
VB5 days when very often the bound controls were less efficient than coding
with data objects directly. Feel free to adapt this any way you like; there must
be 10 ways to open a table or run a query using the various database objects.
This one is adapted from one of the examples I found.
These two connection strings work equally well. The first specifies the TEXT
driver and the second specifies the JET engine.
Code:
Dim sqlstring As String
Dim connectionstring As String
Dim StrTxtDbPath As String
Dim money As String
Dim ReportString As String
StrTxtDbPath = frmKomandToCore.WorkingDirectory & "\"
connectionstring = "Driver={Microsoft Text Driver (*.txt; *.csv)};
DBQ=" & StrTxtDbPath & ";Extended Properties=""text;HDR=NO;FMT=FixedLength"""
connectionstring = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & StrTxtDbPath &
";Extended Properties=""text;HDR=NO"""
Please note that the path you supply for the location of the text files *must*
end with the â\â character. The rest of the example uses the Jet engine only
because I got the code working with that driver and was too lazy to switch back,
especially for no good reason. If you decide to use the text driver your data
objects will be from the system.data.odbc class instead of the system.data.oledb
class.
Some comments about the SQL JOIN statement are in order because thereâs some
unusual syntax that can appear. Itâs no mystery that when youâre doing a join,
that if there are fields with the same name in both tables you need to pre-pend
those fieldsâ names in the SQL statement with the table name:
Code:
Select tbl1.fld1, tbl1.fld2, fld3
from tbl1 inner join tbl2
on tbl1.fld1 = tbl2.fld1 where â¦..
Or
Select tbl1.fld1, tbl1.fld2, fld3
from tbl1, tbl2 where tbl1.fld1 = tbl2.fld1 and â¦..
Now the general syntax for a SELECT statement with a text-file table is this:
Code:
Select fld1, fld2, ⦠fldN from table1.txt where â¦
So if you extend that to a JOIN operation on two text files you get:
Code:
Select table1.txt.fld1, table1.txt.fld2, fld3
from table1.txt
inner join table2.txt
on table1.txt.fld1 = table2.txt.fld1 where â¦
And as you might guess that causes some problems. The SQL parser doesnât
understand how to deal with âtable1.txt.fld1.â
There are two ways around this problem, a special syntax and table aliasing. So
these two SQL statements are equivalent when doing joins on text-file tables.
(The word âasâ when aliasing a table is [wrong/optional] and may be non-standard
SQL; itâs new to me.)
Code:
Select [table1#txt].fld1, [table1#txt].fld2, fld3
from table1.txt inner join table2.txt
on [table1#txt].fld1 = [table2#txt].fld1 where â¦
Or
Select t1.fld1, t1.fld2, fld3 from table1.txt (as) t1
inner join table2.txt (as) t2 on t1.fld1 = t2.fld1 where â¦
The guts of the routine that produces one of the audit control reports is below.
You should be able to use the information in this post to use delimited or fixedlength
text files as data tables with no trouble.
You may safely ignore the quirks of my data and concentrate on
whatâs being done and how itâs being accomplished.
Code:
Using connection As New OleDbConnection(connectionstring)
connection.Open()
'= = = = = = = = = = = = = = = = = = = = = = = = =
'Generate Billing By Client report
'= = = = = = = = = = = = = = = = = = = = = = = = =
'sqlstring = "SELECT [Currpacedata#txt].agency, [agencymapping#txt].agencyname,
[Currpacedata#txt].invnum, Sum(CDbl([amtcredit] & [invamt])) AS charge
FROM Currpacedata#txt INNER JOIN AgencyMapping#txt ON [Currpacedata#txt].agency = [agencymapping#txt].agency
GROUP BY [Currpacedata#txt].agency, [agencymapping#txt].agencyname, [Currpacedata#txt].invnum;"
sqlstring = "SELECT cpd.agency, agencyname, invnum, Sum(CDbl([amtcredit] & [invamt])) AS charge
FROM Currpacedata.txt cpd INNER JOIN AgencyMapping.txt ag
ON cpd.agency = ag.agency
GROUP BY cpd.agency, agencyname, invnum;"
Dim command As New OleDbCommand(sqlstring, connection)
Dim reader As OleDbDataReader = command.ExecuteReader()
If reader.HasRows Then
Do While reader.Read()
money = Format(reader("charge") / 100, "###,###,##0.00")
ReportString &= reader("agency").ToString & " " & reader("agencyname").ToString &
utils.StringOf(" ", 40).Substring(1, 40 - reader("agencyname").ToString.Length) &
reader("invnum").ToString & " " & "\$ ".Substring(1, 15 - money.Length) &
money & vbCrLf
Loop
End If
reader.Close()
'spit out ReportString to the console and/or a text file and youâre done
Thanks for all the help to:
BrianWren of the Wrox forum (
http://p2p.wrox.com/topic.asp?TOPIC_ID=58350)
and
Cybertaz69 and Clint from (
http://forums.microsoft.com/MSDN/Sho...44779&SiteID=1)