Wrox Programmer Forums
Go Back   Wrox Programmer Forums > Microsoft Office > Access and Access VBA > Access VBA
|
Access VBA Discuss using VBA for Access programming.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Access VBA section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old July 20th, 2007, 12:53 PM
Authorized User
 
Join Date: Jul 2007
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Default Eliminating duplicate records programmatically

Hi, folks!

     I'm pursuing this here, as the direction of the solution has changed, although the problem hasn't.

The problem:
              I have a table with several columns, none of which can be defined as "primary key" (reasons available on request). What I'd like to do is eliminate all but 1 copy of any record that is the same as any other record in 3 of these fields (Date, Time, and MachineName). Sort of like a "sort unique" on those 3 columns, if that makes any sense. This seems like it shouldn't be too tough, but being a novice in VBA (or really ANY VB dialect), I'm not sure how to go about it... If only this were on a Mainframe running VM, it'd be about 5 lines of Pipe, but... :(

Anyway, if any of you folks could point me in the right direction on this, I'd be very grateful....

PS - I've already got part of the process running in a VBScript, so if this could be done in the same VBScript, that'd be a BIG plus!

        Thanks loads for any help you folks can offer,

    Joe (frustrated Mainframe guy) Parker

JP
__________________
JP
 
Old July 23rd, 2007, 07:03 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

What is the code that is running?

I would do a dupes query first to see how big the problem is. Then use the dpes query to start the code. For example, if there are 60000 records, but only a few hundred are dupes, then it would be a waste to search the whole table. Start with the query that already found the dupes and only search for those records in your table.

Next issue: How do you choose which of the duped records to delete? Or do you care?

Next issue: Add a PK field to the table now.



mmcdonal
 
Old July 23rd, 2007, 09:29 AM
Authorized User
 
Join Date: Jul 2007
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks for the reply!

 Maybe it's best I lay the whole mess out here....

     There's a process which takes .csv files and stuffs them in the table programmatically. I'm practically promised that there WILL be entries in theses files that will be duplicates of existing ones... Problem is, that if I use a Primary key in the table, any record in the .csv file that would duplicate an existing record in the table will cause the whole append operation to fail. That's why I need to allow the dups in the first place....
   However, I DON'T want them to be in the table when I'm done, so I'm trying to eliminate the dups from the table after the "append" process completes.

  As far as the "which to keep" consideration is concerned, it doesn't matter, as any records that are dups in the fields I'm trying to "select unique" on will be truly dups, and can be safely pitched in the bit-bucket.

FWIW, here's the code that stuff the data in the table:

Dim path, fso, cnv, prj, numf, accApp
path = "c:\inetpub\ftproot"
prj = "c:\SmartSys1.converterx"
numf = 0
Set accApp = CreateObject("Access.Application")
Set cnv = CreateObject( "ConverterX.ConverterX.1" ) '<----- Create TextConverter object
Set fso = CreateObject("Scripting.FileSystemObject") '<----- Create File System object
cnv.OpenProject( prj ) '<----- Load the TextConverter Project
cnv.append = false '<----- Set the Append property to false
numf = numf + ScanFolders(fso.GetFolder( path )) '<----- Main Scanning procedure call
'MsgBox "Files converted: " & numf

'--------------------------
Function ScanFolders( folder ) '<----- Recursive folder scanning procedure
Dim sfs, sf, fls, nf, src
Set sfs = folder.SubFolders
For Each sf in sfs: nf = nf + ScanFolders( sf ): Next
Set fls = folder.Files
For Each fl in fls
    fn = fl.name
    If LCase(fso.GetExtensionName(fn)) = "csv" Then
        src = folder.Path & "\" & fn
        'MsgBox src
        cnv.LoadTextFile src, false '<----- Load Input Text File
        cnv.append = true '<----- Set the Append property to true for the following

conversions
        cnv.Convert()
     End If
Next
ScanFolders = nf
End Function

Thanks again for your attention/assistance!

      Joe

JP
 
Old July 23rd, 2007, 09:45 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

As far as the PK, you need to have one in there anyway, but not really used to weed dupes out as they are coming in.

What you could do is bring the data into an intermediate table first, with the same structure as the final table.

Then create a Find Unmatched Query using the wizard, and compare the necessary fields in the second table (results) with the first.

Then you will get a query named something like: "Table2 Without Matching Table1"

Then create an Append query based on the Unmatched query, and append to Table1.

So your code would run and put all the data into Table2 (in my case) then you would do this:

DoCmd.SetWarnings False
DoCmd.OpenQuery "Table2 Without Matching Table1"
DoCmd.setWarnings True

Not much code there, but the table and query structure is important. If you post the table structure for your final table, I can create the SQL code for the intermediate queries if you need that.

HTH

mmcdonal
 
Old July 23rd, 2007, 09:47 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

Sorry, the code would look like this:

DoCmd.SetWarnings False
DoCmd.OpenQuery "qryAPPENDToTable1" ' based on the unmatched query
DoCmd.setWarnings True


DO NOT DO THIS (sorry for the confusion on my part):
DoCmd.SetWarnings False
DoCmd.OpenQuery "Table2 Without Matching Table1"
DoCmd.setWarnings True




mmcdonal
 
Old July 23rd, 2007, 10:30 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

Actually, the find dupes only allows one field parameter. Let me work on this a few more minutes for a solution based on 3 fields.

mmcdonal
 
Old July 23rd, 2007, 10:37 AM
Authorized User
 
Join Date: Jul 2007
Posts: 32
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Thanks again for the help!

   So would you suggest mooshing the three fields into one, for the purpose of making a PK field in the secondary table? That'd be easy enough to do, and might make the "intermediate query" simpler....

Joe



JP
 
Old July 23rd, 2007, 10:54 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

Okay, if you really want some code, here it is. This works fine for me, but this may take a few seconds if you have tons of records. You can add a counter in there to count the number of records added, and then display the number of records added in a msgbox:

Dim rs1 As ADODB.Recordset
Dim rs2 As ADODB.Recordset
Dim rs3 As ADODB.Recordset
Dim sSQL1 As String
Dim sSQL2 As String
Dim sSQL3 As String
Dim sT1 As String
Dim sT2 As String
Dim sT3 As String
Dim i As Integer

i = 0

sSQL3 = "SELECT * FROM Table1"
Set rs3 = New ADODB.Recordset
rs3.Open sSQL3, CurrentProject.Connection, adOpenDynamic, adLockOptimistic

sSQL2 = "SELECT * FROM Table2"
Set rs2 = New ADODB.Recordset
rs2.Open sSQL2, CurrentProject.Connection, adOpenDynamic, adLockOptimistic

rs2.MoveFirst
    Do Until rs2.EOF
    sT1 = rs2("Text01")
    sT2 = rs2("Text02")
    sT3 = rs2("Text03")
    sSQL1 = "SELECT Count(*) As UnMatched FROM Table1 WHERE [Text01] = '" & sT1 & "' AND [Text02] = '" & sT2 & _
            "' AND [Text03] = '" & sT3 & "'"
            Set rs1 = New ADODB.Recordset
            rs1.Open sSQL1, CurrentProject.Connection, adOpenDynamic, adLockOptimistic
            If IsNull(rs1("UnMatched")) Or rs1("UnMatched") = 0 Then

                rs3.AddNew
                rs3("Text01") = sT1
                rs3("Text02") = sT2
                rs3("Text03") = sT3
                rs3.Update
                i = i + 1
            End If
            rs1.Close

    rs2.MoveNext
    Loop
rs2.Close
rs3.Close

MsgBox i & " records added."

Then remember to create a Delete query and run it to remove everything from Table2 (copy of Table1) with DoCmd.SetWarnings before and after.

Did that help?

mmcdonal
 
Old July 23rd, 2007, 10:58 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

You could do the mooshing solution. You would need to create that and populate it in both tables. I would use an autonumber for the PK for each table though.



mmcdonal
 
Old July 23rd, 2007, 11:03 AM
Friend of Wrox
 
Join Date: Mar 2004
Posts: 3,069
Thanks: 0
Thanked 10 Times in 10 Posts
Default

Also, in your implementation, instead of using String variables as I did, you would need

dtDate As Date
dtTime As Date
sMachName As String

And then pass them like this:

 sSQL1 = "SELECT Count(*) As UnMatched FROM Table1 WHERE [DateField] = #" & dtDate & "# AND [TimeField] = #" & dtTime & _
            "# AND [MachineName] = '" & sMachName & "'"

Date and Time are reserved names and not good for field names, if that is the case here.

mmcdonal





Similar Threads
Thread Thread Starter Forum Replies Last Post
duplicate records vanitha SQL Server 2000 4 June 2nd, 2007 04:35 PM
duplicate records vanitha Reporting Services 2 May 31st, 2007 01:54 AM
Duplicate Records mrookey Dreamweaver (all versions) 1 April 15th, 2005 11:23 AM
Eliminating Redundant Records spraveens MySQL 1 May 12th, 2004 04:21 AM
eliminating duplicate data erin SQL Language 2 April 22nd, 2004 11:49 AM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.