 |
| SQL Language SQL Language discussions not specific to a particular RDBMS program or vendor. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the SQL Language section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

March 29th, 2004, 12:06 PM
|
|
Registered User
|
|
Join Date: Mar 2004
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Join Query Distinct and Top 1
I have a table (excel_text) with huge number of records (15MM), running on SQL Server 2000 (sp3a). The key
fields are:
WordTextAsString
PathNameAsString
FileNameXLSasString
All are nvarchar(255)
There are many repetitions in the first field and to clean them out, I would like to write a query that could be put in a view that would give me all the distinct incidences of WordTextAsString(there are 200k of these), and for each one of those, provide ONE matching PathNameAsString, and its corresponding FileNameAsString
I think I need to use a derived table to separate the âdistinctâ instruction from the âtopâ instruction, but I cannot get it to work. After trying many renditions so I thought I would appeal to your expertise.
I am thinking itâs something along these lines, but being pretty inexperienced in SQL, I realize this may be a bunch of goobledegook
select distinct excel_text.WordTextAsString, PathNameasString as cdata
from excel_text
inner join
(select top 1 excel_text. PathNameasString, excel_text.WordTextAsString as pdata
from excel_text where pdata.WordTextAsString = cdata.WordTextAsString)
on cdata.WordTextAsString = pdata.WordTextAsString
Thanks so much in advance for your help.
Alex Hanson
|
|

March 29th, 2004, 05:16 PM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Posts: 839
Thanks: 0
Thanked 1 Time in 1 Post
|
|
It looks like you are close. I think you want the alias names on the tables and not the individual columns, especially the derived table you are joining to. Derived tables require an alias:
Code:
SELECT DISTINCT cdata.WordTextAsString, pdata.PathNameasString,
pdata.FileNameXLSAsString
FROM excel_text AS cdata
INNER JOIN
(SELECT TOP 1 WordTextAsString, PathNameasString, FileNameXLSAsString
FROM excel_text
WHERE cdata.WordTextAsString = excel_text.WordTextAsString) AS pdata
ON cdata.WordTextAsString = pdata.WordTextAsString
Jeff Mason
Custom Apps, Inc.
www.custom-apps.com
|
|

March 29th, 2004, 07:22 PM
|
|
Registered User
|
|
Join Date: Mar 2004
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Jeff Thanks very much for your help. It didn't quite work out. There is something I am still doing wrong. Your code(which I agree looks like the right way to do it):
SELECT DISTINCT cdata.WordTextAsString, pdata.PathNameasString,
pdata.FileNameAsString
FROM excel_text AS cdata
INNER JOIN
(SELECT TOP 1 WordTextAsString, PathNameasString, FileNameAsString
FROM excel_text
WHERE cdata.WordTextAsString = excel_text.WordTextAsString) AS pdata
ON cdata.WordTextAsString = pdata.WordTextAsString
produced the error:
Server: Msg 107, Level 16, State 2, Line 1
The column prefix 'cdata' does not match with a table name or alias name used in the query.
so I tried a variant:
SELECT DISTINCT cdata.WordTextAsString, pdata.PathNameasString,
pdata.FileNameAsString
FROM (SELECT DISTINCT WordTextAsString, PathNameasString,
FileNameAsString from excel_text) AS cdata
INNER JOIN
(SELECT TOP 1 WordTextAsString, PathNameasString, FileNameAsString
FROM excel_text
WHERE cdata.WordTextAsString = excel_text.WordTextAsString) AS pdata
ON cdata.WordTextAsString = pdata.WordTextAsString
But this produced the Same Error. Do you have any further suggestions? BTW, I have watched your postings a long time and your contributions to the world understanding SQL are remarkable.
Thanks, Alex
|
|

April 5th, 2004, 09:43 AM
|
|
Registered User
|
|
Join Date: Mar 2004
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
As an postscript to my last post, the above query could never have worked. Thanks to Jeff Mason for pointing out that top cannot be used in this way. Instead what I had to do was create a new table with the distinct values from the old, then query the old for the top value for the items matching the first using a cursor dataset from the new. This process ate server time, but eventually worked fine. Here is the code. Alex
USE metadata_excel_files
GO
set nocount on
-- Declare the variables to store the values returned by FETCH.
DECLARE @tname varchar(255)
DECLARE @tindex varchar(255)
DECLARE @pathname varchar(255)
--declare @total_exec_text nvarchar(1000)
--declare @total_exec_text2 nvarchar(1000)
DECLARE table_cursor CURSOR FOR
SELECT WordTextAsString, IndexID, pathnameasstring
FROM text_result_parsed_again
OPEN table_cursor
-- Perform the first fetch and store the values in variables.
-- Note: The variables are in the same order as the columns
-- in the SELECT statement.
FETCH NEXT FROM table_cursor
INTO @tname, @tindex, @pathname
-- Check @@FETCH_STATUS to see if there are any more rows to fetch.
WHILE @@FETCH_STATUS = 0
BEGIN
-- Find Primary Key Colunm and insert it into local variable @pk_name
if @pathname is null
begin
update text_result_parsed_again
Set PathNameAsString =
(select top 1
text_result_parsed.PathNameasString
from text_result_parsed
where text_result_parsed.WordTextAsString = @tname )
where text_result_parsed_again.IndexId = @tindex
end
-- This is executed as long as the previous fetch succeeds.
FETCH NEXT FROM table_cursor
INTO @tname, @tindex, @pathname
END
CLOSE table_cursor
DEALLOCATE table_cursor
GO
set nocount off
|
|

November 30th, 2004, 01:29 PM
|
|
Registered User
|
|
Join Date: Nov 2004
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Quote:
quote:Originally posted by ahanson
Jeff Thanks very much for your help. It didn't quite work out.
|
I know this thread is stale. But I was just searching the web for a solution to a similar problem, found this thread in the process, and eventually found a fairly nice way to solve my problem, so I thought I'd contribute the result.
I think the following would have worked:
SELECT DISTINCT cdata.WordTextAsString, pdata.PathName, cdata.FileNameXLSAsString
FROM excel_text AS cdata
INNER JOIN
(SELECT min(PathNameasString) PathName, WordTextAsString
FROM excel_text
GROUP BY WordTextAsString
) AS pdata
ON cdata.WordTextAsString = pdata.WordTextAsString
and cdata.PathNameAsString=pdata.PathName
It joins the excel_data table to a non-correlated GROUP BY query which contains a single PathName for each distinct WordText.
- Kevin
|
|
 |