由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - How to find all duplicate record in SQL?
相关主题
请教一下高手,包子答谢!问一个小问题!
list duplicators in one table急问:导入XML文件到MySQL数据库
求教高手:odbc更新数据出错Load .cvs to database
SQL question(在线等,谢谢!!)请教关于exists在sql server中的用法
SQL的一个愚蠢小问题db2 default password
急问:怎么去掉MS SQL数据库中的重复记录?一个SQL问题
怎么去除duplicates请教一下这个report的query应该怎么样写?
A question about recursive query我经常问的几道SQL SERVER DBA的面试题,图省事不问编程
相关话题的讨论汇总
话题: duplicate话题: sql话题: record话题: rowid话题: hash
进入Database版参与讨论
1 (共1页)
c**********e
发帖数: 2007
1
如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
一次就行了。
g***l
发帖数: 18555
2
这个问题我面试的时候问过无数次了
DISTINCT去除DUPLICATE RECORD
GROUP BY 去除DUPLICATE KEY
你要这个都不会的话,基本的SQL不懂
y****9
发帖数: 144
3
I guess you imply that duplicate means every columns in two rows are same. I
have a need to delete duplicate rows before, where duplicate means only
unique key columns are same. (see: http://oracle-study-notes.blogspot.com/2008/06/resolve-issue-of-duplicated-rows-in.html )
Here is a good post for Oracle, but should be applicable for SQL Server too
http://viralpatel.net/blogs/2010/06/deleting-duplicate-rows-in-

【在 c**********e 的大作中提到】
: 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
: 一次就行了。

p*********t
发帖数: 2690
4
lz的问题是怎么找到,不是怎么去掉.

【在 g***l 的大作中提到】
: 这个问题我面试的时候问过无数次了
: DISTINCT去除DUPLICATE RECORD
: GROUP BY 去除DUPLICATE KEY
: 你要这个都不会的话,基本的SQL不懂

c**********e
发帖数: 2007
5
My input data set:
AAA 4 5
AAA 4 5
AAA 4 4
AAA 5 5
BBB 4 5
Need to output:
AAA 4 5
This is the only duplicate record!

【在 g***l 的大作中提到】
: 这个问题我面试的时候问过无数次了
: DISTINCT去除DUPLICATE RECORD
: GROUP BY 去除DUPLICATE KEY
: 你要这个都不会的话,基本的SQL不懂

g***l
发帖数: 18555
6
"重复记录只要一次就行了",哪里说找到了?

【在 p*********t 的大作中提到】
: lz的问题是怎么找到,不是怎么去掉.
p*********t
发帖数: 2690
7
"How to find all duplicate record in SQL?"

【在 g***l 的大作中提到】
: "重复记录只要一次就行了",哪里说找到了?
l******8
发帖数: 9475
8
....
have xxxx > 1
w**********n
发帖数: 3624
9
group by having count(sth)>1
m*********y
发帖数: 389
10
Let's say you have first name, last name, and address in one table.
Here duplicate means same firstname&lastname&address combination, for
instance:
John|Doe|123 Main st|...|...
John|Doe|123 Main st|...|...
Mike|Smith|567 Springfield ave|...|...
to find duplicate, do this:
select * from dbo.person where firstname+lastname+address
in (select firstname+lastname+address from dbo.person
group by firstname+lastname+address having count(*)>1)
Does this make sense?
相关主题
急问:怎么去掉MS SQL数据库中的重复记录?问一个小问题!
怎么去除duplicates急问:导入XML文件到MySQL数据库
A question about recursive queryLoad .cvs to database
进入Database版参与讨论
r*****l
发帖数: 2859
11
What is the meaning or "+" in "firstname+lastname+address". I am not a SQL
expert so forgive my simple question :)

【在 m*********y 的大作中提到】
: Let's say you have first name, last name, and address in one table.
: Here duplicate means same firstname&lastname&address combination, for
: instance:
: John|Doe|123 Main st|...|...
: John|Doe|123 Main st|...|...
: Mike|Smith|567 Springfield ave|...|...
: to find duplicate, do this:
: select * from dbo.person where firstname+lastname+address
: in (select firstname+lastname+address from dbo.person
: group by firstname+lastname+address having count(*)>1)

m*********y
发帖数: 389
12
+ is concatenation ; we use that a lot to find out and furthuer more, to
remove duplicate record. But, first thing first, how do you define your
duplicate is up to you.
you can use the Plus sign to concatenate string together...
Usually a table contains many, if not thousands of same first name, so, only
the same firstname&lastname&address combination together truly counts as a
duplicate.. it depends on how you define *duplicate* in your own table..
a9
发帖数: 21638
13
这个执行起来估计很慢。
记得前阵子有人发了个hash的函数还是啥来着,有谁还记得?

【在 m*********y 的大作中提到】
: Let's say you have first name, last name, and address in one table.
: Here duplicate means same firstname&lastname&address combination, for
: instance:
: John|Doe|123 Main st|...|...
: John|Doe|123 Main st|...|...
: Mike|Smith|567 Springfield ave|...|...
: to find duplicate, do this:
: select * from dbo.person where firstname+lastname+address
: in (select firstname+lastname+address from dbo.person
: group by firstname+lastname+address having count(*)>1)

r*****l
发帖数: 2859
14
What if there are two records:
John|Doe|123 Main st|...|...
John|Doe1|23 Main st|...|...
I believe Oracle has something like "group by col1, col2, etc". Why don't
you use that instead of concat?

only
a

【在 m*********y 的大作中提到】
: + is concatenation ; we use that a lot to find out and furthuer more, to
: remove duplicate record. But, first thing first, how do you define your
: duplicate is up to you.
: you can use the Plus sign to concatenate string together...
: Usually a table contains many, if not thousands of same first name, so, only
: the same firstname&lastname&address combination together truly counts as a
: duplicate.. it depends on how you define *duplicate* in your own table..

r*****l
发帖数: 2859
15
En. If finding duplicate is a business need, then the application side can calculate the hash.
Otherwise, I bet Oracle has some hash functions.

【在 a9 的大作中提到】
: 这个执行起来估计很慢。
: 记得前阵子有人发了个hash的函数还是啥来着,有谁还记得?

m*********y
发帖数: 389
16
How to add a hash? I used the Checksum function, but I heard that HashType
is more powerful than Checksum. Anyway, let's say you want hash on the the
combination of first name, last name and address, you can do this:
Alter table person
Add hash as checksum(firstname,lastname,address)
After you created hash, searching for duplicate record is much easier, you
just need to run below query:
select hash,count(*) from person group by hash having count(*)>1
B*****g
发帖数: 34098
17
比如你有100cloumn,而且有些column size很大。如果你每一个新的record都和旧的
record比较所有的cloumn,速度会很慢,而且一般没有有效的index可用。
我认为hash的意思是,每一个record生成一个hash value存在一个column(with index
),然后每一新的record都先比较一下hash,这个应该很快。hash不一样的record肯定
不一样。hash一样的在比较一下每个column看一下是否真的一样(这部分的很小了)
具体问题请问zenny,不过她可不便宜。

【在 m*********y 的大作中提到】
: How to add a hash? I used the Checksum function, but I heard that HashType
: is more powerful than Checksum. Anyway, let's say you want hash on the the
: combination of first name, last name and address, you can do this:
: Alter table person
: Add hash as checksum(firstname,lastname,address)
: After you created hash, searching for duplicate record is much easier, you
: just need to run below query:
: select hash,count(*) from person group by hash having count(*)>1

s*****o
发帖数: 303
18
记得以前在stackoverflow上答过,在sql servers 上用 row number 和 partiton by
i*****9
发帖数: 293
19
modify the sql to suit your need.
delete from $table_name where rowid in
(
select "rowid" from
(select "rowid", rank_n from
(select rank() over (partition by $primary_key order by rowid) rank
_n, rowid as "rowid"
from $table_name
where $primary_key in
(select $primary_key from $table_name
group by $all_columns
having count(*) > 1
)
)
)
where rank_n > 1
)

【在 c**********e 的大作中提到】
: 如果不重复的记录就不要了。只要出现过两次或者两次以上的记录。每个重复记录只要
: 一次就行了。

L******4
发帖数: 1778
20
这个可以
但是会影响效率吧
如果是程式调用
是不是该尽量避免这样的语句呢

【在 m*********y 的大作中提到】
: Let's say you have first name, last name, and address in one table.
: Here duplicate means same firstname&lastname&address combination, for
: instance:
: John|Doe|123 Main st|...|...
: John|Doe|123 Main st|...|...
: Mike|Smith|567 Springfield ave|...|...
: to find duplicate, do this:
: select * from dbo.person where firstname+lastname+address
: in (select firstname+lastname+address from dbo.person
: group by firstname+lastname+address having count(*)>1)

1 (共1页)
进入Database版参与讨论
相关主题
我经常问的几道SQL SERVER DBA的面试题,图省事不问编程SQL的一个愚蠢小问题
这个 Oracle SQL 语句该这么写啊?急问:怎么去掉MS SQL数据库中的重复记录?
求问 Oracle Materialized View怎么去除duplicates
请教比较两个table,找出相同和不同的recordsA question about recursive query
请教一下高手,包子答谢!问一个小问题!
list duplicators in one table急问:导入XML文件到MySQL数据库
求教高手:odbc更新数据出错Load .cvs to database
SQL question(在线等,谢谢!!)请教关于exists在sql server中的用法
相关话题的讨论汇总
话题: duplicate话题: sql话题: record话题: rowid话题: hash