由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - sas question ( you bao zi )
相关主题
How to Macro it in SAS?请问这个程序的OUTPUT为啥跟我想的不一样。。。。
问个问题sas proc sort[合集] 请问一个sas数据集过滤的问题
如何找出没有duplicate的数字[done]也贡献一个电面经验
请教这种freq 该用什么code算(sas)?Thanks!请帮忙看3道SAS题。
sas问题问个比较具体的算法问题
用 sas 分组问题问个sas问题(包子)
请教一个proc transpose的问题SAS code 求助
想用first累计character的变量,可以吗?ADV 63题中的45题
相关话题的讨论汇总
话题: acct话题: column话题: proc话题: sql话题: sorted
进入Statistics版参与讨论
1 (共1页)
a******n
发帖数: 11246
1
suppose i have a table with one column called acct_id.
this column is sorted already.
how do I find if there's any duplicate value or not efficiently?
will give double yellow baozi to the most efficient method. thanks
c*********r
发帖数: 1802
2
试试
proc freq
by acct_id;
x***x
发帖数: 3401
3
proc sort data=a nodupkey;
by acct_id;
This will eliminate any duplicated value of acct_id.
a******n
发帖数: 11246
4
this will give all acct_id freq ba?
this table has millions of acct_id.
i just want to find if by any change one acct_id has more than 1 record...

【在 c*********r 的大作中提到】
: 试试
: proc freq
: by acct_id;

a******n
发帖数: 11246
5
then compare the row numbers?
but if the row numbers are not the same, i still don't know which acct_id
has more then one record ya...

【在 x***x 的大作中提到】
: proc sort data=a nodupkey;
: by acct_id;
: This will eliminate any duplicated value of acct_id.

x***x
发帖数: 3401
6
so you want to print out the duplicated records?
then try
proc sql;
select acct_id from a group by acct_id having count(*)>1
;
quit;

acct_id

【在 a******n 的大作中提到】
: then compare the row numbers?
: but if the row numbers are not the same, i still don't know which acct_id
: has more then one record ya...

a******n
发帖数: 11246
7
thanks. this is what i am using now.
still waiting the result to come out :(
it seems that this method does not take the advantage
that the column is already sorted.

【在 x***x 的大作中提到】
: so you want to print out the duplicated records?
: then try
: proc sql;
: select acct_id from a group by acct_id having count(*)>1
: ;
: quit;
:
: acct_id

x***x
发帖数: 3401
8
if you want to take advantage that the column is sorted, you may use
'RETAIN' statement in the data step. But i dont think it is any more
efficient.

【在 a******n 的大作中提到】
: thanks. this is what i am using now.
: still waiting the result to come out :(
: it seems that this method does not take the advantage
: that the column is already sorted.

L****n
发帖数: 3545
9
you can try compress=YES option,
and, sorted column doesn't speed up the process, but index does.

【在 a******n 的大作中提到】
: thanks. this is what i am using now.
: still waiting the result to come out :(
: it seems that this method does not take the advantage
: that the column is already sorted.

a******n
发帖数: 11246
10
i see...thanks.
will baozi you soon.

【在 x***x 的大作中提到】
: if you want to take advantage that the column is sorted, you may use
: 'RETAIN' statement in the data step. But i dont think it is any more
: efficient.

相关主题
用 sas 分组问题请问这个程序的OUTPUT为啥跟我想的不一样。。。。
请教一个proc transpose的问题[合集] 请问一个sas数据集过滤的问题
想用first累计character的变量,可以吗?也贡献一个电面经验
进入Statistics版参与讨论
a******n
发帖数: 11246
11
good. thanks.
baozi soon.

【在 L****n 的大作中提到】
: you can try compress=YES option,
: and, sorted column doesn't speed up the process, but index does.

x***x
发帖数: 3401
12
another way of speeding up is to create a temporary data set that only has
ACCT_ID before go into the sql step. This will help killing some overhead in
the sql step.
data b;
set a(keep=acct_id);
run;
proc sql;
....
L****n
发帖数: 3545
13
no, I want it now.
Or you consider me as your nurse so we can do business together ?
lol

【在 a******n 的大作中提到】
: good. thanks.
: baozi soon.

a******n
发帖数: 11246
14
我自己都不干了。上班太忙了。

【在 L****n 的大作中提到】
: no, I want it now.
: Or you consider me as your nurse so we can do business together ?
: lol

L****n
发帖数: 3545
15
already working le?
zan !!!!!!!!!!
xiao shen yi, tai niu le !

【在 a******n 的大作中提到】
: 我自己都不干了。上班太忙了。
a*****3
发帖数: 601
16
既然已经排过续, 就用
if Fist.ID <> last.ID
怎么样啊 , 看得过去的话,赏个两个铜板吧

【在 a******n 的大作中提到】
: 我自己都不干了。上班太忙了。
a******n
发帖数: 11246
17
hao. baozi sent.
i will compare speed of this method and the proc sql one :D

【在 a*****3 的大作中提到】
: 既然已经排过续, 就用
: if Fist.ID <> last.ID
: 怎么样啊 , 看得过去的话,赏个两个铜板吧

a*****3
发帖数: 601
18
那不用说,肯定是我的跑得飞快啦, 多谢天使包子

【在 a******n 的大作中提到】
: hao. baozi sent.
: i will compare speed of this method and the proc sql one :D

x**********0
发帖数: 2260
19
赞。想到这个,忘记code了。呵呵

【在 x***x 的大作中提到】
: proc sort data=a nodupkey;
: by acct_id;
: This will eliminate any duplicated value of acct_id.

g****8
发帖数: 2828
20
you don't need to compare yourself.
The log file will tell you how many dup there.

【在 a******n 的大作中提到】
: then compare the row numbers?
: but if the row numbers are not the same, i still don't know which acct_id
: has more then one record ya...

相关主题
请帮忙看3道SAS题。SAS code 求助
问个比较具体的算法问题ADV 63题中的45题
问个sas问题(包子)SAS ADV 63题中第45题
进入Statistics版参与讨论
g****8
发帖数: 2828
21
create a table for that sql, it will be faster than just use SELECT. SAS
needs some time to print out large dataset.

【在 a******n 的大作中提到】
: thanks. this is what i am using now.
: still waiting the result to come out :(
: it seems that this method does not take the advantage
: that the column is already sorted.

a******n
发帖数: 11246
22
有道理!。

id

【在 g****8 的大作中提到】
: you don't need to compare yourself.
: The log file will tell you how many dup there.

a******n
发帖数: 11246
23
thanks...学习了~

【在 g****8 的大作中提到】
: create a table for that sql, it will be faster than just use SELECT. SAS
: needs some time to print out large dataset.

g****8
发帖数: 2828
24
哈哈,谢谢双黄包呀。

【在 a******n 的大作中提到】
: thanks...学习了~
p********r
发帖数: 1465
25
proc sort... nodupkey out= dupout=;
by accountid;
run;
out= 输出的是不重复的,dupout输出的是所有重复过的。
用compress,之前的童鞋说过了。
million级别的数据也不用太久,很快能出结果。
m*****y
发帖数: 229
26
don't know whether this work in PROC SQL: count (distinct acct_id )
1 (共1页)
进入Statistics版参与讨论
相关主题
ADV 63题中的45题sas问题
SAS ADV 63题中第45题用 sas 分组问题
which route in SAS is faster?请教一个proc transpose的问题
请教 2 道SAS Adv 真题想用first累计character的变量,可以吗?
How to Macro it in SAS?请问这个程序的OUTPUT为啥跟我想的不一样。。。。
问个问题sas proc sort[合集] 请问一个sas数据集过滤的问题
如何找出没有duplicate的数字[done]也贡献一个电面经验
请教这种freq 该用什么code算(sas)?Thanks!请帮忙看3道SAS题。
相关话题的讨论汇总
话题: acct话题: column话题: proc话题: sql话题: sorted