s****y 发帖数: 76 | 1 Hi friends,
I have a question on how to eliminate duplicate rows only when there is a
null value for duplicate ids:
id score
1 55
1 .
2 .
3 74
3 80
4 67
5 40
How do I get rid of second row only (id=1 and score=.)?
Thanks much!!! |
t*****w 发帖数: 254 | 2 data a1;
input id score;
cards;
1 55
1 .
2 .
3 74
3 80
4 67
5 40
;
proc sql;
select *
from a1
except
select *
from a1
group by id
having count(id) gt 1 and score is missing;
run;
【在 s****y 的大作中提到】 : Hi friends, : I have a question on how to eliminate duplicate rows only when there is a : null value for duplicate ids: : id score : 1 55 : 1 . : 2 . : 3 74 : 3 80 : 4 67
|
s****y 发帖数: 76 | 3 Thanks much looks great. |
s*****d 发帖数: 267 | 4 能不用Proc SQL 的,就不要使用Proc SQL. 再说一边,所有的问题都可以使用Data
Step解决
以下方法绝对比使用Proc SQL 速度快,而且你的数据越多,速度上的差异越快。你如
果有50万条数据,Proc SQL 得解上一天才能给你结果。使用Data Step可能只要10几
分钟。不信你试试
请给包子
data result (drop=exist);
set youridlist;
/* define a hash for quick lookup */
if _n_=1 then do;
declare hash myhash;
myhash.definekey('id');
myhash.definedata('exist');
myhash.definedone();
end;
if myhash.find() ne 0 then do;
/* this is the first time that we see an id */
/* if it does not have missing score we add it to hash */
if score ne . then do;
exist=1;
myhash.add();
output;
end;
end;
else do;
/* we have already have this id there, then only */
/* output records if it does not have missing score */
if score ne . then output;
end;
run;
【在 s****y 的大作中提到】 : Hi friends, : I have a question on how to eliminate duplicate rows only when there is a : null value for duplicate ids: : id score : 1 55 : 1 . : 2 . : 3 74 : 3 80 : 4 67
|
t*****w 发帖数: 254 | 5 It is very good suggestion. Is its speed really 10-100 times faster than
proc sql or not?
by the way, I ran your code, but it is not working for me. could you debug
by yourself first?
【在 s*****d 的大作中提到】 : 能不用Proc SQL 的,就不要使用Proc SQL. 再说一边,所有的问题都可以使用Data : Step解决 : 以下方法绝对比使用Proc SQL 速度快,而且你的数据越多,速度上的差异越快。你如 : 果有50万条数据,Proc SQL 得解上一天才能给你结果。使用Data Step可能只要10几 : 分钟。不信你试试 : 请给包子 : data result (drop=exist); : set youridlist; : /* define a hash for quick lookup */ : if _n_=1 then do;
|
s*****d 发帖数: 267 | 6 I just try my best to make SAS user realize the drawbacks of Proc SQL, as
well as the real power of SAS datasteps. I am not a SAS programer at all.
Datastep is much much more efficient than Proc SQL. If you are facing big
data issue, SAS has datastep2 to use, or HP (high performance) data step.
The basic idea is shown in my comments. I just write the code using whatever
editor MITBBS is providing. If you can post the error message you've seen
or explain why you say "it is not working", I can think thru it.
Thanks
【在 t*****w 的大作中提到】 : It is very good suggestion. Is its speed really 10-100 times faster than : proc sql or not? : by the way, I ran your code, but it is not working for me. could you debug : by yourself first?
|