m**********u 发帖数: 2 | 1 I am using SAS to deal with a huge data file with over 10 millions of
observations. “personid” is a variable. The structure of “personid” is
like, for example, xxxx2, xxxx2, xxxx2, xxxx3, xxx3, xxxx5, xxxx5, xxxx5,
xxxx6, xxxx7……., for a unique “personid”, there are several observations
as shown in the example above. I am trying to get subset that the frequency
of the unique “personid” has certain frequency, say, frequency=3, in the
case of example above, that means I want to obtain a subset: | g*******y 发帖数: 380 | 2 proc sql;
CREATE TABLE a AS
SELECT *, count(*) as count_id from b
group by personid
where calculated count_id=3;
QUIT;
Not sure it works, but just represent a idea. | o****o 发帖数: 8077 | 3 might be this way:
proc sql;
create table new as
select a.*
from yourdata as a
left join (select personid, count(*) as count
from yourdata(keep=personid)
group by personid
) as b
on a.personid=b.personid
where b.count>=3
;
quit;
or a SAS way
proc freq data=yourdata noprint;
table personid/out=_freq_(where=(count>=3)
keep=personid count
【在 g*******y 的大作中提到】 : proc sql; : CREATE TABLE a AS : SELECT *, count(*) as count_id from b : group by personid : where calculated count_id=3; : QUIT; : Not sure it works, but just represent a idea.
| s*******2 发帖数: 791 | 4 I use first. and last. in data step. It works well.
I will try sql later.
proc sort data=raw;
by personid;
run;
data subset (drop=count);
set raw;
by personid;
if first.personid then count=1;
else count+1;
if count<=3 then output;
proc print data=subset;
title 'Subset size is 3';
run; | s*******2 发帖数: 791 | 5 oloolo可不可以再看看,SQL的结果还是原来的数据集合
count 是计算每个group里的number of non-missing values,我想是不能用count限制
新的数据集合里每个group里只有3个相同的value。
正在学习SQL中,也不是很明白。请指点.... 谢谢
SAS step我是完全看不懂(从declear statement开始)爆汗
【在 o****o 的大作中提到】 : might be this way: : proc sql; : create table new as : select a.* : from yourdata as a : left join (select personid, count(*) as count : from yourdata(keep=personid) : group by personid : ) as b : on a.personid=b.personid
| s*r 发帖数: 2757 | 6 use 'having'
【在 g*******y 的大作中提到】 : proc sql; : CREATE TABLE a AS : SELECT *, count(*) as count_id from b : group by personid : where calculated count_id=3; : QUIT; : Not sure it works, but just represent a idea.
| o****o 发帖数: 8077 | 7 sorry , use join, not left join
"sir" is right, you should use having statement in your code in place of 'where'
proc sql;
create table new as
select a.*
from yourdata as a
join (select personid, count(*) as count
from yourdata(keep=personid)
group by personid
) as b
on a.personid=b.personid
where b.count>=3
;
quit; | m**********u 发帖数: 2 | 8 Thanks a lot for all your help! |
|