由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - [SAS] Efficient way for subsetting data?
相关主题
SAS help : Proc dataset怎么用SAS做加减乘除
a question about SAS code二个包子答谢解决问题的回贴
SAS ODBC连接MySQL安装过程小结What's the wrong with this SAS code?
请教大家一个SAS使用mapped drive的问题which route in SAS is faster?
How to search all data files in a folder?How to work on this dataset?
请问SAS中如何将work中形成的文件拷贝出来 很多的SAS Code question? How to understand this output?
【包子】生成RAW SAS DATASET问题SAS - please help!
SAS format file--- when I load the file 急~~~谢谢~~~SAS SQL 问题
相关话题的讨论汇总
话题: subsetting话题: sas话题: efficient话题: any话题: dataset
进入Statistics版参与讨论
1 (共1页)
A*******s
发帖数: 3942
1
I wrote a tree macro which needs to split a dataset many times based on
different conditions, but it runs quite slowly. For a 700 rows dataset and
500 conditions, it takes 3~5 minutes to complete a loop... Is there any
general way to improve the efficiency? I can only come up two ways:
1. Create index on the variables in if conditional statement?
2. Multithread/parallel programming? I just read oloolo's blog about this
part, wish I could figure out how to do that.
Any other ideas?
D******n
发帖数: 2836
2
dont know how u dissect your dataset, so dont know how to improve it on that
side. what is tree macro? btw the kid is cute...
d*******o
发帖数: 493
3
700 rows 这么小的data set跑那么慢,你是不是loop太多了?
要是所有输入输出的data set 不超过1g的话,可以把你要用的library放到内存里,省
一半以上的时间。
libname mylib “c:\temp” memlib;
A*******s
发帖数: 3942
4
thanks! Like father like son... HAHAHA!!!
Tree is about CART decision tree. It splits the parent node into two child
nodes, based on the information gain.

that

【在 D******n 的大作中提到】
: dont know how u dissect your dataset, so dont know how to improve it on that
: side. what is tree macro? btw the kid is cute...

A*******s
发帖数: 3942
5
你说的是好办法,回去试试。谢谢!
我觉得有可能是I/O的问题。500个loop意味着要对parent node分割500次,生成1000个
child nodes。我在想可以加flag而不是做physical split.

【在 d*******o 的大作中提到】
: 700 rows 这么小的data set跑那么慢,你是不是loop太多了?
: 要是所有输入输出的data set 不超过1g的话,可以把你要用的library放到内存里,省
: 一半以上的时间。
: libname mylib “c:\temp” memlib;

s*r
发帖数: 2757
6
check the logic dependence among conditions
A*******s
发帖数: 3942
7
Any reference? Many thanks...

【在 s*r 的大作中提到】
: check the logic dependence among conditions
1 (共1页)
进入Statistics版参与讨论
相关主题
SAS SQL 问题How to search all data files in a folder?
从大data 产生多个小data 的方法请问SAS中如何将work中形成的文件拷贝出来 很多的
问一个data subset的问题【包子】生成RAW SAS DATASET问题
请问如何把一个数据里所有的变量名后面都加个v?thanks.SAS format file--- when I load the file 急~~~谢谢~~~
SAS help : Proc dataset怎么用SAS做加减乘除
a question about SAS code二个包子答谢解决问题的回贴
SAS ODBC连接MySQL安装过程小结What's the wrong with this SAS code?
请教大家一个SAS使用mapped drive的问题which route in SAS is faster?
相关话题的讨论汇总
话题: subsetting话题: sas话题: efficient话题: any话题: dataset