第4页 - 关于dataset的讨论汇总 - 话题女王

全部话题 - 话题: dataset

E**********e
发帖数: 1736

来自主题: DataSciences版 - 困惑：用cross validationce 来评估performance的时候，还需要把原始的dataset区分为train 和test吗？

y应该说CROSS VALIDATION 主要是针对于sample size 比较小的情况，或者说在
50000row 一下吧。先是分成两套数据，80/20左右。然后80的数据进入CV，70/30左右
。最终model的 PERFORMANCE 用20那份数据。记住，20那份数据在MODEL development
绝不能 "碰". 不能碰是指不要去研究response variable 和 modeling
variables. 也可以跟以前的model 比较看新model 表现为。
还有重要的一点就是不要用80的那份数据来先选modeling 变量。这样做会带来biase,
因为你偷看了变量。预先选变量要在CV之前。或者选变量要独立于modeling。
以上是一年多的经验总结。然而我还是对预先不选变量不能完全认同。实际是很多情
况都要从几千个变量里选出最终的20左右。不预先看变量同outcome的关系不实际。但
是最终可以用那个20的数据来评价。
现在借楼主的帖子抛出，看看有没有达人探讨一下。以上是针对size在10000以下的数
据，10000以上可以用30... 阅读全帖

a**t
发帖数: 57

来自主题: Statistics版 - 求SAS高手解题

菜鸟好不容易有个PHONE INTERVIEW，之后给了题让坐做完发回去，高手们其帮忙给个
解答，第9题以后开始就行了，也许其他人也可用上。
The test below is designed to gauge your level of SAS experience. Feel free
to reference any sources of help (SAS documentation, etc.) you would
normally use in the course of developing code. We are looking for your
general knowledge of concepts, so please DO NOT spend too much time trying
to perfect minor details of syntax. We understand that writing code without
the chance to run it to discover errors can be difficult.
The ... 阅读全帖

o****o
发帖数: 8077

来自主题: Statistics版 - SAS Technical Interview Questions

ZT from :
http://www.globalstatements.com/sas/jobs/technicalinterview.htm
*****************************************
SAS Technical Interview Questions
You can go into a SAS interview with more confidence if you know that you
are prepared to respond to the kind of technical questions that an
interviewer might ask you. I do not provide the specific answers here, both
because these questions can be asked in a variety of ways and because it is
not my objective to help those who have little actual int... 阅读全帖

x*******5
发帖数: 1335

来自主题: Statistics版 - 问一个数据bias的问题

请教一个统计问题
我有2个dataset（采集的机器不同），从每个里面得到的result不一样，
现在的问题是
理论上这2个dataset应该产生同一个结果，应该是dataset bias导致的这种差异性，现
在是该怎么克服dataset的bias
我现在能想到的是把2个dataset合并，用generalized lienar regression
来消除dataset bias （比如说index=1 for dataset 1, =2 for dataset 2),
但是有个问题是dataset 1有200个samples，dataset 2有450个samples，
这样跑regression会不会因为dataset2的sample大而导致最终结果被dataset2的bias影
响？有什么办法能矫正一下吗？
谢谢

m*****n
发帖数: 2152

来自主题: JobHunting版 - 这道google面经体咋做

虽然算法垃圾一点，但是work的。
def test(data):
dataset = [[n, data.count(n)] for n in set(data)]
for i in xrange(2):
copydataset = copy.deepcopy(dataset)
print list(generator(copydataset))
print '\n'

def generator(dataset):
size = len(dataset)
while dataset:
index = int(random.random()*size)
dataset[index][1] -= 1
char = dataset[index][0]
if not dataset[index][1]:
dataset.pop(index)
size = size - 1
... 阅读全帖

h*e
发帖数: 10233

来自主题: Statistics版 - 问个proc merge问题。

我有两个dataset,主dataset有A,B,C,D 4个column,有一个reference dataset，里面有
B,C,D,E 4个column,
我需要把两个dataset 用B,C,D merge一下，把E值加到主dataset去，但是问题是
reference dataset有一些B,C,D,E是blank value,rule是如果一个值是missing,就比较
其它的值。比如说
主dataset B=1,C=2,D=3,而reference dataset B=1,C=null, D=3,E=5, 那就直接比较B
,D,把E加到主dataset中。proc merge有这种功能吗？或者要用其它办法？多谢

i*********7
发帖数: 348

来自主题: JobHunting版 - 求讨论关于Leetcode的WordLadder I的DFS解法

这一题大部分人应该都知道是用BFS解。
我只是想自己试验一下DFS的解。
DFS解如果要避免TLE，重点在于需要截枝和截枝后的答案更新。
这就是我自己新建一个class和对应的HashMap去记录进行截枝。
我的观念是这个样子的，在遇到重复出现过的节点单词的时候，首先考虑的是这个节点
往下遍历过后是否出现过解，如果没有的话只有两种情况：1，这个节点往下走是没有
解的。（在不变回去的情况下）2.变回去了。这种情况下都当做无效访问往上一层走。
如果有的话，就比较该节点之前有解的情况下它所居的递归层数是否比当前重复访问的
时候深，如果否，则不更新，如果是，则根据层数差来修正结果。这相当于把之前遍历
过的结果默认放在这一层下面了。
好吧，问题来了。。这个解只能过leetcode 80%的cases。在一个字典很大的case中比
Expected answer多了1. 有没有人能告诉我听我的代码或者逻辑问题出在哪儿了？=。=
class DataSet{
int level, res;
DataSet(){
level = 0;
... 阅读全帖

I*******g
发帖数: 7600

来自主题: JobHunting版 - 求指点一个G家题

using dynamic programming (cache):
public class Operations
{
static class OperationFormula
{
public int value;
public String formula;
public OperationFormula(int x, String y){
value = x;
formula = y;
}
}
static Map, HashSet> map = new
HashMap,HashSet>();

public static void main( String[] args )
{
int target = 48;
Integer[] item ... 阅读全帖

s******y
发帖数: 352

来自主题: Statistics版 - sas 紧急求助,baozi答谢!

1. if the change was made on a permanent dataset and dataset was not on the
server and no soure code that creates the dataset, them answer will be NO as
far as I know.
2, if dataset was on the server, call IT department to restore. the server
should be archived overnight.
3. if dataset is a temp dataset, rerun the code.
don't worry, it should not be your fault. your manager should set READ-only permission to that dataset.

dataset?

e*****r
发帖数: 621

来自主题: Statistics版 - 请教一个用SAS作DATA MERGE的问题

我想把 DATASET 1 和 DATASET 2 merge成 DATASET 3 如下
DATASET 1 -
id var_1
1 5
1 10
2 15
DATASET 2 -
id var_2
1 3
1 6
1 8
2 7
DATASET 3 -
id var_1 var_2
1 5 3
1 5 6
1 5 8
1 10 3
1 10 6
1 10 8
2 15 7
实际上就是让两个dataset进行多对多，而不是一对多的merge. 可惜我比较弱，现在只做过一对多的merge。请问什么方法最简便？多谢了！

T*******I
发帖数: 5138

来自主题: Statistics版 - Logistic regression，一个validation 的问题

According to your first statement, you have built a logistic model with a
model-building dataset. And then, you mentioned you may have a validation
dataset. What you try to know is that if the model is good or not when it is
validated by the validation dataset. So, let me ask you several questions:
(1) What is the model-building dataset?
(2) What is the validation dataset?
(3) What is the relationship between the two datasets?
(4) Do the two datasets come from a same population?
(5) Is the v

T*******I
发帖数: 5138

来自主题: Statistics版 - Logistic regression，一个validation 的问题

One more opinion,
The so-called validation just means two situations:
1) the validation dataset comes from a same population as the model-buliding
dataset does if the model is good enough for the validation dataset thus
for the same population from which both the model-building dataset and the
validation dataset come;
2) or not, but this situation does not mean that the model is not good for
the population from which the model-building dataset comes. It may tell us
that the validation dataset ma

r*****y
发帖数: 199

来自主题: Statistics版 - 请教一个SAS SQL的问题

今天遇到一个很简单的case，用datastep能够解决，不过感觉太麻烦了，我觉得SQL应
该能很方便的解决这个问题，而平时自己sql又不熟悉，只好上版上来找好心人问问了。
问题很简单，我有两个一摸一样的dataset，pre和post，我要对20多个variable做post
-pre，如果用datastep的话，每个dataset里面的variable都要改名加上后缀pre，post
，然后要再merge到一个大的dataset里面，然后还要计算diff，感觉操作起来非常复杂
。我想做得是用第一个post dataset 整个减掉 pre dataset 里面每个variable 对应
的值，请问要怎么操作呢？我来编个dataset吧！假设
pre
id measure1 measure2 measure3
1 20 30 40
2 50 60 70
post
id measure1 measure2 measure3
1 15 ... 阅读全帖

c**********2
发帖数: 62

来自主题: Statistics版 - 讨论3道SAS ADV题目

SAS ADV 63题网上的答案有很多版本，貌似都有一些错误。下面是我仍然拿不准的一些
题目。欢迎讨论，指正。等考完若有时间再把整理的63题答案发出来积点人品。
Item 1 of 63 Mark item for review
When attempting to minimize memory usage, the most efficient way to do group
processing whenusing the MEANS procedure is to use:
A. the BY statement.
B. GROUPBY with the NOTSORTED specification.
C.the CLASS statement.
D.multiple WHERE statements.
答案选C，个人也认为C是对的。网上有人提到过用Sort再means,这样A更快。但是题目
问的是memeory，不是时间，所以觉得还是应该选C。
Item 11 of 63 Mark item for review
The following SAS code is subm... 阅读全帖

s******s
发帖数: 2837

来自主题: Statistics版 - 请教两个SAS ADV问题

我觉得这两题是这样理解的。
第一题：D里面第二个dataset sum以后colume name变成了sum(Cost),out union corr
以后就出现了三列：id,cost,sum(Cost).A里面cost大小写确实不一样，不过sas识别
var是case insensitive,所以至少不是本题考查的点，不要在意这些细节。。。
第二题：multiple set option是依序读每个dataset的obs,读到最少obs的那个dataset
就不读了。所以只用set one;set two的话，读完第一个obs就不会继续读了，因为第二
个dataset只有一个obs。
用if _n_=1，可以读取并retain第二个dataset里面那个obs的值.PDV里面存有SumY并且
一直是36.当_n_=2的时候就没第二个dataset的事情了，只读第一个dataset的obs。所
以答案是D。

B******y
发帖数: 9065

来自主题: Statistics版 - 与CDISC相关的问题

ADaM
1. Which variable metadata field is NOT a part of a Version 2.1 ADaM-
compliant analysis dataset?
A. Dataset Name
B. Variable Label
C. Source/Derivation
D. Display Format
E. Codelist/Controlled Term
F. Variable Type
G. Informat
2. According to ADaM, the structure for most datasets should be vertical (i.
e., tall and thin).
A. TRUE
B. FALSE
3. In ADaM as in the SDTM, there is no day 0 for relative day variables
whose name ends in DY.
A. TRUE
B. FALSE
4. In ADaM-compliant datasets the display... 阅读全帖

a*****a
发帖数: 1385

来自主题: EB23版 - USCIS Makes Additional Data on Employment-Based Visa Programs Available in Support of ‘Hire American

USCIS Makes Additional Data on Employment-Based Visa Programs Available in
Support of ‘Hire American’ Executive Order | USCIS
U.S. Citizenship and Immigration Services (USCIS) has posted additional data
about the agency’s employment-based visa programs on its website. This new
information reflects USCIS’ commitment to transparency in carrying out
President Trump’s Buy American and Hire American Executive Order.
Datasets now available on the webpage include:
L-1 Datasets: The L-1 program (L-1A an... 阅读全帖

w***g
发帖数: 5958

来自主题: Programming版 - bash中怎样进行变量名递归替换？

我有三个datasets, 分别定义三个变量
A_PARAMS=......
B_PARAMS=......
C_PARAMS=......
用户从命令行输入dataset名，从到变量DATASET中
DATASET=$1
然后怎样得到对应dataset的参数，也就是要实现下面的功能
PARAMS=$($DATASET)_PARAMS
显然上面的语法bash不支持，但不知道有没有可能比较容易地实现这样的功能

A*******s
发帖数: 3942

来自主题: Statistics版 - [SAS] data set options (obs=) in output tables

suppose we use a procedure to get a table from output mode, for example:
output out=dataset ...;
or use ODS output table like:
ods output ods_table_name=dataset;
However in neither way can i use obs= dataset options to restrict the number of observations in output dataset, the following code doesnt work
ods output ods_table_name=dataset(obs=1);
But other dataset options seems OK, say where=, keep=, drop=, ...
couldnt understand why

y****t
发帖数: 446

来自主题: Statistics版 - 问一个data subset的问题

我有两个datasets:
first dataset:
Name Response
A *
A *
B *
B *
B *
C *
C *
second dataset:
Name Response
A *
A *
A *
B *
C *
C *
C *
D *
D *
E *
我想subset第二个dataset,让它只留下第一个dataset出现的name的数据，也就是把第
二个dataset里的D和E对应的observations都去掉
请教大家

s*********r
发帖数: 909

来自主题: Statistics版 - 急，ENTRY LEVEL SAS PROGRAMER明天On-site，需要注意什么啊

发信人: papertigra (长工胖头猪), 信区: Statistics
标题: CRO SAS Interview questions
发信站: BBS 未名空间站 (Fri Feb 26 21:12:00 2010, 美东)
http://www.sas9.blogspot.com/
SAS Programer Position
1. What kind of AE tables are there?
2. What difference between proc means and freq?
3. What does run statement mean?
4. What is ITT? What assessment in ITT definition is?
5. Which procedure can produce standard deviation of a variable?
6. What do put and input functions do?
7. How to validate your program?
8. How to identify... 阅读全帖

t******k
发帖数: 5617

来自主题: Statistics版 - 求教R中做CART tree 报错

用dataset中除X，A，B之外的所有变量构建CART tree预测X：
tree=rpart(X~. -A -B, data=dataset)
报错：
Error in rpart(X ~ . - A - B, data = dataset) :
NAs are not allowed in subscripted assignments
但是A和B变量里没有发现NA。
如果先运行
dataset$A=NULL
dataset$B=NULL
然后再运行
tree=rpart(X~. , data=dataset)
就不报错了。
这个是什么情况？

W***n
发帖数: 11530

来自主题: Military版 - IBM is teaching AI to behave more like the human brain

IBM is teaching AI to behave more like the human brain
Engadget Andrew Tarantola,Engadget Fri, Sep 1 11:00 AM PDT
Since the days of Da Vinci's "Ornithoper", mankind's greatest minds have
sought inspiration from the natural world for their technological creations.
It's no different in the modern world, where bleeding-edge advancements in
machine learning and artificial intelligence have begun taking their design
cues from the most advanced computational organ in the natural word: the
human brain... 阅读全帖

r******n
发帖数: 2730

来自主题: Indiana版 - 问个SAS入门级的问题。。多谢。。 (转载)

【以下文字转载自 Statistics 讨论区】
发信人: rainlion (rainlion), 信区: Statistics
标题: 问个SAS入门级的问题。。多谢。。
发信站: BBS 未名空间站 (Mon Jun 21 21:48:49 2010, 美东)
想create一个dataset 这个dataset 包括所有另一个dataset中变量的mean
proc means mean data=child2 ;
class teacherID;
var gender age ppvtr ppvts ppvtgsv toppas wojor wojos wojow letnn
letnp;
output out=child3 ;
run;
但是为什么我搞出来的child3 dataset里面不光有mean 还有min max n std
怎么create出来一个dataset 只有teacher ID 和 mean？
多谢多谢啊。。。我菜鸟刚开始稍微玩玩sas

z****e
发帖数: 54598

来自主题: Programming版 - Flink Sparks Next Wave of Distributed Data Processing

February 22, 2015 Nicole Hemsoth
art2
If you haven’t heard of Flink until now, get ready for the deluge. As one
of a stream of Apache incubator-to-top-level projects turned commercial
effort, the data processing engine’s promise is to deliver near-real time
handling of data analytics in a much faster, more condensed, and memory-
aware way than Hadoop or its in-memory predecessor, Spark, could do.
What really captured our attention, however, was the claim by Data Artisans,
the company behind Flin... 阅读全帖

w***g
发帖数: 5958

来自主题: Programming版 - 机器学习能发现拓扑不变量，对称群之类的关系么

99%属于正常水平。
机器学习的结果，如果有谁claim 100%，就是unprofessional，直接可以滚蛋了。
现在有很多>99%的结果，我不是很信。
因为nmist已经在那里好几年了，大家说是cross-validation，
其实都是对着validate的结果调的hyper parameter。
很多deep learning框架都拿nmist做toy example。
我刚刚用lasagne跑了下，两分钟之内达到99%。
这个因改没太多over-fitting，因为CNN架构就是随便一个简单的，
没有专门为nmist优化的迹象。
估计多循环几次还能上去。不过就像guvest说的
1和7, 0和9 这个很难全弄对。就是validation准确率上去，
我其实也不信。就像我们用一个仪器测东西有额定精度，
dataset做evaluation也有精度。我觉得nmist这个dataset
的精度应该在99%一下。用这个dataset测出>99%的精度
没有意义。
要看dataset的同学我已经导好了
http://www.aaalgo.com/picpac/datasets/nm... 阅读全帖

d******c
发帖数: 2407

来自主题: Programming版 - pandas 作者：Apache Arrow and the "10 Things I Hate About pandas"

pandas rule of thumb: have 5 to 10 times as much RAM as the size of your
dataset
There are additional, hidden memory killers in the project, like the way
that we use Python objects (like strings) for many internal details, so it's
not unusual to see a dataset that is 5GB on disk take up 20GB or more in
memory. It's an overall bad situation for large datasets.
The 10 (really 11) things are (paraphrasing my own words):
Internals too far from "the metal"
No support for memory-mapped datasets
Poor p... 阅读全帖

d******c
发帖数: 2407

来自主题: Programming版 - pandas 作者：Apache Arrow and the "10 Things I Hate About pandas"

A*********u
发帖数: 8976

来自主题: Statistics版 - in =option的一道题

dataset A
a
1
2
3
dataset B
a
4
5
6
code:
data AB;
set A(in=in1) B(in=in2);
run;
you will get:
dataset AB
a in1 in2
1 1 0
2 1 0
3 1 0
4 0 1
5 0 1
4 0 1
so there is no "in1 and in2".
BTW in1 and in2 are internal variables stored in PDV,
but won't be written to dataset AB.

people和money即使完全一样的dataset，用这个语句执行之后也是0 obs，可能是因为
set，但是不是完全明白为什么，请哪位高手解释一下？

r******n
发帖数: 2730

来自主题: Statistics版 - 问个SAS入门级的问题。。多谢。。

想create一个dataset 这个dataset 包括所有另一个dataset中变量的mean
proc means mean data=child2 ;
class teacherID;
var gender age ppvtr ppvts ppvtgsv toppas wojor wojos wojow letnn
letnp;
output out=child3 ;
run;
但是为什么我搞出来的child3 dataset里面不光有mean 还有min max n std
怎么create出来一个dataset 只有teacher ID 和 mean？
多谢多谢啊。。。我菜鸟刚开始稍微玩玩sas

A*****a
发帖数: 1091

来自主题: Statistics版 - 请教一下SAS编程的一个问题

比如我有若干个dataset
名字分别是dataset1,dataset2,....dataset10,...etc.就是说前面的名字一样，只是
后缀有数字。dataset里面数据结构也相似，只是有一个变量，在这几个dataset里面名
字类似，只是也有一个数字后缀，和dataset的数字相同
我现在需要对这几个dataset里的那个数据做一样的操作，这个能用循环来实现么？
多谢！

s***r
发帖数: 1121

来自主题: Statistics版 - proc sql - SAS 10 包子请教

How can I merge 3 datasets using PROC SQL?
Dataset 1:
Plant date1 Variable1
Unique YYYYMMDD
ID
001 20060914 .....
001 20080801 .....
001 20080822
001 20100101
002 20011119
002 20020101
002 20030808
003 20091212
005 20000816
005 20001225
005 20010205
005 20030203
005 20030501 ....
...
...
....
Dataset 2:
Plant date2 Variable2
Unique YYYYMMDD
ID
001 20050314 ... 阅读全帖

e*******e
发帖数: 75

来自主题: Statistics版 - SAS question (紧急求助，在线等)

Hi,
I have a dataset as follows:
patient_ID drug drug_start_date drug_end_date
1 A 1/1/2011 1/6/2011
1 A 2/1/2011 2/4/2011
1 B 1/1/2010 1/2/2010
1 B 5/3/2010 5/6/2010
2 C 1/2/2011 1/5/2011
2 C 3/3/2011 3/4/2011
2 A 3/4/2010 3/5/3010
3 A 2/1/2011 2/1/2011
3 A 1/15/2010 1/17/2010
4 A 3/2/... 阅读全帖

s*****9
发帖数: 285

来自主题: Statistics版 - DC Entry-level SAS PROGRAMMER PHARM TRAINER

最近还有个中国人在DC边上和FDA有个PROJECT，是在他公司的员工，在FDA干活，他说H
１B SPONSOR钱对半出。过几天和他喝下咖啡，看看具体怎么样。还是要懂CDISC的东西,附上JOB DESCRIPTION
Demonstrated experience in SAS programming in health care environment, preferably in the creation of clinical trial analysis datasets
Excellent communication and interpersonal skills
Hands-on programming with SAS v9 or above, S-Plus and R
Experience and Knowledge in a medical research environment, meta-analysis, and publication in biostatistical and biomedical journals
With hands-on exp... 阅读全帖

h***x
发帖数: 586

来自主题: Statistics版 - Lots of jobs (sas programmer/biostatistician) posted

CALIFORNIA:
0000087272
SAS Programmer (12m)
Bachelor's or Masters in Computer Science or other relevant (Engineering)
degrees with 5+ years of pharmaceutical experience preferred- The work
experience should include at least two years of technical leadership in a
statistical programming environment in a pharmaceutical or biotechnology
environment including the analysis and reporting of clinical trial data-
Knowledge and application of p-values, confidence intervals, linear
regression analysis, ad... 阅读全帖

d********1
发帖数: 188

来自主题: Statistics版 - SAS 问题：关于比较variable 包子答谢

有两个datasets,现在想看具体有那些variables 在dataset 1里，但是不在dataset 2
里；同样想找出具体的variables 在dataset 2里，但是不在dataset 1里

j*****7
发帖数: 4348

来自主题: Statistics版 - 怎样批量将很多的sas文件转换成stata文件？

libname test "X:\XXX\XXX\XXX";
proc sql noprint;
create table ddfdata as
select memname as dataset label='Data Set Name',
name as variable label='Variable Name',
label label='Variable Label',
type label='Variable Type',
count (distinct memname) into :totmem
from sashelp.vcolumn where libname='TEST'
order by dataset,name;
quit;
%macro trans;
data _null_;
set ddfdata;
by dataset notsorted;
retain b 1;
if first.dataset then do;
call symput(compress(trim(left('member'||t... 阅读全帖

发帖数: 1

来自主题: Military版 - 这两天穿山甲联系进来可能会加强阴谋论假说的支持证据

99%那个是华南农业大学弄出来的，
和这篇在武肺爆发前就发表的文章没有任何关联，
人们在对这篇文章进行回顾性分析时，才有了新的发现，和新冠受体结合蛋白97%氨基
酸序列相同，引起我的注意，顺腾摸瓜，找到了这边文章
http://virological.org/t/ncov-2019-spike-protein-receptor-binding-domain-shares-high-amino-acid-identity-with-a-coronavirus-recovered-from-a-pangolin-viral-metagenomic-dataset/362
nCoV-2019 Spike Protein Receptor Binding Domain Shares High Amino Acid
Identity With a Coronavirus Recovered from a Pangolin Viral Metagenomic
Dataset
Novel 2019 coronavirus
nCoV-2019 Evolutionary History
torptube... 阅读全帖

s****n
发帖数: 1237

来自主题: Classified版 - 【JOBS】我们公司的job opening (Data mining in San Diego)

我们组最近扩张，年内要招5个人。主要是做machine learning和data mining，具体的
要求请看下面。因为最近来了好几个很强的candidate，现在感觉基本要PhD，要有拿得
出手的modeling evidence，然后prefer local candidate (San Diego)。如果感兴趣
的可以给我投条。
==============================Job Description===============================
===
The chosen candidate will be part of a talented team designing, developing,
and deploying state-of-the-art, data-driven predictive models to solve
business problems using the latest technologies in neural networks, machine
learning, statistical mod... 阅读全帖

p******r
发帖数: 2999

来自主题: FleaMarket版 - 100包子求两个数据下载

这这能找到。我曾提交申请，但是还没批下来。急需，看看有没有人能下载。谢谢
http://www.fuqua.duke.edu/centers/ccrm/datasets/download.html
Churn Response Modeling Tournament, 2003
http://www.fuqua.duke.edu/centers/ccrm/datasets/churn/
Telecom Dataset
http://www.fuqua.duke.edu/centers/ccrm/datasets/telecom/index.h

s****n
发帖数: 1237

来自主题: JobHunting版 - 【JOBS】我们公司的job opening (Data mining in San Diego)

L****n
发帖数: 3545

来自主题: JobHunting版 - Senior Data Scientist in NC (转载)

【以下文字转载自 DataSciences 讨论区】
发信人: LVHuan (.......), 信区: DataSciences
标题: Senior Data Scientist in NC
发信站: BBS 未名空间站 (Fri Nov 14 16:55:32 2014, 美东)
PM me if interested - Post for a friend (I'm not the HM, sorry, but he can
get you interview directly if qualified).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Data Scientist job in Morrisville, NC
The ideal candidate is an unabashed data geek. You enjoy searching the
Internet for datasets that you can explore and mashup to tell interesting... 阅读全帖

p******r
发帖数: 2999

来自主题: Carolinas版 - 35包子求两个数据下载 (转载)

【以下文字转载自 FleaMarket 讨论区】
发信人: papabear (PAPA), 信区: FleaMarket
标题: 35包子求两个数据下载
发信站: BBS 未名空间站 (Sun Apr 10 16:15:58 2011, 美东)
这这能找到。我曾提交申请，但是还没批下来。急需，看看有没有人能下载。谢谢
http://www.fuqua.duke.edu/centers/ccrm/datasets/download.html
Churn Response Modeling Tournament, 2003
http://www.fuqua.duke.edu/centers/ccrm/datasets/churn/
Telecom Dataset
http://www.fuqua.duke.edu/centers/ccrm/datasets/telecom/index.h

L****n
发帖数: 3545

来自主题: Carolinas版 - Senior Data Scientist in NC (转载)

c*****s
发帖数: 180

来自主题: WaterWorld版 - PURE WATER DO NOT NETER PLEASE

Renaming Variables
You have seen how to use PROC DATASETS to rename an indexed data set.
Similarly, you might want to rename one or more variables within an indexed
data set. In order to preserve any indexes that are associated with the data
set, you can use the RENAME statement in the DATASETS procedure to rename
variables.
General form, PROC DATASETS with the RENAME statement:
PROC DATASETS LIBRARY=libref ;
MODIFY SAS-data-set-name;
RENAME old-var-name-1 = new

j****m
发帖数: 8

来自主题: WaterWorld版 - 发现国内一个学者在同一个会议上发论文10多篇，其中有些有问题，怎么报告？

看到有人指出在icmlc 2011会议中同一个作者发表论文达10多篇。本文深知做学术
的不易，觉得在短时间内能在同一个会议上发表10多篇论文，不是造假抄袭，就是粗制
滥造。于是随便找了一篇搜索了一下。发现不如所料，亩产万斤果然是有问题的。
请相关专业人员鉴定是否属于抄袭，以及是否还有其他未发现之处。
抄袭文（以下简称USOM文）
Le Li, Xiaohang Zhang, Zhiwen yu, Zijian Feng, Ruiping Wei, USOM: Mining
and Visualizing Uncertain Data Based on Self-Organizing Maps, Proceedings
of the 2011 International Conference on machine Learning and Cybernetics,
804-809
作者单位：
School of Computer Science and Engineering, South China University of
Technology, Guangzhou, China
... 阅读全帖

b*e
发帖数: 3845

来自主题: Database版 - 问一个crystal report的问题

What you need to do is return the sp into a dataset, prepare the dataset
to the format you want (in this case you need to define and populate a
dataset by yourself), then push the dataset into the crystal report.

b******g
发帖数: 81

来自主题: DotNet版 - XML转为Excel文件

模板文件是在ASP.NET的Project下的一个文件。文章里提到模板文件时，把模板文件的
内容给贴上来了。
导出到EXCEL的过程大概是这样的：
protected void btn_export_Click(object sender, EventArgs e)
{
Response.ContentType = "application/vnd.ms-excel";
Response.Charset = "";
//先在ASP.NET里生成一个DataSet的对象，
DataSet ds_payments = (DataSet)Session["myPayments"];
//DataSetName 是和xslt文件里的对应的
ds_payments.DataSetName = "RawPayments";
//把DataSet的对象，按照模板的格式，转换成Xml格式
Xm

t****t
发帖数: 6806

来自主题: Programming版 - bash中怎样进行变量名递归替换？

随便看看bash的手册, 可以得到
DATASET=$1
DATASET=${DATASET}_PARAMS
PARAMS=${!DATASET}

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天