a*****i 发帖数: 1045 | 1 想问问各位有没有做statistical process control方向的呢?
我现在硕士论文是这个主题,需要找这方面的dataset,用几个方法比较。
我需要high-dimensional dataset, 有multiple dependent variables 和multiple
independent variables,可以用partial least square和PCA 来分析的。
不知道各位知道哪里可以找到此类数据。包子答谢。 |
|
a*****i 发帖数: 1045 | 2 想问问各位有没有做statistical process control方向的呢?
我现在硕士论文是这个主题,需要找这方面的dataset,用几个方法比较。
我需要high-dimensional dataset, 有multiple dependent variables 和multiple
independent variables,可以用partial least square和PCA 来分析的。
不知道各位知道哪里可以找到此类数据。包子答谢。 |
|
g*********3 发帖数: 177 | 3 各位大神,有没有这方面的经验:
实际项目中,数据库的postive/negative data points是极度unbalanced的。比如
crime database里面有1million individual,crime有100(positive data point),剩
下的全部是negative data point。
需要用这些数据建立一个machine learning model来classify将来一些人的crime。
怎样设计training dataset呢?有什么好的统计或者ML的方法吗?
谢谢。 |
|
s******g 发帖数: 193 | 4 比如一个dataset名字是test,有a b c d四个variables,现在我要删除c d两个
variable,然后删除所有a=1的行,语句是怎么样的?谢谢! |
|
s******g 发帖数: 193 | 5 比如说有个dataset 名字是test,包含a b c d四个variables,现在我想从test中删除
c d两个variables,并删除所有a=1的条目,应该如何实现?
谢谢! |
|
i******7 发帖数: 1 | 6 哪位大侠能推荐几个公开的, 大的 dataset of directed graphs (having more than
10000 nodes)? 随机产生的 或者 真实的 graph 都可以. 谢了先! |
|
|
j**********s 发帖数: 132 | 8 如果 bipartite graph 也可以的化,不妨试试这个 Netflix prize dataset
http://www.netflixprize.com
如果你的程序能比 netflix 的 benchmark 强 10% 的化,可以拿到 1M USD 的
奖金。目前的记录是比 benchmark 强 8.46% |
|
g***l 发帖数: 18555 | 9 不要用RESULT DATASET,放到TEMP TABLE里去 |
|
u*********e 发帖数: 9616 | 10 thanks for the suggestion. Not a fan of cursor myself either. I will see
what I can do to improve the sp.
the question I have is that, what could cause the report not returning all
the data even though the sp itself returned the data as expected? I thought
SSRS just treated all returned dataset as strings. |
|
u*********e 发帖数: 9616 | 11 gejkl,
Thank you very much for helping me answering my question. I was doing
research myself and figuring out the issue. You are right. I rewrite the
query to get rid of cursor and use PATH XML instead.
One interesting thing I observed is that after I updated my sp and run the
report, it gave out an error msg:
"An error occurred during local report processing.Exception has been thrown
by the target of an invocation. String was not recognized as a valid
DateTime.Couldn't store <> in Stmt_Date_Cre... 阅读全帖 |
|
|
|
|
L*******r 发帖数: 1011 | 15
So the point here is using the "cache" in dataset to do computing, am I right? |
|
m*****1 发帖数: 8 | 16 I have a stored procedure returning several records. Those returned records
from the stored procedure will be displayed in the textboxes on a form. Some
of fields in those records may be modified in the related textboxes.
Should I use a dataset or a class to store those records from the stored
procedure before they are displayed on the form ? What are the cons and pros
using each one? Thanks a lot. |
|
A**n 发帖数: 1703 | 17 I'll say datasets for small projects and classes for large projects
involving a team and ongoing maintenance.
records
Some
pros |
|
x**n 发帖数: 461 | 18 dataset: a collection of data that know how to persist themselves, refer to
active record.
Entity data model: model the domain.
LINQ: query the data.
Entity Framework: an OR/M library with extra features. |
|
w*s 发帖数: 7227 | 19 大牛,什么情况用dataset,什么情况用别的?
现在趋势是什么?
to |
|
s***o 发帖数: 2191 | 20 Dataset is a component from ADO.NET. You can think it as a mini in-memory
database (that maps to part of your back end database). It is very "
heavyweight" so you'd better consider other approaches first. It's still
useful in some situations, for example, when you do "bulk" operations.
Linq is a very important language feature that you will use everyday. But if
you mean "Linq to Sql", then ignore it. You have Entity Framework now.
For "entity data model", I assume you mean EDM in entity framewor... 阅读全帖 |
|
w********r 发帖数: 4193 | 21 Anonymity and the Netflix Dataset
Last year, Netflix published 10 million movie rankings by 500,000
customers, as part of a challenge for people to come up with better
recommendation systems than the one the company was using. The data was
anonymized by removing personal details and replacing names with random
numbers, to protect the privacy of the recommenders.
Arvind Narayanan and Vitaly Shmatikov, researchers at the University of
Texas at Austin, de-anonymized some of the Netflix data b |
|
m*******n 发帖数: 154 | 22 load schema stored in the XML file into your progam memory and treat the
schema as a standard dataset object. e.g.
m_xmldataset.Tables[0].Rows.Add(new datarow(...))
动
多
xm |
|
c***y 发帖数: 615 | 23 Around 200 DNA sequences, randomly picked up. Wondering if there is an easy
way (such as online server) to create a non-redundant dataset.Thank you very
much |
|
v*******g 发帖数: 334 | 24 dataset 〉million records
是基因数据。大家是如何处理大型数据,用什么软件。
要用SQL吗?如何与外部数据库联呢?
或者用R 如何管理和处理 large datset?
或哪里有这方面的介绍呢?
谢谢 |
|
c*********t 发帖数: 340 | 25 I used SAS for about 1 year and R for about 1 year. When I was using SAS I
was just dealing with small datasets. For the past year I've been working
with high-through put microarray data and R is extremely good at handling
data matrice. You can find a whole bunch of R tutorials online:-) |
|
I******i 发帖数: 203 | 26 我最近想学习ngs的数据分析,板上的大神能否指点一下哪里可以下载一些ngs的sample
dataset. 比如说rnaseq,或者WGS的。感觉只看理论不动手操作太艰难。
谢谢 |
|
|
|
|
发帖数: 1 | 30 On your computer with web browser
1. go to https://www.gtexportal.org/home/datasets
2. You will be asked to login, so login with your google account
3. Randomly choose a small file to download (such as "GTEx_Analysis_v7_
Annotations_SubjectPhenotypesDD.xlsx"), this is to trigger the
authentication process
4. open developer console, run
"gapi.auth2.getAuthInstance().currentUser.get().getAuthResponse().id_token"
5. Copy this token
On your Linux command line
6. run the following commands to obtain ... 阅读全帖 |
|
u****h 发帖数: 2193 | 31 准备开始学R,但是在R project的网站上找的那些书都没有dataset, example这些下
载来看。 大家有什么推荐呢? 我就是觉得能按照书中的样例学习会很方便。
谢谢啦! |
|
c**********e 发帖数: 2007 | 32 In PROC GLM procedure, model y= x1 x2/ss3; will output the
SS3 table. But how to make the SS3 a SAS dataset? Thanks. |
|
w****a 发帖数: 155 | 33 在一些工作要求中经常提到experience of working with large datasets, 这一般都
包括哪些skill呢, 如何学习呢? |
|
c*******o 发帖数: 8869 | 34 execute the following:
data a;
set a;
format _all_;
run;
then double click the dataset.
by
used |
|
s********l 发帖数: 245 | 35 Thanks very much! That dataset can open right now! |
|
l*****k 发帖数: 587 | 36 You can read sas dataset directly to R, however you need to have SAS
installed on the same machine. I did this before but forgot how to do it
now |
|
l*g 发帖数: 46 | 37 Thank you, ls! The problem is that I cannot get the models with the known
coefficients...
I need to see if the original known models can predict my dataset well. How
can I put those models into Stata? |
|
h*****y 发帖数: 367 | 38 有个project,要自己找数据,大家上课的dataset能传给我么?
HLM, GEE, MManova, RMANOVA都可以阿
谢了先,大包子 |
|
z**********i 发帖数: 12276 | 39 有几个经典的longitudinal dataset. |
|
z**********i 发帖数: 12276 | 40 我再想想,因为我没有仔细去考虑breast cancer dataset, 或许在merge前,我应该把
它的duplicates去掉. |
|
z**********i 发帖数: 12276 | 41 来更新一下这个问题,其实,是one to many merge,不是many to many. 后来,我把大
的dataset的变量减少到我需要的几个变量,10几分钟就完了。原来有上千个变量。
one |
|
P****D 发帖数: 11146 | 42 你有多少的column名字在这两个dataset里不match?我感觉这个只能一个一个手动处理
……
强行素不好滴。要爱好和平,反对暴力。圣哉! |
|
o******6 发帖数: 538 | 43 ☆─────────────────────────────────────☆
largetail (largetail) 于 (Mon Mar 16 22:57:50 2009) 提到:
Any one familiar with hash table or handle large dataset (hundreds of
millions of obs) in SAS (sort merge and subset). Any reference and suggest
is greatly appreciated.
☆─────────────────────────────────────☆
qqzj (小车车) 于 (Mon Mar 16 23:46:54 2009) 提到:
hash table is used when you have a huge data base and a small code table.
you can transform the smaller one into a hash table. it is very effi |
|
s*******2 发帖数: 791 | 44 我有如下dataset Test
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1 B 1
B 2 B 2 B 2
;
怎么样可以得到下面的数据 (outcome按照0,1,2的顺序)?谢谢
Obs input outcome
1 A 0
2 A 1
3 A 2
4 A 0
5 |
|
s*******2 发帖数: 791 | 45 谢谢你。 我运行了你的这个code输出的结果就是我想要的。可是有一个问题。 我给出
的Test刚好是18个observations,所以通过proc sort去掉了duplicate rows, 就剩A 0
A 1 A 2 B 0 B 1 B 2.然后再stack dataset三次得到我想要的结果。可是如果
我给非3的倍数的observations,怎么办?
例如 16个observations:
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1
B 2 B 2 B 2
;
run;
得到的结果应该是
Obs input outcome
1 A 0
|
|
s*******2 发帖数: 791 | 46 谢谢 gosummerod 和 sherryyyf
看来我的first.和last. retain掌握的还是不够好。
我原来想写下面的code (uncomplete),但是要将data step (sequent test_sorted)
运行3遍,再append一起,然后sort by input counter.但是现在看来达不到我预想的结
果。首先,counter的值不是从0-15而是0-6;其次,如果我运行>=3次,最后id=5,8,16
的row是没有办法creat到我的test_New中的。
虽然上面的各位已经帮我解决了这个问题,但是还是很纠结我自己的code,谁能帮我看
看哪里错了?帮忙改一下吧。谢谢了。
proc datasets library=work;
delete sequent test_sorted test_New;
run;
data Test;
input input $ outcome @@;
datalines;
A 0 A 0
B 0 B 0 B 0
A 1 A 1 A 1
A 2 A |
|
p*****0 发帖数: 3104 | 47 做成3个小dataset,
然后vertical combine
然后sort by input
how do you think? |
|
s*******2 发帖数: 791 | 48 谢谢。可是如果这是一个有很多observation的dataset,这个方法就不实际了。 |
|
d*******o 发帖数: 493 | 49 加一句 proc datasets kill;run; |
|
k*******r 发帖数: 16963 | 50 我在找工作
谁能告诉我“Large dataset analysis experience”是指什么? |
|