由买买提看人间百态

topics

全部话题 - 话题: dataset
首页 上页 1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)
a*****i
发帖数: 1045
1
想问问各位有没有做statistical process control方向的呢?
我现在硕士论文是这个主题,需要找这方面的dataset,用几个方法比较。
我需要high-dimensional dataset, 有multiple dependent variables 和multiple
independent variables,可以用partial least square和PCA 来分析的。
不知道各位知道哪里可以找到此类数据。包子答谢。
a*****i
发帖数: 1045
2
想问问各位有没有做statistical process control方向的呢?
我现在硕士论文是这个主题,需要找这方面的dataset,用几个方法比较。
我需要high-dimensional dataset, 有multiple dependent variables 和multiple
independent variables,可以用partial least square和PCA 来分析的。
不知道各位知道哪里可以找到此类数据。包子答谢。
g*********3
发帖数: 177
3
来自主题: DataSciences版 - training dataset和unbalanced dataset的设计
各位大神,有没有这方面的经验:
实际项目中,数据库的postive/negative data points是极度unbalanced的。比如
crime database里面有1million individual,crime有100(positive data point),剩
下的全部是negative data point。
需要用这些数据建立一个machine learning model来classify将来一些人的crime。
怎样设计training dataset呢?有什么好的统计或者ML的方法吗?
谢谢。
s******g
发帖数: 193
4
比如一个dataset名字是test,有a b c d四个variables,现在我要删除c d两个
variable,然后删除所有a=1的行,语句是怎么样的?谢谢!
s******g
发帖数: 193
5
比如说有个dataset 名字是test,包含a b c d四个variables,现在我想从test中删除
c d两个variables,并删除所有a=1的条目,应该如何实现?
谢谢!
i******7
发帖数: 1
6
哪位大侠能推荐几个公开的, 大的 dataset of directed graphs (having more than
10000 nodes)? 随机产生的 或者 真实的 graph 都可以. 谢了先!
j**********s
发帖数: 132
7
The Enron Email dataset
http://www.cs.cmu.edu/~enron/

than
j**********s
发帖数: 132
8
如果 bipartite graph 也可以的化,不妨试试这个 Netflix prize dataset
http://www.netflixprize.com
如果你的程序能比 netflix 的 benchmark 强 10% 的化,可以拿到 1M USD 的
奖金。目前的记录是比 benchmark 强 8.46%
g***l
发帖数: 18555
9
不要用RESULT DATASET,放到TEMP TABLE里去
u*********e
发帖数: 9616
10
thanks for the suggestion. Not a fan of cursor myself either. I will see
what I can do to improve the sp.
the question I have is that, what could cause the report not returning all
the data even though the sp itself returned the data as expected? I thought
SSRS just treated all returned dataset as strings.
u*********e
发帖数: 9616
11
gejkl,
Thank you very much for helping me answering my question. I was doing
research myself and figuring out the issue. You are right. I rewrite the
query to get rid of cursor and use PATH XML instead.
One interesting thing I observed is that after I updated my sp and run the
report, it gave out an error msg:
"An error occurred during local report processing.Exception has been thrown
by the target of an invocation. String was not recognized as a valid
DateTime.Couldn't store <> in Stmt_Date_Cre... 阅读全帖
k****i
发帖数: 1072
12
来自主题: DotNet版 - DataReader vs. DataSet
Personally I like dataset more than datareader.
As you said datareader holds the connection exclusivly.Though the connection
pooling is done by the framework in most of the cases.I still like to share
the connection in my program if possible and it's clearer.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bd
As
avoids
connection
it
it
y****t
发帖数: 10233
13
来自主题: DotNet版 - DataReader vs. DataSet
I think the key is when you need sort of "real time" data, you have to use
datareader.
and it always compare to dataadaptor instead of dataset.
BTW, is the post really "your words" :)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bd
As
avoids
connection
it
it
k****i
发帖数: 1072
14
来自主题: DotNet版 - DataReader vs. DataSet

too
but
I think he was talking about using the in-memory dataset saved huge
server-side oepration at the back-end.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bd
the
off
the
L*******r
发帖数: 1011
15
来自主题: DotNet版 - DataReader vs. DataSet

So the point here is using the "cache" in dataset to do computing, am I right?
m*****1
发帖数: 8
16
来自主题: Programming版 - C#: DataSet vs. Class
I have a stored procedure returning several records. Those returned records
from the stored procedure will be displayed in the textboxes on a form. Some
of fields in those records may be modified in the related textboxes.
Should I use a dataset or a class to store those records from the stored
procedure before they are displayed on the form ? What are the cons and pros
using each one? Thanks a lot.
A**n
发帖数: 1703
17
来自主题: Programming版 - C#: DataSet vs. Class
I'll say datasets for small projects and classes for large projects
involving a team and ongoing maintenance.

records
Some
pros
x**n
发帖数: 461
18
dataset: a collection of data that know how to persist themselves, refer to
active record.
Entity data model: model the domain.
LINQ: query the data.
Entity Framework: an OR/M library with extra features.
w*s
发帖数: 7227
19
大牛,什么情况用dataset,什么情况用别的?
现在趋势是什么?

to
s***o
发帖数: 2191
20
Dataset is a component from ADO.NET. You can think it as a mini in-memory
database (that maps to part of your back end database). It is very "
heavyweight" so you'd better consider other approaches first. It's still
useful in some situations, for example, when you do "bulk" operations.
Linq is a very important language feature that you will use everyday. But if
you mean "Linq to Sql", then ignore it. You have Entity Framework now.
For "entity data model", I assume you mean EDM in entity framewor... 阅读全帖
w********r
发帖数: 4193
21
Anonymity and the Netflix Dataset
Last year, Netflix published 10 million movie rankings by 500,000
customers, as part of a challenge for people to come up with better
recommendation systems than the one the company was using. The data was
anonymized by removing personal details and replacing names with random
numbers, to protect the privacy of the recommenders.
Arvind Narayanan and Vitaly Shmatikov, researchers at the University of
Texas at Austin, de-anonymized some of the Netflix data b
m*******n
发帖数: 154
22
load schema stored in the XML file into your progam memory and treat the
schema as a standard dataset object. e.g.
m_xmldataset.Tables[0].Rows.Add(new datarow(...))



xm
c***y
发帖数: 615
23
Around 200 DNA sequences, randomly picked up. Wondering if there is an easy
way (such as online server) to create a non-redundant dataset.Thank you very
much
v*******g
发帖数: 334
24
dataset 〉million records
是基因数据。大家是如何处理大型数据,用什么软件。
要用SQL吗?如何与外部数据库联呢?
或者用R 如何管理和处理 large datset?
或哪里有这方面的介绍呢?
谢谢
c*********t
发帖数: 340
25
I used SAS for about 1 year and R for about 1 year. When I was using SAS I
was just dealing with small datasets. For the past year I've been working
with high-through put microarray data and R is extremely good at handling
data matrice. You can find a whole bunch of R tutorials online:-)
I******i
发帖数: 203
26
来自主题: Biology版 - 哪里可以下载ngs的sample dataset
我最近想学习ngs的数据分析,板上的大神能否指点一下哪里可以下载一些ngs的sample
dataset. 比如说rnaseq,或者WGS的。感觉只看理论不动手操作太艰难。
谢谢

发帖数: 1
27
来自主题: Biology版 - GTex portal dataset download
They do provide a REST API, such as
https://gtexportal.org/rest/v1/dataset/sample
Some of them need authentication though, including file downloads. What
specific files are you trying to download?

发帖数: 1
28
来自主题: Biology版 - GTex portal dataset download
On your computer with web browser
1. go to https://www.gtexportal.org/home/datasets
2. You will be asked to login, so login with your google account
3. open developer console, run
"gapi.auth2.getAuthInstance().currentUser.get().getAuthResponse().id_token"
4. Copy this token
On your Linux command line
5. run the following command to obtain the URL for each of the file, replace
XXX with the token
curl -X GET https://gtexportal.org/rest/v1/admin/file_download?objectPath=
gtex_analysis_pilot_v3/rna_... 阅读全帖

发帖数: 1
29
来自主题: Biology版 - GTex portal dataset download
They do provide a REST API, such as
https://gtexportal.org/rest/v1/dataset/sample
Some of them need authentication though, including file downloads. What
specific files are you trying to download?

发帖数: 1
30
来自主题: Biology版 - GTex portal dataset download
On your computer with web browser
1. go to https://www.gtexportal.org/home/datasets
2. You will be asked to login, so login with your google account
3. Randomly choose a small file to download (such as "GTEx_Analysis_v7_
Annotations_SubjectPhenotypesDD.xlsx"), this is to trigger the
authentication process
4. open developer console, run
"gapi.auth2.getAuthInstance().currentUser.get().getAuthResponse().id_token"
5. Copy this token
On your Linux command line
6. run the following commands to obtain ... 阅读全帖
u****h
发帖数: 2193
31
准备开始学R,但是在R project的网站上找的那些书都没有dataset, example这些下
载来看。 大家有什么推荐呢? 我就是觉得能按照书中的样例学习会很方便。
谢谢啦!
c**********e
发帖数: 2007
32
来自主题: Statistics版 - How to make the SS3 a dataset?
In PROC GLM procedure, model y= x1 x2/ss3; will output the
SS3 table. But how to make the SS3 a SAS dataset? Thanks.
w****a
发帖数: 155
33
在一些工作要求中经常提到experience of working with large datasets, 这一般都
包括哪些skill呢, 如何学习呢?
c*******o
发帖数: 8869
34
来自主题: Statistics版 - How to open SAS dataset
execute the following:
data a;
set a;
format _all_;
run;
then double click the dataset.

by
used
s********l
发帖数: 245
35
来自主题: Statistics版 - How to open SAS dataset
Thanks very much! That dataset can open right now!
l*****k
发帖数: 587
36
来自主题: Statistics版 - 请问如何把SAS dataset转到R里?
You can read sas dataset directly to R, however you need to have SAS
installed on the same machine. I did this before but forgot how to do it
now
l*g
发帖数: 46
37
Thank you, ls! The problem is that I cannot get the models with the known
coefficients...
I need to see if the original known models can predict my dataset well. How
can I put those models into Stata?
h*****y
发帖数: 367
38
有个project,要自己找数据,大家上课的dataset能传给我么?
HLM, GEE, MManova, RMANOVA都可以阿
谢了先,大包子
z**********i
发帖数: 12276
39
有几个经典的longitudinal dataset.
z**********i
发帖数: 12276
40
来自主题: Statistics版 - Dataset merge的一个问题
我再想想,因为我没有仔细去考虑breast cancer dataset, 或许在merge前,我应该把
它的duplicates去掉.
z**********i
发帖数: 12276
41
来自主题: Statistics版 - Dataset merge的一个问题
来更新一下这个问题,其实,是one to many merge,不是many to many. 后来,我把大
的dataset的变量减少到我需要的几个变量,10几分钟就完了。原来有上千个变量。

one
P****D
发帖数: 11146
42
来自主题: Statistics版 - 如何强行合并两个datasets?
你有多少的column名字在这两个dataset里不match?我感觉这个只能一个一个手动处理
……
强行素不好滴。要爱好和平,反对暴力。圣哉!
o******6
发帖数: 538
43
来自主题: Statistics版 - [合集] hash table and large dataset
☆─────────────────────────────────────☆
largetail (largetail) 于 (Mon Mar 16 22:57:50 2009) 提到:
Any one familiar with hash table or handle large dataset (hundreds of
millions of obs) in SAS (sort merge and subset). Any reference and suggest
is greatly appreciated.
☆─────────────────────────────────────☆
qqzj (小车车) 于 (Mon Mar 16 23:46:54 2009) 提到:
hash table is used when you have a huge data base and a small code table.
you can transform the smaller one into a hash table. it is very effi
s*******2
发帖数: 791
44
来自主题: Statistics版 - [提问]怎样sort这个dataset?
我有如下dataset Test
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1 B 1
B 2 B 2 B 2
;
怎么样可以得到下面的数据 (outcome按照0,1,2的顺序)?谢谢
Obs input outcome
1 A 0
2 A 1
3 A 2
4 A 0
5
s*******2
发帖数: 791
45
来自主题: Statistics版 - [提问]怎样sort这个dataset?
谢谢你。 我运行了你的这个code输出的结果就是我想要的。可是有一个问题。 我给出
的Test刚好是18个observations,所以通过proc sort去掉了duplicate rows, 就剩A 0
A 1 A 2 B 0 B 1 B 2.然后再stack dataset三次得到我想要的结果。可是如果
我给非3的倍数的observations,怎么办?
例如 16个observations:
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1
B 2 B 2 B 2
;
run;
得到的结果应该是
Obs input outcome
1 A 0
s*******2
发帖数: 791
46
来自主题: Statistics版 - [提问]怎样sort这个dataset?
谢谢 gosummerod 和 sherryyyf
看来我的first.和last. retain掌握的还是不够好。
我原来想写下面的code (uncomplete),但是要将data step (sequent test_sorted)
运行3遍,再append一起,然后sort by input counter.但是现在看来达不到我预想的结
果。首先,counter的值不是从0-15而是0-6;其次,如果我运行>=3次,最后id=5,8,16
的row是没有办法creat到我的test_New中的。
虽然上面的各位已经帮我解决了这个问题,但是还是很纠结我自己的code,谁能帮我看
看哪里错了?帮忙改一下吧。谢谢了。
proc datasets library=work;
delete sequent test_sorted test_New;
run;
data Test;
input input $ outcome @@;
datalines;
A 0 A 0
B 0 B 0 B 0
A 1 A 1 A 1
A 2 A
p*****0
发帖数: 3104
47
来自主题: Statistics版 - [提问]怎样sort这个dataset?
做成3个小dataset,
然后vertical combine
然后sort by input
how do you think?
s*******2
发帖数: 791
48
来自主题: Statistics版 - [提问]怎样sort这个dataset?
谢谢。可是如果这是一个有很多observation的dataset,这个方法就不实际了。
d*******o
发帖数: 493
49
加一句 proc datasets kill;run;
k*******r
发帖数: 16963
50
来自主题: Statistics版 - Large dataset analysis experience
我在找工作
谁能告诉我“Large dataset analysis experience”是指什么?
首页 上页 1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)