第2页 - 关于dataset的讨论汇总 - 话题女王

全部话题 - 话题: dataset

a*****i
发帖数: 1045

来自主题: Statistics版 - 求助帮忙找statistical process control方面的dataset, 包子答谢。

想问问各位有没有做statistical process control方向的呢？
我现在硕士论文是这个主题，需要找这方面的dataset,用几个方法比较。
我需要high-dimensional dataset, 有multiple dependent variables 和multiple
independent variables,可以用partial least square和PCA 来分析的。
不知道各位知道哪里可以找到此类数据。包子答谢。

a*****i
发帖数: 1045

来自主题: Statistics版 - 求助帮忙找statistical process control方面的dataset, 包子答谢。

g*********3
发帖数: 177

来自主题: DataSciences版 - training dataset和unbalanced dataset的设计

各位大神，有没有这方面的经验：
实际项目中，数据库的postive/negative data points是极度unbalanced的。比如
crime database里面有1million individual，crime有100(positive data point)，剩
下的全部是negative data point。
需要用这些数据建立一个machine learning model来classify将来一些人的crime。
怎样设计training dataset呢？有什么好的统计或者ML的方法吗？
谢谢。

s******g
发帖数: 193

来自主题: JobHunting版 - [SAS编程]如何删除一个dataset的某一列或者某几行？

比如一个dataset名字是test，有a b c d四个variables，现在我要删除c d两个
variable，然后删除所有a=1的行，语句是怎么样的？谢谢！

s******g
发帖数: 193

来自主题: JobHunting版 - SAS里如何删除一个dataset的某一列或者某几行？

比如说有个dataset 名字是test，包含a b c d四个variables，现在我想从test中删除
c d两个variables，并删除所有a=1的条目，应该如何实现？
谢谢！

i******7
发帖数: 1

来自主题: CS版 - 请推荐几个大的 graph dataset

哪位大侠能推荐几个公开的, 大的 dataset of directed graphs (having more than
10000 nodes)? 随机产生的或者真实的 graph 都可以. 谢了先!

j**********s
发帖数: 132

来自主题: CS版 - 请推荐几个大的 graph dataset

The Enron Email dataset
http://www.cs.cmu.edu/~enron/

than

j**********s
发帖数: 132

来自主题: CS版 - 请推荐几个大的 graph dataset

如果 bipartite graph 也可以的化，不妨试试这个 Netflix prize dataset
http://www.netflixprize.com
如果你的程序能比 netflix 的 benchmark 强 10% 的化，可以拿到 1M USD 的
奖金。目前的记录是比 benchmark 强 8.46%

g***l
发帖数: 18555

来自主题: Database版 - SSRS report failing to display dataset string

不要用RESULT DATASET，放到TEMP TABLE里去

u*********e
发帖数: 9616

来自主题: Database版 - SSRS report failing to display dataset string

thanks for the suggestion. Not a fan of cursor myself either. I will see
what I can do to improve the sp.
the question I have is that, what could cause the report not returning all
the data even though the sp itself returned the data as expected? I thought
SSRS just treated all returned dataset as strings.

u*********e
发帖数: 9616

来自主题: Database版 - SSRS report failing to display dataset string

gejkl,
Thank you very much for helping me answering my question. I was doing
research myself and figuring out the issue. You are right. I rewrite the
query to get rid of cursor and use PATH XML instead.
One interesting thing I observed is that after I updated my sp and run the
report, it gave out an error msg:
"An error occurred during local report processing.Exception has been thrown
by the target of an invocation. String was not recognized as a valid
DateTime.Couldn't store <> in Stmt_Date_Cre... 阅读全帖

k****i
发帖数: 1072

来自主题: DotNet版 - DataReader vs. DataSet

Personally I like dataset more than datareader.
As you said datareader holds the connection exclusivly.Though the connection
pooling is done by the framework in most of the cases.I still like to share
the connection in my program if possible and it's clearer.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bd
As
avoids
connection
it
it

y****t
发帖数: 10233

来自主题: DotNet版 - DataReader vs. DataSet

I think the key is when you need sort of "real time" data, you have to use
datareader.
and it always compare to dataadaptor instead of dataset.
BTW, is the post really "your words" :)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bd
As
avoids
connection
it
it

k****i
发帖数: 1072

来自主题: DotNet版 - DataReader vs. DataSet

too
but
I think he was talking about using the in-memory dataset saved huge
server-side oepration at the back-end.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bd
the
off
the

L*******r
发帖数: 1011

来自主题: DotNet版 - DataReader vs. DataSet

So the point here is using the "cache" in dataset to do computing, am I right?

m*****1
发帖数: 8

来自主题: Programming版 - C#: DataSet vs. Class

I have a stored procedure returning several records. Those returned records
from the stored procedure will be displayed in the textboxes on a form. Some
of fields in those records may be modified in the related textboxes.
Should I use a dataset or a class to store those records from the stored
procedure before they are displayed on the form ? What are the cons and pros
using each one? Thanks a lot.

A**n
发帖数: 1703

来自主题: Programming版 - C#: DataSet vs. Class

I'll say datasets for small projects and classes for large projects
involving a team and ongoing maintenance.

records
Some
pros

x**n
发帖数: 461

来自主题: Programming版 - dot net Q: dataset, entity data model, LINQ, entity framework

dataset: a collection of data that know how to persist themselves, refer to
active record.
Entity data model: model the domain.
LINQ: query the data.
Entity Framework: an OR/M library with extra features.

w*s
发帖数: 7227

来自主题: Programming版 - dot net Q: dataset, entity data model, LINQ, entity framework

大牛，什么情况用dataset,什么情况用别的？
现在趋势是什么？

to

s***o
发帖数: 2191

来自主题: Programming版 - dot net Q: dataset, entity data model, LINQ, entity framework

Dataset is a component from ADO.NET. You can think it as a mini in-memory
database (that maps to part of your back end database). It is very "
heavyweight" so you'd better consider other approaches first. It's still
useful in some situations, for example, when you do "bulk" operations.
Linq is a very important language feature that you will use everyday. But if
you mean "Linq to Sql", then ignore it. You have Entity Framework now.
For "entity data model", I assume you mean EDM in entity framewor... 阅读全帖

w********r
发帖数: 4193

来自主题: Security版 - CRYPTO-GRAM(Jan 15)--Anonymity and the Netflix Dataset

Anonymity and the Netflix Dataset
Last year, Netflix published 10 million movie rankings by 500,000
customers, as part of a challenge for people to come up with better
recommendation systems than the one the company was using. The data was
anonymized by removing personal details and replacing names with random
numbers, to protect the privacy of the recommenders.
Arvind Narayanan and Vitaly Shmatikov, researchers at the University of
Texas at Austin, de-anonymized some of the Netflix data b

m*******n
发帖数: 154

来自主题: XML版 - help 问个c# xml schema dataset 的问题

load schema stored in the XML file into your progam memory and treat the
schema as a standard dataset object. e.g.
m_xmldataset.Tables[0].Rows.Add(new datarow(...))

动
多
xm

c***y
发帖数: 615

来自主题: Biology版 - how to create non-redundant DNA sequence dataset

Around 200 DNA sequences, randomly picked up. Wondering if there is an easy
way (such as online server) to create a non-redundant dataset.Thank you very
much

v*******g
发帖数: 334

来自主题: Biology版 - 用SAS/R如何管理large dataset,存储,读取,高效的数据处理？

dataset 〉million records
是基因数据。大家是如何处理大型数据，用什么软件。
要用SQL吗？如何与外部数据库联呢？
或者用R 如何管理和处理 large datset?
或哪里有这方面的介绍呢？
谢谢

c*********t
发帖数: 340

来自主题: Biology版 - 用SAS/R如何管理large dataset,存储,读取,高效的数据处理？

I used SAS for about 1 year and R for about 1 year. When I was using SAS I
was just dealing with small datasets. For the past year I've been working
with high-through put microarray data and R is extremely good at handling
data matrice. You can find a whole bunch of R tutorials online:-)

I******i
发帖数: 203

来自主题: Biology版 - 哪里可以下载ngs的sample dataset

我最近想学习ngs的数据分析，板上的大神能否指点一下哪里可以下载一些ngs的sample
dataset. 比如说rnaseq，或者WGS的。感觉只看理论不动手操作太艰难。
谢谢

发帖数: 1

来自主题: Biology版 - GTex portal dataset download

They do provide a REST API, such as
https://gtexportal.org/rest/v1/dataset/sample
Some of them need authentication though, including file downloads. What
specific files are you trying to download?

发帖数: 1

来自主题: Biology版 - GTex portal dataset download

On your computer with web browser
1. go to https://www.gtexportal.org/home/datasets
2. You will be asked to login, so login with your google account
3. open developer console, run
"gapi.auth2.getAuthInstance().currentUser.get().getAuthResponse().id_token"
4. Copy this token
On your Linux command line
5. run the following command to obtain the URL for each of the file, replace
XXX with the token
curl -X GET https://gtexportal.org/rest/v1/admin/file_download?objectPath=
gtex_analysis_pilot_v3/rna_... 阅读全帖

发帖数: 1

来自主题: Biology版 - GTex portal dataset download

They do provide a REST API, such as
https://gtexportal.org/rest/v1/dataset/sample
Some of them need authentication though, including file downloads. What
specific files are you trying to download?

发帖数: 1

来自主题: Biology版 - GTex portal dataset download

On your computer with web browser
1. go to https://www.gtexportal.org/home/datasets
2. You will be asked to login, so login with your google account
3. Randomly choose a small file to download (such as "GTEx_Analysis_v7_
Annotations_SubjectPhenotypesDD.xlsx"), this is to trigger the
authentication process
4. open developer console, run
"gapi.auth2.getAuthInstance().currentUser.get().getAuthResponse().id_token"
5. Copy this token
On your Linux command line
6. run the following commands to obtain ... 阅读全帖

u****h
发帖数: 2193

来自主题: Quant版 - 推荐什么R的书是有dataset可以下的呢？

准备开始学R，但是在R project的网站上找的那些书都没有dataset， example这些下
载来看。大家有什么推荐呢？我就是觉得能按照书中的样例学习会很方便。
谢谢啦！

c**********e
发帖数: 2007

来自主题: Statistics版 - How to make the SS3 a dataset?

In PROC GLM procedure, model y= x1 x2/ss3; will output the
SS3 table. But how to make the SS3 a SAS dataset? Thanks.

w****a
发帖数: 155

来自主题: Statistics版 - 在一些工作要求中经常提到experience of working with large datasets, 这一般都包括哪些skill呢, 如何学习呢?

在一些工作要求中经常提到experience of working with large datasets, 这一般都
包括哪些skill呢, 如何学习呢?

c*******o
发帖数: 8869

来自主题: Statistics版 - How to open SAS dataset

execute the following:
data a;
set a;
format _all_;
run;
then double click the dataset.

by
used

s********l
发帖数: 245

来自主题: Statistics版 - How to open SAS dataset

Thanks very much! That dataset can open right now!

l*****k
发帖数: 587

来自主题: Statistics版 - 请问如何把SAS dataset转到R里？

You can read sas dataset directly to R, however you need to have SAS
installed on the same machine. I did this before but forgot how to do it
now

l*g
发帖数: 46

来自主题: Statistics版 - 请问如何验证已知的logistic regression models是不是能很好predict 自己的dataset

Thank you, ls! The problem is that I cannot get the models with the known
coefficients...
I need to see if the original known models can predict my dataset well. How
can I put those models into Stata?

h*****y
发帖数: 367

来自主题: Statistics版 - 求助：谁那里有可以用的longitudinal dataset?

有个project，要自己找数据，大家上课的dataset能传给我么？
HLM, GEE, MManova, RMANOVA都可以阿
谢了先，大包子

z**********i
发帖数: 12276

来自主题: Statistics版 - 求助：谁那里有可以用的longitudinal dataset?

有几个经典的longitudinal dataset.

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

我再想想，因为我没有仔细去考虑breast cancer dataset, 或许在merge前，我应该把
它的duplicates去掉.

z**********i
发帖数: 12276

来自主题: Statistics版 - Dataset merge的一个问题

来更新一下这个问题，其实，是one to many merge,不是many to many. 后来，我把大
的dataset的变量减少到我需要的几个变量，10几分钟就完了。原来有上千个变量。

one

P****D
发帖数: 11146

来自主题: Statistics版 - 如何强行合并两个datasets？

你有多少的column名字在这两个dataset里不match？我感觉这个只能一个一个手动处理
……
强行素不好滴。要爱好和平，反对暴力。圣哉！

o******6
发帖数: 538

来自主题: Statistics版 - [合集] hash table and large dataset

☆─────────────────────────────────────☆
largetail (largetail) 于 (Mon Mar 16 22:57:50 2009) 提到:
Any one familiar with hash table or handle large dataset (hundreds of
millions of obs) in SAS (sort merge and subset). Any reference and suggest
is greatly appreciated.
☆─────────────────────────────────────☆
qqzj (小车车) 于 (Mon Mar 16 23:46:54 2009) 提到:
hash table is used when you have a huge data base and a small code table.
you can transform the smaller one into a hash table. it is very effi

s*******2
发帖数: 791

来自主题: Statistics版 - [提问]怎样sort这个dataset?

我有如下dataset Test
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1 B 1
B 2 B 2 B 2
;
怎么样可以得到下面的数据 (outcome按照0，1，2的顺序)？谢谢
Obs input outcome
1 A 0
2 A 1
3 A 2
4 A 0
5

s*******2
发帖数: 791

来自主题: Statistics版 - [提问]怎样sort这个dataset?

谢谢你。我运行了你的这个code输出的结果就是我想要的。可是有一个问题。我给出
的Test刚好是18个observations，所以通过proc sort去掉了duplicate rows, 就剩A 0
A 1 A 2 B 0 B 1 B 2.然后再stack dataset三次得到我想要的结果。可是如果
我给非3的倍数的observations,怎么办？
例如 16个observations：
data Test;
input input $ outcome $ @@;
datalines;
A 0 A 0
A 1 A 1 A 1
A 2 A 2 A 2
B 0 B 0 B 0
B 1 B 1
B 2 B 2 B 2
;
run;
得到的结果应该是
Obs input outcome
1 A 0

s*******2
发帖数: 791

来自主题: Statistics版 - [提问]怎样sort这个dataset?

谢谢 gosummerod 和 sherryyyf
看来我的first.和last. retain掌握的还是不够好。
我原来想写下面的code （uncomplete），但是要将data step (sequent test_sorted)
运行3遍,再append一起，然后sort by input counter.但是现在看来达不到我预想的结
果。首先，counter的值不是从0-15而是0-6；其次，如果我运行>=3次,最后id=5,8,16
的row是没有办法creat到我的test_New中的。
虽然上面的各位已经帮我解决了这个问题，但是还是很纠结我自己的code,谁能帮我看
看哪里错了？帮忙改一下吧。谢谢了。
proc datasets library=work;
delete sequent test_sorted test_New;
run;
data Test;
input input $ outcome @@;
datalines;
A 0 A 0
B 0 B 0 B 0
A 1 A 1 A 1
A 2 A

p*****0
发帖数: 3104

来自主题: Statistics版 - [提问]怎样sort这个dataset?

做成3个小dataset，
然后vertical combine
然后sort by input
how do you think?

s*******2
发帖数: 791

来自主题: Statistics版 - [提问]怎样sort这个dataset?

谢谢。可是如果这是一个有很多observation的dataset，这个方法就不实际了。

d*******o
发帖数: 493

来自主题: Statistics版 - [SAS]怎么快捷地删除Macro 里创建的临时dataset和macro variab

加一句 proc datasets kill;run;

k*******r
发帖数: 16963

来自主题: Statistics版 - Large dataset analysis experience

我在找工作
谁能告诉我“Large dataset analysis experience”是指什么？

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天