紧急请教两个关于resampling的概念问题 - Statistics版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Statistics版 - 紧急请教两个关于resampling的概念问题

相关主题
● 有个bootstrap的问题想找人讨论下。	● how do you deal with sparse data?
● 请教一个bootstrapping的问题。	● Resampling和monte Carlo用什么软件最好?
● 关于Bootstrap法需要强调的是	● classification 问题求教!!
● [合集] 有个bootstrap的问题想找人讨论下。	● 求用R做bootstrap的example script
● 请问R里面有什么package来算wilcoxon signed rank test的sample size 么？	● 请教如何做normalization，找峰值
● 如何做sampling	● bootstrap真的能让让我们逼近“真理”吗
● 求助：哪位同学能提供下算CI的公式	● random forest里面为什么是"可放回"的resample呢？
● Re: 讨论讨论Bootstrap和resampling吧	● bagging 用于logistic regression because of unbalance data

相关话题的讨论汇总
话题: sample话题: size话题: mean话题: 抽样话题: 1000

进入Statistics版参与讨论

1

(共1页)

r******y 发帖数: 39	1 统计大菜鸟，希望大家帮我回复下面的两个问题，非常谢谢！问题1：有一个小population， size N = 150, 我从这个population里面抽sample，sample size n = 5, 抽sample用的是 sampling without replacement, 然后我这样重复抽了 1000次，最后得到1000个小sample，每个sample的size是5. 这种resampling的方法，有没有一个统计专业名字？这算是 simulation 吗？问题2：这个小population里面是150个人，我要关注的variable是height。我这么抽1000个 resample的目的是想要知道: when sample size n = 5, how likely is the mean height of any randomly drawn sample （n=5） similar or different from the mean height of the population (N=150). 我的问题是：我这么抽了1000次，是主观决定的，感觉好像1000次差不多了，有没有什么公式是可以帮我证明抽1000次是足够了的？非常谢谢！！
w*******9 发帖数: 1433	2 looks like you want to get the distribution of the mean of a size-5 random sample. Since it is fairly quick to do such simple resampling why not use a number as big as possible? (You can keep in mind the sqrt(size) rule to gauge the accuracy of your approximation; or you can try different runs and assess the variations between runs to determine if 1000 is enough; since the total different number of combinations is C(150, 5), any number beyond that is not helpful).
r******y 发帖数: 39	3 非常谢谢回复！ Since it is fairly quick to do such simple resampling why not use a number as big as possible? => 非常好的问题，也是问到了最让我头疼的地方。 1. 我这实验是过去做的，现在解释为什么选用1000作为抽样次数，没有办法回到过去修改抽样次数 2. 如果我再重新做实验，我应该会选择更大的抽样次数，比如说5000或者10000.但即便如此，我依然还是需要解释为什么5000或者10000是个理想的抽样次数 since the total different number of combinations is C(150, 5), any number beyond that is not helpful). => 很好的见解，谢谢提示！ or you can try different runs and assess the variations between runs to determine if 1000 is enough => 这步是做过了的，把1000和2000和5000抽样次数比较了，变化不大，但是感觉这种鉴定方法也是很粗糙。如果有更加clear-cut的验证方法，就更好了。 You can keep in mind the sqrt(size) rule to gauge the accuracy of your approximation => 第一步里面，你提到的sqrt(size) rule，我不了解这个rule，请问可以给一些解释或者文献吗？谢谢另外，我看到了一篇文章：Determining the number of simulation runs:Treating simulations as theories by not sampling their behavior 文章链接：http://acs.ist.psu.edu/papers/ritterSQKip.pdf 这篇文章里提出了几个观点：1）抽样次数越多越好；2）抽样次数多的时候，sample mean distribution会呈正态分布模式，抽样次数少的时候，则不是；3）可以根据 mean effect size 和 predefined power 计算出所需要的抽样次数。我现在没办法回到过去抽更多的样本，但是，我在想是否可以借用这文章里的第2、3观点来解决我当前的问题，比如说，我可以看看我的1000次抽样的sample mean distribution. 如果我的distribution呈正态，我是否就可以claim抽样次数是足够的呢？我也在考虑计算一个所需要的抽样次数，但是我没读懂这篇文章里的mean effect size是怎么算出来的.. 希望楼上朋友继续回复我，谢谢！ a runs since beyond 【在 w*******9 的大作中提到】 : looks like you want to get the distribution of the mean of a size-5 random : sample. Since it is fairly quick to do such simple resampling why not use a : number as big as possible? (You can keep in mind the sqrt(size) rule : to gauge the accuracy of your approximation; or you can try different runs : and assess the variations between runs to determine if 1000 is enough; since : the total different number of combinations is C(150, 5), any number beyond : that is not helpful).
w*******9 发帖数: 1433	4 I was too quick to say any number beyond c(150, 5) is not helpful -- it's wrong. Method 3) in the paper you refer to is the most rigorous approach to calculate the minimum sample size needed in order to achieve certain length in confidence interval, or equivalently, certain power in hypothesis test. See for example http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/BS704_Power4.html If I understand your question correctly, you're trying to estimate the probability (say p) that the mean of the sub-sample differs the whole sample mean by a given threshold. Let's say you get an estimate p_hat in your previous simulation of 1000 times. We can obtain an asymptotic 95% confidence interval for p as p_hat +- 1.96sqrt(p_hat (1 - p_hat). See if you can argue the width of this interval is shorter than a pre-specified length which serves your study purpose.
r******y 发帖数: 39	5 谢谢'那山那水’的时间和回复。大牛厉害！对的，你对我的research question的理解是正确的 :) 然我的research question是比较 sample mean vs population mean, 但是我目前面临的问题却是需要解释为什么我的实验里面选择了重复抽样1000次，所以我并不需要理睬如何计算sample size的问题，而是number of simulation runs. 我也觉得那篇文章里的第三个方法是最准确的，但是那个文章里的计算公式里有个变量叫做 mean effect size, 全文都没解释这个 mean effect size 是怎么计算的，感觉像是Cohen D, 却又好像不太说的通，因为直觉上，Cohen D越大，需要的simulation runs也应该更多次，但是文中的结论却是相反(具体请看p.28, table 5). 所以我一直想知道这个 mean effect size 到底代表什么。我还有个问题想问你： I was too quick to say any number beyond c(150, 5) is not helpful -- it's wrong. => 请问为什么是错的？ length sample 【在 w*******9 的大作中提到】 : I was too quick to say any number beyond c(150, 5) is not helpful -- it's : wrong. : Method 3) in the paper you refer to is the most rigorous approach to : calculate the minimum sample size needed in order to achieve certain length : in confidence interval, or equivalently, certain power in hypothesis test. : See for example http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/BS704_Power4.html : If I understand your question correctly, you're trying to estimate the : probability (say p) that the mean of the sub-sample differs the whole sample : mean by a given threshold. : Let's say you get an estimate p_hat in your previous simulation of 1000
b*****s 发帖数: 11267	6 To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. Ronald Fisher 【在 r******y 的大作中提到】 : 统计大菜鸟，希望大家帮我回复下面的两个问题，非常谢谢！ : 问题1： : 有一个小population， size N = 150, 我从这个population里面抽sample，sample : size n = 5, 抽sample用的是 sampling without replacement, 然后我这样重复抽了 : 1000次，最后得到1000个小sample，每个sample的size是5. 这种resampling的方法， : 有没有一个统计专业名字？这算是 simulation 吗？ : 问题2： : 这个小population里面是150个人，我要关注的variable是height。我这么抽1000个 : resample的目的是想要知道: when sample size n = 5, how likely is the mean : height of any randomly drawn sample （n=5） similar or different from the
r******y 发帖数: 39	7 :( haha... 赞直接。。。谢回复，这条comment我表示开心的接受 :( than what 【在 b*****s 的大作中提到】 : To call in the statistician after the experiment is done may be no more than : asking him to perform a post-mortem examination: he may be able to say what : the experiment died of. : Ronald Fisher

1

(共1页)

进入Statistics版参与讨论

相关主题
● bagging 用于logistic regression because of unbalance data	● 请问R里面有什么package来算wilcoxon signed rank test的sample size 么？
● 请教：如何做regression model的validation？	● 如何做sampling
● 紧急求助一个LOGISTIC REGRESSION 问题.	● 求助：哪位同学能提供下算CI的公式
● 请教大神们关于bootstrap	● Re: 讨论讨论Bootstrap和resampling吧
● 有个bootstrap的问题想找人讨论下。	● how do you deal with sparse data?
● 请教一个bootstrapping的问题。	● Resampling和monte Carlo用什么软件最好?
● 关于Bootstrap法需要强调的是	● classification 问题求教!!
● [合集] 有个bootstrap的问题想找人讨论下。	● 求用R做bootstrap的example script

相关话题的讨论汇总
话题: sample话题: size话题: mean话题: 抽样话题: 1000

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)