新手请教CNV caller - Biology版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Biology版 - 新手请教CNV caller

相关主题
● 有谁谈谈从零开始学NGS数据分析都需要具备什么知识？	● Bioinformatics招人提供refer
● 全基因组数据研究SV/CNV用什么软件	● NGS数据分析的流程
● 版上有谁用过或知道Knome这个公司吗?	● NGS(GATK) vs Sanger results
● bioinformatics吐下槽	● Which method is better for copy number variation detection, NGS or microarray?
● 贡献一个SNP/Indel calling pipeline	● 【包子求助】call SNPs 有哪些工具？？
● 该转到computational bio领域吗	● 请教一个统计学问题，需要多少个SNPs去鉴定一个人
● 请教染色体易位	● 问个whole exome capture之后出来的data要怎么分析
● 请教Bioinformatics职业规划~~~	● bioinformatics postdoc poition($35,000 - $40,000)

相关话题的讨论汇总
话题: read话题: mrfast话题: cnv话题: repeats话题: split

进入Biology版参与讨论

1

(共1页)

k********g 发帖数: 56	1 刚开始搞CNV,我会用CNVnator但好似不是很sensitive，MrFast+MRCaNaVar从没写清 MRCaNaVar的具体算法是什么。请问现在比较常用的用NGS 数据的CNV caller是什么？多谢。
u*********1 发帖数: 2518	2 Read-depth：CNVnator Read-pair: Breakdancer Split-Read: Pindel 一般就是上面三种metrics来通过NGS找CNV，也是1000Ｇenome project用的办法; CNVnator(read-depth)这个慢慢会被淘汰，因为read-depth本来就不是个很靠谱的东西，除非你有个很明显的large deletion，不然read alignment本身就有很多 fluctuation，容易有很多false positive；总之CNVnator是挺不靠谱的，但也算是 read-depth里最好的了 Split-read是最accurate的，也是method for future；当然你要说真正未来的trend，应该是assembly，但对sequencing数据本身要求很高，需要很高的coverage，要long reads Mrfast之类是另外一个门派（Ｅichler lab），核心是基于multiple alignment；目的是take care of segmental duplication，提高复杂区域的calling specificity/ sensitivity；但运算量会提高很多，所以目前也是小众的工具，如果你不是对repeats 很有兴趣，那也就别用这个我现在的做法就是：combine这几种方法，如果一个很obvious的比如large deletion同时被至少两种metrics支持，那我就相信；这样至少可以high-confidence的找到一些很 obvious的至少是deletion 总之对ＳＶ/ＣＮＶ calling其实最大的限制是read length还是太短了【在 k********g 的大作中提到】 : 刚开始搞CNV,我会用CNVnator但好似不是很sensitive，MrFast+MRCaNaVar从没写清 : MRCaNaVar的具体算法是什么。 : 请问现在比较常用的用NGS 数据的CNV caller是什么？多谢。
y***k 发帖数: 40	3 我认为Mrfast之类本质上还是readdepth，只不过他改进multiple alignment的reads的计算. 还有想问一句，你是怎么“combine”的呢？
u*********1 发帖数: 2518	4 mrFAST/mrsFAST,是alignment工具，对应的是BWA/Bowtie， mrFAST得到的alignment的文件基础上，Eichler group又开发出一套基于各种metrics 的软件，比如你说的readdepth的叫MRCaNaVar，对应BWA系列的CNVnator combine的问题，其实我是最弱智的，就是分别call，然后bedtools找overlap 我现在能做的也就这么多；有的人会在这个基础之上做local assembly 当然了，也有一些软件，会基于两种三种signal来找calling，比如Genome STRiP啦， DELLY啦；但我感觉效果都差不多；只要read length不增长，不管你如何玩弄program 的花样这个领域还是没有长足进展我的principle是，我只需要找罕见的SV，而不是optimally的找所有的SV;比如一个疾病是由一个obvious的罕见的10kb的deletion造成的，我相信combine以上几个signal肯定可以找到【在 y***k 的大作中提到】 : 我认为Mrfast之类本质上还是readdepth，只不过他改进multiple alignment的reads的 : 计算. : 还有想问一句，你是怎么“combine”的呢？
k********g 发帖数: 56	5 Thank you very much. I cannot type Chinese on the desktop in my office. I apologize for the inconvenience. I am actually interested in the repeats, and that is why I looked in MrFast+ MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the algorithm of MrFast is well documented. CNVnator, on the other hand, is not sensitive to the duplication in my experience. Regarding to Split-read, this is the first time I heard that SR methods are most accurate. The read length of my data is 101, do you think it is too short for Split-Read methods? I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very much! 【在 u*********1 的大作中提到】 : Read-depth：CNVnator : Read-pair: Breakdancer : Split-Read: Pindel : 一般就是上面三种metrics来通过NGS找CNV，也是1000Ｇenome project用的办法; : CNVnator(read-depth)这个慢慢会被淘汰，因为read-depth本来就不是个很靠谱的东西 : ，除非你有个很明显的large deletion，不然read alignment本身就有很多 : fluctuation，容易有很多false positive；总之CNVnator是挺不靠谱的，但也算是 : read-depth里最好的了 : Split-read是最accurate的，也是method for future；当然你要说真正未来的trend， : 应该是assembly，但对sequencing数据本身要求很高，需要很高的coverage，要long
u*********1 发帖数: 2518	6 SR methods are definitely the most accurate because it provides the exact breakpoint; but we're not lucky enough to have reads encompassing breakpoints all the time even for SV in unique region, not to mention those complex structural variants involving repeats/duplication. So till now, SV field or even indel calling, I would say still quite messy with lots of false positives, and whole field is lagging behind compared with SNP calling. If you are interested in repeats, please first define "repeats" here, do you mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids , if copy number is not that big, ie.tandem repeats polymorphism, say around 10, GATK/samtools can call them just as SNP; if you use Split-read based SV programs like Pindel I think they'll also be called. But also look at the link below: http://erlichlab.wi.mit.edu/lobSTR/ Though I haven't tried this, I think this lobSTR should achieve better performance. Again, it's for polymorphism, if you're looking for repeat expansion, say 1000 copies trinucleotides expanded, I don't think any programs right now will give a best answer given 101bp reads available. MrFast+ not are 【在 k********g 的大作中提到】 : Thank you very much. I cannot type Chinese on the desktop in my office. I : apologize for the inconvenience. : I am actually interested in the repeats, and that is why I looked in MrFast+ : MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the : algorithm of MrFast is well documented. CNVnator, on the other hand, is not : sensitive to the duplication in my experience. : Regarding to Split-read, this is the first time I heard that SR methods are : most accurate. The read length of my data is 101, do you think it is too : short for Split-Read methods? : I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very
b****r 发帖数: 17995	7 这个帖子值得收藏几位大牛预期一下，目前阶段cCGH和illumina NGS的call CNV能力，谁更强，谁的潜力更大呢？
k********g 发帖数: 56	8 多谢，受教了。我是搞统计出身，现阶段确实是更关心比较长 indel，因为从我们的角度来看建模比较简单。您提过的几个paper我会仔细研究一下。多谢。 those you nucleotids around 【在 u*********1 的大作中提到】 : SR methods are definitely the most accurate because it provides the exact : breakpoint; but we're not lucky enough to have reads encompassing : breakpoints all the time even for SV in unique region, not to mention those : complex structural variants involving repeats/duplication. : So till now, SV field or even indel calling, I would say still quite messy : with lots of false positives, and whole field is lagging behind compared : with SNP calling. : If you are interested in repeats, please first define "repeats" here, do you : mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids : , if copy number is not that big, ie.tandem repeats polymorphism, say around
o***a 发帖数: 28	9 我感觉array CGH能detect large SV，但是无法准确定位breakpoint。再说split-read method，detect deletion是没有问题的，任意长度都可以，detect insertion就只能小于read length了，另外它找的duplication只限于tandem duplication Delly是比较新的软件，融合了split-read和read pair的方法。用起来也比较简单。

1

(共1页)

进入Biology版参与讨论

相关主题
● bioinformatics postdoc poition($35,000 - $40,000)	● 贡献一个SNP/Indel calling pipeline
● 制药公司招生物信息Senior Information Scientist	● 该转到computational bio领域吗
● 下一代技术测序分析结果需要会什么软件技术？	● 请教染色体易位
● 小白弱问几个术语	● 请教Bioinformatics职业规划~~~
● 有谁谈谈从零开始学NGS数据分析都需要具备什么知识？	● Bioinformatics招人提供refer
● 全基因组数据研究SV/CNV用什么软件	● NGS数据分析的流程
● 版上有谁用过或知道Knome这个公司吗?	● NGS(GATK) vs Sanger results
● bioinformatics吐下槽	● Which method is better for copy number variation detection, NGS or microarray?

相关话题的讨论汇总
话题: read话题: mrfast话题: cnv话题: repeats话题: split

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)