由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Biology版 - 新手请教CNV caller
相关主题
有谁谈谈从零开始学NGS数据分析都需要具备什么知识?Bioinformatics招人 提供refer
全基因组数据研究SV/CNV用什么软件NGS数据分析的流程
版上有谁用过或知道Knome这个公司吗?NGS(GATK) vs Sanger results
bioinformatics吐下槽Which method is better for copy number variation detection, NGS or microarray?
贡献一个SNP/Indel calling pipeline【包子求助】call SNPs 有哪些工具??
该转到computational bio领域吗请教一个统计学问题,需要多少个SNPs去鉴定一个人
请教染色体易位问个whole exome capture之后出来的data要怎么分析
请教Bioinformatics职业规划~~~bioinformatics postdoc poition($35,000 - $40,000)
相关话题的讨论汇总
话题: read话题: mrfast话题: cnv话题: repeats话题: split
进入Biology版参与讨论
1 (共1页)
k********g
发帖数: 56
1
刚开始搞CNV,我会用CNVnator但好似不是很sensitive,MrFast+MRCaNaVar从没写清
MRCaNaVar的具体算法是什么。
请问现在比较常用的用NGS 数据的CNV caller是什么?多谢。
u*********1
发帖数: 2518
2
Read-depth:CNVnator
Read-pair: Breakdancer
Split-Read: Pindel
一般就是上面三种metrics来通过NGS找CNV,也是1000Genome project用的办法;
CNVnator(read-depth)这个慢慢会被淘汰,因为read-depth本来就不是个很靠谱的东西
,除非你有个很明显的large deletion,不然read alignment本身就有很多
fluctuation,容易有很多false positive;总之CNVnator是挺不靠谱的,但也算是
read-depth里最好的了
Split-read是最accurate的,也是method for future;当然你要说真正未来的trend,
应该是assembly,但对sequencing数据本身要求很高,需要很高的coverage,要long
reads
Mrfast之类是另外一个门派(Eichler lab),核心是基于multiple alignment;目的
是take care of segmental duplication,提高复杂区域的calling specificity/
sensitivity;但运算量会提高很多,所以目前也是小众的工具,如果你不是对repeats
很有兴趣,那也就别用这个
我现在的做法就是:combine这几种方法,如果一个很obvious的比如large deletion同
时被至少两种metrics支持,那我就相信;这样至少可以high-confidence的找到一些很
obvious的至少是deletion
总之对SV/CNV calling其实最大的限制是read length还是太短了

【在 k********g 的大作中提到】
: 刚开始搞CNV,我会用CNVnator但好似不是很sensitive,MrFast+MRCaNaVar从没写清
: MRCaNaVar的具体算法是什么。
: 请问现在比较常用的用NGS 数据的CNV caller是什么?多谢。

y***k
发帖数: 40
3
我认为Mrfast之类本质上还是readdepth,只不过他改进multiple alignment的reads的
计算.
还有想问一句,你是怎么“combine”的呢?
u*********1
发帖数: 2518
4
mrFAST/mrsFAST,是alignment工具,对应的是BWA/Bowtie,
mrFAST得到的alignment的文件基础上,Eichler group又开发出一套基于各种metrics
的软件,比如你说的readdepth的叫MRCaNaVar,对应BWA系列的CNVnator
combine的问题,其实我是最弱智的,就是分别call,然后bedtools找overlap
我现在能做的也就这么多;有的人会在这个基础之上做local assembly
当然了,也有一些软件,会基于两种三种signal来找calling,比如Genome STRiP啦,
DELLY啦;但我感觉效果都差不多;只要read length不增长,不管你如何玩弄program
的花样这个领域还是没有长足进展
我的principle是,我只需要找罕见的SV,而不是optimally的找所有的SV;比如一个疾
病是由一个obvious的罕见的10kb的deletion造成的,我相信combine以上几个signal肯
定可以找到

【在 y***k 的大作中提到】
: 我认为Mrfast之类本质上还是readdepth,只不过他改进multiple alignment的reads的
: 计算.
: 还有想问一句,你是怎么“combine”的呢?

k********g
发帖数: 56
5
Thank you very much. I cannot type Chinese on the desktop in my office. I
apologize for the inconvenience.
I am actually interested in the repeats, and that is why I looked in MrFast+
MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the
algorithm of MrFast is well documented. CNVnator, on the other hand, is not
sensitive to the duplication in my experience.
Regarding to Split-read, this is the first time I heard that SR methods are
most accurate. The read length of my data is 101, do you think it is too
short for Split-Read methods?
I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very
much!

【在 u*********1 的大作中提到】
: Read-depth:CNVnator
: Read-pair: Breakdancer
: Split-Read: Pindel
: 一般就是上面三种metrics来通过NGS找CNV,也是1000Genome project用的办法;
: CNVnator(read-depth)这个慢慢会被淘汰,因为read-depth本来就不是个很靠谱的东西
: ,除非你有个很明显的large deletion,不然read alignment本身就有很多
: fluctuation,容易有很多false positive;总之CNVnator是挺不靠谱的,但也算是
: read-depth里最好的了
: Split-read是最accurate的,也是method for future;当然你要说真正未来的trend,
: 应该是assembly,但对sequencing数据本身要求很高,需要很高的coverage,要long

u*********1
发帖数: 2518
6
SR methods are definitely the most accurate because it provides the exact
breakpoint; but we're not lucky enough to have reads encompassing
breakpoints all the time even for SV in unique region, not to mention those
complex structural variants involving repeats/duplication.
So till now, SV field or even indel calling, I would say still quite messy
with lots of false positives, and whole field is lagging behind compared
with SNP calling.
If you are interested in repeats, please first define "repeats" here, do you
mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids
, if copy number is not that big, ie.tandem repeats polymorphism, say around
10, GATK/samtools can call them just as SNP; if you use Split-read based SV
programs like Pindel I think they'll also be called. But also look at the
link below:
http://erlichlab.wi.mit.edu/lobSTR/
Though I haven't tried this, I think this lobSTR should achieve better
performance.
Again, it's for polymorphism, if you're looking for repeat expansion, say
1000 copies trinucleotides expanded, I don't think any programs right now
will give a best answer given 101bp reads available.

MrFast+
not
are

【在 k********g 的大作中提到】
: Thank you very much. I cannot type Chinese on the desktop in my office. I
: apologize for the inconvenience.
: I am actually interested in the repeats, and that is why I looked in MrFast+
: MrCaNaVar. But I cannot find the algorithm behind MrCaNaVar, though the
: algorithm of MrFast is well documented. CNVnator, on the other hand, is not
: sensitive to the duplication in my experience.
: Regarding to Split-read, this is the first time I heard that SR methods are
: most accurate. The read length of my data is 101, do you think it is too
: short for Split-Read methods?
: I will also check out GenomeSTRiP and DELLY you mentioned. Thank you very

b****r
发帖数: 17995
7
这个帖子值得收藏
几位大牛预期一下,目前阶段cCGH和illumina NGS的call CNV能力,谁更强,谁的潜力
更大呢?
k********g
发帖数: 56
8
多谢,受教了。 我是搞统计出身,现阶段确实是更关心比较长 indel,因为从我们的
角度来看建模比较简单。您提过的几个paper我会仔细研究一下。多谢。

those
you
nucleotids
around

【在 u*********1 的大作中提到】
: SR methods are definitely the most accurate because it provides the exact
: breakpoint; but we're not lucky enough to have reads encompassing
: breakpoints all the time even for SV in unique region, not to mention those
: complex structural variants involving repeats/duplication.
: So till now, SV field or even indel calling, I would say still quite messy
: with lots of false positives, and whole field is lagging behind compared
: with SNP calling.
: If you are interested in repeats, please first define "repeats" here, do you
: mean short tandem repeats (microsatillite)? For di-, tri-,tetra- nucleotids
: , if copy number is not that big, ie.tandem repeats polymorphism, say around

o***a
发帖数: 28
9
我感觉array CGH能detect large SV,但是无法准确定位breakpoint。
再说split-read method,detect deletion是没有问题的,任意长度都可以,detect
insertion就只能小于read length了,另外它找的duplication只限于tandem
duplication
Delly是比较新的软件,融合了split-read和read pair的方法。用起来也比较简单。
1 (共1页)
进入Biology版参与讨论
相关主题
bioinformatics postdoc poition($35,000 - $40,000)贡献一个SNP/Indel calling pipeline
制药公司招生物信息Senior Information Scientist该转到computational bio领域吗
下一代技术测序分析结果需要会什么软件技术?请教染色体易位
小白弱问几个术语请教Bioinformatics职业规划~~~
有谁谈谈从零开始学NGS数据分析都需要具备什么知识?Bioinformatics招人 提供refer
全基因组数据研究SV/CNV用什么软件NGS数据分析的流程
版上有谁用过或知道Knome这个公司吗?NGS(GATK) vs Sanger results
bioinformatics吐下槽Which method is better for copy number variation detection, NGS or microarray?
相关话题的讨论汇总
话题: read话题: mrfast话题: cnv话题: repeats话题: split