由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Biology版 - the Real Cost of Sequencing
相关主题
Obama's Budget Proposal Includes NIH Funding Bump, Supports Long-term Plan at NHGRIRe: Comment on draft sequence of rice
zt 去年 Gerstein 在 GB 发的牢骚文章大家有没有看看Re: Sequence
诚心请教:学习bioinformatics如何入手?老美酸溜溜的
Re: where to find whole sequence of a gene?Help! Public HIV whole genome sequences
qPCR with genomic DNA backgroundPostdoc position at University of Chicago
求助:老鼠centremere genome position和centremere seqence的数据The Neandertal Genome Sequenced.
如何做两个细菌的基因组序列和蛋白质序列的比较软件呼唤大牛
Re: Is it possible now to search human genome?跟风, 请教ILLUMINA data analysis
相关话题的讨论汇总
话题: sequencing话题: analysis话题: cost话题: data话题: genome
进入Biology版参与讨论
1 (共1页)
t*d
发帖数: 1290
1
http://www.genomeweb.com//node/980559?hq_e=el&hq_m=1103092&hq_l
----------------------------------------
A recent study by scientists at Yale University suggests that the actual
cost of sequencing may be much higher than some current estimates indicate
since those figures may not factor in the analysis costs that are necessary
for a successful sequencing project.
In the paper, published in Genome Biology last month, Yale's Mark Gerstein
and colleagues consider costs that weren’t taken into account in a survey
conducted by the National Human Genome Research Institute that pegged the
cost per genome as of March 2011 to be a little over $10,000.
Gerstein and colleagues note that the NHGRI survey, which analyzed data from
the Large-Scale Genome Sequencing Program, omitted so-called "non-
production activities," such as costs for the development of computational
tools to improve sequencing pipelines or downstream analysis; quality
assessment and quality control; technology development to improve sequencing
pipelines; management of individual sequencing projects; informatics
equipment; and downstream analyses such as sequence assembly, sequence
alignment, identifying variants, and the interpretation of results.
They estimate that the cost of downstream analysis for a whole-genome
sequencing project could add as much as $100,000 to the overall costs.
BioInform spoke with Gerstein earlier this month. What follows is an edited
version of the conversation.
Why did you conduct this analysis?
A few months ago, the [National Center for Biotechnology Information]
announced that it was potentially closing [the Short Read Archive]. That was
big news in bioinformatics because the SRA is the resting place for a lot
of sequence [data]. That precipitated a workshop that the NIH organized
afterwards on the costs associated with storing and managing data and
thinking about it in different communities such as DNA sequencing, RNA
sequencing, metagenomics, and so forth.
A Genome Biology representative was at the workshop and they asked me if I
wanted to write an opinion piece addressing these issues. That was the
genesis of this particular piece.
In your paper, you include some graphs that show that sequencing has long
since outpaced Moore's law and storage seems to be coming along nicely but
then analysis is lagging behind. Why is that the case?
I think the thing about analysis that makes it much more problematic is that
it's not a single thing that’s easily measured. There are certain analyses
that are to some degree straightforward and that have certain scaling
properties with more sequences and then there are other things that are much
less well defined. Usually the things that are fairly undefined or not as
precise to find, tend to scale much worse. Mapping the reads to the genome
would be a type of analysis that is fairly well defined and has very defined
scaling properties relative to the number of reads.
On the other hand, here's an example of something that wouldn’t scale very
well as you sequence more and more genomes: you need to interpret them and
you might want to interpret the variants in light of annotation or to
integrate variation with annotation. That is a reasonable thing to do but it
’s just much less well defined what it means. Potentially, it could involve
things that could take a lot of time and the amount of time could scale in
a very nonlinear way relative to having two genomes, five genomes, a hundred
genomes, and so forth.
Isn’t the analysis problem made worse by insufficient funds?
Historically, analysis has always been underfunded relative to data
production. I think genomics and biological science in general has
historically always emphasized data production and with good reason. Now
people are coming to the realization that the data is almost free. You can
produce a gargantuan amount of data for almost nothing and it's really
changing people’s view because previously they always saw the data as the
valuable thing and the analysis was an afterthought and easy to do. Now the
whole equation changes around. It’s easy to get the data but suddenly now
there is this whole new thing that hadn’t really been thought of before,
the analysis which is taking up this bigger place in people’s thinking
about things.
Is the message that data analysis is a necessary component of the research
process really getting out into the funding agencies?
That’s a hard question to answer. I would say that [the National Institutes
of Health] and [the National Science Foundation] certainly support
computational biology and they realize that next-generation sequencing is
putting a premium on their offices and they are certainly issuing
increasingly more [requests for applications] and programs that are pointed
more at the development of analysis tools or workflows. That said, ... it’s
not trivial being funded and ... it’s probably still considerably harder
to garner funding for bioinformatics than for clinical medicine.
How well are researchers budgeting for analysis in their grant proposals?
I think increasingly when funds are allocated in budgets for projects that
generate these datasets, part of that is for someone to do some sort of
analysis. In these things, the analysis tends to be somewhat underfunded
relative to the data production in the sense that usually, you are seeing
the person who is doing the analysis not being able to keep up with things
and I think that’s partially because the budget was written years ago and
suddenly you can generate much more data for a given dollar. Scaling isn’t
taken into account. I think also there is a historic de-emphasis on analysis
relative to data production.
In the paper, you mention that the costs for experimental setup and design
have increased. Why is that the case?
There are two aspects of [cost] going up. There is going up in the relative
sense and in an absolute sense. Clearly, as the cost of NGS goes to
essentially zero, almost by definition, the other components to doing an
experiment have to increase in relative contribution. For example, if an
experiment once cost $1 to collect samples, $1 to do the sequencing and $1
for the analysis, and the sequencing cost dropped to zero, the [relative]
cost of the other things goes up even if the absolute cost goes down.
Another aspect is because the cost of sequencing is dropping to zero and
sequencing is becoming much easier, people are now tackling much harder-to-
procure samples. Now, if you look at [an] experiment, most of it is
procuring the specimen and very little is the actual sequencing.
Now that sequencing is moving into clinics, will analysis become even more
expensive?
I think that the data reduction end of things can get commoditized and I can
easily imagine in a clinic that a lot of standard analysis would be
automatically run and I suspect that the sequencing companies would like to
incorporate that analysis into their products. Thus, the machines would not
only sequence the genome but they would automatically map [reads] against
the reference and automatically call variants. I don’t think the
interpretation and the downstream stuff would be that quickly commoditized.
Those things will remain quite expensive.
What’s the way forward? How can data analysis catch up?
I don’t know if it’s a question of catching up. I think it’s just that
the world has changed and it’s just become much, much easier to procure
sequencing data and that the cost structure of a lot of things is going to
fundamentally change.
b*******7
发帖数: 29
2
葛思定就是一大忽悠

necessary

【在 t*d 的大作中提到】
: http://www.genomeweb.com//node/980559?hq_e=el&hq_m=1103092&hq_l
: ----------------------------------------
: A recent study by scientists at Yale University suggests that the actual
: cost of sequencing may be much higher than some current estimates indicate
: since those figures may not factor in the analysis costs that are necessary
: for a successful sequencing project.
: In the paper, published in Genome Biology last month, Yale's Mark Gerstein
: and colleagues consider costs that weren’t taken into account in a survey
: conducted by the National Human Genome Research Institute that pegged the
: cost per genome as of March 2011 to be a little over $10,000.

1 (共1页)
进入Biology版参与讨论
相关主题
跟风, 请教ILLUMINA data analysisqPCR with genomic DNA background
mouse gene search question求助:老鼠centremere genome position和centremere seqence的数据
做genomic sequencing有前途么?大家给说说如何做两个细菌的基因组序列和蛋白质序列的比较软件
paper help!Re: Is it possible now to search human genome?
Obama's Budget Proposal Includes NIH Funding Bump, Supports Long-term Plan at NHGRIRe: Comment on draft sequence of rice
zt 去年 Gerstein 在 GB 发的牢骚文章大家有没有看看Re: Sequence
诚心请教:学习bioinformatics如何入手?老美酸溜溜的
Re: where to find whole sequence of a gene?Help! Public HIV whole genome sequences
相关话题的讨论汇总
话题: sequencing话题: analysis话题: cost话题: data话题: genome