推荐一篇关于真核基因组注释方法与流程的文章

评论6,530

测一个未知基因组(de nove sequence),要进行测序、拼接及注释。关于测序仪和拼接软件已经讲的很多了,很少有关于基因组注释的文章。一篇最近在Nature Review Genetics上的文章,A beginner’s guide to eukaryotic genome annotation,非常详细地讲解了如何做基因组注释,是一篇非常好的入门文章。

基因组拼接好后,一般要先进行重复序列的检测和注释,然后mask掉这些重复序列,再进行编码基因的预测(有时候也预测非编码RNA),最后一步是整合。因为要通过不同的方法和参考来源来预测,会得到不同的结果,整合时综合考虑预测错误和可变剪接,得到可靠的注释,这一步要一个个手工检测。

有很多软件可以做注释(可见文章内的列表),主要分为ab initio和evidence-driven两种预测方法。

现在RNA-seq技术也很成熟了,一般都是在测基因组时也要做RNA-seq,这些RNA-seq既可用于分析基因的表达,也是非常好的基因注释的参考资源。

A beginner's guide to eukaryotic genome annotation

Mark Yandell & Daniel Ence

Nature Reviews Genetics 13, 329-342 (May 2012) | doi:10.1038/nrg3174

The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.

全文链接:http://www.nature.com/nrg/journal/v13/n5/full/nrg3174.html

发表评论

匿名网友