使用ABySS进行基因组组装

1. ABySS的安装

安装google-sparsehash

$ sudo rpm -ivh http://sparsehash.googlecode.com/files/sparsehash-2.0.2-1.noarch.rpm
$ wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.7/abyss-1.3.7.tar.gz
$ tar zxf abyss-1.3.7.tar.gz
$ cd abyss-1.3.7/
$ ./configure --prefix=/opt/biosoft/abyss-1.3.7/ && make -j 4 && make install

默认下ABySS支持的最大Kmer是64，在 configure 中使用 --enable-maxk=96 改变软件所能支持的最大 kmer 长度。

$ make
$ make install
$ cd ..
$ rm abyss-1.3.7/ -rf
$ echo 'PATH=$PATH:/opt/biosoft/abyss-1.3.7/bin/' >> ~/.bashrc
$ source ~/.bashrc

2. ABySS的使用

使用 ABYSS 命令能将 short reads 组装成 contigs。而如果需要组装成 scaffolds，则需要使用 abyss-pe 命令。

2.1 使用 ABYSS 将 short reads 组装成 contigs

使用如下命令查阅 ABYSS 命令的说明：

$ ABYSS --help

或者

$ less /opt/biosoft/abyss-1.3.7/share/man/man1/ABYSS.1

简要使用方法：

$ ABYSS -k 31 -o 31_contigs.fa read1.fastq reads2.fastq

注意事项： -k 和 -o 参数是必须参数；输入文件可以有多个。

主要使用参数：

--chastity
 去除污染的reads，这是默认选项。
--no-chastity
 不去除污染的reads。
--trim-masked
 从序列某端去除低质量的碱基，这是默认选项。
--no-trim-masked
 不从序列末端去除低质量碱基。
-q | --trim-quality=N
 从序列尾端去除碱基质量低于此值的碱基。
--standard-quality
 碱基质量格式为 phred33 ，这是默认选项。
--illumina-quality
 碱基质量格式为 phred64 。
-o | --out=FILE
 输出的 contigs 文件的文件名。
-k | --kmer=N
 k-mer 长度。
-t | --trim-length=N
 maximum length of dangling edges to trim
-c | --coverage=FLOAT
 去除 k-mer 覆盖读低于此值的 contigs 。
-b | --bubbles=N
 pop bubbles shorter than N bp [3*k]
-b0 | --no-bubbles
 do not pop bubbles

2.1 使用 abyss-pe 将 short reads 组装成 scaffolds

使用如下命令查阅 ABYSS 命令的说明：

$ abyss-pe --help

或者

$ less /opt/biosoft/abyss-1.3.7/share/man/man1/abyss-pe.1

几种使用示例：

1 个 paired-end 文库：

$ abyss-pe k=64 name=ecoli in='reads1.fa reads2.fa'

多个 paired-end 文库：

$ abyss-pe k=64 name=ecoli lib='lib1 lib2' lib1='lib1_1.fa lib1_2.fa' lib2='lib2_1.fa lib2_2.fa' se='se1.fa se2.fa'

paired-end 和 mate-pair 文库：

$ abyss-pe k=64 name=ecoli lib='pe1 pe2' mp='mp1 mp2' pe1='pe1_1.fa pe1_2.fa' pe2='pe2_1.fa pe2_2.fa' mp1='mp1_1.fa mp1_2.fa' mp2='mp2_1.fa mp2_2.fa' se='se1.fa se2.fa'

使用 RNA-Seq 的组装结果进行 rescaffolding ：

$ abyss-pe k=64 name=ecoli lib=pe1 mp=mp1 long=long1 pe1='pe1_1.fa pe1_2.fa' mp1='mp1_1.fa mp1_2.fa' long1=long1.fa

使用 MPI ：

abyss-pe np=8 k=64 name=ecoli in='reads1.fa reads2.fa'

使用集群：

qsub -N ecoli -t 64 -pe openmpi 8 abyss-pe n=10 in='reads1.fa reads2.fa'

使用多个 k 值进行基因组组装，再寻找最佳 k 值：

$ export k
$ for k in {20..40}; do
$ mkdir k$k
$ abyss-pe -C k$k name=ecoli in=../reads.fa
$ done
$ abyss-fac k*/ecoli-contigs.fa

原文来自：http://www.chenlianfu.com/?p=2109

热门搜索

发表评论