利用BioJava从一条序列中得到子序列

  • A+
所属分类:Script

给定一条序列,我们也许只关心前10个碱基或者我们想得到序列中的一段。你也许想打印一条子序列至输出流,例如STDOUT。如何做到这些?
Biojava使用生物学坐标系统识别碱基。第一个碱基索引为1,最后一个碱基索引为序列长度。注意这里和计算机中字串的索引不同(以零开始)。如果你的读取超过了1到序列长度的范围,会抛出异常。

获取子序列

SymbolList symL = null;

//code here to generate a SymbolList

//get the first Symbol
Symbol sym = symL.symbolAt(1);

//get the first three bases
SymbolList symL2 = symL.subList(1,3);

//get the last three bases
SymbolList symL3 = symL.subList(symL.length() - 3, symL.length());

Printing a Sub - Sequence

//print the last three bases of a SymbolList or Sequence
String s = symL.subStr(symL.length() - 3, symL.length());
System.out.println(s);

Complete Listing

import org.biojava.bio.seq.*;
import org.biojava.bio.symbol.*;

public class SubSequencing {
public static void main(String[] args) {
SymbolList symL = null;

//generate an RNA SymbolList
try {
symL = RNATools.createRNA("auggcaccguccagauu");
}
catch (IllegalSymbolException ex) {
ex.printStackTrace();
}

//get the first Symbol
Symbol sym = symL.symbolAt(1);

//get the first three bases
SymbolList symL2 = symL.subList(1,3);

//get the last three bases
SymbolList symL3 = symL.subList(symL.length() - 3, symL.length());

//print the last three bases
String s = symL.subStr(symL.length() - 3, symL.length());
System.out.println(s);
}
}

avatar

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: