用awk和sed快速将fasta格式的序列改成一行显示

Some time when you want to change the fasta seq into one line, just as following. Then it will help you to do other process.
I know that perl and other script will do that, however, I would introduce two simple and fast way do achieve that with awk and sed.


> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
paxpaxpax
pax

> sq1 foofoofoobarfoofoofoo
> sq2 quxquxquxbarquxquxquxbarquxx
> sq3 paxpaxpaxpax

For awk:

awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' YourFile

For sed:

sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' YourFile

Today, I want to extract contig which is more 500bp from my aseembly result, So I do that as following:

sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' |awk '{if (length($5)>500 ) print ">contig-"FNR"\n"$5}'

  • 本文由 整理发表
  • 网站部分文章源自互联网,若未正确标注来源,请联系管理员更新。文章转载,请务必保留本文链接

您必须才能发表评论!

评论:1   其中:访客  0   博主  0   引用   1