fastq2fasta by sed

21,901

I've been having fun with sed this morning. I'm testing using Illumina data as input to Newbler, just to see how it works. As the data is paired-end, it requires some pre-processing following the guidelines for Sanger reads in the manual for gsAssembler (A.K.A newbler). Yesterday I did this using the method proposed in the manual with a final cleanup using sed so the procedure was fastq-->fasta-->perl script to make .acc file --> use fnafile to produce fasta file --> final tidy using sed. That was all messy and horribly slow. So today I decided to do all of those steps using sed.

Here's the end result:

1) Convert fastq2fasta using sed

sed '/^@/!d;s//>/;N' asp5_leaf_read1.fq > asp5_leaf_read1.fna
sed '/^@/!d;s//>/;N' asp5_leaf_read2.fq > asp5_leaf_read2.fna

2) Format fasta header so newbler can match the pairs up

sed -e 's/>\(.*\)#0\/\(.*\)/>\1 template=\1 dir=\2 library=asp5/' -e 's/dir=1/dir=f/' asp5_leaf_read1.fna > asp5_leaf_read1_newbler.fna
sed -e 's/>\(.*\)#0\/\(.*\)/>\1 template=\1 dir=\2 library=asp5/' -e 's/dir=2/dir=r/' asp5_leaf_read2.fna > asp5_leaf_read1_newbler.fna

3) Produce qual files with read names matching the fasta files

sed '/^+/!d;s//>/;N' asp5_leaf_read1.fq | sed 's/>\(.*\)#0\/\(.*\)/>\1/' > asp5_leaf_read1_newbler.qual
sed '/^+/!d;s//>/;N' asp5_leaf_read2.fq | sed 's/>\(.*\)#0\/\(.*\)/>\1/' > asp5_leaf_read2_newbler.qual

Those qual values will then need rescaling to phred33 .... still to do. Steps 1 and 2 can also be piped if you don't want to keep the unmodified .fna files (but I did - for testing Inchworm). I also need to check whether to reverse compliment the reads for either paired-end or mate-pair data.

Those steps took about 15 mins to complete. The attempt yesterday took over six hours. Thank you sed!

  • 本文由
  • 转载请务必保留本文链接:https://www.plob.org/article/54.html

发表评论

匿名网友 填写信息

评论:2   其中:访客  2   博主  0
    • Dan B 0

      I was hoping for a Unix command, so I would avoid having to go through a perl script; I suppose I could do a ‘tr’.
      What about the first two questions?

        • ybzhao

          @ Dan B hi, ‘asp5′ and ‘leaf’ is just a filename without special means.

          As for the first question, I did not understand it well. I am sorry.