fastq2fasta by sed

  • A+
所属分类:Script

I've been having fun with sed this morning. I'm testing using Illumina data as input to Newbler, just to see how it works. As the data is paired-end, it requires some pre-processing following the guidelines for Sanger reads in the manual for gsAssembler (A.K.A newbler). Yesterday I did this using the method proposed in the manual with a final cleanup using sed so the procedure was fastq-->fasta-->perl script to make .acc file --> use fnafile to produce fasta file --> final tidy using sed. That was all messy and horribly slow. So today I decided to do all of those steps using sed.

Here's the end result:

1) Convert fastq2fasta using sed

sed '/^@/!d;s//>/;N' asp5_leaf_read1.fq > asp5_leaf_read1.fna
sed '/^@/!d;s//>/;N' asp5_leaf_read2.fq > asp5_leaf_read2.fna

2) Format fasta header so newbler can match the pairs up

sed -e 's/>\(.*\)#0\/\(.*\)/>\1 template=\1 dir=\2 library=asp5/' -e 's/dir=1/dir=f/' asp5_leaf_read1.fna > asp5_leaf_read1_newbler.fna
sed -e 's/>\(.*\)#0\/\(.*\)/>\1 template=\1 dir=\2 library=asp5/' -e 's/dir=2/dir=r/' asp5_leaf_read2.fna > asp5_leaf_read1_newbler.fna

3) Produce qual files with read names matching the fasta files

sed '/^+/!d;s//>/;N' asp5_leaf_read1.fq | sed 's/>\(.*\)#0\/\(.*\)/>\1/' > asp5_leaf_read1_newbler.qual
sed '/^+/!d;s//>/;N' asp5_leaf_read2.fq | sed 's/>\(.*\)#0\/\(.*\)/>\1/' > asp5_leaf_read2_newbler.qual

Those qual values will then need rescaling to phred33 .... still to do. Steps 1 and 2 can also be piped if you don't want to keep the unmodified .fna files (but I did - for testing Inchworm). I also need to check whether to reverse compliment the reads for either paired-end or mate-pair data.

Those steps took about 15 mins to complete. The attempt yesterday took over six hours. Thank you sed!

ybzhao

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

目前评论:2   其中:访客  2   博主  0

    • Dan B Dan B 0

      I was hoping for a Unix command, so I would avoid having to go through a perl script; I suppose I could do a ‘tr’.
      What about the first two questions?

        • ybzhao ybzhao 1

          @Dan B hi, ‘asp5′ and ‘leaf’ is just a filename without special means.

          As for the first question, I did not understand it well. I am sorry.