Our earlier commentary on SOAPdenovo2 provided a quick overview of an assembly. In the next few commentaries, we will cover various components of the assembly step-by-step, and then in the final commentary, we will present few comparative results to estimate quality of the assembly.
Today, we will look into the first step of the assembly – ‘preparing pregraph’. SOAPdenovo2 gives two options for completing this task. You can either use the ‘pregraph’ tool in SOAPdenovo2 by running the command ‘./SOAPdenovo-63mer-v2.04.3 pregraph [options]‘, or you can use the separate program ‘pregraph_sparse_63mer.5.1′ that comes with the package. Although we do not know whole lot about how the programs are written, based on their names, we suspect that the ‘pregraph sparse’ attempts to save memory by using sparse matrix for storing the graph. In the same vein, ABySS also uses google sparsehash to improve memory performance compared to Velvet. That way, one can run assembly with longer k-mer sizes.
The commands for running the pregraph program is shown above. All parameters except ‘-p 32′ are essential. Additionally, we skipped few parameters and let pregraph program use the default numbers. The format of config file is given in our previous commentary on SOAPdenovo2.
Input library: As we mentioned, the input libraries had a total 600M 99mer reads.
Time to run: The pregraph program took three hours to run in a server with 32 64 bit AMD processors. The server was busy with another large task from a different user. So, pregraph program may take less time for you.
RAM: The pregraph program used 46G of RAM.
Output: The program produced 11 output files. We are going through them one by one to describe what they contain [This post will be updated].
outPG.ht_content (30G), outPG.ht_idx (152M), outPG.edge.gz (332M) – these three are binary files and the main output files generated by pregraph program.
outPG.sparse.edge (FASTA, 1.3G), outPG.preArc (FASTA, 215M), outPG.vertex (78M) – these three are text files with additional information on the edges and nodes.
outPG.kmerFreq – small text file with statistical information on kmer frequency.
outPG.preGraphBasic – tiny text file with number of edges, vertices, etc.
edge_num_stat.txt, edge_len_stat.txt, edge_cvg_stat.txt – these three are another three text files with various stats.