# color space的测序数据

@SRR2967009.1 100_1000_1168_F3

T10011023211201220121202030102221012302121010131001

2@@@@>@?@@@@<@@//;@@/@9?@8@=@@@6;6@66;<@6@67?2?;/@

@SRR2967009.2 100_1000_1211_F3

T20132312201120021312220200023110220113100012321011

@@@@@@@@@<@@@@@@@@@@@@@@@@@@@@@@?@@@@/?@@@@@@@@

for ((i=7009;i<7014;i )) ;do wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP066/SRP066824/SRR296$i/SRR296$i.sra;done

ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/abi-dump \$id;done

### SOLiD native (CSFASTA/QUAL)

All SRA data can be output into color space data. The utility ‘abi-dump’ can be used to output CSFASTA and QUAL data files (with appropriate options, fastq-dump can be used to output “CSFASTQ” format).

SHRiMP，sequel和BFAST 都可以来比对fastq格式的color space的数据，或者直接从 (.csfasta & .qual) 这两个文件开始处理，其实bowtie也可以的。

https://wikis.utexas.edu/display/bioiteam/BFAST

fastqc软件直接处理csfastq格式数据结果如下：

Sequencer reads have a chance of read error (e.g. spot misidentification), combined with a chance of sequence error (e.g. polymerase misread in the PCR step).
For sequencers that output in base space, both these errors have a similar effect on the base-space mapping.

For sequencers that output in color-space, the read errors result in a somewhat unexpected base-space translation even if the underlying sequence has a perfect match to the reference.

The issues relating to color-space to base-space translation were discussed in the thread you linked to, but here’s my take on it (dumped from an email I recently sent to someone else):A color-space sequence is an encoding of adjacent dimers such that unchanging bases are encoded with ‘0’, complementary changes are encoded with ‘3’, the colour ‘1’ is used for a non-complementary base change on the same side of the alphabet (AC, CA, GT, or TG), and the colour ‘2’ is used for a non-complementary base change on a different side of the alphabet (AG, GA, CT, or TC). A table of these changes can be found here:

http://www.ploscompbiol.org/article/…i.1000386.g002

This has a few nice properties (e.g. the reverse-complement of a color-space sequence is the same as the reverse of the color-space sequence, a SNP will have two transitions), but many annoying and nasty properties.

The first is that a color-space sequence in itself is meaningless without a base reference (usually the starting base).

