# FastQ格式介绍

FastQ数据格式

### 1.序列名称：

对于每一条FastQ序列，都有一个可以唯一标示的序列名称，如下：

@HWUSI-EAS100R:6:73:941:1973#0/1


HWUSI-EAS100R
the unique instrument name

6
flowcell lane

73
tile number within the flowcell lane

941
'x'-coordinate of the cluster within the tile

1973
'y'-coordinate of the cluster within the tile

#0
index number for a multiplexed sample (0 for no indexing)

/1
the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

Versions of the Illumina pipeline since 1.4 appear to use #NNNNNN instead of #0 for the multiplex ID, where NNNNNN is the sequence of the multiplex tag.
With Casava 1.8 the format of the '@' line has changed:

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG


EAS139
the unique instrument name

136
the run id

FC706VJ
the flowcell id

2
flowcell lane

2104
tile number within the flowcell lane

15343
'x'-coordinate of the cluster within the tile

197393
'y'-coordinate of the cluster within the tile

1
the member of a pair, 1 or 2 (paired-end or mate-pair reads only)

Y

18
0 when none of the control bits are on, otherwise it is an even number

ATCACG
index sequence

2、质量值：对于每一条序列，其每一个碱基都有一个对应的测序质量值：

Phred quality scores Q are defined as a property which is logarithmically related to the base-calling error probabilities P.
Q=-10 log10P
Phred quality scores are logarithmically linked to error probabilities

Phred Quality Score
Probability of incorrect base call
Base call accuracy

10
1 in 10
90 %

20
1 in 100
99 %

30
1 in 1000
99.9 %

40
1 in 10000
99.99 %

50
1 in 100000
99.999 %

The Solexa pipeline (i.e., the software delivered with the Illumina Genome Analyzer) earlier used a different mapping, encoding the odds p/(1-p) instead of the probability p:

Although both mappings are asymptotically identical at higher quality values, they differ at lower quality levels (i.e., approximately p > 0.05, or equivalently, Q < 13).


• 本文由 整理发表
• 网站部分文章源自互联网，若未正确标注来源，请联系管理员更新。文章转载，请务必保留本文链接