The format for GenBank Accession numbers are:
GenBank Accession numbers命名的规则是:
| Nucleotide: | 1 letter + 5 numerals OR 2 letters + 6 numerals 1个字母+5个数字 或 2个字母+6位数字 |
| Protein: | 3 letters + 5 numerals 3个字母+5位数字 |
| WGS: | 4 letters + 2 numerals for WGS assembly version + 6-8 numerals 4个字母+2位数字+WGS的版本+6-8位数字 |
| MGA: | 5 letters + 7 numerals 5个字母+7位数字 |
Accession号前缀在各个数据库的分布:
Nucleotide Accession Prefixes (核酸序列的前缀)
| Prefix | Database | Type |
|---|---|---|
| BA,DF,DG | DDBJ | CON division |
| AN | EMBL | CON division |
| CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL | NCBI | CON division |
| C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS | DDBJ | EST |
| F | EMBL | EST |
| H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GO | GenBank | EST |
| D,AB | DDBJ | Direct submissions |
| V,X,Y,Z,AJ,AM, FM | EMBL | Direct submissions |
| U,AF,AY,DQ,EF, EU,FJ,GQ | GenBank | Direct submissions |
| AP | DDBJ | Genome project data |
| BS | DDBJ | Chimpanzee genome data |
| AL,BX,CR,CT, CU | EMBL | Genome project data |
| AE,CP,CY | GenBank | Genome project data |
| AG,DE,DH,FT | DDBJ | GSS |
| B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI | GenBank | GSS |
| AK | DDBJ | cDNA projects |
| AC,DP | GenBank | HTGS |
| E,BD,DD,DI,DJ, DL,DM,FU | DDBJ | Patents |
| A,AX,CQ,CS,FB, GM,GN | EMBL | Patents (nucleotide only) |
| I,AR,DZ,EA,GC, GP | GenBank | Patents (nucleotide) |
| G,BV,GF | GenBank | STS |
| BR | DDBJ | TPA |
| BN | EMBL | TPA |
| EZ | GenBank | TSA |
| S | GenBank | From journal scanning |
| AD | GenBank | From GSDB |
| AH | GenBank | Segmented set header |
| AS | GenBank | Other - not currently being used |
| BC | GenBank | MGC project |
| BK | GenBank | TPA |
| BL,GJ,GK | GenBank | TPA CON division |
| BT | GenBank | FLI-cDNA projects |
| J,K,L,M | GenBank | from GSDB direct submissions |
| N | GenBank and DDBJ | N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs |
| AAAA-AZZZ | GenBank | WGS |
| BAAA-BZZZ | DDBJ | WGS |
| CAAA-CZZZ | EMBL | WGS |
| DAAA-DZZZ | GenBank | WGS TPA |
| AAAAA-AZZZZ | DDBJ | MGA |
Protein Accession Prefixes (蛋白序列的前缀)
| Prefix | Database | Type |
|---|---|---|
| BAA-BZZ | DDBJ | Protein ID |
| CAA-CZZ | EMBL | Protein ID |
| AAA-AZZ | GenBank | Protein ID |
| AAE | GenBank | Protein ID for Patents (note that there are also some patent proteins with AAA and AAC |
| FAA_FZZ | DDBJ | TPA Protein ID |
| DAA-DZZ | GenBank | TPA Protein ID |
| GAA-GZZ | DDBJ | WGS Protein ID |
| EAA-EZZ | GenBank | WGS Protein ID |
| HAA-HZZ | GenBank | TPA WGS Protein ID |
| O | Swiss-Prot | Protein |
| P | Swiss-Prot (Geneva) | Protein |
| Q | Swiss-Prot (Hinxton) | Protein |
NCBI RefSeq命名格式的详细说明:https://www.plob.org/2012/02/24/3711.html

