DDBJ/EMBL/GenBank Accession的命名规则

评论3,502

The format for GenBank Accession numbers are:

GenBank Accession numbers命名的规则是:

Nucleotide:1 letter + 5 numerals OR 2 letters + 6 numerals 1个字母+5个数字 或 2个字母+6位数字
Protein:3 letters + 5 numerals 3个字母+5位数字
WGS:4 letters + 2 numerals for WGS assembly version + 6-8 numerals 4个字母+2位数字+WGS的版本+6-8位数字
MGA:5 letters + 7 numerals 5个字母+7位数字

Accession号前缀在各个数据库的分布:

Nucleotide Accession Prefixes (核酸序列的前缀)

PrefixDatabaseType
BA,DF,DGDDBJCON division
ANEMBLCON division
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GLNCBICON division
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FSDDBJEST
FEMBLEST
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DY,EB,EC, EE,EG,EH,EL, ES,EV,EW,EX, EY,FC,FD,FE, FF,FG,FK,FL, GD,GE,GH,GOGenBankEST
D,ABDDBJDirect submissions
V,X,Y,Z,AJ,AM, FMEMBLDirect submissions
U,AF,AY,DQ,EF, EU,FJ,GQGenBankDirect submissions
APDDBJGenome project data
BSDDBJChimpanzee genome data
AL,BX,CR,CT, CUEMBLGenome project data
AE,CP,CYGenBankGenome project data
AG,DE,DH,FTDDBJGSS
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FIGenBankGSS
AKDDBJcDNA projects
AC,DPGenBankHTGS
E,BD,DD,DI,DJ, DL,DM,FUDDBJPatents
A,AX,CQ,CS,FB, GM,GNEMBLPatents (nucleotide only)
I,AR,DZ,EA,GC, GPGenBankPatents (nucleotide)
G,BV,GFGenBankSTS
BRDDBJTPA
BNEMBLTPA
EZGenBankTSA
SGenBankFrom journal scanning
ADGenBankFrom GSDB
AHGenBankSegmented set header
ASGenBankOther - not currently being used
BCGenBankMGC project
BKGenBankTPA
BL,GJ,GKGenBankTPA CON division
BTGenBankFLI-cDNA projects
J,K,L,MGenBankfrom GSDB direct submissions
NGenBank and DDBJN0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs
AAAA-AZZZGenBankWGS
BAAA-BZZZDDBJWGS
CAAA-CZZZEMBLWGS
DAAA-DZZZGenBankWGS TPA
AAAAA-AZZZZDDBJMGA

Protein Accession Prefixes (蛋白序列的前缀)

PrefixDatabaseType
BAA-BZZDDBJProtein ID
CAA-CZZEMBLProtein ID
AAA-AZZGenBankProtein ID
AAEGenBankProtein ID for Patents (note that there are also some patent proteins with AAA and AAC
FAA_FZZDDBJTPA Protein ID
DAA-DZZGenBankTPA Protein ID
GAA-GZZDDBJWGS Protein ID
EAA-EZZGenBankWGS Protein ID
HAA-HZZGenBankTPA WGS Protein ID
OSwiss-ProtProtein
PSwiss-Prot (Geneva)Protein
QSwiss-Prot (Hinxton)Protein

NCBI RefSeq命名格式的详细说明:https://www.plob.org/2012/02/24/3711.html

发表评论

匿名网友