Home  |  Contact



         UniProtKB/Swiss-Prot protein knowledgebase release 2020_06 statistics





1.  INTRODUCTION



Release 2020_06 of 02-Dec-20 of UniProtKB/Swiss-Prot contains 563972 sequence entries,

comprising 203185243 amino acids abstracted from 275921 references. 



430 sequences have been added since release 2020_05, the sequence data of

38 existing entries has been updated and the annotations of

460431 entries have been revised.



Number of fragments: 9235

Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 40403





Protein existence (PE):           entries     %



1: Evidence at protein level       105663   18.7%

2: Evidence at transcript level     56279     10%

3: Inferred from homology          386875   68.6%

4: Predicted                        13311    2.4%

5: Uncertain                         1844    0.3%



The growth of the database is summarized below.



   





2.  TAXONOMIC ORIGIN



   Total number of species represented in this release of UniProtKB/Swiss-Prot: 13984



   The first twenty species represent 121563 sequences:  21.6 % of the total

   number of entries.





   2.1 Table of the frequency of occurrence of species



        Species represented 1x: 5724

                            2x: 2019

                            3x: 1088

                            4x:  715

                            5x:  514

                            6x:  418

                            7x:  318

                            8x:  256

                            9x:  231

                           10x:  143

                       11- 20x:  809

                       21- 50x:  471

                       51-100x:  223

                         >100x: 1055





   2.2  Table of the most represented species



  ------  ---------  --------------------------------------------

  Number  Frequency  Species

  ------  ---------  --------------------------------------------

       1      20394  Homo sapiens (Human)

       2      17056  Mus musculus (Mouse)

       3      16031  Arabidopsis thaliana (Mouse-ear cress)

       4       8114  Rattus norvegicus (Rat)

       5       6721  Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)

       6       6013  Bos taurus (Bovine)

       7       5140  Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)

       8       4518  Escherichia coli (strain K12)

       9       4191  Bacillus subtilis (strain 168)

      10       4169  Caenorhabditis elegans

      11       4150  Dictyostelium discoideum (Slime mold)

      12       4101  Oryza sativa subsp. japonica (Rice)

      13       3620  Drosophila melanogaster (Fruit fly)

      14       3459  Xenopus laevis (African clawed frog)

      15       3194  Danio rerio (Zebrafish) (Brachydanio rerio)

      16       2296  Gallus gallus (Chicken)

      17       2238  Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)

      18       2218  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)

      19       2042  Escherichia coli O157:H7

      20       1898  Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh)

      21       1802  Salmonella typhimurium (strain LT2 / SGSC1412 / ATCC 700720)

      22       1787  Methanocaldococcus jannaschii  

      23       1707  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)

      24       1705  Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)

      25       1697  Escherichia coli O6:H1 (strain CFT073 / ATCC 700928 / UPEC)

      26       1685  Shigella flexneri

      27       1438  Sus scrofa (Pig)

      28       1394  Pseudomonas aeruginosa 

      29       1347  Salmonella typhi

      30       1244  Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97)

      31       1174  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)

      32       1079  Synechocystis sp. (strain PCC 6803 / Kazusa)

      33       1035  Archaeoglobus fulgidus 

      34       1026  Yersinia pestis

      35       1014  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)

      36        986  Vibrio cholerae serotype O1 (strain ATCC 39315 / El Tor Inaba N16961)

      37        972  Emericella nidulans  

      38        941  Staphylococcus aureus (strain Mu50 / ATCC 700699)

      39        930  Salmonella paratyphi A (strain ATCC 9150 / SARB42)

      40        929  Staphylococcus aureus (strain N315)

      41        928  Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)  

      42        919  Kluyveromyces lactis   

      43        909  Acanthamoeba polyphaga mimivirus (APMV)

      44        903  Staphylococcus aureus (strain COL)

      45        896  Staphylococcus aureus (strain MW2)

      46        894  Oryctolagus cuniculus (Rabbit)

      47        894  Escherichia coli O6:K15:H31 (strain 536 / UPEC)

      48        890  Staphylococcus aureus (strain MSSA476)

      49        888  Staphylococcus aureus (strain MRSA252)

      50        883  Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)

      51        882  Salmonella choleraesuis (strain SC-B67)

      52        879  Shigella sonnei (strain Ss046)

      53        878  Candida glabrata   

      54        875  Neurospora crassa 

      55        863  Yersinia pseudotuberculosis serotype I (strain IP32953)

      56        848  Oryza sativa subsp. indica (Rice)

      57        841  Escherichia coli O9:H4 (strain HS)

      58        834  Escherichia coli O139:H28 (strain E24377A / ETEC)

      59        833  Zea mays (Maize)

      60        833  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 

      61        831  Canis lupus familiaris (Dog) (Canis familiaris)

      62        829  Shigella boydii serotype 4 (strain Sb227)

      63        825  Escherichia coli (strain UTI89 / UPEC)

      64        822  Shigella dysenteriae serotype 1 (strain Sd197)

      65        819  Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)

      66        804  Streptomyces coelicolor (strain ATCC BAA-471 / A3(2) / M145)

      67        803  Pectobacterium atrosepticum (strain SCRI 1043 / ATCC BAA-672) 

      68        800  Staphylococcus aureus (strain NCTC 8325 / PS 47)

      69        793  Vibrio parahaemolyticus serotype O3:K6 (strain RIMD 2210633)

      70        791  Escherichia coli (strain SMS-3-5 / SECEC)

      71        787  Aquifex aeolicus (strain VF5)

      72        772  Pasteurella multocida (strain Pm70)

      73        771  Escherichia coli (strain K12 / DH10B)

      74        771  Escherichia coli O127:H6 (strain E2348/69 / EPEC)

      75        765  Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)

      76        765  Escherichia coli (strain K12 / MC4100 / BW2952)

      77        762  Escherichia coli (strain 55989 / EAEC)

      78        761  Escherichia coli O8 (strain IAI1)

      79        760  Shigella flexneri serotype 5b (strain 8401)

      80        760  Staphylococcus epidermidis (strain ATCC 12228 / FDA PCI 1200)

      81        760  Staphylococcus epidermidis (strain ATCC 35984 / RP62A)

      82        758  Escherichia coli O45:K1 (strain S88 / ExPEC)

      83        756  Bacillus anthracis

      84        756  Escherichia coli (strain SE11)

      85        753  Escherichia coli O7:K1 (strain IAI39 / ExPEC)

      86        748  Escherichia coli O157:H7 (strain EC4115 / EHEC)

      87        748  Photorhabdus laumondii subsp. laumondii (strain DSM 15139 / CIP 105565 / TT01)

      88        742  Bacillus halodurans 

      89        739  Yersinia enterocolitica serotype O:8 / biotype 1B (strain NCTC 13174 / 8081)

      90        733  Vibrio vulnificus (strain CMCP6)

      91        731  Escherichia coli O81 (strain ED1a)

      92        725  Pseudomonas putida (strain ATCC 47054 / DSM 6125 / NCIMB 11950 / KT2440)

      93        722  Salmonella enteritidis PT4 (strain P125109)

      94        718  Vibrio vulnificus (strain YJ016)

      95        716  Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)

      96        715  Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)

      97        715  Enterobacter sp. (strain 638)

      98        715  Yersinia pestis bv. Antiqua (strain Nepal516)

      99        714  Escherichia coli O1:K1 / APEC

     100        714  Salmonella paratyphi A (strain AKU_12601)

     101        713  Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)

     102        713  Salmonella newport (strain SL254)

     103        713  Salmonella agona (strain SL483)

     104        712  Salmonella schwarzengrund (strain CVM19633)

     105        711  Yersinia pestis bv. Antiqua (strain Antiqua)

     106        710  Salmonella heidelberg (strain SL476)

     107        702  Salmonella dublin (strain CT_02021853)

     108        698  Klebsiella pneumoniae (strain 342)

     109        698  Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)

     110        697  Nostoc sp. (strain PCC 7120 / SAG 25.82 / UTEX 2576)

     111        695  Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)

     112        692  Pan troglodytes (Chimpanzee)

     113        686  Mycoplasma pneumoniae (strain ATCC 29342 / M129)

     114        684  Salmonella gallinarum (strain 287/91 / NCTC 13346)

     115        682  Escherichia coli

     116        680  Pseudomonas syringae pv. tomato (strain ATCC BAA-871 / DC3000)

     117        678  Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)

     118        677  Staphylococcus aureus (strain USA300)

     119        672  Serratia proteamaculans (strain 568)

     120        669  Mycobacterium leprae (strain TN)

     121        668  Bacillus cereus 

     122        667  Yersinia pestis (strain Pestoides F)

     123        664  Bradyrhizobium diazoefficiens 

     124        664  Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)

     125        657  Sinorhizobium fredii (strain NBRC 101917 / NGR234)

     126        653  Debaryomyces hansenii   

     127        650  Agrobacterium fabrum (strain C58 / ATCC 33970) (Agrobacterium tumefaciens 

     128        650  Shewanella oneidensis (strain MR-1)

     129        643  Staphylococcus aureus (strain bovine RF122 / ET3-1)

     130        642  Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)

     131        641  Yersinia pseudotuberculosis serotype O:3 (strain YPIII)

     132        634  Yersinia pseudotuberculosis serotype IB (strain PB1/+)

     133        622  Methanothermobacter thermautotrophicus  

     134        622  Treponema pallidum (strain Nichols)

     135        622  Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)

     136        618  Listeria monocytogenes serovar 1/2a (strain ATCC BAA-679 / EGD-e)

     137        615  Xanthomonas campestris pv. campestris 

     138        614  Staphylococcus haemolyticus (strain JCSC1435)

     139        613  Mesorhizobium japonicum  (Mesorhizobium loti 

     140        608  Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)

     141        603  Pseudomonas aeruginosa (strain UCBPP-PA14)

     142        603  Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)

     143        603  Listeria innocua serovar 6a (strain ATCC BAA-680 / CLIP 11262)

     144        602  Photobacterium profundum (strain SS9)

     145        602  Staphylococcus saprophyticus subsp. saprophyticus 

     146        601  Salmonella paratyphi C (strain RKS4594)

     147        600  Yersinia pestis bv. Antiqua (strain Angola)

     148        595  Bacillus cereus (strain ATCC 10987 / NRS 248)

     149        591  Pectobacterium carotovorum subsp. carotovorum (strain PC1)

     150        584  Rickettsia prowazekii (strain Madrid E)

     151        583  Neisseria meningitidis serogroup B (strain MC58)

     152        579  Caenorhabditis briggsae

     153        579  Brucella suis biovar 1 (strain 1330)

     154        574  Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)

     155        573  Caulobacter vibrioides (strain ATCC 19089 / CB15) (Caulobacter crescentus)

     156        573  Aliivibrio fischeri (strain ATCC 700601 / ES114) (Vibrio fischeri)

     157        572  Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS) 

     158        569  Bacillus thuringiensis subsp. konkukian (strain 97-27)

     159        568  Helicobacter pylori (strain J99 / ATCC 700824) (Campylobacter pylori J99)

     160        567  Pseudomonas syringae pv. syringae (strain B728a)

     161        566  Mycolicibacterium smegmatis (strain ATCC 700084 / mc(2)155) 

     162        564  Bacillus licheniformis 

     163        562  Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)

     164        562  Bacillus cereus (strain ZK / E33L)

     165        559  Thermotoga maritima (strain ATCC 43589 / MSB8 / DSM 3109 / JCM 10099)

     166        559  Clostridium acetobutylicum 

     167        557  Xanthomonas axonopodis pv. citri (strain 306)

     168        556  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)

     169        555  Pseudomonas fluorescens (strain Pf0-1)

     170        554  Neisseria meningitidis serogroup A / serotype 4A (strain DSM 15465 / Z2491)

     171        553  Oceanobacillus iheyensis 

     172        553  Pseudomonas fluorescens (strain ATCC BAA-477 / NRRL B-23932 / Pf-5)

     173        547  Pseudomonas savastanoi pv. phaseolicola  (Pseudomonas syringae pv. phaseolicola 

     174        540  Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)

     175        533  Corynebacterium glutamicum 

     176        531  Erwinia tasmaniensis (strain DSM 17950 / CIP 109463 / Et1/99)

     177        529  Listeria monocytogenes serotype 4b (strain F2365)

     178        529  Sodalis glossinidius (strain morsitans)

     179        528  Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50) 

     180        524  Staphylococcus aureus (strain Newman)

     181        522  Xylella fastidiosa (strain 9a5c)

     182        521  Vibrio cholerae serotype O1 (strain ATCC 39541 / Classical Ogawa 395 / O395)

     183        519  Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)

     184        517  Chromobacterium violaceum 

     185        516  Bordetella pertussis (strain Tohama I / ATCC BAA-589 / NCTC 13251)

     186        516  Deinococcus radiodurans 

     187        515  Xylella fastidiosa (strain Temecula1 / ATCC 700964)

     188        512  Pseudomonas aeruginosa (strain PA7)

     189        512  Streptococcus pneumoniae serotype 4 (strain ATCC BAA-334 / TIGR4)

     190        511  Streptomyces avermitilis 

     191        510  Haemophilus ducreyi (strain 35000HP / ATCC 700724)

     192        510  Geobacillus kaustophilus (strain HTA426)

     193        508  Bordetella parapertussis (strain 12822 / ATCC BAA-587 / NCTC 13253)

     194        507  Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)

     195        502  Pseudomonas entomophila (strain L48)

     196        501  Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)

     197        499  Brucella abortus biovar 1 (strain 9-941)

     198        499  Haemophilus influenzae (strain 86-028NP)

     199        498  Acinetobacter baylyi (strain ATCC 33305 / BD413 / ADP1)

     200        496  Burkholderia pseudomallei (strain K96243)

     201        496  Rickettsia conorii (strain ATCC VR-613 / Malish 7)

     202        496  Bacillus clausii (strain KSM-K16)

     203        494  Proteus mirabilis (strain HI4320)

     204        494  Xanthomonas campestris pv. campestris (strain 8004)

     205        493  Pyrococcus horikoshii 

     206        492  Thermosynechococcus elongatus (strain BP-1)

     207        492  Bacillus velezensis (strain DSM 23117 / BGSC 10A6 / FZB42) 

     208        491  Halobacterium salinarum (strain ATCC 700922 / JCM 11081 / NRC-1) 

     209        491  Vibrio campbellii (strain ATCC BAA-1116 / BB120)

     210        489  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)

     211        488  Methanosarcina mazei  

     212        487  Synechococcus elongatus (strain PCC 7942 / FACHB-805) (Anacystis nidulans R2)

     213        487  Shewanella sp. (strain MR-7)

     214        486  Mannheimia succiniciproducens (strain MBEL55E)

     215        486  Brucella abortus (strain 2308)

     216        484  Staphylococcus aureus (strain Mu3 / ATCC 700698)

     217        484  Pseudomonas aeruginosa (strain LESB58)

     218        484  Shewanella sp. (strain MR-4)

     219        483  Mycoplasma genitalium (strain ATCC 33530 / G-37 / NCTC 10195)

     220        482  Saccharolobus solfataricus (strain ATCC 35092 / DSM 1617 / JCM 11322 / P2) 

     221        482  Streptococcus pneumoniae (strain ATCC BAA-255 / R6)

     222        480  Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)

     223        478  Pseudomonas putida (strain ATCC 700007 / DSM 6899 / BCRC 17059 / F1)

     224        478  Nicotiana tabacum (Common tobacco)

     225        477  Pyrococcus abyssi (strain GE5 / Orsay)

     226        475  Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) 

     227        474  Burkholderia lata 

     228        472  Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)

     229        469  Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)

     230        468  Clostridium perfringens (strain 13 / Type A)

     231        468  Pseudomonas putida (strain GB-1)

     232        467  Aeromonas hydrophila subsp. hydrophila 

     233        467  Campylobacter jejuni subsp. jejuni serotype O:2 

     234        467  Enterococcus faecalis (strain ATCC 700802 / V583)

     235        467  Shewanella frigidimarina (strain NCIMB 400)

     236        466  Xanthomonas campestris pv. vesicatoria (strain 85-10)

     237        466  Shewanella sp. (strain ANA-3)

     238        465  Trichormus variabilis (strain ATCC 29413 / PCC 7937) (Anabaena variabilis)

     239        463  Burkholderia mallei (strain ATCC 23344)

     240        459  Cupriavidus pinatubonensis (strain JMP 134 / LMG 1197) (Cupriavidus necator 

     241        459  Ovis aries (Sheep)

     242        458  Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)

     243        457  Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)

     244        455  Staphylococcus aureus (strain JH1)

     245        455  Xanthomonas oryzae pv. oryzae (strain MAFF 311018)

     246        455  Shewanella baltica (strain OS185)

     247        453  Streptococcus mutans serotype c (strain ATCC 700610 / UA159)

     248        453  Pseudomonas putida (strain W619)

     249        452  Aeromonas salmonicida (strain A449)

     250        450  Caldanaerobacter subterraneus subsp. tengcongensis  





   

   2.3  Taxonomic distribution of the sequences



   



   Kingdom        sequences (% of the database)

    Archaea           19637 (  3%)

    Bacteria         334772 ( 59%)

    Eukaryota        192555 ( 34%)

    Viruses           17008 (  3%)





   Within Eukaryota:



   



    Category            sequences (% of Eukaryota) (% of the complete database)

     Human                  20395 ( 11%)           (  4%)

     Other Mammalia         46941 ( 24%)           (  8%)

     Other Vertebrata       18614 ( 10%)           (  3%)

     Viridiplantae          40654 ( 21%)           (  7%)

     Fungi                  35073 ( 18%)           (  6%)

     Insecta                 9395 (  5%)           (  2%)

     Nematoda                5078 (  3%)           (  1%)

     Other                  16405 (  9%)           (  3%)







3.  SEQUENCE SIZE



   Repartition of the sequences by size (excluding fragments)



               From   To  Number             From   To   Number

                  1-  50    9753             1001-1100     4034

                 51- 100   42993             1101-1200     2835

                101- 150   59429             1201-1300     2161

                151- 200   59191             1301-1400     2036

                201- 250   58055             1401-1500     1643

                251- 300   51875             1501-1600      809

                301- 350   52275             1601-1700      623

                351- 400   45317             1701-1800      572

                401- 450   37301             1801-1900      490

                451- 500   30136             1901-2000      385

                501- 550   21891             2001-2100      258

                551- 600   15542             2101-2200      359

                601- 650   12951             2201-2300      331

                651- 700    9293             2301-2400      225

                701- 750    7721             2401-2500      177

                751- 800    5595             >2500         1368

                801- 850    4836

                851- 900    5239

                901- 950    4072

                951-1000    2966



   





   The average sequence length in UniProtKB/Swiss-Prot is 360 amino acids.



   The shortest sequence is   GWA_SEPOF (P83570):     2 amino acids.

   The longest sequence is  TITIN_MOUSE (A2ASS6): 35213 amino acids.





4.  JOURNAL CITATIONS



   Note: the following citation statistics reflect the number of distinct

         journal citations.



   Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2901





   4.1 Table of the frequency of journal citations



        Journals cited 1x:  933

                       2x:  396

                       3x:  182

                       4x:  134

                       5x:  119

                       6x:   87

                       7x:   73

                       8x:   64

                       9x:   40

                      10x:   32

                  11- 20x:  237

                  21- 50x:  239

                  51-100x:  125

                    >100x:  240





   4.2  List of the most cited journals in UniProtKB/Swiss-Prot



   Nb    Citations   Journal name

   --    ---------   -------------------------------------------------------------

    1        25534   Journal of Biological Chemistry

    2        11898   Proceedings of the National Academy of Sciences of the U.S.A.

    3         6837   Journal of Bacteriology

    4         5772   Biochemical and Biophysical Research Communications

    5         5521   Biochemistry

    6         5106   Nucleic Acids Research

    7         4917   FEBS Letters

    8         4843   Gene

    9         4764   The EMBO Journal

   10         4730   Nature

   11         4429   Molecular and Cellular Biology

   12         4377   Journal of Molecular Biology

   13         3791   Biochimica et Biophysica Acta

   14         3662   Cell

   15         3440   European Journal of Biochemistry

   16         3366   Journal of Virology

   17         3128   Science

   18         2968   Biochemical Journal

   19         2686   Plant Physiology

   20         2659   Molecular Microbiology

   21         2537   Genomics

   22         2282   The American Journal of Human Genetics

   23         2249   Journal of Cell Biology

   24         2192   PLoS ONE

   25         2089   The Plant Cell

   26         1942   The Plant Journal

   27         1890   Plant Molecular Biology

   28         1890   Human Molecular Genetics

   29         1873   Genes and Development

   30         1830   Virology

   31         1796   Nature Genetics

   32         1722   Molecular Biology of the Cell

   33         1707   Development

   34         1618   Molecular Cell

   35         1590   Human Mutation

   36         1547   Journal of Immunology

   37         1546   Oncogene

   38         1404   Molecular and General Genetics

   39         1403   Structure

   40         1372   Journal of Biochemistry

   41         1344   Genetics

   42         1303   Journal of Cell Science

   43         1195   Blood

   44         1176   Infection and Immunity

   45         1153   Journal of General Virology

   46         1108   Microbiology

   47         1090   Current Biology

   48         1084   Archives of Biochemistry and Biophysics

   49         1082   Developmental Biology

   50          951   Journal of Neuroscience

   51          946   Applied and Environmental Microbiology

   52          929   Acta Crystallographica, Section D

   53          900   Cancer Research

   54          858   FEMS Microbiology Letters

   55          841   Yeast

   56          816   Toxicon

   57          803   Protein Science

   58          779   Neuron

   59          779   Journal of Clinical Investigation

   60          756   PLoS Genetics

   61          728   Plant and Cell Physiology

   62          725   American Journal of Physiology

   63          711   Human Genetics

   64          696   The Journal of Experimental Medicine

   65          658   Nature Communications

   66          655   Mechanisms of Development

   67          652   Proteins

   68          647   Journal of Medical Genetics

   69          646   Nature Structural Biology

   70          601   Nature Cell Biology

   71          582   Nature Structural and Molecular Biology

   72          579   Current Genetics

   73          570   Bioscience, Biotechnology, and Biochemistry

   74          569   The FEBS Journal

   75          564   Scientific Reports

   76          551   Journal of Neurochemistry

   77          543   Developmental Cell

   78          542   Molecular Endocrinology

   79          533   The Journal of Clinical Endocrinology and Metabolism

   80          514   Endocrinology

   81          506   Antimicrobial Agents and Chemotherapy

   82          488   Mammalian Genome

   83          470   Experimental Cell Research

   84          466   Molecular and Biochemical Parasitology

   85          456   PLoS Pathogens

   86          442   Eukaryotic Cell

   87          441   Peptides

   88          437   Planta

   89          432   RNA

   90          432   Journal of the American Chemical Society

   91          430   Immunogenetics

   92          418   Journal of Experimental Botany

   93          406   Journal of Molecular Evolution

   94          401   Molecular Biology and Evolution

   95          397   Molecular Pharmacology

   96          390   DNA and Cell Biology

   97          388   The FASEB Journal

   98          388   EMBO Reports

   99          385   American Journal of Medical Genetics. Part A

  100          382   Acta Crystallographica, Section F

  101          376   Molecular Plant-Microbe Interactions

  102          376   Journal of Investigative Dermatology

  103          376   DNA Sequence

  104          375   Neurology

  105          371   European Journal of Human Genetics

  106          364   Immunity

  107          358   Comparative Biochemistry and Physiology

  108          355   Biology of Reproduction

  109          344   Biochimie

  110          340   Brain Research. Molecular Brain Research

  111          336   Genes to Cells

  112          335   Virus Research

  113          327   Clinical Genetics

  114          323   The New England Journal of Medicine

  115          320   Developmental Dynamics

  116          315   Journal of Lipid Research

  117          305   Annals of Neurology

  118          301   Genome Research

  119          299   Biological Chemistry Hoppe-Seyler

  120          297   BMC Genomics

  121          293   Nature Immunology

  122          293   European Journal of Immunology

  123          285   Applied Microbiology and Biotechnology

  124          284   Investigative Ophthalmology and Visual Science

  125          282   Cytogenetics and Cell Genetics

  126          281   Cell Reports

  127          279   Journal of Medicinal Chemistry

  128          273   Journal of General Microbiology

  129          272   PLoS Biology

  130          269   Journal of Human Genetics

  131          262   Glycobiology

  132          252   Archives of Microbiology

  133          244   Traffic

  134          244   Molecular Immunology

  135          239   Journal of Cellular Biochemistry

  136          238   Molecular Genetics and Metabolism

  137          232   DNA Research

  138          227   Phytochemistry

  139          226   Cell Cycle

  140          226   Protein Expression and Purification

  141          224   Diabetes

  142          223   

  143          222   Nature Medicine

  144          218   Archives of Virology

  145          218   Circulation Research

  146          218   Hoppe-Seyler's Zeitschrift fur Physiologische Chemie

  147          216   Fungal Genetics and Biology

  148          209   Nature Chemical Biology

  149          209   Molecular and Cellular Endocrinology

  150          201   Chemistry and Biology





5.  STATISTICS FOR SOME LINE TYPES



The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,

as well as the number of entries with at least one such line, and the

frequency of the lines.



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry

------------------------------------  -------- ---------  ---------



References (RL)                      1252072                 2.22                                         

   Journal                           1075705     461724      1.91       1                                 

   Submitted to EMBL/GenBank/DDBJ     165231     149581      0.29       2                                 

   Submitted to other databases         7562       6965      0.01       3                                 

   Book citation                        1848       1825     <0.01       4                                 

   Plant Gene Register                   612        599     <0.01       5                                 

   Unpublished observations              459        455     <0.01       6                                 

   Thesis                                437        434     <0.01       7                                 

   Patent                                212        206     <0.01       8                                 

   Worm Breeder's Gazette                  6          6     <0.01       9                                 



Total number of distinct authors cited in UniProtKB/Swiss-Prot: 429873



                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Comments (CC)                        2621600                 4.65                                         

   ACTIVITY REGULATION                 16029      15985      0.03      17                                 

   ALLERGEN                              919        919     <0.01      26                                 

   ALTERNATIVE PRODUCTS                25486      25486      0.05      13                                 

   BIOPHYSICOCHEMICAL PROPERTIES        9380       9372      0.02      20                                 

   BIOTECHNOLOGY                        1369       1346     <0.01      24                                 

   CATALYTIC ACTIVITY                 309661     243716      0.55       4                                 

   CAUTION                             13499      13221      0.02      18                                 

   COFACTOR                           127835     116182      0.23       7                                 

   DEVELOPMENTAL STAGE                 12916      12892      0.02      19                                 

   DISEASE                              7486       5032      0.01      21                                 

   DISRUPTION PHENOTYPE                16716      16708      0.03      16                                 

   DOMAIN                              52340      44752      0.09       9                                 

   FUNCTION                           474788     452879      0.84       2                                 

   INDUCTION                           22652      22605      0.04      15                                 

   INTERACTION                         23030      23030      0.04      14                                 

   MASS SPECTROMETRY                    7049       5419      0.01      23                                 

   MISCELLANEOUS                       44155      38794      0.08      12                                 

   PATHWAY                            141029     127647      0.25       6                                 

   PHARMACEUTICAL                        156        148     <0.01      29                                 

   POLYMORPHISM                         1296       1241     <0.01      25                                 

   PTM                                 59658      43357      0.11       8                                 

   RNA EDITING                           628        628     <0.01      28                                 

   SEQUENCE CAUTION                    44590      44519      0.08      11                                 

   SIMILARITY                         512639     508436      0.91       1                                 

   SUBCELLULAR LOCATION               353955     346115      0.63       3                                 

   SUBUNIT                            286419     282475      0.51       5                                 

   TISSUE SPECIFICITY                  47973      47818      0.09      10                                 

   TOXIC DOSE                            779        635     <0.01      27                                 

   WEB RESOURCE                         7168       5991      0.01      22                                 



Total number of comment topics: 29





                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank

------------------------------------  -------- ---------  ---------  ----

Features (FT)                        4672113                 8.28                                         

   ACT_SITE                           167217     101305      0.30      10                                 

   BINDING                            415853     110438      0.74       2                                 

   CA_BIND                              4213       1748      0.01      36                                 

   CARBOHYD                           119513      30639      0.21      15                                 

   CHAIN                              572202     556666      1.01       1                                 

   COILED                              22161      15336      0.04      27                                 

   COMPBIAS                            59229      31869      0.11      21                                 

   CONFLICT                           137066      47851      0.24      13                                 

   CROSSLNK                            24088       8679      0.04      26                                 

   DISULFID                           129095      34529      0.23      14                                 

   DNA_BIND                            11963      10701      0.02      33                                 

   DOMAIN                             206270     127088      0.37       9                                 

   HELIX                              284077      25649      0.50       6                                 

   INIT_MET                            17415      17367      0.03      28                                 

   INTRAMEM                             2854       1308      0.01      37                                 

   LIPID                               13398       8635      0.02      30                                 

   METAL                              405245      97801      0.72       3                                 

   MOD_RES                            255843      72851      0.45       7                                 

   MOTIF                               44954      29384      0.08      23                                 

   MUTAGEN                             79475      17062      0.14      18                                 

   NON_CONS                             2514        818     <0.01      38                                 

   NON_STD                               358        283     <0.01      39                                 

   NON_TER                             12567       9638      0.02      31                                 

   NP_BIND                            160248      86613      0.28      11                                 

   PEPTIDE                             11992       8234      0.02      32                                 

   PROPEP                              14666      12512      0.03      29                                 

   REGION                             207707      96704      0.37       8                                 

   REPEAT                             106965      14914      0.19      16                                 

   SIGNAL                              43008      43007      0.08      24                                 

   SITE                                60808      32841      0.11      20                                 

   STRAND                             293820      24193      0.52       5                                 

   TOPO_DOM                           145195      29543      0.26      12                                 

   TRANSIT                              9257       9141      0.02      34                                 

   TRANSMEM                           375583      78579      0.67       4                                 

   TURN                                68593      20893      0.12      19                                 

   UNSURE                               5577        841      0.01      35                                 

   VAR_SEQ                             52388      22254      0.09      22                                 

   VARIANT                             98566      17191      0.17      17                                 

   ZN_FING                             30170      12894      0.05      25                                 



Total number of feature keys: 39







                                      Total    Number of  Average

   Line type / subtype                number   entries    per entry  Rank      Category

------------------------------------  -------- ---------  ---------  ----      -------------------------------------------

Cross-references (DR)               18177305                32.23                                                           

   ABCD                                 2716       2716     <0.01     118      Protocols and materials databases            

   Allergome                            2007       1294     <0.01     125      Protein family/group databases               

   Antibodypedia                       32090      31979      0.06      56      Protocols and materials databases            

   ArachnoServer                        1163       1154     <0.01     135      Organism-specific databases                  

   Araport                             16051      15955      0.03      85      Organism-specific databases                  

   Bgee                                57241      57239      0.10      42      Gene expression databases                    

   BindingDB                            5388       5388      0.01     103      Chemistry databases                          

   BioCyc                             202282     198235      0.36      24      Enzyme and pathway databases                 

   BioGRID                             57664      55885      0.10      41      Protein-protein interaction databases        

   BioGRID-ORCS                        38921      38427      0.07      54      Miscellaneous databases                      

   BioMuta                             20318      20301      0.04      71      Polymorphism and mutation databases          

   BMRB                                 6894       6894      0.01      99      3D structure databases                       

   BRENDA                              13008      12225      0.02      87      Enzyme and pathway databases                 

   CarbonylDB                           1157       1157     <0.01     137      PTM databases                                

   CAZy                                 9545       8603      0.02      92      Protein family/group databases               

   CCDS                                48667      34288      0.09      48      Sequence databases                           

   CDD                                186612     169433      0.33      26      Family and domain databases                  

   CGD                                  2001       1984     <0.01     126      Organism-specific databases                  

   ChEMBL                               7824       7660      0.01      96      Chemistry databases                          

   ChiTaRS                             29642      29605      0.05      60      Miscellaneous databases                      

   CLAE                                  356        353     <0.01     153      Protein family/group databases               

   CollecTF                              135        135     <0.01     161      Gene expression databases                    

   ComplexPortal                       10641       5917      0.02      90      Protein-protein interaction databases        

   COMPLUYEAST-2DPAGE                     97         97     <0.01     163      2D gel databases                             

   ConoServer                            959        872     <0.01     139      Organism-specific databases                  

   CORUM                                5806       5806      0.01     102      Protein-protein interaction databases        

   CPTAC                                2525       1632     <0.01     121      Proteomic databases                          

   CPTC                                  213        213     <0.01     159      Protocols and materials databases            

   CTD                                 75451      74534      0.13      39      Organism-specific databases                  

   DEPOD                                 254        254     <0.01     158      PTM databases                                

   dictyBase                            4215       4101      0.01     112      Organism-specific databases                  

   DIP                                 17449      17410      0.03      80      Protein-protein interaction databases        

   DisGeNET                            17035      16810      0.03      81      Organism-specific databases                  

   DisProt                              1425       1413     <0.01     131      Family and domain databases                  

   DMDM                                16195      16193      0.03      84      Polymorphism and mutation databases          

   DNASU                               19067      19000      0.03      74      Protocols and materials databases            

   DOSAC-COBS-2DPAGE                     145        145     <0.01     160      2D gel databases                             

   DrugBank                            29146       4690      0.05      62      Chemistry databases                          

   DrugCentral                          2532       2532     <0.01     120      Chemistry databases                          

   EchoBASE                             4158       4158      0.01     113      Organism-specific databases                  

   eggNOG                             336688     330836      0.60      15      Phylogenomic databases                       

   ELM                                  1811       1811     <0.01     127      Protein-protein interaction databases        

   EMBL                               990097     551754      1.76       3      Sequence databases                           

   Ensembl                             98779      51726      0.18      35      Genome annotation databases                  

   EnsemblBacteria                    356446     337192      0.63      14      Genome annotation databases                  

   EnsemblFungi                        30018      28427      0.05      59      Genome annotation databases                  

   EnsemblMetazoa                      17998      10431      0.03      78      Genome annotation databases                  

   EnsemblPlants                       30184      21423      0.05      57      Genome annotation databases                  

   EnsemblProtists                      5038       4859      0.01     105      Genome annotation databases                  

   EPD                                 21161      21161      0.04      67      Proteomic databases                          

   ESTHER                               2583       2582     <0.01     119      Protein family/group databases               

   euHCVdb                                55         44     <0.01     165      Organism-specific databases                  

   EuPathDB                            39247      39052      0.07      52      Organism-specific databases                  

   EvolutionaryTrace                   16660      16660      0.03      83      Miscellaneous databases                      

   ExpressionAtlas                     48335      48335      0.09      49      Gene expression databases                    

   FlyBase                              4918       4793      0.01     106      Organism-specific databases                  

   Gene3D                             415097     322224      0.74      12      Family and domain databases                  

   GeneCards                           20357      20192      0.04      68      Organism-specific databases                  

   GeneDB                                607        551     <0.01     146      Genome annotation databases                  

   GeneID                             312440     296982      0.55      18      Genome annotation databases                  

   GeneReviews                          1479       1475     <0.01     128      Organism-specific databases                  

   GeneTree                            59872      59833      0.11      40      Phylogenomic databases                       

   Genevisible                         55249      55249      0.10      45      Gene expression databases                    

   GeneWiki                            10350      10267      0.02      91      Miscellaneous databases                      

   GenomeRNAi                          22171      22170      0.04      65      Miscellaneous databases                      

   GlyConnect                           2320       2178     <0.01     122      PTM databases                                

   GlyGen                              11178      11178      0.02      89      PTM databases                                

   GO                                3074740     539430      5.45       1      Ontologies                                   

   Gramene                             30184      21423      0.05      58      Genome annotation databases                  

   GuidetoPHARMACOLOGY                  2020       2020     <0.01     124      Chemistry databases                          

   HAMAP                              330392     327472      0.59      16      Family and domain databases                  

   HGNC                                20336      20203      0.04      69      Organism-specific databases                  

   HOGENOM                            423879     423879      0.75      11      Phylogenomic databases                       

   HPA                                 18983      18847      0.03      75      Organism-specific databases                  

   IDEAL                                 985        985     <0.01     138      Family and domain databases                  

   IMGT_GENE-DB                          267        267     <0.01     157      Protein family/group databases               

   InParanoid                         140237     140237      0.25      27      Phylogenomic databases                       

   IntAct                              55653      55653      0.10      44      Protein-protein interaction databases        

   InterPro                          2320835     545076      4.12       2      Family and domain databases                  

   iPTMnet                             52677      52677      0.09      46      PTM databases                                

   jPOST                               26394      26394      0.05      63      Proteomic databases                          

   KEGG                               505520     476693      0.90       7      Genome annotation databases                  

   LegioList                             765        763     <0.01     142      Organism-specific databases                  

   Leproma                               672        669     <0.01     143      Organism-specific databases                  

   MaizeGDB                              520        516     <0.01     148      Organism-specific databases                  

   MalaCards                            4823       4819      0.01     107      Organism-specific databases                  

   MassIVE                             17470      17470      0.03      79      Proteomic databases                          

   MaxQB                               29607      29607      0.05      61      Proteomic databases                          

   MEROPS                              11512      11510      0.02      88      Protein family/group databases               

   MetOSite                             3106       3106      0.01     116      PTM databases                                

   MGI                                 16967      16927      0.03      82      Organism-specific databases                  

   MIM                                 21779      15370      0.04      66      Organism-specific databases                  

   MINT                                22796      22796      0.04      64      Protein-protein interaction databases        

   MoonDB                                348        348     <0.01     154      Protein family/group databases               

   MoonProt                              281        281     <0.01     156      Protein family/group databases               

   neXtProt                            20322      20321      0.04      70      Organism-specific databases                  

   NIAGADS                                68         68     <0.01     164      Organism-specific databases                  

   OGP                                   373        373     <0.01     152      2D gel databases                             

   OMA                                414318     414318      0.73      13      Phylogenomic databases                       

   OpenTargets                         18396      18243      0.03      76      Organism-specific databases                  

   Orphanet                             7737       4118      0.01      97      Organism-specific databases                  

   OrthoDB                            245665     245665      0.44      21      Phylogenomic databases                       

   PANTHER                            287078     274254      0.51      20      Family and domain databases                  

   PathwayCommons                      19493      19493      0.03      73      Enzyme and pathway databases                 

   PATRIC                              92433      92433      0.16      38      Genome annotation databases                  

   PaxDb                              125521     125521      0.22      32      Proteomic databases                          

   PCDDB                                 129        129     <0.01     162      3D structure databases                       

   PDB                                204603      29738      0.36      22      3D structure databases                       

   PDBsum                             204603      29738      0.36      23      3D structure databases                       

   PeptideAtlas                        33410      33410      0.06      55      Proteomic databases                          

   PeroxiBase                            783        761     <0.01     141      Protein family/group databases               

   Pfam                               784868     523437      1.39       4      Family and domain databases                  

   PharmGKB                            18314      18295      0.03      77      Organism-specific databases                  

   Pharos                              20098      20098      0.04      72      Miscellaneous databases                      

   PHI-base                             1456       1209     <0.01     129      Miscellaneous databases                      

   PhosphoSitePlus                     39078      39078      0.07      53      PTM databases                                

   PhylomeDB                           96992      96992      0.17      36      Phylogenomic databases                       

   PIR                                124423     114155      0.22      33      Sequence databases                           

   PIRSF                              107756     106661      0.19      34      Family and domain databases                  

   PlantReactome                        1163        715     <0.01     136      Enzyme and pathway databases                 

   PomBase                              5132       5128      0.01     104      Organism-specific databases                  

   PRIDE                              135651     135651      0.24      29      Proteomic databases                          

   PRINTS                             131122     116243      0.23      30      Family and domain databases                  

   PRO                                 96888      96887      0.17      37      Miscellaneous databases                      

   ProMEX                                467        467     <0.01     150      Proteomic databases                          

   PROSITE                            480676     305552      0.85       9      Family and domain databases                  

   Proteomes                          502310     467213      0.89       8      Miscellaneous databases                      

   ProteomicsDB                        56866      35739      0.10      43      Proteomic databases                          

   PseudoCAP                            1401       1392     <0.01     132      Organism-specific databases                  

   Reactome                           130027      36735      0.23      31      Enzyme and pathway databases                 

   REBASE                                622        384     <0.01     144      Protein family/group databases               

   RefSeq                             614664     469623      1.09       5      Sequence databases                           

   REPRODUCTION-2DPAGE                  1259       1038     <0.01     133      2D gel databases                             

   RGD                                  8040       8037      0.01      95      Organism-specific databases                  

   RNAct                               43016      43016      0.08      51      Miscellaneous databases                      

   SABIO-RK                             4579       4579      0.01     109      Enzyme and pathway databases                 

   SASBDB                                413        413     <0.01     151      3D structure databases                       

   SFLD                                 8203       6096      0.01      94      Family and domain databases                  

   SGD                                  6740       6735      0.01     100      Organism-specific databases                  

   SignaLink                            3105       3105      0.01     117      Enzyme and pathway databases                 

   SIGNOR                               4805       4805      0.01     108      Enzyme and pathway databases                 

   SMART                              194012     142993      0.34      25      Family and domain databases                  

   SMR                                452898     452898      0.80      10      3D structure databases                       

   STRING                             329481     329481      0.58      17      Protein-protein interaction databases        

   SUPFAM                             514302     388882      0.91       6      Family and domain databases                  

   SWISS-2DPAGE                         1177       1177     <0.01     134      2D gel databases                             

   SwissLipids                          1445       1361     <0.01     130      Chemistry databases                          

   SwissPalm                            8632       8632      0.02      93      PTM databases                                

   TAIR                                14828      14772      0.03      86      Organism-specific databases                  

   TCDB                                 7677       7619      0.01      98      Protein family/group databases               

   TIGRFAMs                           292893     272852      0.52      19      Family and domain databases                  

   TopDownProteomics                    3237       2960      0.01     114      Proteomic databases                          

   TreeFam                             45736      45729      0.08      50      Phylogenomic databases                       

   TubercuList                          2257       2221     <0.01     123      Organism-specific databases                  

   UCD-2DPAGE                            496        496     <0.01     149      2D gel databases                             

   UCSC                                50298      45904      0.09      47      Genome annotation databases                  

   UniCarbKB                             584        584     <0.01     147      PTM databases                                

   UniLectin                             282        282     <0.01     155      Protein family/group databases               

   UniPathway                         138141     124941      0.24      28      Enzyme and pathway databases                 

   VectorBase                            618        531     <0.01     145      Genome annotation databases                  

   VGNC                                 4328       4315      0.01     111      Organism-specific databases                  

   WBParaSite                              5          5     <0.01     166      Genome annotation databases                  

   World-2DPAGE                          932        921     <0.01     140      2D gel databases                             

   WormBase                             6402       4770      0.01     101      Organism-specific databases                  

   Xenbase                              4530       4529      0.01     110      Organism-specific databases                  

   ZFIN                                 3167       3162      0.01     115      Organism-specific databases                  



Total number of cross-referenced databases: 166



6.  AMINO ACID COMPOSITION



   6.1  Composition in percent for the complete database



   Ala (A) 8.25   Gln (Q) 3.93   Leu (L) 9.65   Ser (S) 6.63

   Arg (R) 5.53   Glu (E) 6.72   Lys (K) 5.80   Thr (T) 5.35

   Asn (N) 4.06   Gly (G) 7.07   Met (M) 2.41   Trp (W) 1.10

   Asp (D) 5.46   His (H) 2.27   Phe (F) 3.86   Tyr (Y) 2.92

   Cys (C) 1.38   Ile (I) 5.91   Pro (P) 4.73   Val (V) 6.86



   Asx (B) 0.000  Glx (Z) 0.000  Xaa (X) 0.00



   



   Legend: gray = aliphatic, red = acidic, green = small hydroxy,

           blue = basic, black = aromatic, white = amide, yellow = sulfur





   6.2  Classification of the amino acids by their frequency



   Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,

   Phe, Tyr, Met, His, Cys, Trp





7.  MISCELLANEOUS STATISTICS



4464 entries are encoded on a mitochondrion, and 3942 are encoded on a plasmid.



12189 entries are encoded on a plastid, 

of which 21 are encoded on apicoplasts, 

11624 on chloroplasts, 

51 on organellar chromatophores,

145 on cyanelles, 

149 on non-photosynthetic plastids and 

199 on unspecified types of plastid.



Number of entries with at least one sequence correction: 80072