uniprot logo

News

UniProt release 2020_03

Published June 17, 2020

Headline

Mitochondrial call for help

Life is a continuous chain of issues that have to be addressed for survival and our cells know all about that. Hypoxia, amino acid deprivation, glucose deprivation, viral infection, endoplasmic reticulum stress are but a few examples. That's why eukaryotic cells have developed an elaborate signaling pathway, called the integrated stress response (ISR), which is activated in the cytosol in response to a range of physiological changes and pathological conditions. Stress stimuli that activate ISR all converge on the phosphorylation of the alpha subunit of eukaryotic translation initiation factor 2 (EIF2A). Depending upon the stress stimulus, the reaction is catalyzed by one of the four following kinases: GCN2/EIF2AK4, PERK/EIF2AK3, PKR/EIF2AK2 and HRI/EIF2AK1. EIF2A phosphorylation leads to attenuation in 5' cap-dependent protein synthesis, while promoting the translation of selected mRNAs that harbor a short upstream open reading frame in their 5'-untranslated region, including those for transcription factors ATF4, ATF5 or DDIT3 (also known as CHOP). ISR is primarily a survival program, but exposure to severe stress can also lead to apoptosis.

Mitochondrial stress also strongly induces the expression of ATF4 and DDIT3, and hence triggers ISR, but the pathway signaling mitochondrial stress to the cytosol was elusive until the publication of two articles in March of this year. DDIT3 induction requires at least two mitochondrial proteins: OMA1 and DELE1, and the cytosolic kinase HRI. OMA1 is a metalloprotease located in the mitochondrial inner membrane. It is activated by mitochondrial dysfunction, possibly via membrane depolarization. OMA1 cleaves DELE1 in the intermembrane space and the N-terminally truncated DELE1 fragment enters the cytosol where it interacts with and activates HRI and hence stress response. The consequences of ISR during mitochondrial dysfunction are not yet fully understood and DELE1 may not be the only mitochondrial ISR activator, but one missing link between mitochondrial stress and ISR has been clearly established.

As of this release, the OMA1 protease and DELE1 have been updated and are available in UniProtKB/Swiss-Prot. To annotate DELE1, we took advantage of the new format announced last December, which allows us to describe product-specific features, be it that of an alternative splicing isoform, for example, or of a peptide resulting from proteolytic cleavage. EIF2AK1 activation is a function restricted to the DELE1 cleavage product, and this uniqueness is clearly reported in the 'FUNCTION' subsection dedicated to DELE1 short form.

UniProtKB news

Cross-references to IDEAL

Cross-references have been added to the IDEAL database, a database of Intrinsically Disordered proteins.

IDEAL is available at http://idp1.force.cs.is.nagoya-u.ac.jp/IDEAL/.

The format of the explicit links is:

Resource abbreviation IDEAL
Resource identifier Resource identifier

Example: O15162

Show all entries having a cross-reference to IDEAL.

Text format

Example: O15162

DR   IDEAL; IID00006; -.

XML format

Example: O15162

<dbReference type="IDEAL" id="IID00006"/>

RDF format

Example: O15162

uniprot:O15162
  rdfs:seeAlso <http://identifiers.org/ideal/IID00006> .
<http://identifiers.org/ideal/IID00006>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/IDEAL> .

Cross-references to BioGRID-ORCS

Cross-references have been added to the BioGRID-ORCS database, a database of CRISPR phenotype screens.

BioGRID-ORCS is available at https://orcs.thebiogrid.org.

The format of the explicit links is:

Resource abbreviation BioGRID-ORCS
Resource identifier Resource identifier
Optional information 1 Number of hits

Example: Q96A29

Show all entries having a cross-reference to BioGRID-ORCS.

Text format

Example: Q96A29

DR   BioGRID-ORCS; 55343; 19 Hits in 787 CRISPR Screens.

XML format

Example: Q96A29

<dbReference type="BioGRID-ORCS" id="55343">
  <property type="hits" value="19 Hits in 787 CRISPR Screens"/>
</dbReference>

RDF format

Example: Q96A29

uniprot:Q96A29
  rdfs:seeAlso <http://purl.uniprot.org/biogrid-orcs/55343> .
<http://purl.uniprot.org/biogrid-orcs/55343>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/BioGRID-ORCS> ;
  rdfs:comment "19 Hits in 787 CRISPR Screens" .

Change to the cross-references to ABCD

We have introduced an additional field in the cross-references to the ABCD (AntiBodies Chemically Defined) database, a manually curated depository of sequenced antibodies. This allows us to specify the number of sequenced antibodies available for a given protein in UniProtKB.

Text format

Example: O75084

Former format:

DR   ABCD; O75084; -.

New format:

DR   ABCD; O75084; 10 sequenced antibodies.

XML format

Example: O75084

Former format:

<dbReference type="ABCD" id="O75084"/>

New format:

<dbReference type="ABCD" id="O75084"/>
  <property type="antibodies" value="10 sequenced antibodies"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: O75084

Former format

uniprot:O75084
  rdfs:seeAlso <http://purl.uniprot.org/abcd/O75084> .
<http://purl.uniprot.org/abcd/O75084>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ABCD> .

New format:

uniprot:O75084
  rdfs:seeAlso <http://purl.uniprot.org/abcd/O75084> .
<http://purl.uniprot.org/abcd/O75084>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ABCD> ;
  rdfs:comment "10 sequenced antibodies" .

Change of the cross-references to MycoCLAP

The MycoCLAP resource has changed its name to CLAE and we have updated our cross-references to reflect this name change.

Cross-references to BioGRID

The BioGrid database was renamed BioGRID. We changed the database name in the relevant cross-references (DR lines in the flat file) accordingly.

Example:

DR   BioGRID; 198188; 12.

Cross-references to Unimod in the ptmlist.txt document file

The ptmlist.txt document, which is available by FTP and on the website, describes post-translational modifications (PTMs) annotated in the UniProt knowledgebase. This release sees the addition of optional cross-references from ptmlist.txt to Unimod, an open access database of protein modifications for use in mass spectrometry applications which provides molecular-level details of PTMs (both natural and non-natural) including molecular formula, target residues, monoisotopic and average mass shifts and literature references with a community-driven curation.

Example:

ID   (3R)-3-hydroxyasparagine
AC   PTM-0369
FT   MOD_RES
..
KW   Hydroxylation.
..
DR   Unimod; 35.

This new mapping to Unimod will facilitate the integration of data on PTMs identified by mass spectrometry based proteomics.

We have currently mapped 236 of the most common PTMs in UniProtKB to Unimod and will continue to add new cross-references to Unimod in forthcoming releases. This mapping of PTMs to Unimod is part of our ongoing work on the standardization of knowledge of PTMs in UniProtKB by providing cross-references to a resource which is widely used in proteomics bioinformatics. We welcome your feedback on these current and future developments.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Lipidation' ('LIPID' in the flat file):

  • 3'-prenyl-2',N2-cyclotryptophan

UniProt release 2020_02

Published April 22, 2020

Headline

Genome integrity maintenance by HMCES

Apurinic or apyrimidinic sites, also known as abasic or AP sites, are one of the most common DNA lesions. They occur at a frequency of about 15,000 per day in human cells. In double-stranded DNA, the majority of AP sites are removed by base excision repair. After removal of the lesion, the undamaged strand is used as a template for repair synthesis. AP sites also form in single-stranded DNA (ssDNA), but until recently there was no known mechanism involved in their repair in this context. A major breakthrough in the field was reported last year in Cell.

Mohni et al. were interested in HMCES. HMCES full name is 'stem cell-specific 5-hydroxymethylcytosine-binding protein'. It was originally thought to be a regulator of 5-hydroxymethylcytosine. However, it had also been identified in the replisome, a large protein machine that carries out DNA replication. HMCES is conserved in almost all organisms, even in those that do not utilize methylcytosine for epigenetic control. Taken together, these observations suggested that HMCES could bear another crucial function, possibly in replication. Surprisingly HMCES knockout in cells did not affect DNA replication, nor cell division, but rather exacerbated cell sensitivity toward several DNA-damaging agents. Knockout cells accumulated DNA damage and exhibited increased genetic instability. Different DNA-damaging agents were tested and the only common kind of lesion they induced was the formation of AP sites.

HMCES appears to act as the initiating step of a replication-coupled repair mechanism for abasic sites in ssDNA. In eukaryotic cells, HMCES interacts with proliferating cell nuclear antigen (PCNA), an essential factor for replication, and travels with replication forks. When it senses AP sites in ssDNA, it covalently crosslinks to ssDNA AP sites generating a DNA-protein intermediate. The nature of this crosslink has been identified by crystallographic studies as a stable thiazolidine DNA-protein linkage formed between the N-terminal cysteine and the aldehyde form of the AP deoxyribose. The crosslink is so stable that its resolution requires HMCES degradation via the proteasome. This sequence of events may appear counterintuitive. It is almost as if HMCES takes a bad situation and makes it worse. However, this crosslink effectively shields the lesion from endonucleases and error-prone trans-lesion bypass (TLS) polymerases, such as REV1 and REV3L, and prevents mutagenesis they might engender. The DNA repair mechanism acting downstream of HMCES is not known.

As of this release, human HMCES, as well as YedK, an Escherichia coli homolog have been updated and are available in UniProtKB/Swiss-Prot. The exact structure of the chemical crosslink was submitted to ChEBI where more details are provided.

UniProtKB news

Change of annotation topic 'Interaction'

The annotation topic 'Interaction' provides information about binary protein-protein interactions. This data is curated in the IntAct database and a quality-filtered subset is imported into UniProtKB at each release.

In the context of improving the functional annotation of different gene products in UniProtKB/Swiss-Prot, we have started to import more detailed data from IntAct. Our previous representation of a binary protein-protein interaction provided details only for the protein that was described in another entry. This left ambiguity in UniProtKB/Swiss-Prot entries that describe more than one protein (isoforms or/and products of proteolytic cleavage). To address this we now describe both interacting proteins by unique UniProtKB identifiers.

This change affects the three main UniProtKB distribution formats (text, XML, RDF). The details are described for each format in a separate section below. The following placeholders are used in the format descriptions:

  • <Interactant> represents a UniProtKB protein.
    • <Accession> is a UniProtKB accession number.
    • <IsoId> is a UniProtKB isoform ID.
    • <ProductId> is a UniProtKB product ID.
    • <Gene> is either the gene name, ordered locus name or ORF name of the gene that encodes the UniProtKB protein (see Gene names).
  • <Experiments> is the number of experiments in IntAct that support an interaction.
  • <IntActId> is an IntAct protein ID.

Note: The format descriptions make use of POSIX ERE syntax.

Text format

Previous format:

CC   -!- INTERACTION:
CC       <Interactant>( \(xeno\))?; NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       <Interactant>( \(xeno\))?; NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       ...

The <Interactant> was described in the following way:

Self|(<Accession>|<IsoId>):(<Gene>|-)

Where Self represents a self-interaction and a dash is shown for proteins with an undefined <Gene>. xeno is an optional flag that indicates that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.

New format:

CC   -!- INTERACTION:
CC       <Interactant>; <Interactant>;( Xeno;)? NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       <Interactant>; <Interactant>;( Xeno;)? NbExp=<Experiments>; IntAct=<IntActId>, <IntActId>;
CC       ...

Where

  • the first <Interactant> is represented by:
    (<Accession>|<IsoId>|<ProductId>)
    
  • the second <Interactant> is represented by:
    (<Accession>|<IsoId>|<ProductId> [<Accession>])(: <Gene>)?
    

Example: P11309

Binary interactions with different isoforms that are described in P11309.

Previous format:

CC   -!- INTERACTION:
CC       Q9BZS1-1:FOXP3; NbExp=3; IntAct=EBI-1018629, EBI-9695448;
CC       Q9UNQ0:ABCG2; NbExp=5; IntAct=EBI-1018633, EBI-1569435;

New format:

CC   -!- INTERACTION:
CC       P11309-1; Q9BZS1-1: FOXP3; NbExp=3; IntAct=EBI-1018629, EBI-9695448;
CC       P11309-2; Q9UNQ0: ABCG2; NbExp=5; IntAct=EBI-1018633, EBI-1569435;

Example: P27958 and Q9NPY3

Binary interaction with a product of proteolytic cleavage. Interactions involving products of proteolytic cleavage were previously not imported from IntAct, therefore only the new data/format is shown.

New data and format of P27958:

CC   -!- INTERACTION:
CC       PRO_0000037566; Q9NPY3: CD93; Xeno; NbExp=2; IntAct=EBI-6377335, EBI-1755002;

New data and format of Q9NPY3:

CC   -!- INTERACTION:
CC       Q9NPY3; PRO_0000037566 [P27958]; Xeno; NbExp=2; IntAct=EBI-1755002, EBI-6377335;

XML format

The UniProtKB XSD represents a binary interaction with:

  • two interactant elements of interactantType
  • a boolean organismsDiffer element that indicates that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.
  • an experiments element that gives the number of experiments in IntAct that support an interaction.

The interactantType uses an interactantGroup to represent a sequence of:

  • an id element
  • an optional label element

We have added an optional dbReference element to the interactantGroup to allow us to represent the UniProtKB <Accession> for a <ProductId>:

<xs:group name="interactantGroup">
        <xs:sequence>
            <xs:element name="id" type="xs:string"/>
            <xs:element name="label" type="xs:string" minOccurs="0"/>
            <xs:element name="dbReference" type="dbReferenceType" minOccurs="0"/>
        </xs:sequence>
    </xs:group>

Previous format:

<comment type="interaction">
  <interactant intactId="<IntActId>"/>
  <interactant intactId="<IntActId>">
    <id><Accession>|<IsoId></id>
    <label><Gene></label>
  </interactant>
  <organismsDiffer>true|false</organismsDiffer>
  <experiments><Experiments></experiments>
</comment>

New format:

<comment type="interaction">
  <interactant intactId="<IntActId>">
    <id><Accession>|<IsoId>|<ProductId></id>
  </interactant>
  <interactant intactId="<IntActId>">
    <id><Accession>|<IsoId>|<ProductId></id>
    <label><Gene></label>
    <!-- If <id> is a <ProductId>: -->
    <dbReference type="UniProtKB" id="<Accession>"/>
  </interactant>
  <organismsDiffer>true|false</organismsDiffer>
  <experiments><Experiments></experiments>
</comment>

Example: P11309

Binary interactions with different isoforms that are described in P11309.

Previous format:

<comment type="interaction">
  <interactant intactId="EBI-1018629"/>
  <interactant intactId="EBI-9695448">
    <id>Q9BZS1-1</id>
    <label>FOXP3</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>3</experiments>
</comment>
<comment type="interaction">
  <interactant intactId="EBI-1018633"/>
  <interactant intactId="EBI-1569435">
    <id>Q9UNQ0</id>
    <label>ABCG2</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>5</experiments>
</comment>

New format:

<comment type="interaction">
  <interactant intactId="EBI-1018629">
    <id>P11309-1</id>
  </interactant>
  <interactant intactId="EBI-9695448">
    <id>Q9BZS1-1</id>
    <label>FOXP3</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>3</experiments>
</comment>
<comment type="interaction">
  <interactant intactId="EBI-1018633">
    <id>P11309-2</id>
  </interactant>
  <interactant intactId="EBI-1569435">
    <id>Q9UNQ0</id>
    <label>ABCG2</label>
  </interactant>
  <organismsDiffer>false</organismsDiffer>
  <experiments>5</experiments>
</comment>

Example: P27958 and Q9NPY3

Binary interaction with a product of proteolytic cleavage. Interactions involving products of proteolytic cleavage had previously not been imported from IntAct, therefore only the new data/format is shown.

New data and format of P27958:

<comment type="interaction">
  <interactant intactId="EBI-6377335">
    <id>PRO_0000037566</id>
  </interactant>
  <interactant intactId="EBI-1755002">
    <id>Q9NPY3</id>
    <label>CD93</label>
  </interactant>
  <organismsDiffer>true</organismsDiffer>
  <experiments>2</experiments>
</comment>

New data and format of Q9NPY3:

<comment type="interaction">
  <interactant intactId="EBI-1755002">
    <id>Q9NPY3</id>
  </interactant>
  <interactant intactId="EBI-6377335">
    <id>PRO_0000037566</id>
    <dbReference type="UniProtKB" id="P27958"/>
  </interactant>
  <organismsDiffer>true</organismsDiffer>
  <experiments>2</experiments>
</comment>

RDF format

The UniProt RDF schema ontology represents a binary interaction with an interaction property whose rdfs:range is the Interaction class. This class is the domain of the following properties that describe the interaction:

  • xeno is a boolean that indicates that the interacting proteins are derived from different species. This may be due to the experimental set-up or may reflect a pathogen-host interaction.
  • experiments gives the number of experiments in IntAct that support an interaction.

A Participant is identified by its unique IntAct identifier. It also refers to the corresponding UniProtKB protein which is represented as described in the news article about the functional annotation of different gene products in UniProtKB/Swiss-Prot. An optional rdfs:label property may provide the gene name, ordered locus name or ORF name of the gene that encodes the UniProtKB protein.

The RDF schema ontology required no changes to represent the more detailed data that we now import from IntAct. Due to the symmetry of binary interactions, the UniProt SPARQL server already provided access to the full details about both interacting proteins. We have however taken this opportunity to normalize the URI of a binary interaction so that the two UniProtKB entries that describe the interacting proteins refer to the interaction with the same URI:

Previous format:

<<Accession>#interaction-<IntActId>-<IntActId>> .

New format:

<http://purl.uniprot.org/intact/<IntActId>-<IntActId>> .

Example: P11309 and Q8N9N5

Previous format:

P11309:

<P11309#interaction-696621-744695>

Q8N9N5:

<Q8N9N5#interaction-744695-696621>

New format:

P11309 and Q8N9N5:

<http://purl.uniprot.org/intact/EBI-696621-EBI-744695>

Cross-references to Antibodypedia

Cross-references have been added to Antibodypedia, a portal providing access to publicly available research antibodies towards human protein targets from many different providers.

Antibodypedia is available at https://www.antibodypedia.com/.

The format of the explicit links is:

Resource abbreviation Antibodypedia
Resource identifier Resource identifier
Optional information 1 Number of antibodies

Example: P04626

Show all entries having a cross-reference to Antibodypedia.

Text format

Example: P04626

DR   Antibodypedia; 740; 5394 antibodies.

XML format

Example: P04626

<dbReference type="Antibodypedia" id="740">
   <property type="antibodies" value="5394 antibodies"/>
</dbReference>

RDF format

Example: P04626

uniprot:P04626
  rdfs:seeAlso <http://purl.uniprot.org/antibodypedia/740> .

<http://purl.uniprot.org/antibodypedia/740>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Antibodypedia> ;
  rdfs:comment "5394 antibodies" .

Cross-references to MetOSite

Cross-references have been added to MetOSite, a database of methionine sulfoxide sites. Each collected site has been classified according to the effect of its sulfoxidation on the biological properties of the modified protein. Thus, MetOSite documents cases where the sulfoxidation of methionine leads to gain or loss of activity, increased or decreased protein-protein interaction susceptibility, and to changes in protein stability or in subcellular location.

MetOSite is available at https://metosite.uma.es/.

The format of the explicit links is:

Resource abbreviation MetOSite
Resource identifier UniProtKB accession number

Example: P10987

Show all entries having a cross-reference to MetOSite.

Text format

Example: P10987

DR   MetOSite; P10987; -.

XML format

Example: P10987

<dbReference type="MetOSite" id="P10987"/>

RDF format

Example: P10987

uniprot:P10987
  rdfs:seeAlso <http://purl.uniprot.org/metosite/P10987> .
<http://purl.uniprot.org/metosite/P10987>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/MetOSite> .

Cross-references to PHI-base

Cross-references have been added to PHI-base, a database providing expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions.

PHI-base is available at http://www.phi-base.org/.

The format of the explicit links is:

Resource abbreviation PHI-base
Resource identifier Resource identifier

Example: Q00310

Show all entries having a cross-reference to PHI-base.

Text format

Example: Q00310

DR   PHI-base; PHI:104; -.

XML format

Example: Q00310

<dbReference type="PHI-base" id="PHI:104"/>

RDF format

Example: Q00310

uniprot:Q00310
  rdfs:seeAlso <http://purl.uniprot.org/phi-base/PHI:104> .
<http://purl.uniprot.org/phi-base/PHI:104>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/PHI-base> .

Change to the cross-references to Human Protein Atlas (HPA)

We have changed the way we present the Human Protein Atlas database cross-references. Links between UniProtKB entries and HPA used to be established by HPA antibody identifier, but are now based on Ensembl Gene identifiers.

We have also introduced an additional field in these cross-references to indicate the level of RNA tissue specificity. The RNA specificity category is based on mRNA expression levels in the analyzed samples. The categories include: 'Tissue enriched', 'Group enriched', 'Tissue enhanced', 'Low tissue specificity' and 'Not detected'. For more details on these categories, see the Classification of transcriptomics data by Human Protein Atlas.

Text format

Example: Q9NSG2

Previous format:

DR   HPA; HPA023778; -.
DR   HPA; HPA024451; -.

New format:

DR   HPA; ENSG00000000460; Tissue enhanced (lymphoid).

XML format

Example: Q9NSG2

Previous format:

<dbReference type="HPA" id="HPA023778"/>
<dbReference type="HPA" id="HPA024451"/>

New format:

<dbReference type="HPA" id="ENSG00000000460">
  <property type="expression patterns" value="Tissue enhanced (lymphoid)"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: Q9NSG2

Previous format:

uniprot:Q9NSG2
  rdfs:seeAlso <http://purl.uniprot.org/hpa/HPA023778> ,
               <http://purl.uniprot.org/hpa/HPA024451> .
<http://purl.uniprot.org/hpa/HPA023778>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/HPA> .
<http://purl.uniprot.org/hpa/HPA024451>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/HPA> .

New format:

uniprot:Q9NSG2
  rdfs:seeAlso <http://www.proteinatlas.org/ENSG00000000460> .
<http://www.proteinatlas.org/ENSG00000000460>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/HPA> ;
  rdfs:comment "Tissue enhanced (lymphoid)" .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • Thiazolidine linkage to a ring-opened DNA abasic site
  • Deoxyhypusine

RDF news

Change of URIs for the Human Protein Atlas (HPA) database

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the Human Protein Atlas database have therefore been updated from:

http://purl.uniprot.org/hpa/<ID>

to:

http://www.proteinatlas.org/<ID>

If required for backward compatibility, you will be able to use the following query to add the old URIs:

PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/HPA> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/HPA> .
   BIND(iri(concat('http://purl.uniprot.org/hpa/', substr(str(?new),29))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/hpa/<ID> URIs will be maintained.

Standardized MD5 checksums in UniProt RDF

The UniProt databases UniProtKB, UniRef and UniParc have historically provided a CRC-64 checksum for the amino acid sequences. In the UniParc RDF representation we had already introduced an MD5 checksum, and we have now replaced it with a SPARQL 1.1 compliant MD5 representation (lowercase string) and use this across all databases. This allows to use the MD5 function defined in SPARQL 1.1 to check that the sequence string is not corrupted, without the need to use the lowercase (LCASE) function and a cast to string, as it was formerly the case:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX up:<http://purl.uniprot.org/core/>
SELECT ?computedMD5 ((?uniprotMD5 = ?computedMD5) AS ?md5SumsMatch)
WHERE
{
  ?protein a up:Protein ;
    up:sequence ?sequence .
  ?sequence rdf:value ?value ;
    up:md5Checksum ?uniprotMD5 .
  BIND(MD5(?value) AS ?computedMD5)
}

UniProt release 2020_01

Published February 26, 2020

Headline

Coronavirus SARS-CoV-2 in UniProtKB

At the end of 2019, a novel coronavirus (nCoV) of animal origin started infecting humans, initiating a severe outbreak in China. nCoV infection can result in severe and even fatal respiratory diseases, such as acute respiratory distress syndrome. The virus is highly contagious and transmission occurs via airborne droplets and contact. On January 30th, 2019-nCoV was designated a global health emergency by the WHO. On February 11th, the WHO called the disease caused by the virus COVID-19, and the virus itself was named Severe Acute Respiratory Syndrome-related coronavirus 2 or SARS-CoV-2 by the International Committee on Taxonomy of Viruses (ICTV).

SARS-CoV-2 belongs to the large family of Coronaviridae, genus Betacoronavirus. This genus comprises mainly vertebrate respiratory viruses, including HCoV-OC43, which is responsible for 10% of common colds, and SARS, which caused an epidemic in 2003, resulting in over 8,000 infected individuals in 26 countries. The novel coronavirus genome has been sequenced. Its close similarity to SARS suggests it has emerged from the same reservoir, namely bats.

With a size of 30 kb, coronaviruses have the largest RNA genomes known to date. The genome encodes a polyprotein 1a that can be elongated by ribosomal frameshifting to produce polyprotein 1ab. The short and elongated polyproteins contain 11 and 15 chains, respectively, and are dedicated to viral RNA transcription and replication, while controlling the host antiviral defense. A strategy used by the virus to escape host cell innate immunity is to induce the formation of a specialized intracellular compartment from the endoplasmic reticulum, called endoplasmic spherules, which protects viral dsRNA replication intermediates. Later on subgenomic mRNAs are translated to produce virion structural proteins and yet another set of immune modulatory factors. Virions are assembled at the ER-Golgi intermediate compartment (or ERGIC) and exported out of the cell. The freshly exported virion is not yet infectious. Its surface is covered by spikes, giving the impression a crown (corona in Latin, hence its name), but spike proteins have to be cleaved in order to become functional and to confer infectivity on the virion. The activating proteolytic cleavages occur in the extracellular space.

It is at the level of spike proteins that SARS-CoV-2 diverges from SARS, differing in both amino acid sequence and glycosylation. The SARS-CoV-2 spike protein cleavage site comprises several arginines, making it an excellent substrate for many host proteases. This feature is predicted to enhance virus tropism and virulence. SARS-CoV-2 interacts with the same host receptor as SARS, ACE2, which presumably explains why both viruses infect lungs, as well as the small intestine and kidney. The functions of several other SARS-CoV-2 proteins are still unclear and need further investigations. Among them is SARS-CoV-2 NS8 protein, which shares sequence similarity with some Bat-hosted coronavirus NS8 proteins, but is entirely different from SARS NS8a or NS8b. Thus, in spite of many similarities to SARS and other coronaviruses, SARS-CoV-2 displays unique molecular features that lead to unpredictable behavior during infection.

SARS-CoV-2 protein sequences from the current public health emergency have been annotated in UniProtKB and made available as a pre-release dataset on the UniProt FTP site. These entries will be available in the usual file formats as part of release 2020_02.

UniProt release news

Change of release cycle

Starting with release 2020_01 of February 26th, UniProt releases are published every 8 weeks. Release 2020_02 is scheduled for April 22nd, 2020.

See also: How frequently is UniProt released? What is the synchronization delay with other databases?

UniProtKB news

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted disease

  • Popov-Chang syndrome

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • N6-lactoyllysine

Modified term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • 6-(S-cysteinyl)-8alpha-(pros-histidyl)-FAD (Cys-His) -> 6-(S-cysteinyl)-8alpha-(pros-histidyl)-FAD (His-Cys)

UniProt release 2019_11

Published December 18, 2019

Headline

Thicker than water

We know about blood types and their incompatibility; transfusing someone who is O- with AB+ blood can be lethal. The ABO alleles present on chromosome 9 decide our blood type. The A and B antigens are a set of red blood cell surface carbohydrates ending in α-1,3-linked N-acetylgalactosamine and α-1,3-linked galactose respectively, while type O blood has neither of these cell surface sugars. Sequence variations in the ABO gene determine if the encoded protein has α-1,3-N-acetylgalactosaminyltransferase activity and makes type A blood, or if it has α-1,3-galactosyltransferase activity and makes type B blood. When both alleles are present, we make type AB blood. Deletion of a single G nucleotide in the ABO gene leads to a truncated inactive product and type O blood, which has the non-modified H antigen.

To improve the usability of blood, people have tried for years to find a way to enzymatically convert A or B blood to type O; it seems an obvious way to increase the supply of universal donor (which would still require Rhesus matching). While such enzymes have been found, they are not yet ideal, as they either work at high concentration or have very specific buffer requirements, not met by blood.

By screening human fecal metagenomic libraries, Rahfeld et al. have isolated a pair of enzymes from the obligate gut anaerobe Flavonifractor plautii that efficiently converts the A to H antigen (type O). The first enzyme (A type blood N-acetyl-alpha-D-galactosamine deacetylase, ADAC) deacylates all A antigen subtypes tested (and there are many), while the second enzyme (A type blood alpha-D-galactosamine galactosaminidase, AGAL) removes the residual galactosamine moiety. This reaction can occur on red blood cells and in blood, as opposed to a buffer system, and at low enzyme concentration, and thus shows promise for uses in blood production. Further testing is underway, and we still need a way to remove the B antigen, but this could well help increase the flexibility of our blood supply. It still won't solve the world shortage of blood, only more donors can do that...

As of this release, ADAC and AGAL have been annotated and are available in UniProtKB/Swiss-Prot.

UniProtKB news

Change of FT and CC sections in UniProtKB text format

We have changed the format of the FT and CC section of the UniProtKB text files. The changes of the FT section likely affects all parsers, and software will have to be adapted accordingly. The changes of the CC section are smaller, but may also require code adaptations depending on the CC annotation types that you parse.

The motivation for this change is described in the section "Functional annotation of different gene products in UniProtKB/Swiss-Prot" below, where you can also find the technical details and examples under the heading Text format.

Change of line length in UniProtKB text format

Historically, the lines of the UniProtKB text format have been wrapped at 75 characters for technical reasons (terminal screen size and data processing capabilities). When these technical restrictions vanished, we introduced exceptions for data like URLs, protein names and cross-references where line wrapping does not improve readability. These lines can be up to 255 characters long, but most lines are still wrapped at 75 characters for readability. We have now increased the maximum number of characters for wrapped lines to 80 in the context of the format change of the FT section of the UniProtKB text format for the functional annotation of different gene products in UniProtKB/Swiss-Prot described below.

Functional annotation of different gene products in UniProtKB/Swiss-Prot

To reduce database redundancy, the UniProtKB/Swiss-Prot policy is to describe, whenever possible, all protein products that are encoded by one gene in a given species in a single entry. This includes isoforms generated by alternative promoter usage, alternative splicing, alternative initiation and ribosomal frameshifting. We assign a name and a unique identifier to each isoform and choose one of them to be the canonical sequence that is shown in the UniProtKB text and XML format (the RDF format shows all sequences). All positional annotations in the entry referred to this canonical sequence until this release. Some gene products are precursors that are processed by proteolytic cleavage to generate the biologically active product(s). These products are described by their location on the sequence, a name and a unique identifier.

When isoforms, or products of proteolytic cleavage, are known to differ in their function or other characteristics, we generally describe this in the text of the respective annotations. To make this information also accessible to software applications, we adapted the UniProtKB text format to describe the product to which an annotation applies in a computer-processable way. The schemas of the XML and RDF format already supported this and required no changes. The following sections describe the changes for the text format and how the data is represented in the XML and RDF format.

Text format

Isoforms are described in ALTERNATIVE PRODUCTS annotations in the CC section. The products of proteolytic cleavage are described in PEPTIDE and CHAIN annotations in the FT section. All three annotation types provide a name (<ProductName>) and a unique ID (<ProductId>) for the product that they describe:

  • ALTERNATIVE PRODUCTS annotations show the name of an isoform in the Name field and its ID in the IsoId field.
    CC   -!- ALTERNATIVE PRODUCTS:
    ...
    CC       Name=<ProductName>;
    CC         IsoId=<ProductId>; Sequence=Displayed;
    
  • PEPTIDE and CHAIN annotations showed the name of a proteolytic cleavage product in the <Description> field and its ID in the FTId field in the previous text format:
    FT   CHAIN       <B>    <E>       <ProductName>.
    FT                                /FTId=<ProductId>.
    
    In the new text format that is described in more details in the FT section they are shown in the /note= and /id= qualifiers, respectively:
    FT   CHAIN           <B>..<E>
    FT                   /note="<ProductName>"
    FT                   /id="<ProductId>"
    

Example: O60443

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=3;
CC       Name=1; Synonyms=Long;
CC         IsoId=O60443-1; Sequence=Displayed;
CC       Name=2; Synonyms=Short;
CC         IsoId=O60443-2; Sequence=VSP_004190;
CC         Note=No experimental confirmation available.;
CC       Name=3;
CC         IsoId=O60443-3; Sequence=VSP_044276;
...
FT   CHAIN         1    496       Gasdermin-E.
FT                                /FTId=PRO_0000148178.
FT   CHAIN         1    270       Gasdermin-E, N-terminal.
FT                                {ECO:0000269|PubMed:27281216,
FT                                ECO:0000305|PubMed:28459430}.
FT                                /FTId=PRO_0000442786.
FT   CHAIN       271    496       Gasdermin-E, C-terminal.
FT                                {ECO:0000305|PubMed:28459430}.
FT                                /FTId=PRO_0000442787.
CC section

The annotation types in the CC section describe a product by its name (isoform names are prefixed with the term "Isoform"). In the format descriptions below this name is represented by <ProductName>. Different products are described in separate annotations (see FUNCTION and BIOPHYSICOCHEMICAL PROPERTIES examples).

All annotation types of the CC section start with:

CC   -!- <TYPE>:

Where <TYPE> is a value from the controlled vocabulary of annotation types.

In some annotation types the content of the annotation used to directly follow the <TYPE>, and lines were wrapped at 75 chars:

CC   -!- <TYPE>: <Content>

In the new format a <ProductName> may be added between the <TYPE> and the <Content> and lines are wrapped at 80 chars (see Change of line length in UniProtKB text format ):

CC   -!- <TYPE>: [<ProductName>]: <Content>

The <ProductName> is surrounded by square brackets and separated by a colon from the <Content> to make it possible to parse it with a POSIX ERE like this one:

^CC   -!- ([^:]+):(?: \[(.+?)\]:)? (.+)

Where $1=<TYPE>, $2=<ProductName>, $3=<Content>.

In annotation types where the content is structured as a list of different fields that are formatted according to custom rules for better readability, the annotation content starts on a new line:

CC   -!- <TYPE>:
CC       <Content>

In the new format a <ProductName> may be added after the <TYPE> and this line is not wrapped (i.e. it may in rare cases exceed 80 chars).

CC   -!- <TYPE>: [<ProductName>]:
CC       <Content>

The format of the <Content> remains unchanged.

A <ProductName> cannot be added to ALTERNATIVE PRODUCTS and INTERACTION annotations. The INTERACTION format will be adapted in a different way to describe binary interactions that involve isoforms and/or products of proteolytic cleavage (see Change of annotation topic 'Interaction' ).

Please note that the previous text format of SUBCELLULAR LOCATION, COFACTOR and MASS SPECTROMETRY annotations already allowed to specify a product name/ID, but we have adapted it to be consistent with all other annotation types.

Representative examples for different annotation types are shown here:

FUNCTION

Example: Q96F85

CC   -!- ALTERNATIVE PRODUCTS:
CC       Event=Alternative splicing; Named isoforms=2;
CC       Name=1; Synonyms=CRIP1a;
CC         IsoId=Q96F85-1; Sequence=Displayed;
CC       Name=2; Synonyms=CRIP1b;
CC         IsoId=Q96F85-2; Sequence=VSP_035598;

Previous format:

CC   -!- FUNCTION: Isoform 1 suppresses cannabinoid receptor CNR1-mediated
CC       tonic inhibition of voltage-gated calcium channels. Isoform 2 does
CC       not have this effect. {ECO:0000269|PubMed:17895407}.

New format:

CC   -!- FUNCTION: [Isoform 1]: Suppresses cannabinoid receptor CNR1-mediated
CC       tonic inhibition of voltage-gated calcium channels.
CC       {ECO:0000269|PubMed:17895407}.
CC   -!- FUNCTION: [Isoform 2]: Does not suppress cannabinoid receptor CNR1-
CC       mediated tonic inhibition of voltage-gated calcium channels.
CC       {ECO:0000269|PubMed:17895407}.

DISEASE

Example: P35555

FT   CHAIN      2732   2871       Asprosin. {ECO:0000305|PubMed:27087445,
FT                                ECO:0000305|PubMed:9817919}.
FT                                /FTId=PRO_0000436882.

Previous format:

CC   -!- DISEASE: Marfan lipodystrophy syndrome (MFLS) [MIM:616914]: A
CC       syndrome characterized by congenital ...
CC       Note=The disease is caused by mutations affecting the gene
CC       represented in this entry. Asprosin: Mutations specifically affect
CC       Asprosin, a hormone peptide present at the C-terminus of
CC       Fibrillin-1 chain, which is cleaved from Fibrillin-1 following
CC       secretion (PubMed:27087445). {ECO:0000269|PubMed:27087445}.

New format:

CC   -!- DISEASE: [Asprosin]: Marfan lipodystrophy syndrome (MFLS) [MIM:616914]:
CC       A syndrome characterized by congenital ...
CC       Note=The disease is caused by mutations affecting the gene represented
CC       in this entry. {ECO:0000269|PubMed:27087445}.

SUBCELLULAR LOCATION

Please note that the previous text format of SUBCELLULAR LOCATION annotations already allowed to describe a product by its name in the optional first field. To be consistent with all other annotation types we have added square brackets around the product name.

Example: Q13421

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=3; Synonyms=SMRP;
CC         IsoId=Q13421-2; Sequence=VSP_021059, VSP_021060;
...
FT   CHAIN        37    286       Megakaryocyte-potentiating factor.
FT                                /FTId=PRO_0000253560.

Previous format:

CC   -!- SUBCELLULAR LOCATION: Cell membrane; Lipid-anchor, GPI-anchor.
CC       Golgi apparatus.
CC   -!- SUBCELLULAR LOCATION: Megakaryocyte-potentiating factor: Secreted.
CC   -!- SUBCELLULAR LOCATION: Isoform 3: Secreted.

New format:

CC   -!- SUBCELLULAR LOCATION: Cell membrane; Lipid-anchor, GPI-anchor. Golgi
CC       apparatus.
CC   -!- SUBCELLULAR LOCATION: [Megakaryocyte-potentiating factor]: Secreted.
CC   -!- SUBCELLULAR LOCATION: [Isoform 3]: Secreted.

MASS SPECTROMETRY

Please note that the previous text format of MASS SPECTROMETRY annotations already allowed to describe a product (by its sequence range and an optional isoform ID) in the Range field. To be consistent with all other annotation types we have replaced the Range field by a <ProductName> field.

Example: P09493

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=3; Synonyms=Fibroblast, TM3;
CC         IsoId=P09493-3; Sequence=VSP_006577, VSP_006579;

Previous format:

CC   -!- MASS SPECTROMETRY: Mass=32875.93; Method=MALDI; Range=1-284
CC       (P09493-3); Evidence={ECO:0000269|PubMed:11840567};

New format:

CC   -!- MASS SPECTROMETRY: [Isoform 3]: Mass=32875.93; Method=MALDI;
CC       Evidence={ECO:0000269|PubMed:11840567};

RNA EDITING

Example: Q9P225

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=3;
CC         IsoId=Q9P225-3; Sequence=VSP_031913, VSP_031914, VSP_031915;

Previous format:

CC   -!- RNA EDITING: Modified_positions=Not_applicable; Note=Exon 13
CC       included in isoform 3 is extensively edited in brain.
CC       {ECO:0000269|PubMed:20835228};

New format:

CC   -!- RNA EDITING: [Isoform 3]: Modified_positions=Not_applicable; Note=Exon
CC       13 is extensively edited in brain. {ECO:0000269|PubMed:20835228};

WEB RESOURCE

Example: P50570

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1;
CC         IsoId=P50570-1; Sequence=Displayed;

Previous format:

CC   -!- WEB RESOURCE: Name=The UMD-DNM2-isoform 1 mutations database;
CC       URL="http://www.umd.be/DNM2/";

New format:

CC   -!- WEB RESOURCE: [Isoform 1]: Name=The UMD-DNM2-isoform 1 mutations
CC       database;
CC       URL="http://www.umd.be/DNM2/";

CATALYTIC ACTIVITY

Example: Q2YHF0

FT   CHAIN      1475   2092       Serine protease NS3.
FT                                {ECO:0000250|UniProtKB:P29990}.
FT                                /FTId=PRO_0000268140.
...
FT   CHAIN      2488   3387       RNA-directed RNA polymerase NS5.
FT                                {ECO:0000250|UniProtKB:P29990}.
FT                                /FTId=PRO_0000268144.

Previous format:

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=Selective hydrolysis of -Xaa-Xaa-|-Yaa- bonds in which
CC         each of the Xaa can be either Arg or Lys and Yaa can be either
CC         Ser or Ala.; EC=3.4.21.91;
CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=a ribonucleoside 5'-triphosphate + RNA(n) = diphosphate +
CC         RNA(n+1); Xref=Rhea:RHEA:21248, Rhea:RHEA-COMP:11128, Rhea:RHEA-
CC         COMP:11129, ChEBI:CHEBI:33019, ChEBI:CHEBI:61557,
CC         ChEBI:CHEBI:83400; EC=2.7.7.48; Evidence={ECO:0000255|PROSITE-
CC         ProRule:PRU00539};

New format:

CC   -!- CATALYTIC ACTIVITY: [Serine protease NS3]:
CC       Reaction=Selective hydrolysis of -Xaa-Xaa-|-Yaa- bonds in which each of
CC         the Xaa can be either Arg or Lys and Yaa can be either Ser or Ala.;
CC         EC=3.4.21.91;
CC   -!- CATALYTIC ACTIVITY: [RNA-directed RNA polymerase NS5]:
CC       Reaction=a ribonucleoside 5'-triphosphate + RNA(n) = diphosphate +
CC         RNA(n+1); Xref=Rhea:RHEA:21248, Rhea:RHEA-COMP:11128, Rhea:RHEA-
CC         COMP:11129, ChEBI:CHEBI:33019, ChEBI:CHEBI:61557, ChEBI:CHEBI:83400;
CC         EC=2.7.7.48; Evidence={ECO:0000255|PROSITE-ProRule:PRU00539};

COFACTOR

Please note that the previous text format of COFACTOR annotations already allowed to describe a product by its name in the optional first field. To be consistent with all other annotation types we have added square brackets around the product name.

Example: P26662

FT   CHAIN      1027   1657       Serine protease NS3. {ECO:0000255}.
FT                                /FTId=PRO_0000037644.
FT   CHAIN      1658   1711       Non-structural protein 4A. {ECO:0000255}.
FT                                /FTId=PRO_0000037645.

Previous format:

CC   -!- COFACTOR: Serine protease NS3:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
CC         Evidence={ECO:0000269|PubMed:9060645};
CC       Note=Binds 1 zinc ion. {ECO:0000269|PubMed:9060645};
CC   -!- COFACTOR: Non-structural protein 5A:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105; Evidence={ECO:0000250};
CC       Note=Binds 1 zinc ion in the NS5A N-terminal domain.
CC       {ECO:0000250};

New format:

CC   -!- COFACTOR: [Serine protease NS3]:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105;
CC         Evidence={ECO:0000269|PubMed:9060645};
CC       Note=Binds 1 zinc ion. {ECO:0000269|PubMed:9060645};
CC   -!- COFACTOR: [Non-structural protein 5A]:
CC       Name=Zn(2+); Xref=ChEBI:CHEBI:29105; Evidence={ECO:0000250};
CC       Note=Binds 1 zinc ion in the NS5A N-terminal domain. {ECO:0000250};

BIOPHYSICOCHEMICAL PROPERTIES

Example: Q9ULC5

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1; Synonyms=ACSL5b, ACSL5-fl;
CC         IsoId=Q9ULC5-1; Sequence=Displayed;
...
CC       Name=3; Synonyms=ACSL5delta20;
CC         IsoId=Q9ULC5-4; Sequence=VSP_038233;

Previous format:

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC       Kinetic parameters:
CC         KM=0.11 uM for palmitic acid (isoform 1 at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.38 uM for palmitic acid (isoform 1 at pH 9.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.04 uM for palmitic acid (isoform 3 at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.15 uM for palmitic acid (isoform 3 at pH 8.5)
CC         {ECO:0000269|PubMed:17681178};
CC       pH dependence:
CC         Optimum pH is 9.5 (isoform 1), 7.5-8.5 (isoform 3).
CC         {ECO:0000269|PubMed:17681178};

New format:

CC   -!- BIOPHYSICOCHEMICAL PROPERTIES: [Isoform 1]:
CC       Kinetic parameters:
CC         KM=0.11 uM for palmitic acid (at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.38 uM for palmitic acid (at pH 9.5)
CC         {ECO:0000269|PubMed:17681178};
CC       pH dependence:
CC         Optimum pH is 9.5. {ECO:0000269|PubMed:17681178};
CC   -!- BIOPHYSICOCHEMICAL PROPERTIES: [Isoform 3]:
CC       Kinetic parameters:
CC         KM=0.04 uM for palmitic acid (at pH 7.5)
CC         {ECO:0000269|PubMed:17681178};
CC         KM=0.15 uM for palmitic acid (at pH 8.5)
CC         {ECO:0000269|PubMed:17681178};
CC       pH dependence:
CC         Optimum pH is 7.5-8.5. {ECO:0000269|PubMed:17681178};

SEQUENCE CAUTION

Example: Q9NQS3

Previous format:

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1;
CC         IsoId=Q9NQS3-1; Sequence=Displayed;
...
CC       Name=3;
CC         IsoId=Q9NQS3-3; Sequence=VSP_046893, VSP_046894;
CC         Note=Ref.2 (BAC11404) sequence differs from that shown due to
CC         erroneous termination (Truncated C-terminus). {ECO:0000305};
...
CC   -!- SEQUENCE CAUTION:
CC       Sequence=AAH17572.1; Type=Erroneous initiation; Note=Truncated N-terminus.; Evidence={ECO:0000305};

New format:

CC   -!- ALTERNATIVE PRODUCTS:
...
CC       Name=1;
CC         IsoId=Q9NQS3-1; Sequence=Displayed;
...
CC       Name=3;
CC         IsoId=Q9NQS3-3; Sequence=VSP_046893, VSP_046894;
...
CC   -!- SEQUENCE CAUTION: [Isoform 1]:
CC       Sequence=AAH17572.1; Type=Erroneous initiation; Note=Truncated N-terminus.; Evidence={ECO:0000305};
CC   -!- SEQUENCE CAUTION: [Isoform 3]:
CC       Sequence=BAC11404.1; Type=Erroneous termination; Note=Truncated C-terminus.; Evidence={ECO:0000305};
FT section

Note: The format descriptions make use of POSIX ERE syntax.

All positional annotations in the FT section previously referred to the canonical sequence that is shown in the UniProtKB entry. This was the text format of these annotation types:

FT   <TYPE>      <B>    <E>       (<Description>.)?( {<Evidences>}.)?
(FT                                /FTId=<Id>.)?

Where

  • <TYPE> is a value from the controlled vocabulary of positional annotation types.
  • <B> and <E> are amino acid positions on the canonical sequence. For most annotation types, they are the begin and end position of a sequence range, but they have other semantics for some types (e.g. CROSSLNK and DISULFID).
  • <Description> may provide information in addition to that conveyed by the <TYPE> and the location <B> and <E>. This field is mandatory for some annotation types and optional for others.
  • <Evidences> are optional and added between curly braces.
  • <Id> is a unique annotation identifier that is mandatory for some annotation types, including CHAIN and PEPTIDE where it corresponds to the <ProductId>.

We have modified this format in order to describe amino acid positions on isoforms sequences. The new format is inspired by the INSDC's feature table format to enable code reuse:

FT   <TYPE>          <Location>
(FT                   /<Qualifier>(="<Value>")?)*

Where

  • <TYPE> is a value from the controlled vocabulary of positional annotation types.
  • <Location> is a sequence location on the canonical or an isoform sequence. We will use for now only a subset of the INSDC Location types: A <Location> must be either a single <Position> or a range of <Position> that may optionally be preceded by an isoform ID. The < and > symbols may be used with begin and end positions to indicate that the begin or end point is beyond the specified amino acid position. Please note that we have to extend the INSDC Location format with the ? symbol to allow us to represent all existing UniProtKB locations. This symbol may precede a <Position> to indicate that the exact position is unsure, or it may substitute the <Position> when the position is unknown.
    (<IsoformId>:)?((<|?)?<Position>|?)(..((>|?)?<Position>|?))?
    
  • /<Qualifier> may provide information in addition to that conveyed by the <TYPE> and <Location>. While we will follow the format of the INSDC Qualifiers, we will introduce our own <Qualifier> types where necessary. For this format change, we will represent the existing data with 3 qualifiers:
    • /note= will show the content of the current <Description> field.
    • /evidence= will show the content of the current <Evidences> field.
    • /id= will show the content of the current /FTId= field.

In a future format change, we may introduce more <Location> and <Qualifier> types to structure the description of positional annotations further.

Lines are wrapped at 80 chars (see section Change of line length in UniProtKB text format above).

Example: P84077

This example illustrates the format change with a selection of representative positional annotation types that refer to the canonical sequence.

Previous format:

FT   INIT_MET      1      1       Removed. {ECO:0000244|PubMed:19413330,
FT                                ECO:0000244|PubMed:22223895,
FT                                ECO:0000269|PubMed:25255805,
FT                                ECO:0000269|PubMed:25807930}.
FT   CHAIN         2    181       ADP-ribosylation factor 1.
FT                                /FTId=PRO_0000207378.
...
FT   NP_BIND     126    129       GTP. {ECO:0000244|PDB:1HUR,
FT                                ECO:0000244|PDB:1RE0,
FT                                ECO:0000244|PDB:1U81,
FT                                ECO:0000244|PDB:3O47, ECO:0000305}.
...
FT   VARIANT      35     35       Y -> H (in PVNH8; decreased interaction
FT                                with GGA3; dbSNP:rs879036238).
FT                                {ECO:0000269|PubMed:28868155}.
FT                                /FTId=VAR_081272.
...
FT   HELIX         6      9       {ECO:0000244|PDB:1HUR}.

New format:

FT   INIT_MET        1
FT                   /note="Removed"
FT                   /evidence="ECO:0000244|PubMed:19413330,
FT                   ECO:0000244|PubMed:22223895, ECO:0000269|PubMed:25255805,
FT                   ECO:0000269|PubMed:25807930"
FT   CHAIN           2..181
FT                   /note="ADP-ribosylation factor 1"
FT                   /id="PRO_0000207378"
...
FT   NP_BIND         126..129
FT                   /note="GTP"
FT                   /evidence="ECO:0000244|PDB:1HUR, ECO:0000244|PDB:1RE0,
FT                   ECO:0000244|PDB:1U81, ECO:0000244|PDB:3O47, ECO:0000305"
...
FT   VARIANT         35
FT                   /note="Y -> H (in PVNH8; decreased interaction with GGA3;
FT                   dbSNP:rs879036238)"
FT                   /evidence="ECO:0000269|PubMed:28868155"
FT                   /id="VAR_081272"
...
FT   HELIX           6..9
FT                   /evidence="ECO:0000244|PDB:1HUR"

Example: P0C551

This example illustrates the use of the < and ? symbols in UniProtKB locations.

Previous format:

FT   SIGNAL       <1      ?       {ECO:0000250}.
FT   PROPEP        ?     17       {ECO:0000250}.
FT                                /FTId=PRO_0000293097.
FT   CHAIN        18    142       Acidic phospholipase A2 KBf-grIB.
FT                                /FTId=FTId=PRO_0000293098.

New format:

FT   SIGNAL          <1..?
FT                   /evidence="ECO:0000250"
FT   PROPEP          ?..17
FT                   /evidence="ECO:0000250"
FT                   /id="PRO_0000293097"
FT   CHAIN           18..142
FT                   /note="Acidic phospholipase A2 KBf-grIB"
FT                   /id="PRO_0000293098"

Example: P12821

This example illustrates how positional annotations for isoforms are represented.

Previous format:

CC   -!- ALTERNATIVE PRODUCTS:
CC       ...
CC       Name=Testis-specific; Synonyms=ACE-T;
CC         IsoId=P12821-3, P22966-1;
CC         Sequence=VSP_035120, VSP_035121;
CC         Note=Variant in position: 32:S->P (in dbSNP:rs4317). Variant in
CC         position: 49:S->G (in dbSNP:rs4318).;
...
FT   VARIANT     154    154       A -> T (in dbSNP:rs13306087).
FT                                /FTId=VAR_029139.

New format:

CC   -!- ALTERNATIVE PRODUCTS:
CC       ...
CC       Name=Testis-specific; Synonyms=ACE-T;
CC         IsoId=P12821-3, P22966-1;
CC         Sequence=VSP_035120, VSP_035121;
...
FT   VARIANT         154
FT                   /note="A -> T (in dbSNP:rs13306087)"
FT                   /id="VAR_029139"
...
FT   VARIANT         P12821-3:32
FT                   /note="S -> P (in dbSNP:rs4317)"
FT                   /id="VAR_x"
FT   VARIANT         P12821-3:49
FT                   /note="S -> G (in dbSNP:rs4318)"
FT                   /id="VAR_y"

XML format

The UniProtKB XSD already allowed to describe the product to which an annotation applies and required no changes.

Isoforms are described in "alternative products" annotations. The products of proteolytic cleavage are described in "peptide" and "chain" annotations. All three annotation types provide a name (<ProductName>) and/or a unique ID (<ProductId>) for the product that they describe:

  • "alternative products" annotations describe each isoform by an isoform element of isoformType. The isoformType describes the product IDs and names with sequences of id and name elements (where the first element in each sequence is the main product ID/name).
    <comment type="alternative products">
      ...
      <isoform>
        <id><ProductId></id>
        <id><OldProductId></id>
        <name><ProductName></name>
        <name><AlternativeProductName></name>
        ...
      </isoform>
      ...
    </comment>
    
  • "peptide" and "chain" annotations show the name and ID of a proteolytic cleavage product in the description and id attributes of the featureType.
    <feature type="chain" description="<ProductName>" id="<ProductId>">
    ...
    </feature>
    
commentType

The commentType has two ways to indicate that the annotation applies to a specific product:

  • An optional molecule element of moleculeType allows to describe a product by its name or/and unique ID. It is currently only used for "subcellular location" and "cofactor" annotations (see examples below). In the future it may be used for all annotations that are represented by commentType.
  • An optional sequence of location elements of locationType allows to describe the sequence coordinates of an annotation. The locationType has an optional sequence attribute that is only set (to an isoform ID) when the coordinates are not for the canonical sequence. Sequence coordinates may currently be given for "rna editing", "sequence caution" and "mass spectrometry" annotations. In the future sequence caution and mass spectrometry annotations will no longer describe sequence coordinates.

subcellular location

Example: Q13421

<comment type="alternative products">
  ...
  <isoform>
    <id>Q13421-2</id>
    <name>3</name>
    <name>SMRP</name>
    <sequence type="described" ref="VSP_021059 VSP_021060"/>
    ...
  </isoform>
  ...
</comment>
...
<feature type="chain" description="Megakaryocyte-potentiating factor"
                      id="PRO_0000253560">
  <location>
    <begin position="37"/>
    <end position="286"/>
  </location>
</feature>
<comment type="subcellular location">
  <subcellularLocation>
    <location>Cell membrane</location>
    <topology>Lipid-anchor</topology>
    <topology>GPI-anchor</topology>
  </subcellularLocation>
  <subcellularLocation>
    <location>Golgi apparatus</location>
  </subcellularLocation>
</comment>
<comment type="subcellular location">
  <molecule>Megakaryocyte-potentiating factor</molecule>
  <subcellularLocation>
    <location>Secreted</location>
  </subcellularLocation>
</comment>
<comment type="subcellular location">
  <molecule>Isoform 3</molecule>
  <subcellularLocation>
    <location>Secreted</location>
  </subcellularLocation>
</comment>

cofactor

Example: P26662

<feature type="chain" description="Serine protease NS3"
                      id="PRO_0000037644" evidence="4">
  <location>
    <begin position="1027"/>
    <end position="1657"/>
  </location>
</feature>
<feature type="chain" description="Non-structural protein 4A"
                      id="PRO_0000037645" evidence="4">
  <location>
    <begin position="1658"/>
    <end position="1711"/>
  </location>
</feature>
<comment type="cofactor">
  <molecule>Serine protease NS3</molecule>
  <cofactor evidence="14">
    <name>Zn(2+)</name>
    <dbReference type="ChEBI" id="CHEBI:29105"/>
  </cofactor>
  <text evidence="14">Binds 1 zinc ion.</text>
</comment>
<comment type="cofactor">
  <molecule>Non-structural protein 5A</molecule>
  <cofactor evidence="3">
    <name>Zn(2+)</name>
    <dbReference type="ChEBI" id="CHEBI:29105"/>
  </cofactor>
  <text evidence="3">Binds 1 zinc ion in the NS5A N-terminal domain.</text>
</comment>
featureType

The featureType has a mandatory location element of locationType to describe the sequence coordinates of an annotation.

Example: P84077

<feature type="initiator methionine" description="Removed" evidence="6 7 21 22">
  <location>
    <position position="1"/>
  </location>
</feature>
<feature type="chain" description="ADP-ribosylation factor 1" id="PRO_0000207378">
  <location>
    <begin position="2"/>
    <end position="181"/>
  </location>
</feature>
...
<feature type="nucleotide phosphate-binding region" description="GTP" evidence="1 2 3 4 25">
  <location>
    <begin position="126"/>
    <end position="129"/>
  </location>
</feature>
...
<feature type="sequence variant" description="In PVNH8; decreased interaction with GGA3; dbSNP:rs879036238." id="VAR_081272" evidence="23">
  <original>Y</original>
  <variation>H</variation>
  <location>
    <position position="35"/>
  </location>
</feature>
...
<feature type="helix" evidence="1">
  <location>
    <begin position="6"/>
    <end position="9"/>
  </location>
</feature>

The locationType has an optional sequence attribute that is only set (to an isoform ID) when the coordinates are not for the canonical sequence.

Example: P12821

Previous representation:

<comment type="alternative products">
  ...
  <isoform>
    <id>P12821-3</id>
    <id>P22966-1</id>
    <name>Testis-specific</name>
    <name>ACE-T</name>
    <sequence type="described" ref="VSP_035120 VSP_035121"/>
    <text>Variant in position: 32:S->P (in dbSNP:rs4317). Variant in position: 49:S->G (in dbSNP:rs4318).</text>
  </isoform>
  ...
</comment>
...
<feature type="sequence variant" description="In dbSNP:rs13306087." id="VAR_029139">
  <original>A</original>
  <variation>T</variation>
  <location>
    <position position="154"/>
  </location>
</feature>

New representation:

<comment type="alternative products">
  ...
  <isoform>
    <id>P12821-3</id>
    <id>P22966-1</id>
    <name>Testis-specific</name>
    <name>ACE-T</name>
    <sequence type="described" ref="VSP_035120 VSP_035121"/>
  </isoform>
  ...
</comment>
...
<feature type="sequence variant" description="In dbSNP:rs13306087." id="VAR_029139">
  <original>A</original>
  <variation>T</variation>
  <location>
    <position position="154"/>
  </location>
</feature>
...
<feature type="sequence variant" description="In dbSNP:rs4317." id="VAR_x">
  <original>S</original>
  <variation>P</variation>
  <location sequence="P12821-3">
    <position position="32"/>
  </location>
</feature>
<feature type="sequence variant" description="In dbSNP:rs4318." id="VAR_y">
  <original>S</original>
  <variation>G</variation>
  <location sequence="P12821-3">
    <position position="49"/>
  </location>
</feature>

RDF format

The UniProt RDF schema ontology already allowed to describe the product to which an annotation applies and required no changes for this purpose.

The RDF format has a single hierarchy of Annotation classes with various intermediary classes. The subclass Sequence_Annotation groups all classes that refer to a location on a protein sequence. This location is represented with FALDO and always indicates the FALDO reference sequence for the location (the RDF format makes no special case for a canonical sequence). Annotations that do not refer to a specific location on a protein sequence, but that apply to a given product, describe the sequence of this product with a sequence property. The object of this property may be a Sequence or a Chain_Annotation / Peptide_Annotation that describes a sequence that is the product of proteolytic processing.

Please note that the change of mass spectrometry annotations required an adaptation of the hierarchy of Annotation classes: The Mass_Spectrometry_Annotation class no longer is an rdfs:subClassOf of the Sequence_Annotation class, but a direct rdfs:subClassOf of the Annotation class.

Example: Q13421

@prefix up: <http://purl.uniprot.org/core/> .
@prefix uniprot: <http://purl.uniprot.org/uniprot/> .
@prefix isoform: <http://purl.uniprot.org/isoforms/> .
@prefix annotation: <http://purl.uniprot.org/annotation/> .
@prefix faldo: <http://biohackathon.org/resource/faldo#> .

uniprot:Q13421
  up:annotation
    annotation:PRO_0000253560 ,
    <Q13421#SIPADAC7D651EFC09CC> ,
    <Q13421#SIP307BEB951103B073> ,
    <Q13421#SIPB6746E472B99B031> ,
    ...
  up:sequence
    isoform:Q13421-1 ,
    isoform:Q13421-3 ,
    isoform:Q13421-2 ,
    isoform:Q13421-4 ;

annotation:PRO_0000253560
  rdf:type up:Chain_Annotation ;
  rdfs:comment "Megakaryocyte-potentiating factor" ;
  up:range range:22853569102360878tt37tt286 .
range:22853569102360878tt37tt286
  rdf:type faldo:Region ;
  faldo:begin position:22853569102360878tt37 ;
  faldo:end position:22853569102360878tt286 .
position:22853569102360878tt37
  rdf:type faldo:Position , faldo:ExactPosition ;
  faldo:position 37 ;
  faldo:reference isoform:Q13421-1 .
position:22853569102360878tt286
  rdf:type faldo:Position , faldo:ExactPosition ;
  faldo:position 286 ;
  faldo:reference isoform:Q13421-1 .

<Q13421#SIPADAC7D651EFC09CC>
  rdf:type up:Subcellular_Location_Annotation ;
  up:locatedIn <Q13421#SIP04927440DF8EB941> ,
               <Q13421#SIP727DF431EB6C89EC> .

<Q13421#SIP307BEB951103B073>
  rdf:type up:Subcellular_Location_Annotation ;
  up:locatedIn <Q13421#SIPD59D33F5047A94FD> ;
  up:sequence annotation:PRO_0000253560 .

<Q13421#SIPB6746E472B99B031>
  rdf:type up:Subcellular_Location_Annotation ;
  up:locatedIn <Q13421#SIPD59D33F5047A94FD> ;
  up:sequence isoform:Q13421-2 .

isoform:Q13421-1
  rdf:type up:Simple_Sequence ;
  up:modified "2006-10-17"^^xsd:date ;
  up:version 2 ;
  up:precursor true ;
  up:mass 68986 ;
  up:crc64Checksum "FA17E3609B6CC9CA"^^xsd:token ;
  up:name "1" ;
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LLASTLA" .
isoform:Q13421-3
  rdf:type up:Modified_Sequence ;
  up:name "2" ;
  up:basedOn isoform:Q13421-1 ;
  up:modification annotation:VSP_021059 .
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LLASTLA" ;
isoform:Q13421-2
  rdf:type up:Modified_Sequence ;
  up:name "3" , "SMRP" ;
  up:basedOn isoform:Q13421-1 ;
  up:modification annotation:VSP_021059 , annotation:VSP_021060 .
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LRAPLPC" ;
isoform:Q13421-4
  rdf:type up:Modified_Sequence ;
  up:name "4" ;
  up:basedOn isoform:Q13421-1 ;
  up:modification annotation:VSP_021058 , annotation:VSP_021059 .
  rdf:value "MALPTARPLLGSCGTPALGSLLFLLFSL ... LLASTLA" ;

Cross-references to RNAct

Cross-references have been added to RNAct, a database of protein-RNA interaction predictions for model organisms with supporting experimental data.

RNAct is available at https://rnact.crg.eu.

The format of the explicit links is:

Resource abbreviation RNAct
Resource identifier UniProtKB accession number
Optional information 1 Molecule type

Example: Q9Y2I1

Show all entries having a cross-reference to RNAct.

Text format

Example: Q9Y2I1

DR   RNAct; Q9Y2I1; protein.

XML format

Example: Q9Y2I1

<dbReference type="RNAct" id="Q9Y2I1">
   <property type="molecule type" value="protein"/>
</dbreference>

RDF format

Example: Q9Y2I1

uniprot:Q9Y2I1
  rdfs:seeAlso <http://purl.uniprot.org/rnact/Q9Y2I1> .
<http://purl.uniprot.org/rnact/Q9Y2I1>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/RNAct> ;
  rdfs:comment "protein" .

Change of the cross-references to Pharos

We have introduced an additional field in the cross-references to the Pharos database to indicate the development status of a target. Targets are categorized into four development/druggability levels (TDLs), ranging from Tclin for approved drugs with known mechanisms of action, to Tdark for targets about which virtually nothing is known.

Text format

Example: P33151

DR   Pharos; P33151; Tbio.

XML format

Example: P33151

<dbReference type="Pharos" id="P33151">
  <property type="development level" value="Tbio"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: P33151

uniprot:P33151
  rdfs:seeAlso <http://purl.uniprot.org/pharos/P33151> .
<http://purl.uniprot.org/pharos/P33151>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Pharos> ;
  rdfs:comment "Tbio" .

Removal of the cross-references to PMAP-CutDB

Cross-references to PMAP-CutDB have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Cross-link' ('CROSSLNK' in the flat file):

  • 6-(S-cysteinyl)-8alpha-(pros-histidyl)-FAD (Cys-His)

Changes to keywords

Deleted keyword:

  • Complete proteome

Proteomes changes

The UniProt Proteomes portal is offering protein sequence sets obtained from the translation of sequenced genomes. Published genomes from NCBI Genome used to be brought into UniProt if they satisfied the following criteria:

  • The genome is annotated and a set of coding sequences is available.
  • The number of predicted coding sequences falls within a statistically significant range of published proteomes from neighbouring species.

We have changed these criteria to publish all proteomes that can be derived from NCBI genomes that are not considered to be low quality assemblies. We now use a subset of the RefSeq reasons to exclude a genome assembly to determine which proteomes to bring into UniProtKB and we give the reason(s) why a proteome is excluded from UniProtKB. We also provide two metrics to help users to assess the quality of a proteome:

  • A score obtained with the BUSCO software.
  • A score based on the number of coding sequences expected based on neighbouring species, "Complete Proteome Detector (CPD)".

The "Complete proteome" keyword was removed from all UniProtKB entries. Individual proteomes can be retrieved from the UniProt website by their unique proteome identifier, e.g. UP000005640.

UniProt release 2019_10

Published November 13, 2019

Headline

A scorpion venom toxin may help unravel the mystery of chronic pain

The old saying goes 'an ounce of prevention is worth a pound of cure', and indeed, our body has developed various strategies to alert us of potential dangers to avoid. One contributor to this strategy is TRPA1, also called the 'wasabi receptor'. TRPA1, a member of the transient receptor family (TRP), is a plasma membrane cation channel expressed by primary afferent sensory neurons. It is activated by chemically reactive electrophiles present in a range of environmental irritants and endogenous inflammatory agents. Cigarette smoke, for example, is rich in reactive electrophiles that can trigger TRPA1 in the cells that line the airways, inducing coughing and sustained airway inflammation. Some plants, such as mustard, wasabi or onions, have evolved compounds that activate TRPA1, possibly to ward off animals that might otherwise eat them. In this context, TRPA1 activation is responsible for the sinus-jolting sting of wasabi and the flood of tears associated with chopping onions.

Not only plants produce TRPA1 activating compounds. Black rock scorpions do too, as has been reported in a recent publication by Lin King et al. This comes as a surprise. Most animal toxins identified so far target voltage-gated ion channels, and the few known to act on TRP channels all activate the capsaicin receptor, TRPV1. The newly discovered black rock scorpion toxin has been called Wasabi receptor toxin or WaTx. In its mature form, it is a 19 amino acid-long peptide, which has the amazing ability to penetrate cells by passive diffusion. This property is not unique to WaTx, other proteins, such as HIV Tat or Drosophila penetratin also share it, but WaTx does not have any sequence similarity to them.

Once in the cell, WaTx binds TRPA1 at the same site as plant and environmental irritants, but the similarity ends there. Reactive electrophiles covalently bind TRPA1 and produce a large increase in the probability of channel opening characterized by brief transitions between open and closed states. This results in the influx of sodium and calcium ions. The influx of Ca(2+), in turn, causes the exocytosis of dense-core vesicles, the release of calcitonin-gene-related peptide (CGRP) and substance P, and ultimately induces neurogenic inflammation. WaTx non-covalent binding to TRPA1 stabilizes the open state of the channel and prolongs open time. Consequently, it induces neuronal depolarization and subsequent hypersensitivities, which are characteristic of chronic pain. In addition, it decreases the relative Ca(2+)-permeability of the channel. The Ca(+2) influx is not sufficient to trigger CGRP release and does not cause any inflammation. These observations show a striking convergent evolution between plants and animals in terms of binding site, resulting, however, in a very different modulation of cation channel activity and a distinct outcome in terms of inflammation.

TRPA1 is expressed in virtually every animal, from worms and humans, but WaTx only activates mammalian orthologs. Why so? It is difficult to say. Black rock scorpions feed on insects like cockroaches and beetles, as well as other small invertebrates such as millipedes, centipedes, spiders and rarely earthworms, but never mammals. Therefore, WaTx may have a deterrent role aimed specifically at mammalian predators.

One thing is certain: with WaTx, scorpions provide us with a powerful tool to study the central neural pathways contributing to chronic pain and to investigate the link between chronic pain and inflammation. TRPA1 is emerging as a potential target for new classes of non-opioid analgesics to treat chronic pain.

As of this release, WaTx has been annotated and is painlessly available in UniProtKB/Swiss-Prot.

UniProtKB news

Removal of the cross-references to EcoGene

Cross-references to EcoGene have been removed.

Change of the cross-references to DisProt

Cross-references to DisProt may now be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Example: Q9NQC3

Changes to the controlled vocabulary of human diseases

New diseases:

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2019_09

Published October 16, 2019

Headline

Biological weapons in the struggle for life

The apoplast, the extracellular space between plant cells, is a battleground between plants and attacking microorganisms. The battle often starts with the secretion of carbohydrate-degrading enzymes by the invader and a counter-attack by the plant with dedicated enzyme inhibitors. This happens for instance when Phytophthora sojae invades soybean. During very early infection stages (20 minutes to 2 hours), the oomycete expresses and secretes high levels of a xyloglucanase enzyme, called XEG1. XEG1 degrades the plant cell wall polymers and not only permits pathogen invasion, but also provides it with nutrients. XEG1 expression declines rapidly from 3 hours onwards.

Like other plants, soybean monitors the apoplastic environment through pattern recognition receptors and recognizes XEG1 as a PAMP (pathogen-associated molecular pattern). This recognition triggers defense responses, which include the secretion of GIP1, a xyloglucanase inhibitor. Efficient XEG1 inhibition by GIP1 increases soybean resistance towards P. sojae. This could be the end of the story, but P. sojae has another trick up its sleeve.

When Ma et al. studied GIP1 binding to XEG1 paralogs, they retrieved only one protein, called XLP1, in addition to XEG1. Like XEG1, XLP1 is targeted to the apoplast. It also shows a similar expression time course to XEG1, i.e. high levels during very early infection stages, followed by a rapid decline, suggesting a role in virulence. Unlike XEG1, however, XLP1 does not show any xyloglucanase activity. It is 52 residues shorter and is missing one of the residues critical for xyloglucanase activity. XLP1 contributes to P. sojae infectivity, but only in the presence of active XEG1. Thus XLP1 acts as a decoy to disrupt plant defenses. It interacts with GIP1 with a five-fold higher affinity than that of XEG1 and hence neutralizes the GIP1 inhibitor. In this setting, XEG1 can pursue plant cell wall digestion without any hindrance.

The 3 belligerent proteins have been annotated in UniProtKB/Swiss-Prot and are publicly available.

UniProtKB news

Change of annotation topic 'Sequence caution'

The annotation topic Sequence caution reports differences between the protein sequence shown in a UniProtKB entry and other available protein sequences derived from the same gene. It indicates the likely cause for the differences, and when that cause is a frameshift or erroneous termination, the amino acid sequence position(s) of these errors were listed when possible. Since it is nowadays easy to align two protein sequences for comparison, we no longer curate error positions and removed the field where this information was stored.

Text format

We removed the optional Positions field.

Example: P14332

Previous format:

CC   -!- SEQUENCE CAUTION: Sequence=CAA34633.1; Type=Frameshift; Positions=226, 249; Evidence={ECO:0000305};

New format:

CC   -!- SEQUENCE CAUTION: Sequence=CAA34633.1; Type=Frameshift; Evidence={ECO:0000305};

XML format

This change did not affect the UniProtKB XSD.

Example: P14332

Previous format:

<comment type="sequence caution" evidence="3">
  <conflict type="frameshift">
    <sequence resource="EMBL-CDS" id="CAA34633" version="1"/>
  </conflict>
  <location>
    <position position="226"/>
  </location>
  <location>
    <position position="249"/>
  </location>
</comment>

New format:

<comment type="sequence caution" evidence="3">
  <conflict type="frameshift">
    <sequence resource="EMBL-CDS" id="CAA34633" version="1"/>
  </conflict>
</comment>

RDF format

This change required an adaptation of the hierarchy of Annotation classes in the UniProt RDF schema ontology: The Sequence_Caution_Annotation class is no longer an rdfs:subClassOf of the Sequence_Annotation class, but a direct rdfs:subClassOf of the Annotation class.

Example: P14332

Previous format:

uniprot:P14332
  up:annotation <P14332#SIP7159608509D280BB> .

<P14332#SIPBBAF3CC29FCD3715>
  rdf:type up:Frameshift_Annotation ;
  up:conflictingSequence <P14332#SIPD8B2EDFEB46FA203> ;
  up:range range:22572098403906094tt226tt226 ,
           range:22572098403906094tt249tt249 .
range:22572098403906094tt226tt226
  rdf:type faldo:Region ;
  faldo:begin position:22572098403906094tt226 ;
  faldo:end position:22572098403906094tt226 .
position:22572098403906094tt226
  rdf:type faldo:Position , faldo:ExactPosition ;
  faldo:position 226 ;
  faldo:reference isoform:P14332-1 .
range:22572098403906094tt249tt249
  rdf:type faldo:Region ;
  faldo:begin position:22572098403906094tt249 ;
  faldo:end position:22572098403906094tt249 .
position:22572098403906094tt249
  rdf:type faldo:Position , faldo:ExactPosition ;
  faldo:position 249 ;
  faldo:reference isoform:P14332-1 .

New format:

uniprot:P14332
  up:annotation <P14332#SIP7159608509D280BB> .

<P14332#SIPBBAF3CC29FCD3715>
  rdf:type up:Frameshift_Annotation ;
  up:conflictingSequence <P14332#SIPD8B2EDFEB46FA203> .

Cross-references to PlantReactome

Cross-references have been added to PlantReactome, a curated resource of core pathways and reactions in plant biology.

PlantReactome is available at https://plantreactome.gramene.org

The format of the explicit links is:

Resource abbreviation PlantReactome
Resource identifier Resource identifier
Optional information Pathway name

Example: P0C128

Show all entries having a cross-reference to PlantReactome.

Text format

Example: P0C128

DR   PlantReactome; R-OSA-5608118; Auxin signalling.

XML format

Example: P0C128

<dbReference type="PlantReactome" id="R-OSA-5608118">
  <property type="pathway name" value="Auxin signalling"/>
</dbReference>

RDF format

Example: P0C128

uniprot:P0C128
  rdfs:seeAlso <http://purl.uniprot.org/plantreactome/R-OSA-5608118> .
<http://purl.uniprot.org/plantreactome/R-OSA-5608118>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/PlantReactome> ;
  rdfs:comment "Auxin signalling" .

Change of the cross-references to Reactome

Cross-references to Reactome may now be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Example: P00167

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Mental retardation, X-linked 17

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Cross-link' ('CROSSLNK' in the flat file):

  • N5-[4-(S-L-cysteinyl)-5-methyl-1H-imidazol-2-yl]-L-ornithine (Arg-Cys) (interchain with C-...)
  • N5-[4-(S-L-cysteinyl)-5-methyl-1H-imidazol-2-yl]-L-ornithine (Cys-Arg) (interchain with R-...)

New term for the feature key 'Glycosylation' ('CARBOHYD' in the flat file):

  • N-linked (Glc) (glycation) arginine

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • S-(2-succinyl)cysteine
  • N6-carbamoyllysine
  • S-(2,3-dicarboxypropyl)cysteine
  • S-cGMP-cysteine

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2019_08

Published September 18, 2019

Headline

Magnetic personalities

Magnetotactic bacteria sense and align to the Earth's magnetic field, swimming north in the Northern Hemisphere and south in the Southern in the presence of oxygen. This amazing ability to sense the Earth's magnetic field is provided by small organelles, called magnetosomes, formed by iron nanocrystals of either magnetite (Fe3O4) or greigite (Fe3S4) surrounded by a phospholipid bilayer. Generally, magnetosomes form chains that align along the long axis of the cell using a dedicated actin-like cytoskeletal structure.

image
Source: Frank Mickoleit CC BY-SA 3.0

Magnetosome formation is a complex process, which includes invagination of the cell inner membrane to form vesicles, iron ion uptake, crystal biomineralization and magnetosome chain assembly. It involves a large number of proteins, encoded by genes clustered in an approximately 100 kb magnetosome island. Among all proteins involved in magnetosome formation, one of the single most important is MamB, a probable iron transporter with a role in both vesicle formation and biomineralization. MamB is stabilized by heterodimerization with MamM. Studies in genetically tractable Magnetospirillum magneticum (strain AMB-1) and Magnetospirillum gryphiswaldense (strain MSR) have pinpointed the function of many more proteins. For instance, MamA forms a scaffold to which other proteins attach on the organelle's exterior. MamI aids in magnetite nucleation, while MamH is another probable iron transporter. MamN may control the pH of the magnetosome lumen. 4 redox-active multi-heme proteins are probably involved in correct iron oxidization (MamP, MamT, MamX and MamE), the latter is also a protease necessary for magnetosome protein maturation. There are proteins that positively regulate crystal size (including MamC, MamD MamG, MamF and those that negatively regulate crystal size (Mms36 and Mms48). Finally MamK is an actin-like protein involved in organelle positioning, along with MamJ.

The interest in magnetosomes goes far beyond the understanding of these fascinating bacteria. Magnetosomes may be instrumental for the improvement of magnetic nanoparticle biotechnologies. Purified bacterial magnetosomes represent magnetic nanoparticles with exceptionally well-defined characteristics, owing to the precise control that is exerted during all stages of biogenesis, and several unprecedented properties, such as high crystallinity, strong magnetization, and a uniform distribution of shape and size that cannot be replicated by synthesis using abiotic processes. In the biomedical field, promising results suggest that magnetosomes could be used in medical imaging, targeted drug delivery and tumor hyperthermia. In the context of wastewater treatment, it has been shown that heavy metal ions can be adsorbed onto magnetosome-producing microorganisms and then removed by magnetic separation. In addition, it may also help us learn more about the origin of life and the evolution of membrane-bound eukaryotic organelles.

As of this release nearly 70 magnetosome proteins have been annotated and can be retrieved using the term magnetosome.

UniProtKB news

Cross-references to DrugCentral

Cross-references have been added to DrugCentral, an online drug information resource providing information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, pharmacologic action.

DrugCentral is available at http://drugcentral.org.

The format of the explicit links is:

Resource abbreviation DrugCentral
Resource identifier UniProtKB accession number

Example: P35372

Show all entries having a cross-reference to DrugCentral.

Text format

Example: P35372

DR   DrugCentral; P35372; -.

XML format

Example: P35372

<dbReference type="DrugCentral" id="P35372"/>

RDF format

Example: P35372

uniprot:P35372
  rdfs:seeAlso <http://purl.uniprot.org/drugcentral/P35372> .
<http://purl.uniprot.org/drugcentral/P35372>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/DrugCentral> .

Cross-references to Pharos

Cross-references have been added to Pharos, a user interface to the knowledge-base for the Druggable Genome (DG), whose goal is to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on three of the most commonly drug-targeted protein families: G-protein-coupled receptors (GPCRs), ion channels (ICs) and kinases.

Pharos is available at https://pharos.nih.gov.

The format of the explicit links is:

Resource abbreviation Pharos
Resource identifier UniProtKB accession number

Example: Q7Z3E2

Show all entries having a cross-reference to Pharos.

Text format

Example: Q7Z3E2

DR   Pharos; Q7Z3E2; -.

XML format

Example: Q7Z3E2

<dbReference type="Pharos" id="Q7Z3E2"/>

RDF format

Example: Q7Z3E2

uniprot:Q7Z3E2
  rdfs:seeAlso <http://purl.uniprot.org/pharos/Q7Z3E2> .
<http://purl.uniprot.org/pharos/Q7Z3E2>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Pharos> .

Cross-references to MassIVE

Cross-references have been added to MassIVE, a community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data and provide a reusable aggregation of community-scale detection of peptides and proteins observations.

MassIVE is available at https://massive.ucsd.edu/.

The format of the explicit links is:

Resource abbreviation MassIVE
Resource identifier UniProtKB accession number

Example: Q8IY92

Show all entries having a cross-reference to MassIVE.

Text format

Example: Q8IY92

DR   MassIVE; Q8IY92; -.

XML format

Example: Q8IY92

<dbReference type="MassIVE" id="Q8IY92"/>

RDF format

Example: Q8IY92

uniprot:Q8IY92
  rdfs:seeAlso <http://purl.uniprot.org/massive/Q8IY92> .
<http://purl.uniprot.org/massive/Q8IY92>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/MassIVE> .

Changes to the controlled vocabulary of human diseases

New diseases:

Changes in subcellular location controlled vocabulary

New subcellular locations:

Modified subcellular location:

UniRef news

Change of UniRef clustering method from CD-HIT to MMseqs2

We have switched the clustering program for UniRef90 and UniRef50 from CD-HIT to MMseqs2 (Steinegger M. and Soeding J., Nat. Commun. 9 (2018)).

The clustering algorithm remains "Greedy Incremental Clustering" with the same parameters (thanks to the MMseqs2 authors for making this available). UniRef100 was not affected.

UniProt XML news

Removal of whitespace characters in the XML amino acid sequence representations

The <sequence> elements of the UniProtKB, UniParc and UniRef XML representations formatted the amino acid sequence for historic reasons with spaces and newlines. These whitespace characters had to be removed before parsing with native XML tools. To avoid this complication we have removed all whitespace characters in the <sequence> elements, so that they contain only IUPAC amino acid codes.

UniProt release 2019_07

Published July 31, 2019

Headline

The enemy of my enemy is my friend

When prey encounters predator, escape is often the best form of defense. Unfortunately this is not an option for plants, who instead have evolved a range of chemical defenses to deter herbivores, such as insects and other arthropods. These chemical defenses include compounds that are toxic to herbivores (direct defense) as well as compounds that attract herbivore predators (indirect defense).

Indirect defense has been extensively studied in maize. Upon foliar damage by lepidopteran larvae, maize releases a complex volatile terpenoid mixture, which attracts parasitic wasps, like Cotesia marginiventris or Cotesia sesamiae. These wasps deposit eggs in the lepidopteran larvae, which leads to an understandable loss of appetite by the lepidopterans, and eventually their death, when the wasp finally emerges from its host. These volatile terpenoids can also 'prime' neighboring plants, causing them to increase the transcription of defense-related genes and respond faster and more vigorously to subsequent herbivore attacks.

The volatile terpenoid mixture produced by maize under lepidopteran attack includes one sesquiterpene of special interest, which is (E)"-beta-caryophyllene":https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:10357. This same sesquiterpene is also produced by roots upon damage by root-feeding pests, such as western corn rootworm (Diabrotica virgifera virgifera). Here it attracts entomopathogenic nematodes, which also live parasitically inside the infected lepidopterans. (E)-beta-caryophyllene is produced by the (E)-beta-caryophyllene synthase encoded by TPS23, which catalyzes the cyclization of farnesyl diphosphate to (E)-beta-caryophyllene. As expected for a gene involved in plant defense, TPS23 gene expression is tightly regulated. Herbivore-induced leaf damage causes increased expression in leaves, but not in roots, while attack by root-feeding pests increases expression in roots, but not in the shoots. TPS23 transcript levels correlate with the production of (E)-beta-caryophyllene and high-level expressing maize lines are more resistant to herbivores than low-level expressing ones.

(E)-beta-caryophyllene is just one of a host of chemical deterrents that plants deploy against herbivores and other pests: the UniProt plant annotation program aims to capture knowledge of the relevant biochemical pathways using Rhea and ChEBI, as well as many other aspects of plant biology too. Find out more about maize TPS23 in the latest updated version of UniProtKB/Swiss-Prot.

UniProtKB news

Cross-references to ChEBI in the ptmlist.txt document file

The ptmlist.txt document, which is available by FTP and on the website, describes post-translational modifications (PTMs) annotated in the UniProt knowledgebase. This release sees the addition of optional cross-references from ptmlist.txt to ChEBI (Chemical Entities of Biological Interest), a freely available dictionary of molecular entities focused on small chemical compounds and derivatives, including modified residues.

Example:

ID   (3R)-3-hydroxyarginine
AC   PTM-0476
FT   MOD_RES
..
KW   Hydroxylation.
DR   ChEBI; CHEBI:78294.

This new mapping to ChEBI will facilitate the integration of data on PTMs with knowledge of enzymatic reactions described in UniProt using the Rhea knowledgebase of biochemical reactions (itself built on ChEBI). The following query allows users to find enzymes in UniProtKB that are capable of creating the modified (3R)-3-hydroxyarginine residue (PTM-0476, CHEBI:78294):

annotation:(type:"catalytic activity" chebi:"(3R)-3-hydroxy-L-arginine residue [78294]")

We have currently mapped over 120 of the most common PTMs in UniProtKB to ChEBI and will continue to add new cross-references to ChEBI in forthcoming releases. This mapping of PTMs to ChEBI is part of our ongoing work on the standardization of knowledge of small molecule chemistry in UniProtKB that now covers enzyme cofactors and reactions as well as PTMs, and that will eventually extend to all small molecule protein interactions. We welcome your feedback on these current and future developments.

Retirement of UniProt decoy databases

Based on usage statistics, we decided to retire the UniProt decoy databases from our FTP site. If you wish to generate decoy databases from UniProt FASTA databases, you can use this software.

Please contact us if you have questions about this change.

Cross-references to NIAGADS

Cross-references have been added to the NIAGADS Alzheimer's GenomicsDB. NIAGADS is a searchable annotation resource that provides access to publicly available NIAGADS summary statistics datasets for Alzheimer's Disease (AD) and related neuropathologies.

NIAGADS is available at https://www.niagads.org/genomics/.

The format of the explicit links is:

Resource abbreviation NIAGADS
Resource identifier Resource identifier

Example: E9PDY4

Show all entries having a cross-reference to NIAGADS.

Text format

Example: E9PDY4

DR   NIAGADS; ENSG00000203710; -.

XML format

Example: E9PDY4

<dbReference type="NIAGADS" id="ENSG00000203710"/>

RDF format

Example: E9PDY4

uniprot:E9PDY4
  rdfs:seeAlso <http://purl.uniprot.org/niagads/ENSG00000203710> .
<http://purl.uniprot.org/niagads/ENSG00000203710>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/NIAGADS> .

Cross-references to CPTAC

Cross-references have been added to the CPTAC Assay Portal. CPTAC serves as a centralized public repository of "fit-for-purpose" multiplexed quantitative mass spectrometry-based proteomic targeted assays.
CPTAC is available at https://assays.cancer.gov/.

The format of the explicit links is:

Resource abbreviation CPTAC
Resource identifier Resource identifier

Example: P04083

Show all entries having a cross-reference to CPTAC.

Text format

Example: P04083

DR  CPTAC; CPTAC-311; -.

XML format

Example: P04083

<dbReference type="CPTAC" id="CPTAC-311"/>

RDF format

Example: P04083

uniprot:P04083
  rdfs:seeAlso <http://purl.uniprot.org/cptac/CPTAC-311> .
<http://purl.uniprot.org/cptac/CPTAC-311>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CPTAC> .

Removal of the cross-references to H-InvDB

Cross-references to H-InvDB have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniProt release 2019_06

Published July 3, 2019

Headline

The three-peptide itch

Just like pain, itch is a signal provided to our brain that something is wrong, or potentially dangerous. Contrary to pain, where our reflex leads to withdraw from danger, an itch leads to scratching, in order to remove the irritating, potentially toxic agent, be it an insect or a chemical. Although itching in response to environmental cues is crucial for survival, chronic itch can be debilitating and severely impact the well-being of affected persons. It is known that itch is relayed from the skin, via the dorsal root ganglion neurons, to the second order neurons in the spinal cord that project to the brain. Several Mas-related G-protein coupled receptors (MRGPRs) have been identified as the primary targets of itch signals, and many of them, including MRGPRX1 (also called MrgprC11 in rodents) and Mrgpra3, are activated by the anti-malarial drug chloroquine.

However, the molecular and neural mechanisms of itch are not well elucidated, which is why the discovery of new tools to identify itch receptors and develop new drugs is extremely valuable. In this context, conotoxins have proven to be a gold mine. Conotoxins are produced by cone snails as part of an envenomation survival strategy for feeding and defense. They are short peptides (usually 10 to 30 amino acid residues), typically with one or more disulfide bonds. Many of them modulate the activity of ion channels and receptors with very high affinity and specificity. They can be highly selective between closely related receptor subtypes, therefore they could meet specific therapeutic needs with a reduced likelihood of side effects due to off-target drug effects.

In mice, the injection of Conus textile venom, but not that of C. geographus induced a scratching reflex, which was accompanied by the activation of 89% of sensory neurons that were also sensitive to chloroquine. Two peptides, CNF-Tx1 and CNF-Tx2, were isolated from C. textile venom gland and tested in vitro for their activity on MRGPRX1. In parallel, the same activity was measured for two additional peptides, CNF-Sr1 and CNF-Sr2, previously identified in C. spurious and one, CNF-Vc1, from C. victoriae. Three of these peptides were able to activate MRGPRX1: CNF-Tx2 activated the human, but not the mouse ortholog, CNF-Sr1 activates only the mouse, but not the human ortholog, and CNF-Vc1 activated both. CNF-Tx2 and CNF-Vc1 were then tested in a humanized mouse transgenic line, which has a knockout of the entire endogenous MRGPR cluster and expresses human MRGPRX1 in primary sensory neurons. In this setting, both peptides elicited a scratching reflex, further confirming that CNF-Tx2 and CNF-Vc1 act via the itch receptor MRGPRX1. Compared with the well-established MrgprX1 agonist chloroquine, CNF-Tx2 and CNF-Vc1 were 600 times and 200 times more potent, respectively.

The 5 conopeptides have been updated in UniProtKB/Swiss-Prot and are publicly available.

UniProt website news

We have changed our BLAST default dataset from UniProtKB to "Reference proteomes + Swiss-Prot". You can still select UniProtKB or other options under "Target database" in the BLAST submission form.

UniProtKB news

Cross-references to ABCD

Cross-references have been added to the ABCD (AntiBodies Chemically Defined) Database, a manually curated depository of sequenced antibodies.

ABCD is available at https://web.expasy.org/abcd.

The format of the explicit links is:

Resource abbreviation ABCD
Resource identifier UniProtKB accession number

Example: P07766

Show all entries having a cross-reference to ABCD.

Text format

Example: P07766

DR   ABCD; P07766; -.

XML format

Example: P07766

<dbReference type="ABCD" id="P07766"/>

RDF format

Example: P07766

uniprot:P07766
  rdfs:seeAlso <http://purl.uniprot.org/abcd/P07766> .
<http://purl.uniprot.org/abcd/P07766>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ABCD> .

Removal of the cross-references to ProDom

Cross-references to ProDom have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Cardiomyopathy, familial hypertrophic 19
  • Myopathy, centronuclear, 3
  • Parkes Weber syndrome

Changes in subcellular location controlled vocabulary

New subcellular locations:

Changes to keywords

New keyword:

UniProt release 2019_05

Published June 5, 2019

Headline

Love's Labour (nearly) Lost

Actin is a globular multi-functional protein that forms microfilaments. It is probably one of the most abundant proteins in our cells, and in almost all eukaryotic cells. Actin plays crucial roles in many processes essential to life, such as cell migration and division, and muscle contraction. Actin undergoes several post-translational modifications thought to control its cellular functions. Among them is histidine methylation, a rare modification in vertebrates that affects only a few proteins, but which was reported to occur in actin over a half-century ago. Two recent publications revealed the identity of the enzyme catalyzing this methylation, namely a SET domain-containing protein called SETD3. This is not only the first actin methylase, but also the first histidine methylase to be identified in vertebrates.

SETD3 had previously been reported to be a methylase, but a histone methyltransferase, modifying essentially lysine residues. This observation was consistent with the well-established role of SET domain-containing proteins in histone methylation on lysine residues. Consistent, yes, but erroneous! SETD3 actually methylates only actin and only at histidine-73 (His-73). Structural studies showed that the catalytic pocket so perfectly fits the actin peptide, with an extensive network of interactions, that accommodation of divergent sequences may be quite inefficient. From a functional point view, His-73 methylation modestly accelerates the assembly of actin filaments and somewhat reduces the nucleotide-exchange rate on actin monomers. SETD3 knockout mice are viable and overall healthy, in spite of several moderate phenotypes, including some skeletal muscle myopathy, abnormal cardiac electrocardiogram and mildly decreased lean mass. So what? One could think that SETD3-catalyzed histidine methylation of actin is not so important after all, if not for the observation that litter sizes of homozygous knockout females are significantly smaller than litters from wild-type or heterozygous females. This is due to incomplete delivery, with fetuses remaining in utero. The mutant females experience uterus contraction problems that are not improved by oxytocin administration. In vitro SETD3-depleted human myometrial cells also have impeded signal-induced contractions by oxytocin and endothelin-1/EDN1, supporting a role for His-73 methylation in uterine smooth muscle cell contraction during parturition.

The UniProtKB/Swiss-Prot SETD3 entries have been updated and are publicly available as of this release.

UniProtKB news

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Brain small vessel disease with or without ocular anomalies
  • Macrocephaly, macrosomia, facial dysmorphism syndrome.

UniProt release 2019_04

Published May 8, 2019

Headline

A pox on your messenger

Millions of years of coexistence between viruses and their hosts has imposed a selection pressure on hosts to develop the most efficient defense systems possible, while viruses evolve more and more complex strategies to escape them. New findings from James B. Eaglesham and co-workers describe how viruses block second messenger production to evade innate immunity.

The innate immune system is the first line of host defense; it relies on pattern recognition receptors (PRRs) to detect pathogen-specific molecules. The enzyme cyclic 2',3'-cyclic GMP-AMP (cGAMP) synthase cGAS is a PRR that recognizes cytosolic double-stranded DNA, a marker for viral infection, and catalyzes the formation of cGAMP. cGAMP in turn activates STING (also known as TMEM173) and type I interferon and NF-kappa-B responses, leading to a potent anti-viral state. The poxin family of viral proteins cleave cytosolic 2',3'-cGAMP into linear Gp[2'-5']Ap[3'], thereby preventing the activation of STING and the resulting induction of the host immune response. Not surprisingly, poxins are required for efficient viral replication in vivo.

Poxins are found in poxviruses, a large family of DNA viruses infecting mammalian cells, and baculoviruses, which target insect cells. The active site is conserved between poxviruses, including vaccinia virus poxin, and baculoviruses, an amazing observation considering the evolutionary distance between these viral families. Even more surprising is the presence of an enzymatically active poxin homolog in the silk moth Bombyx mori and other moth and butterfly genomes! The current hypothesis for the spread of poxins between insects and insect and mammalian viruses is that poxviruses and baculoviruses can share overlapping host tropisms and readily acquire genes through homologous recombination.

UniProtKB/Swiss-Prot poxin entries have been updated and should not escape your attention.

UniProtKB news

Removal of the cross-references to HOVERGEN

Cross-references to HOVERGEN have been removed.

Removal of the cross-references to ProteinModelPortal

Cross-references to ProteinModelPortal have been removed.

In an attempt to improve access to modelling data, we now add a link to the SWISS-MODEL-Workspace for all entries that are not already cross-referenced to the SWISS-MODEL Repository (SMR). This new link will allow users to start a new homology modelling project.

Removal of the cross-references to UniGene

Cross-references to UniGene have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to keywords

New keywords:

Deleted keywords:

  • Immunoglobulin C region
  • Immunoglobulin V region

UniProt release 2019_03

Published April 10, 2019

Headline

A drug arsenal from lupins

Digging into traditional medicines to find new drugs is a proven and fruitful strategy. Think of forskolin, a very effective activator of adenylate cyclase, used daily in numerous laboratories. This agent is produced by the Ayurvedic herb Plectranthus barbatus, which used to be recommended, among others, to treat cardiovascular disorders. Or the anticancer drug paclitaxel (taxol), isolated from Taxus brevifolia, the Pacific yew. Or artemisinin, the most efficient treatment against malaria, which derives from Artemisia annua, also called sweet wormwood, a herb employed in Chinese traditional medicine. Plant metabolites and direct derivatives thereof constitute more than a third of currently approved pharmaceuticals. Lupin seeds also belong to the traditional pharmacopoeia on all continents where it has been cultivated. The great Persian physician Avicenna recommended lupin seed flour, mixed with fenugreek and zedoary, to treat diabetes, as he noticed that this mixture considerably decreased sugar excretion in patients. A thousand years after his observation, the lupin protein mediating this effect, gamma-conglutin, has been identified.

In the lupin seed, most conglutins are storage proteins, which are hydrolyzed during germination and nourish the early stages of seedling growth. By contrast, gamma-conglutin is resistant to proteolysis. In this context, its physiological role in the seed is puzzling, but we have a little more insight into its effect on mammalian cells and organisms. Magni et al. reported that hyperglycemic rats experienced a substantial normalization of blood glucose levels after oral administration of white lupin (Lupinus albus) gamma-conglutin. The decrease in sugar blood level was comparable to that obtained with metformin, a well-established medication for the treatment of type 2 diabetes. This observation was later confirmed and extended to small groups of human volunteers.

After ingestion, gamma-conglutin is not digested in the gastrointestinal tract and the intact protein may be translocated across the intestinal barrier through transcytosis. Once in the blood, it may act at several levels. It seems to bind insulin and may potentiate its activity. When myocytes are incubated with gamma-conglutin, they activate signaling pathways similar to those of insulin, including the activation of the insulin receptor substrate 1 (IRS1), AKT1, and EIF4EBP1/PHAS1. Gamma-conglutin peptides produced in vitro can also inhibit dipeptidyl peptidase-4 (DPP4), an enzyme which degrades incretins, a group of metabolic hormones that stimulate a decrease in blood glucose levels. Gamma-conglutin also enhances the cell surface expression of glucose transporters, including SLC2A4 (GLUT4), and inhibits gluconeogenesis in hepatocytes.

More investigations are needed before gamma-conglutin becomes a drug for type 2 diabetes, but in view of the dynamics of the diabetes epidemic, it seems that nature may be giving us a hand in new drug development.

It's time now for you to enjoy a lupin bean snack and consult our newly annotated UniProtKB/Swiss-Prot lupin gamma-conglutin entries.

UniProt website news

Search for small molecules via InChiKey

We have recently enhanced enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions that uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants, their chemical structures, and chemical transformations. We also use ChEBI to annotate enzyme cofactors in UniProtKB.

You can now search UniProtKB for small molecule reaction participants and cofactors using the InChIKey, a standard hashed representation of the IUPAC International Chemical Identifier (InChI) that provides a unique and compact representation of chemical structure data. The UniProt website supports flexible chemical structure searches with the complete InChIKey, as well as with the connectivity and stereochemistry layers, or the connectivity layer alone. You can search our "Catalytic activity" or "Cofactor" annotations, or both combined, by using the new "Small molecule" advanced search field:

image

image

image

This new InChIKey-based search will help unlock the power of chemical structure data in UniProtKB, particularly when combined with our existing search tools and options for biological data. It complements the chemical ontology search, which allows users to search UniProtKB for chemical classes of biological interest like lipids, amino acids, sugars and specializations thereof, using identifiers from the ChEBI ontology of small molecules.

See How can I search UniProt for chemical or reaction data?

UniProtKB news

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • (Z)-2,3-didehydroaspartate

UniProt release 2019_02

Published February 13, 2019

Headline

Let's twist again with Myo1D

At first glance, we look bilaterally symmetrical. Our left side appears pretty much the mirror image of the right one. For our internal organs, it's a completely different story. For instance, our heart is on the left of our body, while the liver lies to the right. Macroscopic left-right patterning is only one aspect of an organism's asymmetry. Actually all known life forms show asymmetric properties in chemical structures, as well as in macroscopic anatomy, development and behavior. However, not much is known about the nature of the link between molecular-level and macroscopic asymmetry.

Studies in Drosophila led to the discovery of a crucial role for an unconventional myosin, called Myo31DF or Myo1D, in left-right asymmetry. Myo1D inactivation in the fly can reverse handedness of the gut and testes. In a recent publication, Lebreton et al. have extended these observations, showing that ectopic expression of Myo1D in 'naive' tissues, i.e. devoid of left-right asymmetry, such as epidermis and trachea, was sufficient to drive laterality. In the larval epidermis, Myo1D expression induced dextral twisting of the whole larval body, which could rotate up to 180°, resulting in abnormal crawling behavior. In the trachea, pronounced right-handed twisting, with a spiraling ribbon shape with multiple turns, was observed instead of the smooth and linear conformation of the wild-type tissue. This asymmetry was also seen at the cellular level. In control conditions, epidermal cells were perpendicular to the anterior-posterior axis. In contrast, cells ectopically expressing Myo1D showed elongation and a clear shift in membrane orientation toward one side. Myo1D functions as an actin-based motor protein with ATPase activity and this activity was required for the establishment of left-right asymmetry. In vitro Myo1D caused actin filaments to move in anticlockwise circular motion, suggesting that the multiscale property of Myo1D emerges from its molecular interaction with F-actin.

Does this conclusion apply to vertebrates? This answer is not straightforward. Experiments in some vertebrates point to a role for MYO1D in left-right patterning. In Xenopus, MYO1D morpholino knockdown affected organ placement in over 50% of the morphant tadpoles. In Zebrafish, MYO1D plays a role in the formation of Kupffer's vesicle, an organ that functions as left-right organizer during embryogenesis. However, in rat, MYO1D knockout didn't lead to visceral situs inversus and caused no obvious motor defects, indicating that, at least in certain mammals, MYO1D is not involved in left-right body asymmetry.

As of this release, UniProtKB/Swiss-Prot MYO1D entries have been updated and are publicly available.

UniProtKB news

Removal of the cross-references to CleanEx

Cross-references to CleanEx have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • ADP-ribosyldiphthamide

RDF news

Change of URIs for Orphanet

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the Orphanet database have therefore been updated from:

http://purl.uniprot.org/orphanet/<ID>
to:
http://www.orpha.net/ORDO/Orphanet_<ID>
If required for backward compatibility, you can use the following query to add the old URIs:
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/orphanet> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/Orphanet> .
   BIND(iri(concat('http://purl.uniprot.org/orphanet/', substr(str(?new),32))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/orphanet/<ID> URIs will be maintained.

UniProt release 2019_01

Published January 16, 2019

Headline

Engaging and disengaging: CRISPR rings

CRISPR-Cas systems are an RNA-guided adaptive immune response that bacteria and archaea use to defend against invasive genetic elements of bacterial (plasmid) or viral origin. Pieces of foreign DNA incorporated into CRISPR arrays provide a "memory" of having encountered the invader. These arrays are transcribed and processed, and the resulting CRISPR RNA (crRNA) is used by the interference complex to recognize the invader if it is re-encountered. Once recognized, foreign nucleic acids are quickly degraded, providing immunity. There are different types of CRISPR-Cas systems, mainly characterized by the presence or absence of certain Cas proteins. For example, the Cas3, Cas9, and Cas10 proteins are hallmarks of the CRISPR/Cas types I, II and III, respectively. The best known system is the type II Cas9-encoding system, which has been coopted by scientists for genome editing. The most intriguing one is the type III system, which has additional, novel control mechanisms not found in the other systems.

The type III interference complex is composed of crRNA, Cas10 and proteins Csm2, Csm3, Csm4 and Csm5. Once the target RNA has bound to the Csm interference complex it is cleaved by the complex, which acts as a sequence-specific endoribonuclease (RNase). There is an additional component to this system: Csm6. Under basal conditions, Csm6 is an inactive RNase and is not part of the Csm complex, however its presence is required for full CRISPR-Cas immunity where it non-specifically degrades invader-derived RNA transcripts. How then is Csm6 RNase activity turned on and, once activated, how is it turned off, considering that an uncontrolled RNase activity could be detrimental to the cell? The answer to these questions has been revealed in recent publications. Homodimeric Csm6 is activated by cyclic oligoadenylates (cOA), ring-shaped second messengers synthesized by the C-terminal GGDEF (also called Palm) domain of Cas10. Binding of cOA to the Csm6 dimer interface pocket formed by its CARF (CRISPR-associated Rossman fold) domains allosterically regulates its RNase activity. The type of cyclic oligoadenylates produced is species-specific. Streptococcus thermophilus and Enterococcus italicus make cyclic hexaadenylate (cA6), while Csm6 of Thermus thermophilus is stimulated by cyclic tetraadenylate (cA4), suggesting Cas10 in this organism synthesizes cA4. As the target RNA associated with the CRISPR complex is degraded, the cOA synthase activity of Cas10 shuts off, halting second messenger synthesis. Additionally, 2 proteins with ring-specific nuclease activity able to degrade cOA have been recently isolated from Saccharolobus solfataricus (formerly called Sulfolobus solfataricus), which would turn down Csm6 activity and prevent uncontrolled degradation of cellular RNA.

As of this release several Cas10 proteins and the ring nucleases of S.solfataricus have been annotated and can be retrieved.

UniProtKB news

Cross-references to jPOST

Cross-references have been added to jPOST, a proteomics database containing re-analysis results with unified criteria for raw data from several ProteomeXchange (PX) repositories.

jPOST is available at https://globe.jpostdb.org/.

The format of the explicit links is:

Resource abbreviation jPOST
Resource identifier UniProtKB accession number

Example: Q8IY92

Show all entries having a cross-reference to jPOST.

Text format

Example: Q8IY92

DR   jPOST; Q8IY92; -.

XML format

Example: Q8IY92

<dbReference type="jPOST" id="Q8IY92"/>

RDF format

Example: Q8IY92

uniprot:Q8IY92
  rdfs:seeAlso <http://purl.uniprot.org/jpost/Q8IY92> .
<http://purl.uniprot.org/jpost/Q8IY92>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/jPOST> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Limb-girdle muscular dystrophy 1A
  • Limb-girdle muscular dystrophy 1B
  • Limb-girdle muscular dystrophy 1C
  • Limb-girdle muscular dystrophy 2R

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • ADP-ribosylglycine
  • ADP-ribosyltyrosine

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2018_11

Published December 5, 2018

Headline

Enhanced enzyme annotation in UniProtKB using Rhea

This release marks a major advance in the way UniProt describes enzyme function, with the introduction of Rhea as a vocabulary to annotate and represent enzyme-catalysed reactions in UniProtKB.

Rhea is a comprehensive expert-curated knowledgebase of biochemical reactions that uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants, their chemical structures, and chemical transformations. Rhea provides stable unique identifiers for reactions and standard computationally tractable descriptors for chemical transformations.

The enhanced enzyme annotations created using Rhea will form the basis of new search and identifier mapping services in UniProtKB that combine knowledge of small molecules and proteins. They will help UniProt users to more easily integrate and analyse metabolomic data, annotate metabolic networks and models, or mine reaction data to study enzyme evolution and predict new pathways for drug production or bioremediation.

Recent publications provide additional information on Rhea reactions and examples of services that integrate Rhea with biological knowledge from UniProtKB; we hope these will inspire you to dig deeper into the wealth of enzyme data in UniProtKB.

For further technical details about this change see below.

UniProtKB news

Standardization of 'Catalytic activity' annotations

A 'Catalytic activity' annotation describes a catalytic activity of an enzyme, i.e. a chemical reaction that the enzyme catalyzes. Up to now, UniProt has followed the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) for the description of enzymatic activities, except for reactions that are described in the scientific literature, but that are not (yet) covered by the NC-IUBMB. The focus of the NC-IUBMB is the nomenclature and classification of enzymes by the reactions they catalyze. For this purpose the NC-IUBMB typically describes an exemplary reaction for each class of enzymes, with the understanding that individual members of the class may use alternative reactants. The NC-IUBMB use their own names for the reactants. To allow UniProt to curate reactions at the level of specific enzymes instead of enzyme classes, and to use standardized names for reactants, we now use chemical reaction descriptions from the Rhea database whenever possible. Rhea uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants that are small molecules as well as the reactive groups of large molecules (such as amino acid residues within proteins). These large molecules are identified by a RHEA-COMP identifier. For catalytic activities that can only be described in the form of free text, we continue to follow the NC-IUBMB descriptions. We have also started to curate the physiological direction of a reaction, i.e. the direction of the net flow of reactants in vivo, where evidence for it is available.

Due to their focus on nomenclature, cross-references to Enzyme Commission (EC) numbers have historically been added to the Protein names subsection of UniProtKB entries. To link the EC numbers to the reactions on which they are based, we now also add them to 'Catalytic activity' annotations.

'Catalytic activity' annotations are found in UniProtKB entries, as well as in UniRule and SAAS annotation rules.

Below is a description of how this change affects the different file formats in which UniProt entries are distributed.

Text format

Note: Regex symbols indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+).

Reaction description from Rhea:

CC   -!- CATALYTIC ACTIVITY:
 CC       Reaction=<RheaText>; Xref=<RheaXref>(, <ReactantXref>)+;
 CC        ( EC=<EcNumber>;)?( Evidence={<Evidences>};)?
(CC       PhysiologicalDirection=left-to-right; Xref=<RheaXref>; Evidence={<Evidences>};)?
(CC       PhysiologicalDirection=right-to-left; Xref=<RheaXref>; Evidence={<Evidences>};)?

Where:

  • <RheaText>: Textual representation of an undirectional Rhea reaction.
  • <RheaXref>: Cross-reference to a Rhea reaction (Rhea:n).
  • <ReactantXref>: Cross-reference to a reactant from ChEBI (CHEBI:n) or Rhea (RHEA-COMP:n).
  • <EcNumber>: EC number of the corresponding enzyme class, when available.
  • <Evidences>: List of evidences, when available.

Example: O36015

Previous format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY: S-adenosyl-L-methionine +
CC       cytidine(32)/guanosine(34) in tRNA = S-adenosyl-L-homocysteine +
CC       2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA.
CC       {ECO:0000255|HAMAP-Rule:MF_03162}.

New format (based on Rhea):

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=cytidine(32)/guanosine(34) in tRNA + 2 S-adenosyl-L-
CC         methionine = 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in
CC         tRNA + 2 H(+) + 2 S-adenosyl-L-homocysteine;
CC         Xref=Rhea:RHEA:42396, Rhea:RHEA-COMP:10246, Rhea:RHEA-
CC         COMP:10247, ChEBI:CHEBI:15378, ChEBI:CHEBI:57856,
CC         ChEBI:CHEBI:59789, ChEBI:CHEBI:74269, ChEBI:CHEBI:74445,
CC         ChEBI:CHEBI:74495, ChEBI:CHEBI:82748; EC=2.1.1.205;
CC         Evidence={ECO:0000255|HAMAP-Rule:MF_03162};

Example: A0A0S3QTD0

Previous format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY: Acetyl-CoA + H(2)O + oxaloacetate = citrate +
CC       CoA. {ECO:0000269|PubMed:29420286}.

New format (based on Rhea):

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=acetyl-CoA + H2O + oxaloacetate = citrate + CoA + H(+);
CC         Xref=Rhea:RHEA:16845, ChEBI:CHEBI:15377, ChEBI:CHEBI:15378,
CC         ChEBI:CHEBI:16452, ChEBI:CHEBI:16947, ChEBI:CHEBI:57287,
CC         ChEBI:CHEBI:57288; EC=2.3.3.16;
CC         Evidence={ECO:0000269|PubMed:29420286};
CC       PhysiologicalDirection=left-to-right; Xref=Rhea:RHEA:16846;
CC         Evidence={ECO:0000269|PubMed:29420286};
CC       PhysiologicalDirection=right-to-left; Xref=Rhea:RHEA:16847;
CC         Evidence={ECO:0000269|PubMed:29420286};

Reaction description from NC-IUBMB:

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=<IUBMBText>; EC=<EcNumber>;( Evidence={<Evidences>};)?

Where:

  • <IUBMBText>: An NC-IUBMB reaction description.
  • <EcNumber>: EC number of the corresponding enzyme class.
  • <Evidences>: List of evidences, when available.

Example: P17050

Previous format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY: Cleavage of non-reducing alpha-(1->3)-N-
CC       acetylgalactosamine residues from human blood group A and AB mucin
CC       glycoproteins, Forssman hapten and blood group A lacto series
CC       glycolipids. {ECO:0000269|PubMed:19683538}.

New format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=Cleavage of non-reducing alpha-(1->3)-N-
CC         acetylgalactosamine residues from human blood group A and AB
CC         mucin glycoproteins, Forssman hapten and blood group A lacto
CC         series glycolipids.; EC=3.2.1.49;
CC         Evidence={ECO:0000269|PubMed:19683538};

XML format

We have extended the UniProt XSD with new elements and types as shown below in red color:

<xs:complexType name="commentType">
        ...
        <xs:sequence>
            <xs:element name="molecule" type="moleculeType" minOccurs="0"/>
            <xs:choice minOccurs="0">
                ...
                <xs:sequence>
                    <xs:annotation>
                        <xs:documentation>Used in 'catalytic activity' annotations.</xs:documentation>
                    </xs:annotation>
                    <xs:element name="reaction" type="reactionType"/>
                    <xs:element name="physiologicalReaction" type="physiologicalReactionType" minOccurs="0" maxOccurs="2"/>
                </xs:sequence>
                ...
            </xs:choice>
            ...
        </xs:sequence>
        ...
    </xs:complexType>
    ...
    <xs:complexType name="reactionType">
        <xs:annotation>
            <xs:documentation>Describes a chemical reaction.</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="text" type="xs:string"/>
            <xs:element name="dbReference" type="dbReferenceType" minOccurs="1" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="evidence" type="intListType" use="optional"/>
    </xs:complexType>

    <xs:complexType name="physiologicalReactionType">
        <xs:annotation>
            <xs:documentation>Describes a physiological reaction.</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="dbReference" type="dbReferenceType"/>
        </xs:sequence>
        <xs:attribute name="direction" use="required">
            <xs:simpleType>
                <xs:restriction base="xs:string">
                    <xs:enumeration value="left-to-right"/>
                    <xs:enumeration value="right-to-left"/>
                </xs:restriction>
            </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="evidence" type="intListType" use="optional"/>
    </xs:complexType>

Reaction description from Rhea:

Example: O36015

Previous format (based on NC-IUBMB):

<comment type="catalytic activity">
  <text evidence="1">S-adenosyl-L-methionine + cytidine(32)/guanosine(34) in tRNA = S-adenosyl-L-homocysteine + 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA.</text>
</comment>

New format (based on Rhea):

<comment type="catalytic activity">
  <reaction evidence="1">
    <text>cytidine(32)/guanosine(34) in tRNA + 2 S-adenosyl-L-methionine = 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA + 2 H(+) + 2 S-adenosyl-L-homocysteine</text>
    <dbReference type="Rhea" id="RHEA:42396"/>
    <dbReference type="Rhea" id="RHEA-COMP:10246"/>
    <dbReference type="Rhea" id="RHEA-COMP:10247"/>
    <dbReference type="ChEBI" id="CHEBI:15378"/>
    <dbReference type="ChEBI" id="CHEBI:57856"/>
    <dbReference type="ChEBI" id="CHEBI:59789"/>
    <dbReference type="ChEBI" id="CHEBI:74269"/>
    <dbReference type="ChEBI" id="CHEBI:74445"/>
    <dbReference type="ChEBI" id="CHEBI:74495"/>
    <dbReference type="ChEBI" id="CHEBI:82748"/>
    <dbReference type="EC" id="2.1.1.205"/>
  </reaction>
</comment>

Example: A0A0S3QTD0

Previous format (based on NC-IUBMB):

<comment type="catalytic activity">
  <text evidence="2">Acetyl-CoA + H(2)O + oxaloacetate = citrate + CoA.</text>
</comment>

New format (based on Rhea):

<comment type="catalytic activity">
  <reaction evidence="2">
    <text>acetyl-CoA + H2O + oxaloacetate = citrate + CoA + H(+)</text>
    <dbReference type="Rhea" id="RHEA:16845"/>
    <dbReference type="ChEBI" id="CHEBI:15377"/>
    <dbReference type="ChEBI" id="CHEBI:15378"/>
    <dbReference type="ChEBI" id="CHEBI:16452"/>
    <dbReference type="ChEBI" id="CHEBI:16947"/>
    <dbReference type="ChEBI" id="CHEBI:57287"/>
    <dbReference type="ChEBI" id="CHEBI:57288"/>
    <dbReference type="EC" id="2.3.3.16"/>
  </reaction>
  <physiologicalReaction direction="left-to-right" evidence="2">
    <dbReference type="Rhea" id="RHEA:16846"/>
  </physiologicalReaction>
  <physiologicalReaction direction="right-to-left" evidence="2">
    <dbReference type="Rhea" id="RHEA:16847"/>
  </physiologicalReaction>
</comment>

Reaction description from NC-IUBMB:

Example: P17050

Previous format (based on NC-IUBMB):

<comment type="catalytic activity">
  <text evidence="6">Cleavage of non-reducing alpha-(1->3)-N-acetylgalactosamine residues from human blood group A and AB mucin glycoproteins, Forssman hapten and blood group A lacto series glycolipids.</text>
</comment>

New format (based on NC-IUBMB):

<comment type="catalytic activity">
  <reaction evidence="6">
    <text>Cleavage of non-reducing alpha-(1->3)-N-acetylgalactosamine residues from human blood group A and AB mucin glycoproteins, Forssman hapten and blood group A lacto series glycolipids.</text>
    <dbReference type="EC" id="3.2.1.49"/>
  </reaction>
</comment>

RDF format

Note: Evidence-related statements are omitted since their format does not change. In the previous format, evidence was attributed via reification of the rdfs:comment statement. In the new format, the up:catalyticActivity and up:catalyzedPhysiologicalReaction statements are reified.

Reaction description from Rhea:

Example: O36015

Previous format (based on NC-IUBMB):

uniprot:O36015
  up:annotation <O36015#SIP5A4ED6FF66BBF481> .

<O36015#SIP5A4ED6FF66BBF481>
  rdf:type up:Catalytic_Activity_Annotation ;
  rdfs:comment "S-adenosyl-L-methionine + cytidine(32)/guanosine(34) in tRNA = S-adenosyl-L-homocysteine + 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA." .

New format (based on Rhea):

uniprot:O36015
  up:annotation <O36015#SIP962CEE3C69B2533E> .

<O36015#SIP962CEE3C69B2533E>
  rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <O36015#SIP6D2D3E976AAD17F0> .

<O36015#SIP6D2D3E976AAD17F0>
  rdf:type up:Catalytic_Activity ;
  up:catalyzedReaction <http://rdf.rhea-db.org/42396> ;
  up:enzymeClass enzyme:2.1.1.205 .

Example: A0A0S3QTD0

Previous format (based on NC-IUBMB):

uniprot:A0A0S3QTD0
  up:annotation <A0A0S3QTD0#SIPF04A1EC4C8EBCB08> .

<A0A0S3QTD0#SIPF04A1EC4C8EBCB08>
  rdf:type up:Catalytic_Activity_Annotation ;
  rdfs:comment "Acetyl-CoA + H(2)O + oxaloacetate = citrate + CoA." .

New format (based on Rhea):

uniprot:A0A0S3QTD0
  up:annotation <A0A0S3QTD0#SIP8171B3125ADE4E9D> .

<A0A0S3QTD0#SIP8171B3125ADE4E9D>
  rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <A0A0S3QTD0#SIP1A91565011EC50F6> ;
  up:catalyzedPhysiologicalReaction <http://rdf.rhea-db.org/16846> ,
                                    <http://rdf.rhea-db.org/16847> .

<A0A0S3QTD0#SIP1A91565011EC50F6>
  rdf:type up:Catalytic_Activity ;
  up:catalyzedReaction <http://rdf.rhea-db.org/16845> ;
  up:enzymeClass enzyme:2.3.3.16 .

Reaction description from NC-IUBMB:

Example: P17050

Previous format (based on NC-IUBMB):

uniprot:P17050
  up:annotation <P17050#SIP0FD272930B1683DE> .

<P17050#SIP0FD272930B1683DE>
  rdf:type up:Catalytic_Activity_Annotation ;
  rdfs:comment "Cleavage of non-reducing alpha-(1->3)-N-acetylgalactosamine residues from human blood group A and AB mucin glycoproteins, Forssman hapten and blood group A lacto series glycolipids." .
  

New format (based on NC-IUBMB):

uniprot:P17050
  up:annotation <P17050#SIP0FD272930B1683DE> .

<P17050#SIP0FD272930B1683DE>
  rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <P17050#SIP0FD272930B1683DF> .

<P17050#SIP0FD272930B1683DF>
  rdf:type up:Catalytic_Activity ;
  skos:closeMatch enzyme:3.2.1.49#SIP0FD272930B1683DG ;
  up:enzymeClass enzyme:3.2.1.49 .

We have changed the RDF representation of ENZYME records in order to refer from UniProt 'Catalytic activity' annotations to individual enzymatic activities. The range of the activity predicate has been changed to the type Catalytic_Activity.

Example: 1.11.1.21

Previous format:

enzyme:1.11.1.21
  rdf:type up:Enzyme ;
  skos:prefLabel "Catalase peroxidase" ;
  up:activity "Donor + H(2)O(2) = oxidized donor + 2 H(2)O." ;
  up:activity "2 H(2)O(2) = O(2) + 2 H(2)O." ;
  ...

New format:

enzyme:1.11.1.21
  rdf:type up:Enzyme ;
  skos:prefLabel "Catalase peroxidase" ;
  up:activity <1.11.1.21#SIP017EC216DF0EDC2A> ;
  up:activity <1.11.1.21#SIP018ED427AB1BAS3X> ;
  ...

<1.11.1.21#SIP017EC216DF0EDC2A>
  rdf:type up:Catalytic_Activity ;
  rdfs:label "Donor + H(2)O(2) = oxidized donor + 2 H(2)O." .

<1.11.1.21#SIP018ED427AB1BAS3X>
  rdf:type up:Catalytic_Activity ;
  rdfs:label "2 H(2)O(2) = O(2) + 2 H(2)O." .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Deafness, autosomal recessive, 105

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • Murein peptidoglycan amidated serine

Changes in subcellular location controlled vocabulary

New subcellular location:

UniProt release 2018_10

Published November 7, 2018

Headline

You're not coming in!

Sexual reproduction is a great process to diversify the genetic pool and to accelerate evolution. However, it imposes tight constraints for success. First, sperm must meet egg, an unfertilized egg cannot develop. In addition, exactly one sperm cell has to meet one egg, polyspermy is not viable. And to ensure the survival of distinct species, the process has to be strictly species-specific. This requirement is particularly challenging in organisms in which fertilization occurs externally, as is the case for fish.

Looking for factors required for fertilization in vertebrates, Herberg et al. identified a small protein highly expressed in zebrafish (Danio rerio) oocytes. They called the protein Bouncer. Bouncer is located at the cell surface where it is attached to the membrane through a glycosylphosphatidylinositol (GPI) anchor, following cleavage of the C-terminal propeptide.

Bouncer function was investigated in knockout zebrafish. At first glance, the mutant animals did not show any overt phenotype. They were produced at the expected Mendelian rates and developed normally. When fertility was tested, there was no difference between knockout and wild-type males, but knockout females were almost completely sterile. Delivery of sperm into Bouncer-deficient eggs by intracytoplasmic sperm injection restored embryonic development, suggesting that Bouncer was involved in sperm entry during fertilization. Bouncer was indeed shown to promote sperm-egg binding. Could Bouncer play a role in species recognition during fertilization? To test this hypothesis, zebrafish Bouncer knockout eggs expressing the medaka fish (Oryzias latipes) Bouncer ortholog were generated. Medaka sperm cannot normally fertilize zebrafish eggs. Both species split apart some 200 million years ago, much earlier than we did from mice, and they share only 40% sequence identity. Amazingly the transgenic knockout eggs could be fertilized by medaka, but not zebrafish sperm. Fertility rates of individual transgenic medaka Bouncer females were found to correlate with expression levels of medaka Bouncer mRNA in eggs. In conclusion, the small 80-amino acid-long Bouncer protein plays a crucial role in species-specific fertilization. The rescue was not complete. The fertility rate was low, suggesting that other factors likely contribute to species-specific sperm-egg interaction.

Bouncer homologs exist in other vertebrate species. Its closest relative in mammals is the SPACA4 gene. Bouncer/SPACA4 germline-restricted expression was confirmed in all vertebrates tested. However, Bouncer ovary-specific expression was observed only in externally fertilizing animals, such as fish or amphibians; surprisingly, internally fertilizing vertebrates, such as reptiles and mammals, show testis-specific expression. The reason for this difference is not clear and the function of mammalian SPACA4 is not yet known.

As of this release, zebrafish and medaka Bouncer proteins have been annotated and integrated into UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Changes to keywords

New keyword:

UniProt release 2018_09

Published October 10, 2018

Headline

Tubulin code: a long sought-after player identified

In eukaryotes, the cytoskeleton helps cells maintain their shape and internal organization, and provides mechanical support that enables cells to carry out essential functions, like division and movement. It is made of filamentous proteins, microtubules being the largest type of cytoskeletal filament. Microtubules are dynamically assembled from alpha-tubulin and beta-tubulin heterodimers, creating specific structures adapted to the cell's needs, structures that can be as different from each other as a cilium can be from a mitotic spindle. How is this variety achieved using the same highly conserved building blocks? Part of the answer lies in the so-called 'tubulin code' which involves not only the differential expression of alpha-and beta-tubulin genes (tubulin isotypes), but also a plethora of post-translational modifications (PTMs). Tubulins have a globular core and a more variable C-terminal tail that is exposed at the microtubule surface, where many PTMs occur. One of first PTMs to be reported, back in the 70s, was C-terminal reversible detyrosination, which occurs on most alpha-, but not beta-, tubulins. The enzyme catalyzing the addition of tyrosine, tubulin-tyrosine ligase or TTL, was identified not long after, but the carboxypeptidase responsible for tubulin detyrosination remained elusive until recently.

Aillaud et al. tackled the problem by developing an irreversible inhibitor of tubulin carboxypeptidase activity, followed by mass spectrometry analysis of the inhibitor targets. Nieuwenhuis et al. performed gene-trapping mutagenesis in a haploid human cell line aimed at regulators of tubulin detyrosination. Both groups identified vasohibin-1 (VASH1) and 2 (VASH2) as the major alpha-tubulin-specific carboxypeptidases. Vasohibins were formerly predicted to have a protease fold, but their enzymatic activity had not been investigated. Actually, both enzymes show low carboxypeptidase activity when assayed on their own. Full activity requires the formation of a complex with another protein, called small vasohibin-binding protein, or SVBP. This may explain why previous attempts to identify tubulin carboxypeptidase have failed. SVBP-VASH complexes act preferentially on polymerized tubulins. When microtubules disassemble, TTL adds back a tyrosine residue at the C-terminus and the tubulin detyrosination/tyrosination cycle is closed.

The physiological importance of detyrosination has to be investigated. SVBP or vasohibin knockdown in mouse hippocampal neurons results in delayed axonal differentiation. In embryos, it affects neuronal migration during brain cortex differentiation. However, mice lacking VASH1 or VASH2 do not exhibit a dramatic phenotype. It should also be noted that vasohibin depletion in cells could not completely abolish activity, suggesting the existence of yet another enzyme.

The VASH1 and VASH2 protein entries have been updated and are now available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Alport syndrome, with macrothrombocytopenia
  • Bannayan-Riley-Ruvalcaba syndrome
  • Cowden syndrome 2
  • Cowden syndrome 3
  • Epstein syndrome
  • Fechtner syndrome
  • Macrothrombocytopenia and progressive sensorineural deafness
  • Sebastian syndrome

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • (2S)-4-hydroxyleucine
  • (3S)-3-hydroxylysine
  • (4S)-4,5-dihydroxyleucine
  • 2-hydroxyproline
  • 3',4',5'-trihydroxyphenylalanine
  • 4-hydroxylysine

Modified term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • Hydroxylated arginine -> Hydroxyarginine

UniProt website news

Deprecation of legacy REST URLs /batch and /mapping - please replace by /uploadlists

Programmatic access to our "Retrieve/IDmapping" service should be addressed to the URL path /uploadlists as shown in the code examples in the respective service help pages ID mapping and Batch retrieval.

If you have existing code for batch retrieval, you also need to specify that you are mapping to and from UniProtKB, i.e.

'from' => 'ACC+ID',
'to' => 'ACC',

(See the Perl code example in Batch retrieval.)

The obsolete URL paths /batch and /mapping have been deprecated and are no longer supported as of release 2018_09.

UniProt release 2018_08

Published September 12, 2018

Headline

Human brain development: slow and steady wins the race

As mammals, we share most of our physiological processes with other animals and these similarities allow the wide use of model organisms for medical research purposes. Yet there is something special about us, abstract thought, creativity, art, culture, something linked to our big brains. The increase in size and complexity of our cerebral cortex happened recently on the evolution time scale, i.e. after the Homo lineage split apart from that of other related primates. In order to try to understand this distinctive feature of ours, several new human-specific genes involved in corticogenesis have been identified. They are produced by segmental duplications, but their functional impact on brain development remains mysterious. However, among these genes are 3 nearly identical NOTCH2 paralogs, called NOTCH2NLA, B and C for which functional clues have been recently obtained. The evolutionary history of NOTCH2NL genes is peculiar. NOTCH2 partial duplication occurred prior to the last common ancestor of human, chimpanzee, and gorilla (some 14 million years ago) leading to the creation of a truncated inactive copy, called NOTCH2NL (standing for Notch homolog 2 N-terminal-like). In the hominin lineage, some 3 to 4 million years ago, the NOTCH2 dopplegänger was repaired by gene conversion and duplicated, creating 3 new human-specific active genes NOTCH2NLA, B and C. This timeframe corresponds to the early stages of the expansion of the human neocortex.

NOTCH2NLA, B and C are expressed in radial glia neural stem cells during cortical development. These cells undergo multiple cycles of regenerative, mostly asymmetric, cell divisions, leading to the generation of diverse types of neurons while maintaining a pool of progenitors. NOTCH2NL gene expression activates the NOTCH signaling pathway, down-regulates neuronal differentiation genes, and delays the differentiation of neuronal progenitors, increasing their number, all of which ultimately results in an increase in neurons. In this context, slow development produces a huge benefit.

The chromosome 1q21.1 region hosting NOTCH2NL genes is associated with chromosome 1q21.1 deletion / duplication syndromes, where duplications are associated with macrocephaly and autism, and deletions with microcephaly and schizophrenia. 11 patients were analyzed : those with microcephaly had NOTCH2NLA and/or NOTCH2NLB deletions, while the macrocephaly cases were consistent with NOTCH2NLA and/or NOTCH2NLB duplications. If confirmed, these results are consistent with a crucial role for NOTCH2NL genes in human neocortex development. Thus, the emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, at the expense of susceptibility to recurrent neurodevelopmental disorders.

Using our big brains, we have annotated all 3 NOTCH2NL gene products in UniProtKB/Swiss-Prot and they are publicly available as of this release.

UniProtKB news

Change of the annotation topic 'Enzyme regulation' to 'Activity regulation'

In UniProtKB entries, the topic 'Enzyme regulation' was used to display information about factors that regulate the activity of enzymes, but also of transporters and microbial transcription factors. To clarify the situation, we have renamed this topic to 'Activity regulation'.

Text format

Example: P02730

Previous format:

CC   -!- ENZYME REGULATION: Phenyl isothiocyanate inhibits anion transport
CC       in vitro.

New format:

CC   -!- ACTIVITY REGULATION: Phenyl isothiocyanate inhibits anion transport
CC       in vitro.

XML format

Example: P02730

Previous format:

<comment type="enzyme regulation">
  <text>Phenyl isothiocyanate inhibits anion transport in vitro.</text>
</comment>

New format:

<comment type="activity regulation">
  <text>Phenyl isothiocyanate inhibits anion transport in vitro.</text>
</comment>

RDF format

Example: P02730

Previous format:

uniprot:P02730
  up:annotation <P02730#SIPC58AB4FDB0DD7DCA> .

<P02730#SIPC58AB4FDB0DD7DCA>
  rdf:type up:Enzyme_Regulation_Annotation ;
  rdfs:comment "Phenyl isothiocyanate inhibits anion transport in vitro." .

New format:

uniprot:P02730
  up:annotation <P02730#SIPC58AB4FDB0DD7DCA> .

<P02730#SIPC58AB4FDB0DD7DCA>
  rdf:type up:Activity_Regulation_Annotation ;
  rdfs:comment "Phenyl isothiocyanate inhibits anion transport in vitro." .

Change to the cross-references to Bgee

We have introduced an additional field in the cross-references to the Bgee database to indicate the expression pattern of the gene.

Text format

Example: P10361

DR   Bgee; ENSRNOG00000010756; Expressed in 10 organ(s), highest expression level in spleen.

XML format

Example: P10361

<dbReference type="Bgee" id="ENSRNOG00000010756">
  <property type="expression patterns" value="Expressed in 10 organ(s), highest expression level in spleen"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: P10361

uniprot:P10361
  rdfs:seeAlso <http://purl.uniprot.org/bgee/ENSRNOG00000010756> .
<http://purl.uniprot.org/bgee/ENSRNOG00000010756>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Bgee> ;
  rdfs:comment "Expressed in 10 organ(s), highest expression level in spleen" .

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted diseases

  • Ehlers-Danlos syndrome 7B

UniProt website news

New advanced search interface

We have revamped the advanced search interface to make it easier for you to browse the different search fields and options within the dropdown menus. Most importantly, there is now a search box right at the top when you open the blue dropdown menu that allows you to type a concept name (e.g. "structure") and receive some autocompleted suggestions from which you can then select the most suitable one:

image

Automatic gene-centric isoform mapping for eukaryotic reference proteome entries

Some proteomes have been (manually and algorithmically) selected as reference proteomes. They cover well-studied model organisms and other organisms of interest for biomedical research and phylogeny. In this context, we provide data sets for reference proteomes where only one form of a protein, usually the best annotated version in UniProtKB, is present. The relationships identified when generating these data sets are now also used when displaying individual entries on the UniProt website:

A single gene can code for multiple proteins through biological events such as alternative splicing, initiation and promoter usage. While the UniProtKB/Swiss-Prot expert curation process includes the identification and review of different forms of a protein and their description in a single UniProtKB/Swiss-Prot entry, its focus is the functional annotation of proteins. For this reason, not all potential isoforms of a protein that are available in UniProtKB/TrEMBL can be reviewed and merged into a single entry. This results in a larger number of UniProtKB entries than genes for many of the eukaryotic reference proteomes. In order to identify potential isoforms that have not (yet) been reviewed by a biocurator, we have established an automatic gene-centric mapping between entries from eukaryotic reference proteomes that are likely to belong to the same gene. This mapping is based on gene identifiers from Ensembl, EnsemblGenomes and model organism databases and, in cases where none of these are available, on gene names assigned by the original sequencing projects.

Example: Q15286

UniProt release 2018_07

Published July 18, 2018

Headline

Ubiquitin ligation: new insight into mechanistic diversity

Protein ubiquitination is a reversible post-translational modification that is crucial for many physiological processes, from cell survival and differentiation to innate and adaptive immunity. It can affect protein functions at many levels, marking them for degradation, as well as regulating their cellular location, activity and interactions. Most frequently ubiquitin is linked to the amine group of a lysine side chain via an isopeptide bond, but a growing number of non-canonical linkages has been reported in recent years that involves the N-terminal amine group, thiol groups of cysteine side chains, and also serine and threonine hydroxyl groups.

A cascade of enzymatic reactions catalyzes the process of protein ubiquitination. The first step consists of ATP-dependent ubiquitin activation by E1 enzymes. Activated ubiquitin is transferred onto E2-conjugating enzymes, producing a covalently linked intermediate (E2-Ub). The transfer of ubiquitin onto the target protein is mediated by E3 protein ligases, which ensure the specificity of the reaction. The whole process grows in complexity with each step. The human genome is thought to encode only 2 E1 enzymes, some 40 E2s and over 600 E3 ligases. E3 ligases can be grouped into 3 classes based on their domain structure and mode of action. E3s of the 'really interesting new gene' (RING) family recruit E2-Ub via their RING domain and then mediate direct transfer of ubiquitin to substrates. By contrast, HECT E3 ligases undergo a catalytic cysteine-dependent transthiolation reaction with E2-Ub, forming a covalent E3-Ub intermediate. Finally, RING-between-RING (RBR) E3 ligases have a canonical RING domain linked to an ancillary domain. This ancillary domain contains a catalytic cysteine that enables a hybrid RING-HECT mechanism.

In order to identify new E3 enzymes of HECT or RBR classes, Pao et al. established an activity based assay, in which a biotinylated probe exhibiting the properties of a HECT/RBR substrate acts as a 'suicide' substrate and covalently traps target E3s. The assay worked as expected, identifying most known HECT/RBR, but much to their surprise, the authors also isolated 33 RING E3s that lacked HECT or RBR ancillary domains. One of these, MYCBP2, an E3 ligase involved in axon guidance and synapse formation in the developing nervous system, was found to mediate ubiquitination of serines and threonines, but not on lysines, with a strong preference for threonine. The enzymatic mechanism was also found to be novel: MYCBP2 relays ubiquitin to the target threonine via thioester intermediates involving 2 essential cysteines, a mechanism termed the 'RING-Cys-relay' (RCR).

Although non-canonical ubiquitination has already been observed, this is the first report of the identification of an enzyme catalyzing this reaction and along with it, a novel E3 mechanism has been unraveled. The annotation in MYCBP2 entries has been updated with this new knowledge and is publicly available as of this release.

UniProtKB news

Cross-references to UniLectin

Cross-references have been added to the UniLectin database, a database of carbohydrate-binding proteins.

UniLectin is available at https://unilectin.eu.

The format of the explicit links is:

Resource abbreviation UniLectin
Resource identifier UniProtKB accession number

Example: P84801

Show all entries having a cross-reference to UniLectin.

Text format

Example: P84801

DR   UniLectin; P84801; -.

XML format

Example: P84801

<dbReference type="UniLectin" id="P84801"/>

RDF format

Example: P84801

uniprot:P84801
  rdfs:seeAlso <http://purl.uniprot.org/unilectin/P84801> .
<http://purl.uniprot.org/unilectin/P84801>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/UniLectin> .

Changes to the controlled vocabulary of human diseases

New diseases:

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt news

Change of UniProt license

We have changed the license that applies to all copyrightable parts of our databases from the Creative Commons Attribution-NoDerivs (CC BY-ND 3.0) to the Creative Commons Attribution (CC BY 4.0) License. This change will make it easier for others to reuse UniProt data in their own works. The updated license information is available on the UniProt website and FTP site. As with the previous license users must give appropriate credit for use of UniProt data. The change in license means that our users can remix, transform, and build upon UniProt for any purpose, including commercially, without seeking permission from us. However, when doing so users must provide a link to the license and indicate if changes were made.

UniProt release 2018_06

Published June 20, 2018

Headline

Neuronal express mRNA delivery service

In mammals, the activity-regulated cytoskeleton-associated protein ARC is a key regulator of synaptic plasticity, being involved in many aspects of synapse formation, maturation, and plasticity, as well as in learning and memory. ARC expression is known to be induced by synaptic activity and its mRNA accumulates at sites of local synaptic activity where it is locally translated.

ARC originates from the Ty3/Gypsy retrotransposon family and it has retained some retroviral features. Its protein architecture is remarkably similar to that of the capsid domain of human immunodeficiency virus (HIV) GAG protein. GAG proteins are essential for viral infection. They can self-assemble to form capsids and encapsulate genomic RNA via direct sequence-specific interactions. At first glance, these properties do not seem crucial for eukaryotic proteins, but two recent studies unravel a quite unexpected means of neuronal communication, that is reminiscent of viral infection.

The intriguing observation was that ARC protein and mRNA are not only present at synapses, but also enriched in extracellular vesicles (EVs) released by neurons. These EVs are endocytosed by target cells, where ARC mRNA is postsynaptically translated, as has been described both at the Drosophila neuromuscular junction (NMJ), between motor neurons and muscles, and in rat hippocampal neurons. How is this achieved? Presynaptic ARC proteins bind the 3'-UTR of ARC mRNA, oligomerize and form capsid-like structures, in which the mRNA is packaged. These eukaryotic 'capsids' are then released by neurons in EVs and they mediate ARC mRNA transfer into postsynaptic target cells. In flies, ARC knockdown in motor neurons results in a decrease in ARC mRNA and protein in muscles, and leads to impaired expansion of the NMJ, synaptic bouton maturation, and activity-dependent synaptic bouton formation. This phenotype is not rescued by the expression of an ARC construct in muscle alone, nor if the neuronal ARC mRNA construct is missing its 3'-UTR. Overall, these data suggest that it is not just the presence of ARC in presynaptic terminals, but the actual transfer to the postsynaptic region that is required for ARC function.

This exciting piece of information has been transferred to UniProtKB/Swiss-Prot rat and Drosophila ARC entries by means of the classical pathway of expert curation and, as of this release, the updated records are publicly available.

International protein nomenclature guidelines

The European Bioinformatics Institute (EMBL-EBI), the National Center for Biotechnology Information (NCBI), the Protein Information Resource (PIR) and the Swiss Institute for Bioinformatics (SIB) have worked together to produce a shared set of protein naming guidelines. These guidelines are intended for use by anyone who wants to name a protein and aim to promote consistent nomenclature which is indispensable for communication, literature searching and data retrieval. They replace the previous UniProt protein naming guidelines and are available on the UniProt website as part of this release.

UniProtKB news

Cross-references to ComplexPortal

Cross-references have been added to ComplexPortal, a manually curated resource of macromolecular complexes.

ComplexPortal is available at https://www.ebi.ac.uk/complexportal/.

The format of the explicit links is:

Resource abbreviation ComplexPortal
Resource identifier Resource identifier
Optional information 1 Complex name

Example: Q8IY92

Show all entries having a cross-reference to ComplexPortal.

Cross-references to ComplexPortal may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Text format

Example: Q8IY92

DR   ComplexPortal; CPX-484; SLX4-TERF2 complex.

XML format

Example: Q8IY92

<dbReference type="ComplexPortal" id="CPX-484">
  <property type="entry name" value="SLX4-TERF2 complex"/>
</dbReference>

RDF format

Example: Q8IY92

uniprot:Q8IY92
  rdfs:seeAlso <http://purl.uniprot.org/complexportal/CPX-484> .
<http://purl.uniprot.org/complexportal/CPX-484>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ComplexPortal> ;
  rdfs:comment "SLX4-TERF2 complex" .

Cross-references to ProteomicsDB

Cross-references have been added to the ProteomicsDB, a human proteome resource.

ProteomicsDB is available at https://www.proteomicsdb.org/.

The format of the explicit links is:

Resource abbreviation ProteomicsDB
Resource identifier Resource identifier

Example: P41182

Show all entries having a cross-reference to ProteomicsDB.

Cross-references to ProteomicsDB may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Text format

Example: P41182

DR   ProteomicsDB; 55413; -.
DR   ProteomicsDB; 55414; -. [P41182-2]

XML format

Example: P41182

<dbReference type="ProteomicsDB" id="55413"/>
<dbReference type="ProteomicsDB" id="55414">
   <molecule id="P41182-2"/>
</dbReference>

RDF format

Example: P41182

uniprot:P41182
  rdfs:seeAlso <http://purl.uniprot.org/proteomicsdb/55413> ;
  rdfs:seeAlso <http://purl.uniprot.org/proteomicsdb/55414> .
<http://purl.uniprot.org/proteomicsdb/55413> rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ProteomicsDB> .
<http://purl.uniprot.org/proteomicsdb/55414> rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ProteomicsDB> ;
  rdfs:seeAlso isoform:P41182-2 .

Cross-references to MoonDB

Cross-references have been added to MoonDB, a database of extreme multifunctional and moonlighting proteins.

MoonDB is available at http://moondb.hb.univ-amu.fr.

The format of the explicit links is:

Resource abbreviation MoonDB
Resource identifier UniProtKB accession number
Optional information 1 Entry type ("Curated" or "Predicted")

Example: Q13492

Show all entries having a cross-reference to MoonDB.

Text format

Example: Q13492

DR   MoonDB; Q13492; Curated.

XML format

Example: Q13492

<dbReference type="MoonDB" id="Q13492">
  <property type="type" value="Curated"/>
</dbReference>

RDF format

Example: Q13492

uniprot:Q13492
  rdfs:seeAlso <http://purl.uniprot.org/moondb/Q13492> .
<http://purl.uniprot.org/moondb/Q13492>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/MoonDB> ;
  rdfs:comment "Curated" .

Changes to the controlled vocabulary of human diseases

New diseases:

UniProt website news

To improve security and privacy, we have moved our web pages and services from HTTP to HTTPS.

The HTTP protocol does not provide encryption - anyone who can see web traffic between a client (e.g. a web browser) and a server can intercept potentially sensitive information and/or inject malware into users' browsers or operating systems. HTTPS solves this problem by encrypting web traffic between a client and a server in both directions, so that observers cannot intercept or tamper with the client's requests or the server's responses. It also provides authentication, ensuring that the client is communicating with the intended server given by the hostname, and not some impostor.

Timeline

We supported separate HTTP and HTTPS services until release 2018_06 (June 20, 2018). From this date, the HTTP traffic is automatically redirected to HTTPS. We intend to maintain these redirects indefinitely, but it is to your advantage to update your applications to use HTTPS as soon as possible, both for performance and security reasons.

Interactive users

If you access our pages only through a Web browser (like Chrome, Firefox, Safari, Internet Explorer, Opera, etc.), the only change after the switchover date is that a green lock icon should appear inside the URL box of your browser, and the web addresses of the pages you visit will start with https://. We recommend that you update your bookmarks and links accordingly.

Programmatic users

Applications that access web servers using http:// URLs instead of https:// URLs may fail after a switch to HTTPS for the following reasons:

  • Your programming environment's HTTP facility does not automatically follow redirects from HTTP to HTTPS. Some libraries follow redirections from HTTP to HTTPS, others do not (e.g. Java's URLConnection).
  • Your application uses HTTP requests other than GET and HEAD. These requests (including especially POST and PUT) will fail with HTTP 403 Forbidden after the switchover date.
  • Your application accesses our servers through a proxy. Check with your proxy vendor about HTTPS support and how to add or update certificates.
  • Your programming environment does not support HTTPS.

After the switchover date, our servers:

  • respond with a server-side redirect (HTTP 301 Moved permanently) to the corresponding HTTPS URL for HTTP GET and HEAD requests
  • respond with HTTP 403 Forbidden and an error message to all HTTP requests other than GET and HEAD (including and especially HTTP POST).

URLs that start with http://purl.uniprot.org/, which are used as URIs in the UniProt RDF distribution and SPARQL service, are redirected to the corresponding HTTPS web page when used in a web context.

UniProt release 2018_05

Published May 23, 2018

Headline

Selenium vs. Sulfur: and the winner is...

Selenium is a chemical element that, in trace amounts, is essential for cellular function in many, though not all, organisms from all kingdoms of life. Proteins incorporate selenium as selenocysteine (Sec), where selenium replaces the sulfur of cysteine, when an UGA stop codon is "recoded" by a Sec-tRNA and a selenocysteine insertion sequence (SECIS) within target mRNA. Sec is indispensable for mammalian life and deficiency in Sec-tRNA is embryonic-lethal (shortly after implantation) in mice, yet this process is complex, inefficient and energetically costly. Why then does Mother Nature continue to produce selenoproteins in spite of these drawbacks?

Recent work from Ingold et al. suggests that one reason may be the ability of selenium to protect cells from a specific form of oxidative stress leading to cell death. The authors focused on the phospholipid hydroperoxide glutathione peroxidase GPX4, an essential selenoprotein and the only one whose knockout phenotype mimics that of Sec-tRNA gene disruption. GPX4 catalyzes the reduction of toxic lipid hydroperoxides formed when ferrous iron is imported into cells in the presence of reactive oxygen species produced during aerobic metabolism. If left unchecked, lipid peroxides can spontaneously propagate, directly damaging membranes or generating other toxic products, leading to a specific form of cell death, called ferroptosis. Mice in which the active site of GPX4 (Sec-73) is replaced by cysteine (GPX4-Cys) develop normally, but experience fatal seizures 2-3 weeks after birth. This phenotype is due to the lack of parvalbumin-positive GABAergic interneurons, which are important regulators of cortical network excitability. Hence the presence of Sec is essential for specific developmental events, such as the maturation of a specific class of neurons. In adult mice, the conditional expression of the GPX4-Cys mutant did not show any peculiar phenotype.

Cys substitution greatly reduces GPX4 activity, although it does not abolish it. In the presence of increasing levels of H2O2, GPX4-Cys readily undergoes irreversible oxidation and the mutant GPX4-Cys cells become exquisitely sensitive to peroxide-induced ferroptosis. In conclusion, the critical advantage of selenolate-versus thiolate-based catalysis may lie in its resistance to overoxidation when cells increase their metabolic rates and mitochondrial H2O2 production.

Selenium was discovered in 1817, almost exactly 200 years ago, and it is quite exciting to celebrate this anniversary with a new discovery about its role in higher organisms. As of this release, the updated GPX4 entries are publicly available.

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted diseases

  • Epidermolysis bullosa dystrophica, Hallopeau-Siemens type
  • Epidermolysis bullosa dystrophica, Pasini type

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • Pyruvic acid (Tyr)

UniRef news

GO annotation to UniRef90 and UniRef50 clusters (also in clusters with one member)

In release 2017_05, we announced the addition of Gene Ontology (GO) annotations for UniRef90 and UniRef50 clusters: In this first approach, GO terms were assigned to clusters with at least 2 members, and a GO term was added to a cluster when it was found in all UniProtKB members, or when it was a common ancestor of at least one GO term of each member.

As of this release, 2018_05, we also adding GO annotations to UniRef90 and UniRef50 singleton clusters, i.e. clusters that have only one member. These clusters inherit the GO terms of their single member.

UniProt release 2018_04

Published April 25, 2018

Headline

The Matrix (enzymes) Reloaded

Collagen is the major protein that stitches together animal tissues, and is the most abundant protein in mammals, making up to 25-35% of our body weight. It comprises three individual protein molecules which coil together to form tropocollagen fibers which in turn make microfibrils. Collagen is extremely stable and extremely ancient; collagen fragments have been sequenced from 80 million year old dinosaurs, such as Brachylophosaurus canadensis and Tyrannosaurus rex, and is found in all extant metazoans. The breakdown of collagen is essential to permit tissue growth, and all animals have the ability to metabolize collagen in a very controlled way by cutting a single site. Infectious bacteria, such as gas gangrene-causing Clostridium perfringens and Hathewaya histolytica, on the other hand digest collagen indiscriminately, using collagenases with both endopeptidase and tripeptidylcarboxypeptidase activities. This rampant activity causes massive tissue disruption, favoring bacterial colonization and virulence, and is obviously severely problematic in a clinical setting.

Despite their different approaches to collagen degradation (cautious versus gung-ho), mammalian and clostridial collagenases have similar enzymatic mechanisms and many inhibitors work on both types of collagenases, making them unsuitable for antibacterial therapy. Recent work by Schönauer et al. has found promising new molecules that inhibit only bacterial and not mammalian collagenases, pointing to a possible way to block bacterial collagenase action in a wound setting for example. By not attacking the bacteria directly, these inhibitors should provide novel, non-selective ways to treat some of the damage inflicted by these bacteria, while minimizing potential resistance. While these inhibitors are undoubtedly very useful, there are also many applications in which potentially undesirable bacterial collagenase activities are actively exploited. The H.histolytica collagenases (ColG and ColH) are used to isolate pancreatic islet cells for transplantation, remove retained placenta in cattle and horses, to debride wounds, ulcers and severely burned patients (SANTYL Ointment, Smith and Nephew, Inc.), and to treat human diseases caused by abnormal accumulation of collagen plaques such as Dupuytren's disease and Peyronie's disease (Xiaflex, Endo Pharmaceuticals, Inc.). Dupuytren's disease is an abnormal deposition of collagen in the hand that causes permanent contraction. In Peyronie's disease, collagen forms fibrous plaques in the penis, restricting erection. Collagenase injection relieves this accumulation, leading to an increased quality of life. The collagen-binding domain of collagenases when attached to other proteins, promotes their retention at injection sites for as long as 10 days. Although this is far from the only example of a repurposed enzyme (think of Botox, another clostridial protein), it is fascinating how a protein class that can be so dangerous to life, when harnessed, can be so very helpful.

As of this release 3 clostridial collagenases have been expertly updated in UniProtKB/Swiss-Prot.

Cross-references to GlyConnect

Cross-references have been added to the GlyConnect database and protein glycosylation platform.

GlyConnect is available at https://glyconnect.expasy.org.

The format of the explicit links is:

Resource abbreviation GlyConnect
Resource identifier Resource identifier

Example: P00742

Show all entries having a cross-reference to GlyConnect.

Cross-references to GlyConnect may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Text format

Example: P00742

DR   GlyConnect; 102; -.

XML format

Example: P00742

<dbReference type="GlyConnect" id="102"/>

RDF format

Example: P00742

uniprot:P00742
  rdfs:seeAlso <http://purl.uniprot.org/glyconnect/102> .
<http://purl.uniprot.org/glyconnect/102>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/GlyConnect> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniProt release 2018_03

Published March 28, 2018

Headline

Ama-(not a)-toxin: a cap on death

Amanita and Galerina mushrooms are responsible for a large number of food poisoning cases and deaths across the world. Like other poisonous mushrooms, Amanita and Galerina express a cocktail of toxic peptides, but the major lethal components are amatoxins. The typical symptoms of amatoxin poisoning are gastro-intestinal distress beginning 6 to 12 hours after ingestion, a remission phase lasting 12 to 24 hours, and progressive loss of liver function culminating in death 3 to 5 hours later. One of the few effective treatments is liver transplantation.

Amatoxins are bicyclic octapeptides that act by binding non-competitively to RNA polymerase II and greatly slowing transcriptional elongation. Most mycotoxic cyclic peptides are synthesized by nonribosomal peptide synthetases. This is not the case for amatoxins (and related compounds) which are encoded by the genome and synthesized by ribosomes. The amatoxin genes encode 35 amino acid-long propeptides that are processed by a dual macrocyclase-peptidase, called POPB. They belong to a extended family called MSDIN (after the 5 N-terminal amino acids of the propeptide), a family that also includes phallotoxins, such as phalloidin and phallicidin. Although structurally related to amatoxins, phallotoxins are bicyclic heptapeptides and have a different mode of action: they stabilize F-actin. Luckily, phallotoxins are poorly absorbed through the gut, and therefore make only a small contribution to toxicity after mushroom ingestion.

While the amatoxins are undoubtedly extremely dangerous, some MSDIN cyclopeptides may actually be beneficial. One example is the antamanide protein of Amanita phalloides (the 'death cap' mushroom), which can act as a competitive antagonist and a natural antidote to the lethal toxins, if administered before, or simultaneously with, the poisons. In addition, antamanide may also protect cells from death by targeting cyclophilin D and inhibiting the mitochondrial permeability transition pore, a central effector of cell death induction. Another mushroom, A. exitialis, produces a structurally closely related cyclic nanopeptide, called amanexitide, that has been suggested to have a similar antidote activity. Unfortunately the concentration of such natural antidotes tends to be much lower than that of the toxins they protect against, meaning that consumers of these deadly mushrooms don't feel the benefit, and we strongly recommend that readers refrain from their consumption.

Toxic MSDIN family members, as well as a number of natural antidotes, have been identified in several Amanita species, including A. bisporigera, A. phalloides, A. exitialis, A. fuligineoides, A. fuliginea, A. ocreata, A. pallidorosea and A. rimosa as well as in Galerina marginata. Expert curated entries describing their biology can be found in UniProtKB/Swiss-Prot, publicly available as of this release.

Cross-references to VGNC (Vertebrate Gene Nomenclature Database)

Cross-references have been added to the VGNC Vertebrate Gene Nomenclature Database.

VGNC is available at https://vertebrate.genenames.org/.

The format of the explicit links is:

Resource abbreviation VGNC
Resource identifier Resource identifier
Optional information 1 Gene designation

Example: P11613

Show all entries having a cross-reference to VGNC.

Text format

Example: P11613

DR   VGNC; VGNC:37509; ACKR3.

XML format

Example: P11613

<dbReference type="VGNC" id="VGNC:37509">
  <property type="gene designation" value="ACKR3"/>
</dbReference>

RDF format

Example: P11613

uniprot:P11613
  rdfs:seeAlso <http://purl.uniprot.org/vgnc/37509> .
<http://purl.uniprot.org/vgnc/37509>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/VGNC> ;
  rdfs:comment "ACKR3" .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Deleted diseases

  • Mental retardation, autosomal dominant 8
  • Mental retardation, X-linked, syndromic, Borck type

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Cross-link' ('CROSSLNK' in the flat file):

  • Cyclopeptide (Cys-Pro)
  • Cyclopeptide (Gly-Pro)
  • Cyclopeptide (His-Pro)
  • Cyclopeptide (Leu-Pro)
  • Cyclopeptide (Met-Pro)
  • Cyclopeptide (Phe-Pro)
  • Cyclopeptide (Ser-Pro)
  • Cyclopeptide (Trp-Pro)
  • Cyclopeptide (Tyr-Pro)
  • Cyclopeptide (Val-Pro)

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2018_02

Published February 28, 2018

Headline

Escaping friendly fire

During the first hours of an infection, our safety relies almost entirely on the innate immune system, and predominantly on neutrophils. The encounter between neutrophils and invading microbes leads to neutrophil activation and to the engulfment of pathogens into intracellular phagosomes, where exposure to high concentrations of reactive oxygen species (ROS) and antimicrobial peptides eventually kill them. Neutrophils defend us not only in life but also in death, when they release chromatin and granule proteins that together form extracellular fibers, called 'neutrophil extracellular traps' or NETs, which catch and prevent the spread of microorganisms. NETs are covered with antimicrobial compounds, such as cathelicidin peptides, as well as histones, which can also effectively neutralize intruders. This process is so efficient that extracellular DNases able to catalyze NET disruption serve as virulence factors in several pathogenic bacteria, such as in group A Streptococcus.

NETs are a double-edged sword and have to be regulated very tightly. Indeed, free extracellular DNA is a potent trigger of autoimmune response, such as that encountered in systemic lupus erythematosus (SLE) that is characterized by circulating anti-DNA antibodies. NETs can also initiate vascular occlusion in a fibrin-independent manner. In other words, NETs are not an innocuous therapy in the middle/long term and the host has to get rid of them quickly. Timely removal of NET chromatin by DNases DNASE1 and DNASE1L3 has been shown to play a crucial role in the prevention of autoimmunity. However, it was not known until recently what mechanism was involved in NET clearance under inflammatory conditions. This issue was addressed by Jimenez-Alcazar and colleagues. They created knockout mice lacking both DNASE1 and DNASE1L3. Mutant animals were treated with granulocyte colony-stimulating factor (G-CSF) to induce chronic neutrophilia, a condition mimicking acute inflammation. While wild-type mice showed no sign of distress, all double knockout animals exhibited features of infection-induced thrombotic microangiopathies (TMAs) and died within 6 days. This phenotype could be reversed by the reintroduction of DNASE1 or DNASE1L3, but not by an anti-thrombotic treatment, further supporting the idea that NETs can clog vessels by themselves. TMAs are a well-known complication encountered by patients suffering from systemic bacterial infections. Analysis of lungs from patients with acute respiratory distress syndrome and/or sepsis revealed numerous NET-derived clots in their blood vessels. It is too early yet to propose DNase treatment for TMA patients, but at least it opens new therapeutic perspectives.

As of this release, murine DNASE1 and DNASE1L3 and their orthologs in other mammalian species have been updated and are now publicly available.

UniProtKB news

UniProtKB FASTA headers: Addition of NCBI taxonomy identifier

In order to avoid ambiguities and simplify parsing, we have added the NCBI taxonomy identifier to UniProtKB FASTA headers.

Previous format:

>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

New format:

>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

Where:

  • db is 'sp' for UniProtKB/Swiss-Prot and 'tr' for UniProtKB/TrEMBL.
  • UniqueIdentifier is the primary accession number of the UniProtKB entry.
  • EntryName is the entry name of the UniProtKB entry.
  • ProteinName is the recommended name of the UniProtKB entry as annotated in the RecName field. For UniProtKB/TrEMBL entries without a RecName field, the SubName field is used. In case of multiple SubNames, the first one is used. The 'precursor' attribute is excluded, 'Fragment' is included with the name if applicable.
  • OrganismName is the scientific name of the organism of the UniProtKB entry.
  • OrganismIdentifier is the unique identifier of the source organism, assigned by the NCBI.
  • GeneName is the first gene name of the UniProtKB entry. If there is no gene name, OrderedLocusName or ORFname, the GN field is not listed.
  • ProteinExistence is the numerical value describing the evidence for the existence of the protein.
  • SequenceVersion is the version number of the sequence.

Examples:

>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana OX=115339 GN=acantho2 PE=1 SV=1
>sp|P27748|ACOX_CUPNH Acetoin catabolism protein X OS=Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) OX=381666 GN=acoX PE=4 SV=2
>sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus OX=10090 PE=1 SV=1

>tr|Q3SA23|Q3SA23_9HIV1 Protein Nef (Fragment) OS=Human immunodeficiency virus 1  OX=11676 GN=nef PE=3 SV=1
>tr|Q8N2H2|Q8N2H2_HUMAN cDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens OX=9606 PE=2 SV=1

The same modification has been applied to FASTA headers of alternative isoforms in UniProtKB/Swiss-Prot), where the new format is:

>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName OX=OrganismIdentifier[ GN=GeneName]

Example:

>sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis OX=9541 GN=YWHAB

Cross-references to CarbonylDB

Cross-references have been added to the CarbonylDB database, a resource of protein carbonylation sites.

CarbonylDB is available at http://digbio.missouri.edu/CarbonylDB/.

The format of the explicit links is:

Resource abbreviation CarbonylDB
Resource identifier UniProtKB accession number

Example: P02768

Show all entries having a cross-reference to CarbonylDB.

Text format

Example: P02768

DR   CarbonylDB; P02768; -.

XML format

Example: P02768

<dbReference type="CarbonylDB" id="P02768"/>

RDF format

Example: P02768

uniprot:P02768
  rdfs:seeAlso <http://purl.uniprot.org/carbonyldb/P02768> .
<http://purl.uniprot.org/carbonyldb/P02768>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CarbonylDB> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • O-AMP-serine

RDF news

Change of URIs for OrthoDB

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the OrthoDB database have therefore been updated from:

http://purl.uniprot.org/orthodb/<ID>

to:

http://purl.orthodb.org/odbgroup/<ID>

If required for backward compatibility, you can use the following query to add the old URIs:

PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/orthodb> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/OrthoDB> .
   BIND(iri(concat('http://purl.uniprot.org/orthodb/', substr(str(?new),31))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/orthodb/<ID> URIs will be maintained.

UniProt release 2018_01

Published January 31, 2018

Zika virus: from petty crime to banditry

A Zika virus (ZIKV) outbreak in Brazil in 2015 drew the world's attention to this microbe (see UniProt headline). The situation was so severe that in February 2016 it was declared to be a 'public health emergency of international concern' by the World Health Organization (WHO), indicating that it constituted a public health risk to other States through the international spread of the disease and potentially required a coordinated international response.

ZIKV has been known for over 70 years since its first isolation from a febrile rhesus macaque in the Ugandan Zika forest. The clinical symptoms caused by ZIKV infection in humans were mild at that time, consisting of a self-limiting flu-like febrile illness that resolved within days and occurred in an estimated 20% of infected individuals. The picture of the recent epidemic was however dramatically different. ZIKV infection was associated with severe symptoms, including multi-organ failure. The most alarming feature was its ability to cause microcephaly, congenital malformations, and fetal demise in pregnant women.

When did the metamorphosis from an almost innocuous agent to a congenital pathogen with global impact occur? In the decades following its discovery, sporadic human ZIKV infections were reported in a few countries in Africa, and then the virus started spreading, first to Southeast Asia, to Micronesia in 2007, to French Polynesia in 2013-2014, and soon after to South and Central America. Comparison of ZIKV neurovirulence between 'ancestral' (African/Southeast Asian) and 'contemporary' (Polynesian/South American) strains was done by intracerebral injections of the virus in neonatal mice. All 3 contemporary strains led to 100% mortality, with typical neurological manifestations. By contrast, the 'ancestral' strain killed less than 17% of the animals. Moreover in a mouse embryonic microcephaly model, infection with a 'contemporary' ZIKV strain resulted in brains exhibiting a substantial degree of microcephaly contrary to the ancestral strain which caused less severe symptoms. Both viruses targeted neural progenitor cells, but the 'contemporary' strain showed significantly enhanced replication in the brain compared with the 'ancestral' one. Obviously something had changed between the 'ancestral' ZIKV and its 'contemporary' version, something that boosted ZIKV neurovirulence, but what?

This question was addressed by Yuan et al. Sequence alignments between 'ancestral' (INSDC accession number AY632535) and 'contemporary' (KJ776791) strains show many differences at the amino acid level. To find out which changes account for increased neurovirulence, several 'contemporary' strain-specific substitutions were introduced in the 'ancestral' strain and tested in neonatal mice. One of them, the substitution of a serine residue by an asparagine at position 139 (in the precursor polyprotein), S139N, greatly increased the neurovirulence of the ancestral strain. It also showed enhanced replication in neural progenitor cells and caused more extensive cell death compared with the original 'ancestral' virus. Conversely, when this residue was mutated back to serine in the 'contemporary' strain, mortality caused by the 'contemporary' virus in neonatal mice was significantly decreased. The ZIKV S139N substitution probably emerged in May 2013, a few months before the outbreak in French Polynesia, and was then stably maintained in the epidemic strain during its subsequent spread to the Americas. Its emergence correlates with reports of microcephaly and other severe neurological abnormalities.

After maturation of the genome polyprotein, position 139 is found in viral protein prM, which, in flaviviruses, closely associates with the envelope protein E and is believed to prevent premature fusion of immature virions inside infected cells. However, the mechanism through which the S139N substitution increases neurovirulence is not yet known.

At the beginning of 2016, UniProtKB/Swiss-Prot released the annotated sequence of a ZIKV genome polyprotein, corresponding to the East African 'ancestral' strain. In order to meet the needs of the scientific community, we have now released that of a 'contemporary' strain, isolated from a French Polynesian sample.

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • S-carbamoylcysteine
  • S-cyanocysteine

UniProt release 2017_12

Published December 20, 2017

Headline

Swiss-Prot in the sky with psilocybin: the biosynthesis pathway of a psychedelic drug unveiled

Psychedelic mushrooms, also called 'magic mushrooms', have been used by humans since prehistoric times and can be found depicted in Stone Age rock art in Europe and Africa. Some cultures have used them for religious rites and ceremonies, especially in pre-Columbian Mesoamerica. Aztecs and Mazatecs referred to them as genius mushrooms, divinatory mushrooms, and wondrous mushrooms. A Psilocybe species was known to the Aztecs as 'teōnanācatl', literally 'the divine mushroom'.

The effects of many psychedelic mushrooms come from the pro-drug psilocybin. When psilocybin is ingested, this natural compound is rapidly metabolized to yield psilocin. This latter acts as a serotonergic psychedelic substance. Its effects include euphoria, altered thinking, visual hallucinations, altered sense of time and spiritual experiences. Some consider the drug as an entheogen and a tool to supplement practices for transcendence. Psilocybin is considered to have low toxicity and harm potential, although some very rare cases of lethality have been reported. In most countries, psilocybin and psilocin are listed as schedule I drugs, i.e. compounds that have a high potential for abuse and are not recognized for medical use.

Nevertheless, over the last 30 years, the potential medical and psychological therapeutic benefits of psilocybin have been investigated. Clinical studies revealed a positive trend in the treatment of existential anxiety with advanced-stage cancer patients and for nicotine addiction. Studies on the clinical use of psilocybin against depression are ongoing.

The structures of both psilocybin and psilocin were determined in 1959 by Hofmann et al., but the basis of their biosynthesis has remained obscure for almost 60 years. The locus for the biosynthesis of psilocybin, called psi, has been recently identified in 2 out of over 100 species of psilocybin mushrooms, namely Psilocybe cubensis and Psilocybe cyanescens.

The psi locus encodes 4 psilocybin biosynthesis enzymes, including a new type of fungal L-tryptophan decarboxylase (psiD), a kinase (psiK), a methyltransferase (psiM), and a cytochrome P450 monooxygenase (psiH). All 4 have been characterized and are sufficient to produce psilocybin from the amino acid L-tryptophan. The first step of the psilocybin biosynthetic pathway is the decarboxylation of L-tryptophan to tryptamine by psiD. The cytochrome P450 monooxygenase psiH then converts tryptamine to 4-hydroxytryptamine. The kinase psiK catalyzes the 4-O-phosphorylation step by converting 4-hydroxytryptamine into norbaeocystin. The methyltransferase psiM eventually catalyzes iterative methyl transfer to the amino group of norbaeocystin to yield psilocybin via a monomethylated intermediate, called baeocystin. The psi locus also contains 2 major facilitator-type transporters (psiT1 and psiT2), as well as a cluster-specific transcriptional regulator (psiR).

As of this release, expertly annotated Psilocybe cubensis psi locus proteins psiD, psiH, psiK, psiM, psiR, psiT1, and psiT2 are publicly and legally available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted disease

  • Weissenbacher-Zweymueller syndrome

Changes in subcellular location controlled vocabulary

New subcellular locations:

Changes to keywords

New keyword:

UniProt release 2017_11

Published November 22, 2017

Headline

Sex determination in insects: 50 ways to achieve sex-specific splicing

The primary signals triggering sex determination in insects are amazingly diverse among various species and sometimes even between strains of the same species. These various signals converge on a single downstream conserved transformer gene (tra) which undergoes sex-specific splicing. In developing females, splicing results in the production of an active tra protein. Tra in turn regulates sex-specific splicing of another highly conserved gene of this signaling cascade, namely double-sex (dsx) which ultimately decides the sexual fate of the embryo. In males, tra splicing includes an exon containing several in-frame stop codons, resulting in a truncated, inactive isoform, unable to affect dsx splicing, resulting in a male-specific dsx isoform.

The primary signals can be environmental and genetic. In some species, temperature, population density or nutritional status can trigger the sexual fate of the embryo. In the most studied organism, Drosophila (fruit fly), the number of X chromosomes in the embryo is crucial: 2 X chromosomes lead to female development, 1 X results in males. Counting X chromosomes is a mechanism common to drosophilids, but rarely observed outside this genus. In other species, such as wasps, ants and bees, sexual fate depends upon the fertilization process: unfertilized eggs (haploid) give rise to males and fertilized diploid eggs to females. Yet other insects involve dominant Mendelian cues, which can be either male-determining (usually referred to as M-factor) as in many dipterans, or female-determining (F-factor) as in butterflies. Due to their bewildering diversity, these cues are difficult to pinpoint. Nevertheless recent years have seen a few major breakthroughs in the identification of M-factors.

In 2015, Hall et al. identified the M-factor Nix in the yellow fever mosquito Aedes aegypti. Nix is expressed very early in male embryonic development. Knockout of Nix results in the production of the dsx female isoform and feminization, while ectopic expression of Nix in females leads to the formation of nearly complete male genitalia. The evolution of Nix appears confined to a subset of mosquitoes: only the Asian tiger mosquito (Aedes albopictus) has an orthologous gene, while other genera, such as Anopheles or Culex, are negative.

The M-factor of Anopheles gambiae, identified in 2016, is encoded by the Yob gene and consists of a short, 56 amino acid protein. It is not homologous to Nix. Yob is activated at the beginning of zygotic transcription and expressed throughout a male's life. It controls male-specific splicing of dsx and several lines of evidence suggest that it is also involved in dosage compensation in this species in which females are XX and males XY. Indeed, the ectopic delivery of Yob mRNA is lethal to genetically female embryos, but has no discernible effect on the sexual development of genetic males. Its silencing in nonsexed embryos yields highly significant male deficiency in surviving mosquitoes.

Last, but not least, the third M-factor to be reported was that of the housefly. It was called Mdmd standing for Musca domestica male determiner. It encodes a 1,174 amino acid-long protein that is expressed very early in the zygote and maintained throughout male development until adulthood. In the absence of Mdmd, males turn into females capable of sexual reproduction. Here again, diversity is not an empty word: Mdmd is not conserved in all houseflies. It is absent in at least one strain for which the M-factor has been mapped onto a different chromosome. Mdmd does not share any similarity with Nix or Yob, but it has a paralog, namely the pre-mRNA-splicing factor Cwc22. Cwc22 is a spliceosome-associated protein that is indispensable for the assembly of the exon junction complex (EJC). Interestingly, it has been shown that changes in expression levels of EJC components also affect the splice site selection of alternatively spliced genes. The homology between Mdmd and Cwc22 brings us one step closer to alternative splicing and the mechanism of sex-specific tra production.

Multiple copies of the Mdmd gene have been found on chromosomes Y, II, III, or V. All 4 encoded proteins have been annotated and, along with the A. aegypti Nix and A. gambiae Yob products, they are now publicly available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Cross-link' ('CROSSLNK' in the flat file):

  • Glycyl cysteine thioester (Gly-Cys) (interchain with C-...)

New term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • 2,3-didehydroalanine (Tyr)

UniProt release 2017_10

Published October 25, 2017

Headline

Of smell and social life

Ants are arguably the greatest success story in the history of terrestrial metazoa. On average, ants constitute 15-20% of the terrestrial animal biomass. All the ~ 12,000 known ant species are eusocial, i.e. they have a sophisticated collective behavior characterized by a division of labor that creates groups, sometimes called castes, specialized in tasks such as reproduction, brood care and foraging . Individuals of one caste usually lose the ability to perform at least one behavior characteristic of individuals in another caste. It is thought that ant sociality depends on their sense of smell. Indeed while the Drosophila melanogaster genome contains about 69 odorant receptor (Or) genes, ant genomes have undergone a dramatic expansion with close to 350 Ors, representing one of the largest yet known repertoires of Ors among insects. Other chemosensory receptor genes, such as gustatory receptors and ionotropic glutamate receptors, have not undergone a similar expansion. Ors are expressed by Or neurons, which project axons to the antennal lobe, a region analogous to the olfactory bulb in vertebrates. The antennal lobe consists of numerous globule-shaped neuropils known as glomeruli, where initial synaptic integration occurs before olfactory information is sent to the central brain. Here again a drastic amplification occurred. Close to 450 glomeruli have been identified in the Camponotus floridanus ant versus only 42 in Drosophila. These observations are consistent with a crucial role for odorant perception in the complex chemical communication in ants, but so far there has no genetic confirmation of this hypothesis.

In insects, Ors dimerize with a highly conserved 7-transmembrane protein called Orco (Odorant Receptor COreceptor) and form ligand-gated ion channels that activate Or neurons upon odorant binding. Orco knockout in fruit flies, locusts, mosquitoes, and moths impairs responses to odorants. An Orco knockout in ants would allow testing of the hypothesis that the expanded Or repertoire is required for chemical communication. However social insects are especially hard to genetically modify, the eggs of ants are very sensitive and difficult to raise without workers, and the life cycle is complicated and drawn out, making it difficult to obtain large quantities of genetically modified offspring in a reasonable time frame.
In spite of these difficulties, 2 teams managed to successfully knockout Orco using CRISPR/Cas9 technology, providing the scientific community with the first genetically modified ants. This achievement was made possible through tenacity and a smart choice of the ant species. Yan et al. worked on Harpegnathos saltator ants. This species shows a remarkable reproductive plasticity: in the absence of a queen or when a worker is completely isolated, non-reproductive workers can become reproductive pseudoqueens (or gamergaters). It is thought that this transition is induced by the lack of queen pheromones which normally would repress it. When isolated, unmated gamergaters lay unfertilized eggs that develop into haploid males. Taking advantage of the gamergate transition, Yan et al. generated hemizygous mutant males. The transgenic males were identified by forewing genotyping. They did not exhibit any overt phenotype and were fully fertile. They could be crossed to receptive females to produce heterozygous and homozygous mutant females. Identification of transgenic females was more complicated. Females have no wings and could be genotyped only after being sacrified. All experiments were therefore done blindly. This denotes a rare enthusiasm for science that merits being emphasized!

Trible et al. chose Ooceraea biroi, a very distantly related species, as it diverged some 100 million years ago from H. saltator. Unlike most other ant species, O. biroi reproduces via parthenogenesis, so stable germ-line modifications can be obtained from the clonal progeny of injected individuals without laboratory crosses.

Both groups observed consistent phenotypes. The response to general odorants was reduced. Mutant insects wandered out of the social group and were unable to forage successfully. They did not produce progeny because they laid very few eggs and did not care for their brood. They appeared to be largely unable to communicate with conspecifics. Unexpectedly they exhibited a dramatic decrease in the size of the antennal lobes, as well as in the number of glomeruli. The remaining glomeruli tended to be bigger than in wild-type ants. The reason for this neuro-anatomical phenotype is unclear at this stage. However these results confirm the central role of olfaction in eusocial behavior.

As of this release, freshly annotated Harpegnathos saltator and Ooceraea biroi Orco entries are available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Changes to the controlled vocabulary for PTMs

New terms for the feature key 'Cross-link' ('CROSSLNK' in the flat file):

  • Cyclopeptide (Gly-Arg)
  • Cyclopeptide (Ser-Lys)

New term for the feature key 'Lipidation' ('LIPID' in the flat file):

  • S-palmitoleoyl cysteine

New terms for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • ADP-ribosyl glutamic acid
  • N6-(2-hydroxyisobutyryl)lysine
  • N6-butyryllysine
  • N6-poly(beta-hydroxybutyryl)lysine
  • N6-propionyllysine
  • O3-poly(beta-hydroxybutyryl)serine
  • S-poly(beta-hydroxybutyryl)cysteine

Modified term for the feature key 'Modified residue' ('MOD_RES' in the flat file):

  • N6-(beta-hydroxybutyrate)lysine -> N6-(beta-hydroxybutyryl)lysine

Modified term for the feature key 'Lipidation' ('LIPID' in the flat file):

  • O-palmitoleyl serine -> O-palmitoleoyl serine

Changes to keywords

New keyword:

Modified keyword:

UniProt release 2017_09

Published September 27, 2017

Headline

Protein translation goes round in circles

Covalently closed circular RNA molecules (circRNAs) were observed over 40 years ago in viruses. Later on, they were discovered in non-infected eukaryotes. In 1993, Capel et al. reported the existence of unusual circular Sry transcripts in mouse testis where they represented the most abundant transcript. These peculiar RNA species have generally been considered to be of low abundance, likely representing errors in splicing. Recent studies have shown however that they may actually be quite numerous and produced by thousands of genes. In addition, they are evolutionarily conserved. CircRNAs are generated by the spliceosome via backsplicing, a process in which the 3'-end of an exon is covalently linked to the 5'-end of an upstream exon. As a result, they lack typical mRNA terminal structures, such as 5' cap and polyA tail. This feature leads to exonuclease resistance, allowing circRNAs to escape from normal RNA turnover processes.

The physiological functions of circRNAs have not yet been extensively explored. Some have been shown to act as microRNA sponges. They can also function as platforms for protein interaction. For instance, circ-FOXO3 represses cell cycle progression by binding to the cell cycle proteins CDK2 and CDKN1A (p21), resulting in the formation of a ternary complex. Circ-MBL/MBNL1 binds to the RNA-binding MBNL1 protein and regulates gene expression by competing with pre-mRNA linear splicing of its linear counterpart.

At this point, you may wonder why UniProtKB, a protein resource, is interested in circRNAs. Most circRNAs originate from protein-coding genes and contain complete exons. In theory they could be translated, but there has been no direct evidence for in vivo translation of endogenous transcripts, and they were classified as non-coding RNAs.

A major breakthrough came from a study done in human and mouse muscles published last April. Muscles not only produce thousands of circular splicing events, but the expression of circRNAs is also differentially regulated during myoblast differentiation. Among them, circ-ZNF609, a transcript that originates from the circularization of the first coding exon of ZNF609 gene, is down-regulated during myogenesis. Circ-ZNF609 contains the initiation codon of the linear ZNF609 transcript, a putative 753-nucleotide open reading frame and a STOP codon created 3 nucleotide after the splice junction by the circularization event with the upstream ZNF609 5'-UTR. In human myoblasts, the knockdown of circ-ZNF609, but not that of its linear transcript, reduces cell proliferation by about 80%, suggesting a specific role in the regulation of myoblast proliferation. Circ-ZNF609 transcripts are located in the cytoplasm where they are associated with heavy polysomes. They are translated in a cap-independent manner, though less efficiently than their linear counterparts and produce a new 250-amino acid long ZNF609 isoform, both in human and mouse cells. The translation is driven by an internal ribosomal entry site (IRES) located within the 5'-UTR. In vivo translation of at least some circRNAs was confirmed in Drosophila in the same issue of Molecular Cell.

In June 1963, Sidney Brenner wrote to Max Perutz: 'It is now widely realized that nearly all the 'classical' problems of molecular biology have either been solved or will be solved in the next decade.' One could think that the process in which genetic information is transcribed and processed into functional RNAs would be such 'classical' problem, but it seems that there are still plenty of discoveries to be made in this field, for our greatest pleasure.

Human and mouse ZNF609 UniProtKB/Swiss-Prot entries have been updated and the new isoforms encoded by circ-ZNF609 integrated, with the help of Dr. Legnini whom we want to sincerely thank. The revised entries are publicly available as of this release.

Cross-references to CORUM

Cross-references have been added to the CORUM database, a resource of manually annotated protein complexes from mammalian organisms.

CORUM is available at http://mips.helmholtz-muenchen.de/corum/

The format of the explicit links is:

Resource abbreviation CORUM
Resource identifier UniProtKB accession number

Example: P41182

Show all entries having a cross-reference to CORUM.

Text format

Example: P41182

DR   CORUM; P41182; -.

XML format

Example: P41182

<dbReference type="CORUM" id="P41182"/>

RDF format

Example: P41182

uniprot:P41182
  rdfs:seeAlso <http://purl.uniprot.org/corum/P41182> .
<http://purl.uniprot.org/corum/P41182>
rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CORUM> .

Changes to the controlled vocabulary of human diseases

New diseases: