uniprot logo

News

UniProt release 2019_03

Published April 10, 2019

Headline

A drug arsenal from lupins

Digging into traditional medicines to find new drugs is a proven and fruitful strategy. Think of forskolin, a very effective activator of adenylate cyclase, used daily in numerous laboratories. This agent is produced by the Ayurvedic herb Plectranthus barbatus, which used to be recommended, among others, to treat cardiovascular disorders. Or the anticancer drug paclitaxel (taxol), isolated from Taxus brevifolia, the Pacific yew. Or artemisinin, the most efficient treatment against malaria, which derives from Artemisia annua, also called sweet wormwood, a herb employed in Chinese traditional medicine. Plant metabolites and direct derivatives thereof constitute more than a third of currently approved pharmaceuticals. Lupin seeds also belong to the traditional pharmacopoeia on all continents where it has been cultivated. The great Persian physician Avicenna recommended lupin seed flour, mixed with fenugreek and zedoary, to treat diabetes, as he noticed that this mixture considerably decreased sugar excretion in patients. A thousand years after his observation, the lupin protein mediating this effect, gamma-conglutin, has been identified.

In the lupin seed, most conglutins are storage proteins, which are hydrolyzed during germination and nourish the early stages of seedling growth. By contrast, gamma-conglutin is resistant to proteolysis. In this context, its physiological role in the seed is puzzling, but we have a little more insight into its effect on mammalian cells and organisms. Magni et al. reported that hyperglycemic rats experienced a substantial normalization of blood glucose levels after oral administration of white lupin (Lupinus albus) gamma-conglutin. The decrease in sugar blood level was comparable to that obtained with metformin, a well-established medication for the treatment of type 2 diabetes. This observation was later confirmed and extended to small groups of human volunteers.

After ingestion, gamma-conglutin is not digested in the gastrointestinal tract and the intact protein may be translocated across the intestinal barrier through transcytosis. Once in the blood, it may act at several levels. It seems to bind insulin and may potentiate its activity. When myocytes are incubated with gamma-conglutin, they activate signaling pathways similar to those of insulin, including the activation of the insulin receptor substrate 1 (IRS1), AKT1, and EIF4EBP1/PHAS1. Gamma-conglutin peptides produced in vitro can also inhibit dipeptidyl peptidase-4 (DPP4), an enzyme which degrades incretins, a group of metabolic hormones that stimulate a decrease in blood glucose levels. Gamma-conglutin also enhances the cell surface expression of glucose transporters, including SLC2A4 (GLUT4), and inhibits gluconeogenesis in hepatocytes.

More investigations are needed before gamma-conglutin becomes a drug for type 2 diabetes, but in view of the dynamics of the diabetes epidemic, it seems that nature may be giving us a hand in new drug development.

It’s time now for you to enjoy a lupin bean snack and consult our newly annotated UniProtKB/Swiss-Prot lupin gamma-conglutin entries.

UniProt website news

Search for small molecules via InChiKey

We have recently enhanced enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions that uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants, their chemical structures, and chemical transformations. We also use ChEBI to annotate enzyme cofactors in UniProtKB.

You can now search UniProtKB for small molecule reaction participants and cofactors using the InChIKey, a standard hashed representation of the IUPAC International Chemical Identifier (InChI) that provides a unique and compact representation of chemical structure data. The UniProt website supports flexible chemical structure searches with the complete InChIKey, as well as with the connectivity and stereochemistry layers, or the connectivity layer alone. You can search our “Catalytic activity” or “Cofactor” annotations, or both combined, by using the new “Small molecule” advanced search field:

This new InChIKey-based search will help unlock the power of chemical structure data in UniProtKB, particularly when combined with our existing search tools and options for biological data. It complements the chemical ontology search, which allows users to search UniProtKB for chemical classes of biological interest like lipids, amino acids, sugars and specializations thereof, using identifiers from the ChEBI ontology of small molecules.

UniProtKB news

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • (Z)-2,3-didehydroaspartate

UniProt release 2019_02

Published February 13, 2019

Headline

Let’s twist again with Myo1D

At first glance, we look bilaterally symmetrical. Our left side appears pretty much the mirror image of the right one. For our internal organs, it’s a completely different story. For instance, our heart is on the left of our body, while the liver lies to the right. Macroscopic left-right patterning is only one aspect of an organism’s asymmetry. Actually all known life forms show asymmetric properties in chemical structures, as well as in macroscopic anatomy, development and behavior. However, not much is known about the nature of the link between molecular-level and macroscopic asymmetry.

Studies in Drosophila led to the discovery of a crucial role for an unconventional myosin, called Myo31DF or Myo1D, in left-right asymmetry. Myo1D inactivation in the fly can reverse handedness of the gut and testes. In a recent publication, Lebreton et al. have extended these observations, showing that ectopic expression of Myo1D in ‘naive’ tissues, i.e. devoid of left-right asymmetry, such as epidermis and trachea, was sufficient to drive laterality. In the larval epidermis, Myo1D expression induced dextral twisting of the whole larval body, which could rotate up to 180°, resulting in abnormal crawling behavior. In the trachea, pronounced right-handed twisting, with a spiraling ribbon shape with multiple turns, was observed instead of the smooth and linear conformation of the wild-type tissue. This asymmetry was also seen at the cellular level. In control conditions, epidermal cells were perpendicular to the anterior-posterior axis. In contrast, cells ectopically expressing Myo1D showed elongation and a clear shift in membrane orientation toward one side. Myo1D functions as an actin-based motor protein with ATPase activity and this activity was required for the establishment of left-right asymmetry. In vitro Myo1D caused actin filaments to move in anticlockwise circular motion, suggesting that the multiscale property of Myo1D emerges from its molecular interaction with F-actin.

Does this conclusion apply to vertebrates? This answer is not straightforward. Experiments in some vertebrates point to a role for MYO1D in left-right patterning. In Xenopus, MYO1D morpholino knockdown affected organ placement in over 50% of the morphant tadpoles. In Zebrafish, MYO1D plays a role in the formation of Kupffer’s vesicle, an organ that functions as left-right organizer during embryogenesis. However, in rat, MYO1D knockout didn’t lead to visceral situs inversus and caused no obvious motor defects, indicating that, at least in certain mammals, MYO1D is not involved in left-right body asymmetry.

As of this release, UniProtKB/Swiss-Prot MYO1D entries have been updated and are publicly available.

UniProtKB news

Removal of the cross-references to CleanEx

Cross-references to CleanEx have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • ADP-ribosyldiphthamide

RDF news

Change of URIs for Orphanet

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the Orphanet database have therefore been updated from:

http://purl.uniprot.org/orphanet/<ID>

to:
http://www.orpha.net/ORDO/Orphanet_<ID>

If required for backward compatibility, you can use the following query to add the old URIs:
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/orphanet> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/Orphanet> .
   BIND,32))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/orphanet/<ID> URIs will be maintained.

UniProt release 2019_01

Published January 16, 2019

Headline

Engaging and disengaging: CRISPR rings

CRISPR-Cas systems are an RNA-guided adaptive immune response that bacteria and archaea use to defend against invasive genetic elements of bacterial (plasmid) or viral origin. Pieces of foreign DNA incorporated into CRISPR arrays provide a “memory” of having encountered the invader. These arrays are transcribed and processed, and the resulting CRISPR RNA (crRNA) is used by the interference complex to recognize the invader if it is re-encountered. Once recognized, foreign nucleic acids are quickly degraded, providing immunity. There are different types of CRISPR-Cas systems, mainly characterized by the presence or absence of certain Cas proteins. For example, the Cas3, Cas9, and Cas10 proteins are hallmarks of the CRISPR/Cas types I, II and III, respectively. The best known system is the type II Cas9-encoding system, which has been coopted by scientists for genome editing. The most intriguing one is the type III system, which has additional, novel control mechanisms not found in the other systems.

The type III interference complex is composed of crRNA, Cas10 and proteins Csm2, Csm3, Csm4 and Csm5. Once the target RNA has bound to the Csm interference complex it is cleaved by the complex, which acts as a sequence-specific endoribonuclease (RNase). There is an additional component to this system: Csm6. Under basal conditions, Csm6 is an inactive RNase and is not part of the Csm complex, however its presence is required for full CRISPR-Cas immunity where it non-specifically degrades invader-derived RNA transcripts. How then is Csm6 RNase activity turned on and, once activated, how is it turned off, considering that an uncontrolled RNase activity could be detrimental to the cell? The answer to these questions has been revealed in recent publications. Homodimeric Csm6 is activated by cyclic oligoadenylates (cOA), ring-shaped second messengers synthesized by the C-terminal GGDEF (also called Palm) domain of Cas10. Binding of cOA to the Csm6 dimer interface pocket formed by its CARF (CRISPR-associated Rossman fold) domains allosterically regulates its RNase activity. The type of cyclic oligoadenylates produced is species-specific. Streptococcus thermophilus and Enterococcus italicus make cyclic hexaadenylate (cA6), while Csm6 of Thermus thermophilus is stimulated by cyclic tetraadenylate (cA4), suggesting Cas10 in this organism synthesizes cA4. As the target RNA associated with the CRISPR complex is degraded, the cOA synthase activity of Cas10 shuts off, halting second messenger synthesis. Additionally, 2 proteins with ring-specific nuclease activity able to degrade cOA have been recently isolated from Saccharolobus solfataricus (formerly called Sulfolobus solfataricus), which would turn down Csm6 activity and prevent uncontrolled degradation of cellular RNA.

As of this release several Cas10 proteins and the ring nucleases of S.solfataricus have been annotated and can be retrieved.

UniProtKB news

Cross-references to jPOST

Cross-references have been added to jPOST, a proteomics database containing re-analysis results with unified criteria for raw data from several ProteomeXchange (PX) repositories.

jPOST is available at https://globe.jpostdb.org/.

The format of the explicit links is:

Resource abbreviation jPOST
Resource identifier UniProtKB accession number

Example: Q8IY92

Show all entries having a cross-reference to jPOST.

Text format

Example: Q8IY92

DR   jPOST; Q8IY92; -.

XML format

Example: Q8IY92

<dbReference type="jPOST" id="Q8IY92"/>

RDF format

Example: Q8IY92

uniprot:Q8IY92
  rdfs:seeAlso <http://purl.uniprot.org/jpost/Q8IY92> .
<http://purl.uniprot.org/jpost/Q8IY92>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/jPOST> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Limb-girdle muscular dystrophy 1A
  • Limb-girdle muscular dystrophy 1B
  • Limb-girdle muscular dystrophy 1C
  • Limb-girdle muscular dystrophy 2R

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • ADP-ribosylglycine
  • ADP-ribosyltyrosine

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2018_11

Published December 5, 2018

Headline

Enhanced enzyme annotation in UniProtKB using Rhea

This release marks a major advance in the way UniProt describes enzyme function, with the introduction of Rhea as a vocabulary to annotate and represent enzyme-catalysed reactions in UniProtKB.

Rhea is a comprehensive expert-curated knowledgebase of biochemical reactions that uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants, their chemical structures, and chemical transformations. Rhea provides stable unique identifiers for reactions and standard computationally tractable descriptors for chemical transformations.

The enhanced enzyme annotations created using Rhea will form the basis of new search and identifier mapping services in UniProtKB that combine knowledge of small molecules and proteins. They will help UniProt users to more easily integrate and analyse metabolomic data, annotate metabolic networks and models, or mine reaction data to study enzyme evolution and predict new pathways for drug production or bioremediation.

Recent publications provide additional information on Rhea reactions and examples of services that integrate Rhea with biological knowledge from UniProtKB; we hope these will inspire you to dig deeper into the wealth of enzyme data in UniProtKB.

For further technical details about this change see below.

UniProtKB news

Standardization of ‘Catalytic activity’ annotations

A ‘Catalytic activity’ annotation describes a catalytic activity of an enzyme, i.e. a chemical reaction that the enzyme catalyzes. Up to now, UniProt has followed the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) for the description of enzymatic activities, except for reactions that are described in the scientific literature, but that are not (yet) covered by the NC-IUBMB. The focus of the NC-IUBMB is the nomenclature and classification of enzymes by the reactions they catalyze. For this purpose the NC-IUBMB typically describes an exemplary reaction for each class of enzymes, with the understanding that individual members of the class may use alternative reactants. The NC-IUBMB use their own names for the reactants. To allow UniProt to curate reactions at the level of specific enzymes instead of enzyme classes, and to use standardized names for reactants, we now use chemical reaction descriptions from the Rhea database whenever possible. Rhea uses the ChEBI (Chemical Entities of Biological Interest) ontology to describe reaction participants that are small molecules as well as the reactive groups of large molecules (such as amino acid residues within proteins). These large molecules are identified by a RHEA-COMP identifier. For catalytic activities that can only be described in the form of free text, we continue to follow the NC-IUBMB descriptions. We have also started to curate the physiological direction of a reaction, i.e. the direction of the net flow of reactants in vivo, where evidence for it is available.

Due to their focus on nomenclature, cross-references to Enzyme Commission (EC) numbers have historically been added to the Protein names subsection of UniProtKB entries. To link the EC numbers to the reactions on which they are based, we now also add them to ‘Catalytic activity’ annotations.

‘Catalytic activity’ annotations are found in UniProtKB entries, as well as in UniRule and SAAS annotation rules.

Below is a description of how this change affects the different file formats in which UniProt entries are distributed.

Text format

Note: Regex symbols indicate whether a pattern (as delimited by parentheses) is optional (?) or may occur 1 or more times (+).

Reaction description from Rhea:

 CC   -!- CATALYTIC ACTIVITY:
 CC       Reaction=<RheaText>; Xref=<RheaXref>(, <ReactantXref>)+;
 CC        ( EC=<EcNumber>;)?( Evidence={<Evidences>};)?
(CC       PhysiologicalDirection=left-to-right; Xref=<RheaXref>; Evidence={<Evidences>};)?
(CC       PhysiologicalDirection=right-to-left; Xref=<RheaXref>; Evidence={<Evidences>};)?

Where:

  • <RheaText>: Textual representation of an undirectional Rhea reaction.
  • <RheaXref>: Cross-reference to a Rhea reaction (Rhea:n).
  • <ReactantXref>: Cross-reference to a reactant from ChEBI (CHEBI:n) or Rhea (RHEA-COMP:n).
  • <EcNumber>: EC number of the corresponding enzyme class, when available.
  • <Evidences>: List of evidences, when available.

Example: O36015

Previous format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY: S-adenosyl-L-methionine +
CC       cytidine(32)/guanosine(34) in tRNA = S-adenosyl-L-homocysteine +
CC       2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA.
CC       {ECO:0000255|HAMAP-Rule:MF_03162}.

New format (based on Rhea):

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=cytidine(32)/guanosine(34) in tRNA + 2 S-adenosyl-L-
CC         methionine = 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in
CC         tRNA + 2 H(+) + 2 S-adenosyl-L-homocysteine;
CC         Xref=Rhea:RHEA:42396, Rhea:RHEA-COMP:10246, Rhea:RHEA-
CC         COMP:10247, ChEBI:CHEBI:15378, ChEBI:CHEBI:57856,
CC         ChEBI:CHEBI:59789, ChEBI:CHEBI:74269, ChEBI:CHEBI:74445,
CC         ChEBI:CHEBI:74495, ChEBI:CHEBI:82748; EC=2.1.1.205;
CC         Evidence={ECO:0000255|HAMAP-Rule:MF_03162};

Example: A0A0S3QTD0

Previous format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY: Acetyl-CoA + H(2)O + oxaloacetate = citrate +
CC       CoA. {ECO:0000269|PubMed:29420286}.

New format (based on Rhea):

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=acetyl-CoA + H2O + oxaloacetate = citrate + CoA + H(+);
CC         Xref=Rhea:RHEA:16845, ChEBI:CHEBI:15377, ChEBI:CHEBI:15378,
CC         ChEBI:CHEBI:16452, ChEBI:CHEBI:16947, ChEBI:CHEBI:57287,
CC         ChEBI:CHEBI:57288; EC=2.3.3.16;
CC         Evidence={ECO:0000269|PubMed:29420286};
CC       PhysiologicalDirection=left-to-right; Xref=Rhea:RHEA:16846;
CC         Evidence={ECO:0000269|PubMed:29420286};
CC       PhysiologicalDirection=right-to-left; Xref=Rhea:RHEA:16847;
CC         Evidence={ECO:0000269|PubMed:29420286};

Reaction description from NC-IUBMB:

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=<IUBMBText>; EC=<EcNumber>;( Evidence={<Evidences>};)?

Where:

  • <IUBMBText>: An NC-IUBMB reaction description.
  • <EcNumber>: EC number of the corresponding enzyme class.
  • <Evidences>: List of evidences, when available.

Example: P17050

Previous format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY: Cleavage of non-reducing alpha-(1->3)-N-
CC       acetylgalactosamine residues from human blood group A and AB mucin
CC       glycoproteins, Forssman hapten and blood group A lacto series
CC       glycolipids. {ECO:0000269|PubMed:19683538}.

New format (based on NC-IUBMB):

CC   -!- CATALYTIC ACTIVITY:
CC       Reaction=Cleavage of non-reducing alpha-(1->3)-N-
CC         acetylgalactosamine residues from human blood group A and AB
CC         mucin glycoproteins, Forssman hapten and blood group A lacto
CC         series glycolipids.; EC=3.2.1.49;
CC         Evidence={ECO:0000269|PubMed:19683538};

XML format

We have extended the UniProt XSD with new elements and types as shown below in red color:

    <xs:complexType name="commentType">
        ...
        <xs:sequence>
            <xs:element name="molecule" type="moleculeType" minOccurs="0"/>
            <xs:choice minOccurs="0">
                ...
                <xs:sequence>
                    <xs:annotation>
                        <xs:documentation>Used in 'catalytic activity' annotations.</xs:documentation>
                    </xs:annotation>
                    <xs:element name="reaction" type="reactionType"/>
                    <xs:element name="physiologicalReaction" type="physiologicalReactionType" minOccurs="0" maxOccurs="2"/>
                </xs:sequence>
                ...
            </xs:choice>
            ...
        </xs:sequence>
        ...
    </xs:complexType>
    ...
    <xs:complexType name="reactionType">
        <xs:annotation>
            <xs:documentation>Describes a chemical reaction.</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="text" type="xs:string"/>
            <xs:element name="dbReference" type="dbReferenceType" minOccurs="1" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="evidence" type="intListType" use="optional"/>
    </xs:complexType>

    <xs:complexType name="physiologicalReactionType">
        <xs:annotation>
            <xs:documentation>Describes a physiological reaction.</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="dbReference" type="dbReferenceType"/>
        </xs:sequence>
        <xs:attribute name="direction" use="required">
            <xs:simpleType>
                <xs:restriction base="xs:string">
                    <xs:enumeration value="left-to-right"/>
                    <xs:enumeration value="right-to-left"/>
                </xs:restriction>
            </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="evidence" type="intListType" use="optional"/>
    </xs:complexType>

Reaction description from Rhea:

Example: O36015

Previous format (based on NC-IUBMB):

<comment type="catalytic activity">
  <text evidence="1">S-adenosyl-L-methionine + cytidine(32)/guanosine(34) in tRNA = S-adenosyl-L-homocysteine + 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA.</text>
</comment>

New format (based on Rhea):

<comment type="catalytic activity">
  <reaction evidence="1">
    <text>cytidine(32)/guanosine(34) in tRNA + 2 S-adenosyl-L-methionine = 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA + 2 H(+) + 2 S-adenosyl-L-homocysteine</text>
    <dbReference type="Rhea" id="RHEA:42396"/>
    <dbReference type="Rhea" id="RHEA-COMP:10246"/>
    <dbReference type="Rhea" id="RHEA-COMP:10247"/>
    <dbReference type="ChEBI" id="CHEBI:15378"/>
    <dbReference type="ChEBI" id="CHEBI:57856"/>
    <dbReference type="ChEBI" id="CHEBI:59789"/>
    <dbReference type="ChEBI" id="CHEBI:74269"/>
    <dbReference type="ChEBI" id="CHEBI:74445"/>
    <dbReference type="ChEBI" id="CHEBI:74495"/>
    <dbReference type="ChEBI" id="CHEBI:82748"/>
    <dbReference type="EC" id="2.1.1.205"/>
  </reaction>
</comment>

Example: A0A0S3QTD0

Previous format (based on NC-IUBMB):

<comment type="catalytic activity">
  <text evidence="2">Acetyl-CoA + H(2)O + oxaloacetate = citrate + CoA.</text>
</comment>

New format (based on Rhea):

<comment type="catalytic activity">
  <reaction evidence="2">
    <text>acetyl-CoA + H2O + oxaloacetate = citrate + CoA + H(+)</text>
    <dbReference type="Rhea" id="RHEA:16845"/>
    <dbReference type="ChEBI" id="CHEBI:15377"/>
    <dbReference type="ChEBI" id="CHEBI:15378"/>
    <dbReference type="ChEBI" id="CHEBI:16452"/>
    <dbReference type="ChEBI" id="CHEBI:16947"/>
    <dbReference type="ChEBI" id="CHEBI:57287"/>
    <dbReference type="ChEBI" id="CHEBI:57288"/>
    <dbReference type="EC" id="2.3.3.16"/>
  </reaction>
  <physiologicalReaction direction="left-to-right" evidence="2">
    <dbReference type="Rhea" id="RHEA:16846"/>
  </physiologicalReaction>
  <physiologicalReaction direction="right-to-left" evidence="2">
    <dbReference type="Rhea" id="RHEA:16847"/>
  </physiologicalReaction>
</comment>

Reaction description from NC-IUBMB:

Example: P17050

Previous format (based on NC-IUBMB):

<comment type="catalytic activity">
  <text evidence="6">Cleavage of non-reducing alpha-(1->3)-N-acetylgalactosamine residues from human blood group A and AB mucin glycoproteins, Forssman hapten and blood group A lacto series glycolipids.</text>
</comment>

New format (based on NC-IUBMB):

<comment type="catalytic activity">
  <reaction evidence="6">
    <text>Cleavage of non-reducing alpha-(1->3)-N-acetylgalactosamine residues from human blood group A and AB mucin glycoproteins, Forssman hapten and blood group A lacto series glycolipids.</text>
    <dbReference type="EC" id="3.2.1.49"/>
  </reaction>
</comment>

RDF format

Note: Evidence-related statements are omitted since their format does not change. In the previous format, evidence was attributed via reification of the rdfs:comment statement. In the new format, the up:catalyticActivity and up:catalyzedPhysiologicalReaction statements are reified.

Reaction description from Rhea:

Example: O36015

Previous format (based on NC-IUBMB):

uniprot:O36015
  up:annotation <O36015#SIP5A4ED6FF66BBF481> .

<O36015#SIP5A4ED6FF66BBF481>
  rdf:type up:Catalytic_Activity_Annotation ;
  rdfs:comment "S-adenosyl-L-methionine + cytidine(32)/guanosine(34) in tRNA = S-adenosyl-L-homocysteine + 2'-O-methylcytidine(32)/2'-O-methylguanosine(34) in tRNA." .

New format (based on Rhea):

uniprot:O36015
  up:annotation <O36015#SIP962CEE3C69B2533E> .

<O36015#SIP962CEE3C69B2533E>
  rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <O36015#SIP6D2D3E976AAD17F0> .

<O36015#SIP6D2D3E976AAD17F0>
  rdf:type up:Catalytic_Activity ;
  up:catalyzedReaction <http://rdf.rhea-db.org/42396> ;
  up:enzymeClass enzyme:2.1.1.205 .

Example: A0A0S3QTD0

Previous format (based on NC-IUBMB):

uniprot:A0A0S3QTD0
  up:annotation <A0A0S3QTD0#SIPF04A1EC4C8EBCB08> .

<A0A0S3QTD0#SIPF04A1EC4C8EBCB08>
  rdf:type up:Catalytic_Activity_Annotation ;
  rdfs:comment "Acetyl-CoA + H(2)O + oxaloacetate = citrate + CoA." .

New format (based on Rhea):

uniprot:A0A0S3QTD0
  up:annotation <A0A0S3QTD0#SIP8171B3125ADE4E9D> .

<A0A0S3QTD0#SIP8171B3125ADE4E9D>
  rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <A0A0S3QTD0#SIP1A91565011EC50F6> ;
  up:catalyzedPhysiologicalReaction <http://rdf.rhea-db.org/16846> ,
                                    <http://rdf.rhea-db.org/16847> .

<A0A0S3QTD0#SIP1A91565011EC50F6>
  rdf:type up:Catalytic_Activity ;
  up:catalyzedReaction <http://rdf.rhea-db.org/16845> ;
  up:enzymeClass enzyme:2.3.3.16 .

Reaction description from NC-IUBMB:

Example: P17050

Previous format (based on NC-IUBMB):

uniprot:P17050
  up:annotation <P17050#SIP0FD272930B1683DE> .

<P17050#SIP0FD272930B1683DE>
  rdf:type up:Catalytic_Activity_Annotation ;
  rdfs:comment "Cleavage of non-reducing alpha-(1->3)-N-acetylgalactosamine residues from human blood group A and AB mucin glycoproteins, Forssman hapten and blood group A lacto series glycolipids." .
  

New format (based on NC-IUBMB):

uniprot:P17050
  up:annotation <P17050#SIP0FD272930B1683DE> .

<P17050#SIP0FD272930B1683DE>
  rdf:type up:Catalytic_Activity_Annotation ;
  up:catalyticActivity <P17050#SIP0FD272930B1683DF> .

<P17050#SIP0FD272930B1683DF>
  rdf:type up:Catalytic_Activity ;
  skos:closeMatch enzyme:3.2.1.49#SIP0FD272930B1683DG ;
  up:enzymeClass enzyme:3.2.1.49 .

Change of the RDF representation of enzyme related data

We have changed the RDF representation of ENZYME records in order to refer from UniProt ‘Catalytic activity’ annotations to individual enzymatic activities. The range of the activity predicate has been changed to the type Catalytic_Activity.

Example: 1.11.1.21

Previous format:

enzyme:1.11.1.21
  rdf:type up:Enzyme ;
  skos:prefLabel "Catalase peroxidase" ;
  up:activity "Donor + H(2)O(2) = oxidized donor + 2 H(2)O." ;
  up:activity "2 H(2)O(2) = O(2) + 2 H(2)O." ;
  ...

New format:

enzyme:1.11.1.21
  rdf:type up:Enzyme ;
  skos:prefLabel "Catalase peroxidase" ;
  up:activity <1.11.1.21#SIP017EC216DF0EDC2A> ;
  up:activity <1.11.1.21#SIP018ED427AB1BAS3X> ;
  ...

<1.11.1.21#SIP017EC216DF0EDC2A>
  rdf:type up:Catalytic_Activity ;
  rdfs:label "Donor + H(2)O(2) = oxidized donor + 2 H(2)O." .

<1.11.1.21#SIP018ED427AB1BAS3X>
  rdf:type up:Catalytic_Activity ;
  rdfs:label "2 H(2)O(2) = O(2) + 2 H(2)O." .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Deafness, autosomal recessive, 105

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • Murein peptidoglycan amidated serine

Changes in subcellular location controlled vocabulary

New subcellular location:

UniProt release 2018_10

Published November 7, 2018

Headline

You’re not coming in!

Sexual reproduction is a great process to diversify the genetic pool and to accelerate evolution. However, it imposes tight constraints for success. First, sperm must meet egg, an unfertilized egg cannot develop. In addition, exactly one sperm cell has to meet one egg, polyspermy is not viable. And to ensure the survival of distinct species, the process has to be strictly species-specific. This requirement is particularly challenging in organisms in which fertilization occurs externally, as is the case for fish.

Looking for factors required for fertilization in vertebrates, Herberg et al. identified a small protein highly expressed in zebrafish (Danio rerio) oocytes. They called the protein Bouncer. Bouncer is located at the cell surface where it is attached to the membrane through a glycosylphosphatidylinositol (GPI) anchor, following cleavage of the C-terminal propeptide.

Bouncer function was investigated in knockout zebrafish. At first glance, the mutant animals did not show any overt phenotype. They were produced at the expected Mendelian rates and developed normally. When fertility was tested, there was no difference between knockout and wild-type males, but knockout females were almost completely sterile. Delivery of sperm into Bouncer-deficient eggs by intracytoplasmic sperm injection restored embryonic development, suggesting that Bouncer was involved in sperm entry during fertilization. Bouncer was indeed shown to promote sperm-egg binding. Could Bouncer play a role in species recognition during fertilization? To test this hypothesis, zebrafish Bouncer knockout eggs expressing the medaka fish (Oryzias latipes) Bouncer ortholog were generated. Medaka sperm cannot normally fertilize zebrafish eggs. Both species split apart some 200 million years ago, much earlier than we did from mice, and they share only 40% sequence identity. Amazingly the transgenic knockout eggs could be fertilized by medaka, but not zebrafish sperm. Fertility rates of individual transgenic medaka Bouncer females were found to correlate with expression levels of medaka Bouncer mRNA in eggs. In conclusion, the small 80-amino acid-long Bouncer protein plays a crucial role in species-specific fertilization. The rescue was not complete. The fertility rate was low, suggesting that other factors likely contribute to species-specific sperm-egg interaction.

Bouncer homologs exist in other vertebrate species. Its closest relative in mammals is the SPACA4 gene. Bouncer/SPACA4 germline-restricted expression was confirmed in all vertebrates tested. However, Bouncer ovary-specific expression was observed only in externally fertilizing animals, such as fish or amphibians; surprisingly, internally fertilizing vertebrates, such as reptiles and mammals, show testis-specific expression. The reason for this difference is not clear and the function of mammalian SPACA4 is not yet known.

As of this release, zebrafish and medaka Bouncer proteins have been annotated and integrated into UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Changes to keywords

New keyword:

UniProt release 2018_09

Published October 10, 2018

Headline

Tubulin code: a long sought-after player identified

In eukaryotes, the cytoskeleton helps cells maintain their shape and internal organization, and provides mechanical support that enables cells to carry out essential functions, like division and movement. It is made of filamentous proteins, microtubules being the largest type of cytoskeletal filament. Microtubules are dynamically assembled from alpha-tubulin and beta-tubulin heterodimers, creating specific structures adapted to the cell’s needs, structures that can be as different from each other as a cilium can be from a mitotic spindle. How is this variety achieved using the same highly conserved building blocks? Part of the answer lies in the so-called 'tubulin code' which involves not only the differential expression of alpha-and beta-tubulin genes (tubulin isotypes), but also a plethora of post-translational modifications (PTMs). Tubulins have a globular core and a more variable C-terminal tail that is exposed at the microtubule surface, where many PTMs occur. One of first PTMs to be reported, back in the 70s, was C-terminal reversible detyrosination, which occurs on most alpha-, but not beta-, tubulins. The enzyme catalyzing the addition of tyrosine, tubulin-tyrosine ligase or TTL, was identified not long after, but the carboxypeptidase responsible for tubulin detyrosination remained elusive until recently.

Aillaud et al. tackled the problem by developing an irreversible inhibitor of tubulin carboxypeptidase activity, followed by mass spectrometry analysis of the inhibitor targets. Nieuwenhuis et al. performed gene-trapping mutagenesis in a haploid human cell line aimed at regulators of tubulin detyrosination. Both groups identified vasohibin-1 (VASH1) and 2 (VASH2) as the major alpha-tubulin-specific carboxypeptidases. Vasohibins were formerly predicted to have a protease fold, but their enzymatic activity had not been investigated. Actually, both enzymes show low carboxypeptidase activity when assayed on their own. Full activity requires the formation of a complex with another protein, called small vasohibin-binding protein, or SVBP. This may explain why previous attempts to identify tubulin carboxypeptidase have failed. SVBP-VASH complexes act preferentially on polymerized tubulins. When microtubules disassemble, TTL adds back a tyrosine residue at the C-terminus and the tubulin detyrosination/tyrosination cycle is closed.

The physiological importance of detyrosination has to be investigated. SVBP or vasohibin knockdown in mouse hippocampal neurons results in delayed axonal differentiation. In embryos, it affects neuronal migration during brain cortex differentiation. However, mice lacking VASH1 or VASH2 do not exhibit a dramatic phenotype. It should also be noted that vasohibin depletion in cells could not completely abolish activity, suggesting the existence of yet another enzyme.

The VASH1 and VASH2 protein entries have been updated and are now available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Alport syndrome, with macrothrombocytopenia
  • Bannayan-Riley-Ruvalcaba syndrome
  • Cowden syndrome 2
  • Cowden syndrome 3
  • Epstein syndrome
  • Fechtner syndrome
  • Macrothrombocytopenia and progressive sensorineural deafness
  • Sebastian syndrome

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • (2S)-4-hydroxyleucine
  • (3S)-3-hydroxylysine
  • (4S)-4,5-dihydroxyleucine
  • 2-hydroxyproline
  • 3',4',5'-trihydroxyphenylalanine
  • 4-hydroxylysine

Modified term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • Hydroxylated arginine -> Hydroxyarginine

UniProt website news

Deprecation of legacy REST URLs /batch and /mapping – please replace by /uploadlists

Programmatic access to our “Retrieve/IDmapping” service should be addressed to the URL path /uploadlists as shown in the code examples in the respective service help pages ID mapping and Batch retrieval.

If you have existing code for batch retrieval, you also need to specify that you are mapping to and from UniProtKB, i.e.

'from' => 'ACC+ID',
'to' => 'ACC',

(See the Perl code example in Batch retrieval. )

The obsolete URL paths /batch and /mapping have been deprecated and are no longer supported as of release 2018_09.

UniProt release 2018_08

Published September 12, 2018

Headline

Human brain development: slow and steady wins the race

As mammals, we share most of our physiological processes with other animals and these similarities allow the wide use of model organisms for medical research purposes. Yet there is something special about us, abstract thought, creativity, art, culture, something linked to our big brains. The increase in size and complexity of our cerebral cortex happened recently on the evolution time scale, i.e. after the Homo lineage split apart from that of other related primates. In order to try to understand this distinctive feature of ours, several new human-specific genes involved in corticogenesis have been identified. They are produced by segmental duplications, but their functional impact on brain development remains mysterious. However, among these genes are 3 nearly identical NOTCH2 paralogs, called NOTCH2NLA, B and C for which functional clues have been recently obtained. The evolutionary history of NOTCH2NL genes is peculiar. NOTCH2 partial duplication occurred prior to the last common ancestor of human, chimpanzee, and gorilla (some 14 million years ago) leading to the creation of a truncated inactive copy, called NOTCH2NL (standing for Notch homolog 2 N-terminal-like). In the hominin lineage, some 3 to 4 million years ago, the NOTCH2 dopplegänger was repaired by gene conversion and duplicated, creating 3 new human-specific active genes NOTCH2NLA, B and C. This timeframe corresponds to the early stages of the expansion of the human neocortex.

NOTCH2NLA, B and C are expressed in radial glia neural stem cells during cortical development. These cells undergo multiple cycles of regenerative, mostly asymmetric, cell divisions, leading to the generation of diverse types of neurons while maintaining a pool of progenitors. NOTCH2NL gene expression activates the NOTCH signaling pathway, down-regulates neuronal differentiation genes, and delays the differentiation of neuronal progenitors, increasing their number, all of which ultimately results in an increase in neurons. In this context, slow development produces a huge benefit.

The chromosome 1q21.1 region hosting NOTCH2NL genes is associated with chromosome 1q21.1 deletion / duplication syndromes, where duplications are associated with macrocephaly and autism, and deletions with microcephaly and schizophrenia. 11 patients were analyzed : those with microcephaly had NOTCH2NLA and/or NOTCH2NLB deletions, while the macrocephaly cases were consistent with NOTCH2NLA and/or NOTCH2NLB duplications. If confirmed, these results are consistent with a crucial role for NOTCH2NL genes in human neocortex development. Thus, the emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, at the expense of susceptibility to recurrent neurodevelopmental disorders.

Using our big brains, we have annotated all 3 NOTCH2NL gene products in UniProtKB/Swiss-Prot and they are publicly available as of this release.

UniProtKB news

Change of the annotation topic ‘Enzyme regulation’ to 'Activity regulation'

In UniProtKB entries, the topic ‘Enzyme regulation’ was used to display information about factors that regulate the activity of enzymes, but also of transporters and microbial transcription factors. To clarify the situation, we have renamed this topic to ‘Activity regulation’.

Text format

Example: P02730

Previous format:

CC   -!- ENZYME REGULATION: Phenyl isothiocyanate inhibits anion transport
CC       in vitro.

New format:

CC   -!- ACTIVITY REGULATION: Phenyl isothiocyanate inhibits anion transport
CC       in vitro.

XML format

Example: P02730

Previous format:

<comment type="enzyme regulation">
  <text>Phenyl isothiocyanate inhibits anion transport in vitro.</text>
</comment>

New format:

<comment type="activity regulation">
  <text>Phenyl isothiocyanate inhibits anion transport in vitro.</text>
</comment>

RDF format

Example: P02730

Previous format:

uniprot:P02730
  up:annotation <P02730#SIPC58AB4FDB0DD7DCA> .

<P02730#SIPC58AB4FDB0DD7DCA>
  rdf:type up:Enzyme_Regulation_Annotation ;
  rdfs:comment "Phenyl isothiocyanate inhibits anion transport in vitro." .

New format:

uniprot:P02730
  up:annotation <P02730#SIPC58AB4FDB0DD7DCA> .

<P02730#SIPC58AB4FDB0DD7DCA>
  rdf:type up:Activity_Regulation_Annotation ;
  rdfs:comment "Phenyl isothiocyanate inhibits anion transport in vitro." .

Change to the cross-references to Bgee

We have introduced an additional field in the cross-references to the Bgee database to indicate the expression pattern of the gene.

Text format Example: P10361

DR   Bgee; ENSRNOG00000010756; Expressed in 10 organ(s), highest expression level in spleen.

XML format

Example: P10361

<dbReference type="Bgee" id="ENSRNOG00000010756">
  <property type="expression patterns" value="Expressed in 10 organ(s), highest expression level in spleen"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: P10361

uniprot:P10361
  rdfs:seeAlso <http://purl.uniprot.org/bgee/ENSRNOG00000010756> .
<http://purl.uniprot.org/bgee/ENSRNOG00000010756>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Bgee> ;
  rdfs:comment "Expressed in 10 organ(s), highest expression level in spleen" .

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted diseases

  • Ehlers-Danlos syndrome 7B

UniProt website news

New advanced search interface

We have revamped the advanced search interface to make it easier for you to browse the different search fields and options within the dropdown menus. Most importantly, there is now a search box right at the top when you open the blue dropdown menu that allows you to type a concept name (e.g. “structure”) and receive some autocompleted suggestions from which you can then select the most suitable one:

Automatic gene-centric isoform mapping for eukaryotic reference proteome entries

Some proteomes have been (manually and algorithmically) selected as reference proteomes. They cover well-studied model organisms and other organisms of interest for biomedical research and phylogeny. In this context, we provide data sets for reference proteomes where only one form of a protein, usually the best annotated version in UniProtKB, is present. The relationships identified when generating these data sets are now also used when displaying individual entries on the UniProt website:

A single gene can code for multiple proteins through biological events such as alternative splicing, initiation and promoter usage. While the UniProtKB/Swiss-Prot expert curation process includes the identification and review of different forms of a protein and their description in a single UniProtKB/Swiss-Prot entry, its focus is the functional annotation of proteins. For this reason, not all potential isoforms of a protein that are available in UniProtKB/TrEMBL can be reviewed and merged into a single entry. This results in a larger number of UniProtKB entries than genes for many of the eukaryotic reference proteomes. In order to identify potential isoforms that have not (yet) been reviewed by a biocurator, we have established an automatic gene-centric mapping between entries from eukaryotic reference proteomes that are likely to belong to the same gene. This mapping is based on gene identifiers from Ensembl, EnsemblGenomes and model organism databases and, in cases where none of these are available, on gene names assigned by the original sequencing projects.

Example: Q15286

UniProt release 2018_07

Published July 18, 2018

Headline

Ubiquitin ligation: new insight into mechanistic diversity

Protein ubiquitination is a reversible post-translational modification that is crucial for many physiological processes, from cell survival and differentiation to innate and adaptive immunity. It can affect protein functions at many levels, marking them for degradation, as well as regulating their cellular location, activity and interactions. Most frequently ubiquitin is linked to the amine group of a lysine side chain via an isopeptide bond, but a growing number of non-canonical linkages has been reported in recent years that involves the N-terminal amine group, thiol groups of cysteine side chains, and also serine and threonine hydroxyl groups.

A cascade of enzymatic reactions catalyzes the process of protein ubiquitination. The first step consists of ATP-dependent ubiquitin activation by E1 enzymes. Activated ubiquitin is transferred onto E2-conjugating enzymes, producing a covalently linked intermediate (E2-Ub). The transfer of ubiquitin onto the target protein is mediated by E3 protein ligases, which ensure the specificity of the reaction. The whole process grows in complexity with each step. The human genome is thought to encode only 2 E1 enzymes, some 40 E2s and over 600 E3 ligases. E3 ligases can be grouped into 3 classes based on their domain structure and mode of action. E3s of the ‘really interesting new gene’ (RING) family recruit E2-Ub via their RING domain and then mediate direct transfer of ubiquitin to substrates. By contrast, HECT E3 ligases undergo a catalytic cysteine-dependent transthiolation reaction with E2-Ub, forming a covalent E3-Ub intermediate. Finally, RING-between-RING (RBR) E3 ligases have a canonical RING domain linked to an ancillary domain. This ancillary domain contains a catalytic cysteine that enables a hybrid RING-HECT mechanism.

In order to identify new E3 enzymes of HECT or RBR classes, Pao et al. established an activity based assay, in which a biotinylated probe exhibiting the properties of a HECT/RBR substrate acts as a ‘suicide’ substrate and covalently traps target E3s. The assay worked as expected, identifying most known HECT/RBR, but much to their surprise, the authors also isolated 33 RING E3s that lacked HECT or RBR ancillary domains. One of these, MYCBP2, an E3 ligase involved in axon guidance and synapse formation in the developing nervous system, was found to mediate ubiquitination of serines and threonines, but not on lysines, with a strong preference for threonine. The enzymatic mechanism was also found to be novel: MYCBP2 relays ubiquitin to the target threonine via thioester intermediates involving 2 essential cysteines, a mechanism termed the ‘RING-Cys-relay’ (RCR).

Although non-canonical ubiquitination has already been observed, this is the first report of the identification of an enzyme catalyzing this reaction and along with it, a novel E3 mechanism has been unraveled. The annotation in MYCBP2 entries has been updated with this new knowledge and is publicly available as of this release.

UniProtKB news

Cross-references to UniLectin

Cross-references have been added to the UniLectin database, a database of carbohydrate-binding proteins.

UniLectin is available at https://unilectin.eu.

The format of the explicit links is:

Resource abbreviation UniLectin
Resource identifier UniProtKB accession number

Example: P84801

Show all entries having a cross-reference to UniLectin.

Text format

Example: P84801

DR   UniLectin; P84801; -.

XML format

Example: P84801

<dbReference type="UniLectin" id="P84801"/>

RDF format

Example: P84801

uniprot:P84801
  rdfs:seeAlso <http://purl.uniprot.org/unilectin/P84801> .
<http://purl.uniprot.org/unilectin/P84801>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/UniLectin> .

Changes to the controlled vocabulary of human diseases

New diseases:

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt news

Change of UniProt license

We have changed the license that applies to all copyrightable parts of our databases from the Creative Commons Attribution-NoDerivs (CC BY-ND 3.0) to the Creative Commons Attribution (CC BY 4.0) License. This change will make it easier for others to reuse UniProt data in their own works. The updated license information is available on the UniProt website and FTP site. As with the previous license users must give appropriate credit for use of UniProt data. The change in license means that our users can remix, transform, and build upon UniProt for any purpose, including commercially, without seeking permission from us. However, when doing so users must provide a link to the license and indicate if changes were made.

UniProt release 2018_06

Published June 20, 2018

Headline

Neuronal express mRNA delivery service

In mammals, the activity-regulated cytoskeleton-associated protein ARC is a key regulator of synaptic plasticity, being involved in many aspects of synapse formation, maturation, and plasticity, as well as in learning and memory. ARC expression is known to be induced by synaptic activity and its mRNA accumulates at sites of local synaptic activity where it is locally translated.

ARC originates from the Ty3/Gypsy retrotransposon family and it has retained some retroviral features. Its protein architecture is remarkably similar to that of the capsid domain of human immunodeficiency virus (HIV) GAG protein. GAG proteins are essential for viral infection. They can self-assemble to form capsids and encapsulate genomic RNA via direct sequence-specific interactions. At first glance, these properties do not seem crucial for eukaryotic proteins, but two recent studies unravel a quite unexpected means of neuronal communication, that is reminiscent of viral infection.

The intriguing observation was that ARC protein and mRNA are not only present at synapses, but also enriched in extracellular vesicles (EVs) released by neurons. These EVs are endocytosed by target cells, where ARC mRNA is postsynaptically translated, as has been described both at the Drosophila neuromuscular junction (NMJ), between motor neurons and muscles, and in rat hippocampal neurons. How is this achieved? Presynaptic ARC proteins bind the 3'-UTR of ARC mRNA, oligomerize and form capsid-like structures, in which the mRNA is packaged. These eukaryotic ‘capsids’ are then released by neurons in EVs and they mediate ARC mRNA transfer into postsynaptic target cells. In flies, ARC knockdown in motor neurons results in a decrease in ARC mRNA and protein in muscles, and leads to impaired expansion of the NMJ, synaptic bouton maturation, and activity-dependent synaptic bouton formation. This phenotype is not rescued by the expression of an ARC construct in muscle alone, nor if the neuronal ARC mRNA construct is missing its 3'-UTR. Overall, these data suggest that it is not just the presence of ARC in presynaptic terminals, but the actual transfer to the postsynaptic region that is required for ARC function.

This exciting piece of information has been transferred to UniProtKB/Swiss-Prot rat and Drosophila ARC entries by means of the classical pathway of expert curation and, as of this release, the updated records are publicly available.

International protein nomenclature guidelines

The European Bioinformatics Institute (EMBL-EBI), the National Center for Biotechnology Information (NCBI), the Protein Information Resource (PIR) and the Swiss Institute for Bioinformatics (SIB) have worked together to produce a shared set of protein naming guidelines. These guidelines are intended for use by anyone who wants to name a protein and aim to promote consistent nomenclature which is indispensable for communication, literature searching and data retrieval. They replace the previous UniProt protein naming guidelines and are available on the UniProt website as part of this release.

UniProtKB news

Cross-references to ComplexPortal

Cross-references have been added to ComplexPortal, a manually curated resource of macromolecular complexes.

ComplexPortal is available at https://www.ebi.ac.uk/complexportal/.

The format of the explicit links is:

Resource abbreviation ComplexPortal
Resource identifier Resource identifier
Optional information 1 Complex name

Example: Q8IY92

Show all entries having a cross-reference to ComplexPortal.

Cross-references to ComplexPortal may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

h4#a. Text format

Example: Q8IY92

DR   ComplexPortal; CPX-484; SLX4-TERF2 complex.

h4#a. XML format

Example: Q8IY92

<dbReference type="ComplexPortal" id="CPX-484">
  <property type="entry name" value="SLX4-TERF2 complex"/>
</dbReference>

h4#a. RDF format

Example: Q8IY92

uniprot:Q8IY92
  rdfs:seeAlso <http://purl.uniprot.org/complexportal/CPX-484> .
<http://purl.uniprot.org/complexportal/CPX-484>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ComplexPortal> ;
  rdfs:comment "SLX4-TERF2 complex" .

Cross-references to ProteomicsDB

Cross-references have been added to the ProteomicsDB, a human proteome resource.

ProteomicsDB is available at https://www.proteomicsdb.org/.

The format of the explicit links is:

Resource abbreviation ProteomicsDB
Resource identifier Resource identifier

Example: P41182

Show all entries having a cross-reference to ProteomicsDB.

Cross-references to ProteomicsDB may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

h4#b. Text format

Example: P41182

DR   ProteomicsDB; 55413; -.
DR   ProteomicsDB; 55414; -. [P41182-2]

h4#b. XML format

Example: P41182

<dbReference type="ProteomicsDB" id="55413"/>
<dbReference type="ProteomicsDB" id="55414">
   <molecule id="P41182-2"/>
</dbReference>

h4#b. RDF format

Example: P41182

uniprot:P41182
  rdfs:seeAlso <http://purl.uniprot.org/proteomicsdb/55413> ;
  rdfs:seeAlso <http://purl.uniprot.org/proteomicsdb/55414> .
<http://purl.uniprot.org/proteomicsdb/55413> rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ProteomicsDB> .
<http://purl.uniprot.org/proteomicsdb/55414> rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ProteomicsDB> ;
  rdfs:seeAlso isoform:P41182-2 .

Cross-references to MoonDB

Cross-references have been added to MoonDB, a database of extreme multifunctional and moonlighting proteins.

MoonDB is available at http://moondb.hb.univ-amu.fr.

The format of the explicit links is:

Resource abbreviation MoonDB
Resource identifier UniProtKB accession number
Optional information 1 Entry type (“Curated” or “Predicted”)

Example: Q13492

Show all entries having a cross-reference to MoonDB.

h4#c. Text format

Example: Q13492

DR   MoonDB; Q13492; Curated.

h4#c. XML format

Example: Q13492

<dbReference type="MoonDB" id="Q13492">
  <property type="type" value="Curated"/>
</dbReference>

h4#c. RDF format

Example: Q13492

uniprot:Q13492
  rdfs:seeAlso <http://purl.uniprot.org/moondb/Q13492> .
<http://purl.uniprot.org/moondb/Q13492>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/MoonDB> ;
  rdfs:comment "Curated" .

Changes to the controlled vocabulary of human diseases

New diseases:

UniProt website news

To improve security and privacy, we have moved our web pages and services from HTTP to HTTPS.

The HTTP protocol does not provide encryption – anyone who can see web traffic between a client (e.g. a web browser) and a server can intercept potentially sensitive information and/or inject malware into users' browsers or operating systems. HTTPS solves this problem by encrypting web traffic between a client and a server in both directions, so that observers cannot intercept or tamper with the client’s requests or the server’s responses. It also provides authentication, ensuring that the client is communicating with the intended server given by the hostname, and not some impostor.

Timeline

We supported separate HTTP and HTTPS services until release 2018_06 (June 20, 2018). From this date, the HTTP traffic is automatically redirected to HTTPS. We intend to maintain these redirects indefinitely, but it is to your advantage to update your applications to use HTTPS as soon as possible, both for performance and security reasons.

Interactive users

If you access our pages only through a Web browser (like Chrome, Firefox, Safari, Internet Explorer, Opera, etc.), the only change after the switchover date is that a green lock icon should appear inside the URL box of your browser, and the web addresses of the pages you visit will start with https://. We recommend that you update your bookmarks and links accordingly.

Programmatic users

Applications that access web servers using http:// URLs instead of https:// URLs may fail after a switch to HTTPS for the following reasons:

  • Your programming environment’s HTTP facility does not automatically follow redirects from HTTP to HTTPS. Some libraries follow redirections from HTTP to HTTPS, others do not (e.g. Java’s URLConnection).
  • Your application uses HTTP requests other than GET and HEAD. These requests (including especially POST and PUT) will fail with HTTP 403 Forbidden after the switchover date.
  • Your application accesses our servers through a proxy. Check with your proxy vendor about HTTPS support and how to add or update certificates.
  • Your programming environment does not support HTTPS.

After the switchover date, our servers:

  • respond with a server-side redirect (HTTP 301 Moved permanently) to the corresponding HTTPS URL for HTTP GET and HEAD requests
  • respond with HTTP 403 Forbidden and an error message to all HTTP requests other than GET and HEAD (including and especially HTTP POST).

URLs that start with http://purl.uniprot.org/, which are used as URIs in the UniProt RDF distribution and SPARQL service, are redirected to the corresponding HTTPS web page when used in a web context.

UniProt release 2018_05

Published May 23, 2018

Headline

Selenium vs. Sulfur: and the winner is...

Selenium is a chemical element that, in trace amounts, is essential for cellular function in many, though not all, organisms from all kingdoms of life. Proteins incorporate selenium as selenocysteine (Sec), where selenium replaces the sulfur of cysteine, when an UGA stop codon is “recoded” by a Sec-tRNA and a selenocysteine insertion sequence (SECIS) within target mRNA. Sec is indispensable for mammalian life and deficiency in Sec-tRNA is embryonic-lethal (shortly after implantation) in mice, yet this process is complex, inefficient and energetically costly. Why then does Mother Nature continue to produce selenoproteins in spite of these drawbacks?

Recent work from Ingold et al. suggests that one reason may be the ability of selenium to protect cells from a specific form of oxidative stress leading to cell death. The authors focused on the phospholipid hydroperoxide glutathione peroxidase GPX4, an essential selenoprotein and the only one whose knockout phenotype mimics that of Sec-tRNA gene disruption. GPX4 catalyzes the reduction of toxic lipid hydroperoxides formed when ferrous iron is imported into cells in the presence of reactive oxygen species produced during aerobic metabolism. If left unchecked, lipid peroxides can spontaneously propagate, directly damaging membranes or generating other toxic products, leading to a specific form of cell death, called ferroptosis. Mice in which the active site of GPX4 (Sec-73) is replaced by cysteine (GPX4-Cys) develop normally, but experience fatal seizures 2-3 weeks after birth. This phenotype is due to the lack of parvalbumin-positive GABAergic interneurons, which are important regulators of cortical network excitability. Hence the presence of Sec is essential for specific developmental events, such as the maturation of a specific class of neurons. In adult mice, the conditional expression of the GPX4-Cys mutant did not show any peculiar phenotype.

Cys substitution greatly reduces GPX4 activity, although it does not abolish it. In the presence of increasing levels of H2O2, GPX4-Cys readily undergoes irreversible oxidation and the mutant GPX4-Cys cells become exquisitely sensitive to peroxide-induced ferroptosis. In conclusion, the critical advantage of selenolate-versus thiolate-based catalysis may lie in its resistance to overoxidation when cells increase their metabolic rates and mitochondrial H2O2 production.

Selenium was discovered in 1817, almost exactly 200 years ago, and it is quite exciting to celebrate this anniversary with a new discovery about its role in higher organisms. As of this release, the updated GPX4 entries are publicly available.

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted diseases

  • Epidermolysis bullosa dystrophica, Hallopeau-Siemens type
  • Epidermolysis bullosa dystrophica, Pasini type

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • Pyruvic acid (Tyr)

UniRef news

GO annotation to UniRef90 and UniRef50 clusters (also in clusters with one member)

In release 2017_05, we announced the addition of Gene Ontology (GO) annotations for UniRef90 and UniRef50 clusters: In this first approach, GO terms were assigned to clusters with at least 2 members, and a GO term was added to a cluster when it was found in all UniProtKB members, or when it was a common ancestor of at least one GO term of each member.

As of this release, 2018_05, we also adding GO annotations to UniRef90 and UniRef50 singleton clusters, i.e. clusters that have only one member. These clusters inherit the GO terms of their single member.

UniProt release 2018_04

Published April 25, 2018

Headline

The Matrix (enzymes) Reloaded

Collagen is the major protein that stitches together animal tissues, and is the most abundant protein in mammals, making up to 25-35% of our body weight. It comprises three individual protein molecules which coil together to form tropocollagen fibers which in turn make microfibrils. Collagen is extremely stable and extremely ancient; collagen fragments have been sequenced from 80 million year old dinosaurs, such as Brachylophosaurus canadensis and Tyrannosaurus rex, and is found in all extant metazoans. The breakdown of collagen is essential to permit tissue growth, and all animals have the ability to metabolize collagen in a very controlled way by cutting a single site. Infectious bacteria, such as gas gangrene-causing Clostridium perfringens and Hathewaya histolytica, on the other hand digest collagen indiscriminately, using collagenases with both endopeptidase and tripeptidylcarboxypeptidase activities. This rampant activity causes massive tissue disruption, favoring bacterial colonization and virulence, and is obviously severely problematic in a clinical setting.

Despite their different approaches to collagen degradation (cautious versus gung-ho), mammalian and clostridial collagenases have similar enzymatic mechanisms and many inhibitors work on both types of collagenases, making them unsuitable for antibacterial therapy. Recent work by Schönauer et al. has found promising new molecules that inhibit only bacterial and not mammalian collagenases, pointing to a possible way to block bacterial collagenase action in a wound setting for example. By not attacking the bacteria directly, these inhibitors should provide novel, non-selective ways to treat some of the damage inflicted by these bacteria, while minimizing potential resistance. While these inhibitors are undoubtedly very useful, there are also many applications in which potentially undesirable bacterial collagenase activities are actively exploited. The H.histolytica collagenases (ColG and ColH) are used to isolate pancreatic islet cells for transplantation, remove retained placenta in cattle and horses, to debride wounds, ulcers and severely burned patients (SANTYL Ointment, Smith and Nephew, Inc.), and to treat human diseases caused by abnormal accumulation of collagen plaques such as Dupuytren’s disease and Peyronie’s disease (Xiaflex, Endo Pharmaceuticals, Inc.). Dupuytren’s disease is an abnormal deposition of collagen in the hand that causes permanent contraction. In Peyronie’s disease, collagen forms fibrous plaques in the penis, restricting erection. Collagenase injection relieves this accumulation, leading to an increased quality of life. The collagen-binding domain of collagenases when attached to other proteins, promotes their retention at injection sites for as long as 10 days. Although this is far from the only example of a repurposed enzyme (think of Botox, another clostridial protein), it is fascinating how a protein class that can be so dangerous to life, when harnessed, can be so very helpful.

As of this release 3 clostridial collagenases have been expertly updated in UniProtKB/Swiss-Prot.

Cross-references to GlyConnect

Cross-references have been added to the GlyConnect database and protein glycosylation platform.

GlyConnect is available at https://glyconnect.expasy.org.

The format of the explicit links is:

Resource abbreviation GlyConnect
Resource identifier Resource identifier

Example: P00742

Show all entries having a cross-reference to GlyConnect.

Cross-references to GlyConnect may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Text format

Example: P00742

DR   GlyConnect; 102; -.

XML format

Example: P00742

<dbReference type="GlyConnect" id="102"/>

RDF format

Example: P00742

uniprot:P00742
  rdfs:seeAlso <http://purl.uniprot.org/glyconnect/102> .
<http://purl.uniprot.org/glyconnect/102>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/GlyConnect> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniProt release 2018_03

Published March 28, 2018

Headline

Ama-(not a)-toxin: a cap on death

Amanita and Galerina mushrooms are responsible for a large number of food poisoning cases and deaths across the world. Like other poisonous mushrooms, Amanita and Galerina express a cocktail of toxic peptides, but the major lethal components are amatoxins. The typical symptoms of amatoxin poisoning are gastro-intestinal distress beginning 6 to 12 hours after ingestion, a remission phase lasting 12 to 24 hours, and progressive loss of liver function culminating in death 3 to 5 hours later. One of the few effective treatments is liver transplantation.

Amatoxins are bicyclic octapeptides that act by binding non-competitively to RNA polymerase II and greatly slowing transcriptional elongation. Most mycotoxic cyclic peptides are synthesized by nonribosomal peptide synthetases. This is not the case for amatoxins (and related compounds) which are encoded by the genome and synthesized by ribosomes. The amatoxin genes encode 35 amino acid-long propeptides that are processed by a dual macrocyclase-peptidase, called POPB. They belong to a extended family called MSDIN (after the 5 N-terminal amino acids of the propeptide), a family that also includes phallotoxins, such as phalloidin and phallicidin. Although structurally related to amatoxins, phallotoxins are bicyclic heptapeptides and have a different mode of action: they stabilize F-actin. Luckily, phallotoxins are poorly absorbed through the gut, and therefore make only a small contribution to toxicity after mushroom ingestion.

While the amatoxins are undoubtedly extremely dangerous, some MSDIN cyclopeptides may actually be beneficial. One example is the antamanide protein of Amanita phalloides (the ‘death cap’ mushroom), which can act as a competitive antagonist and a natural antidote to the lethal toxins, if administered before, or simultaneously with, the poisons. In addition, antamanide may also protect cells from death by targeting cyclophilin D and inhibiting the mitochondrial permeability transition pore, a central effector of cell death induction. Another mushroom, A. exitialis, produces a structurally closely related cyclic nanopeptide, called amanexitide, that has been suggested to have a similar antidote activity. Unfortunately the concentration of such natural antidotes tends to be much lower than that of the toxins they protect against, meaning that consumers of these deadly mushrooms don’t feel the benefit, and we strongly recommend that readers refrain from their consumption.

Toxic MSDIN family members, as well as a number of natural antidotes, have been identified in several Amanita species, including A. bisporigera, A. phalloides, A. exitialis, A. fuligineoides, A. fuliginea, A. ocreata, A. pallidorosea and A. rimosa as well as in Galerina marginata. Expert curated entries describing their biology can be found in UniProtKB/Swiss-Prot, publicly available as of this release.

Cross-references to VGNC (Vertebrate Gene Nomenclature Database)

Cross-references have been added to the VGNC Vertebrate Gene Nomenclature Database.

VGNC is available at https://vertebrate.genenames.org/.

The format of the explicit links is:

Resource abbreviation VGNC
Resource identifier Resource identifier
Optional information 1 Gene designation

Example: P11613

Show all entries having a cross-reference to VGNC.

Text format

Example: P11613

DR   VGNC; VGNC:37509; ACKR3.

XML format

Example: P11613

<dbReference type="VGNC" id="VGNC:37509">
  <property type="gene designation" value="ACKR3"/>
</dbReference>

RDF format

Example: P11613

uniprot:P11613
  rdfs:seeAlso <http://purl.uniprot.org/vgnc/37509> .
<http://purl.uniprot.org/vgnc/37509>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/VGNC> ;
  rdfs:comment "ACKR3" .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Deleted diseases

  • Mental retardation, autosomal dominant 8
  • Mental retardation, X-linked, syndromic, Borck type

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • Cyclopeptide (Cys-Pro)
  • Cyclopeptide (Gly-Pro)
  • Cyclopeptide (His-Pro)
  • Cyclopeptide (Leu-Pro)
  • Cyclopeptide (Met-Pro)
  • Cyclopeptide (Phe-Pro)
  • Cyclopeptide (Ser-Pro)
  • Cyclopeptide (Trp-Pro)
  • Cyclopeptide (Tyr-Pro)
  • Cyclopeptide (Val-Pro)

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2018_02

Published February 28, 2018

Headline

Escaping friendly fire

During the first hours of an infection, our safety relies almost entirely on the innate immune system, and predominantly on neutrophils. The encounter between neutrophils and invading microbes leads to neutrophil activation and to the engulfment of pathogens into intracellular phagosomes, where exposure to high concentrations of reactive oxygen species (ROS) and antimicrobial peptides eventually kill them. Neutrophils defend us not only in life but also in death, when they release chromatin and granule proteins that together form extracellular fibers, called ‘neutrophil extracellular traps’ or NETs, which catch and prevent the spread of microorganisms. NETs are covered with antimicrobial compounds, such as cathelicidin peptides, as well as histones, which can also effectively neutralize intruders. This process is so efficient that extracellular DNases able to catalyze NET disruption serve as virulence factors in several pathogenic bacteria, such as in group A Streptococcus.

NETs are a double-edged sword and have to be regulated very tightly. Indeed, free extracellular DNA is a potent trigger of autoimmune response, such as that encountered in systemic lupus erythematosus (SLE) that is characterized by circulating anti-DNA antibodies. NETs can also initiate vascular occlusion in a fibrin-independent manner. In other words, NETs are not an innocuous therapy in the middle/long term and the host has to get rid of them quickly. Timely removal of NET chromatin by DNases DNASE1 and DNASE1L3 has been shown to play a crucial role in the prevention of autoimmunity. However, it was not known until recently what mechanism was involved in NET clearance under inflammatory conditions. This issue was addressed by Jimenez-Alcazar and colleagues. They created knockout mice lacking both DNASE1 and DNASE1L3. Mutant animals were treated with granulocyte colony-stimulating factor (G-CSF) to induce chronic neutrophilia, a condition mimicking acute inflammation. While wild-type mice showed no sign of distress, all double knockout animals exhibited features of infection-induced thrombotic microangiopathies (TMAs) and died within 6 days. This phenotype could be reversed by the reintroduction of DNASE1 or DNASE1L3, but not by an anti-thrombotic treatment, further supporting the idea that NETs can clog vessels by themselves. TMAs are a well-known complication encountered by patients suffering from systemic bacterial infections. Analysis of lungs from patients with acute respiratory distress syndrome and/or sepsis revealed numerous NET-derived clots in their blood vessels. It is too early yet to propose DNase treatment for TMA patients, but at least it opens new therapeutic perspectives.

As of this release, murine DNASE1 and DNASE1L3 and their orthologs in other mammalian species have been updated and are now publicly available.

UniProtKB news

UniProtKB FASTA headers: Addition of NCBI taxonomy identifier

In order to avoid ambiguities and simplify parsing, we have added the NCBI taxonomy identifier to UniProtKB FASTA headers.

Previous format:

>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

New format:

>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

Where:

  • db is ‘sp’ for UniProtKB/Swiss-Prot and ‘tr’ for UniProtKB/TrEMBL.
  • UniqueIdentifier is the primary accession number of the UniProtKB entry.
  • EntryName is the entry name of the UniProtKB entry.
  • ProteinName is the recommended name of the UniProtKB entry as annotated in the RecName field. For UniProtKB/TrEMBL entries without a RecName field, the SubName field is used. In case of multiple SubNames, the first one is used. The ‘precursor’ attribute is excluded, ‘Fragment’ is included with the name if applicable.
  • OrganismName is the scientific name of the organism of the UniProtKB entry.
  • OrganismIdentifier is the unique identifier of the source organism, assigned by the NCBI.
  • GeneName is the first gene name of the UniProtKB entry. If there is no gene name, OrderedLocusName or ORFname, the GN field is not listed.
  • ProteinExistence is the numerical value describing the evidence for the existence of the protein.
  • SequenceVersion is the version number of the sequence.

Examples:

>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana OX=115339 GN=acantho2 PE=1 SV=1
>sp|P27748|ACOX_CUPNH Acetoin catabolism protein X OS=Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) OX=381666 GN=acoX PE=4 SV=2
>sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus OX=10090 PE=1 SV=1

>tr|Q3SA23|Q3SA23_9HIV1 Protein Nef (Fragment) OS=Human immunodeficiency virus 1  OX=11676 GN=nef PE=3 SV=1
>tr|Q8N2H2|Q8N2H2_HUMAN cDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens OX=9606 PE=2 SV=1

The same modification has been applied to FASTA headers of alternative isoforms in UniProtKB/Swiss-Prot), where the new format is:

>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName OX=OrganismIdentifier[ GN=GeneName]

Example:

>sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis OX=9541 GN=YWHAB

Cross-references to CarbonylDB

Cross-references have been added to the CarbonylDB database, a resource of protein carbonylation sites.

CarbonylDB is available at http://digbio.missouri.edu/CarbonylDB/.

The format of the explicit links is:

Resource abbreviation CarbonylDB
Resource identifier UniProtKB accession number

Example: P02768

Show all entries having a cross-reference to CarbonylDB.

Text format

Example: P02768

DR   CarbonylDB; P02768; -.

XML format

Example: P02768

<dbReference type="CarbonylDB" id="P02768"/>

RDF format

Example: P02768

uniprot:P02768
  rdfs:seeAlso <http://purl.uniprot.org/carbonyldb/P02768> .
<http://purl.uniprot.org/carbonyldb/P02768>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CarbonylDB> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • O-AMP-serine

RDF news

Change of URIs for OrthoDB

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the OrthoDB database have therefore been updated from:

http://purl.uniprot.org/orthodb/<ID>

to:

http://purl.orthodb.org/odbgroup/<ID>

If required for backward compatibility, you can use the following query to add the old URIs:

PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/orthodb> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/OrthoDB> .
   BIND(iri(concat('http://purl.uniprot.org/orthodb/', substr(str(?new),31))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/orthodb/<ID> URIs will be maintained.

UniProt release 2018_01

Published January 31, 2018

Zika virus: from petty crime to banditry

A Zika virus (ZIKV) outbreak in Brazil in 2015 drew the world’s attention to this microbe (see UniProt headline). The situation was so severe that in February 2016 it was declared to be a ‘public health emergency of international concern’ by the World Health Organization (WHO), indicating that it constituted a public health risk to other States through the international spread of the disease and potentially required a coordinated international response.

ZIKV has been known for over 70 years since its first isolation from a febrile rhesus macaque in the Ugandan Zika forest. The clinical symptoms caused by ZIKV infection in humans were mild at that time, consisting of a self-limiting flu-like febrile illness that resolved within days and occurred in an estimated 20% of infected individuals. The picture of the recent epidemic was however dramatically different. ZIKV infection was associated with severe symptoms, including multi-organ failure. The most alarming feature was its ability to cause microcephaly, congenital malformations, and fetal demise in pregnant women.

When did the metamorphosis from an almost innocuous agent to a congenital pathogen with global impact occur? In the decades following its discovery, sporadic human ZIKV infections were reported in a few countries in Africa, and then the virus started spreading, first to Southeast Asia, to Micronesia in 2007, to French Polynesia in 2013-2014, and soon after to South and Central America. Comparison of ZIKV neurovirulence between ‘ancestral’ (African/Southeast Asian) and ‘contemporary’ (Polynesian/South American) strains was done by intracerebral injections of the virus in neonatal mice. All 3 contemporary strains led to 100% mortality, with typical neurological manifestations. By contrast, the ‘ancestral’ strain killed less than 17% of the animals. Moreover in a mouse embryonic microcephaly model, infection with a ‘contemporary’ ZIKV strain resulted in brains exhibiting a substantial degree of microcephaly contrary to the ancestral strain which caused less severe symptoms. Both viruses targeted neural progenitor cells, but the ‘contemporary’ strain showed significantly enhanced replication in the brain compared with the ‘ancestral’ one. Obviously something had changed between the ‘ancestral’ ZIKV and its ‘contemporary’ version, something that boosted ZIKV neurovirulence, but what?

This question was addressed by Yuan et al. Sequence alignments between ‘ancestral’ (INSDC accession number AY632535) and ‘contemporary’ (KJ776791) strains show many differences at the amino acid level. To find out which changes account for increased neurovirulence, several ‘contemporary’ strain-specific substitutions were introduced in the ‘ancestral’ strain and tested in neonatal mice. One of them, the substitution of a serine residue by an asparagine at position 139 (in the precursor polyprotein), S139N, greatly increased the neurovirulence of the ancestral strain. It also showed enhanced replication in neural progenitor cells and caused more extensive cell death compared with the original ‘ancestral’ virus. Conversely, when this residue was mutated back to serine in the ‘contemporary’ strain, mortality caused by the ‘contemporary’ virus in neonatal mice was significantly decreased. The ZIKV S139N substitution probably emerged in May 2013, a few months before the outbreak in French Polynesia, and was then stably maintained in the epidemic strain during its subsequent spread to the Americas. Its emergence correlates with reports of microcephaly and other severe neurological abnormalities.

After maturation of the genome polyprotein, position 139 is found in viral protein prM, which, in flaviviruses, closely associates with the envelope protein E and is believed to prevent premature fusion of immature virions inside infected cells. However, the mechanism through which the S139N substitution increases neurovirulence is not yet known.

At the beginning of 2016, UniProtKB/Swiss-Prot released the annotated sequence of a ZIKV genome polyprotein, corresponding to the East African ‘ancestral’ strain. In order to meet the needs of the scientific community, we have now released that of a ‘contemporary’ strain, isolated from a French Polynesian sample.

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • S-carbamoylcysteine
  • S-cyanocysteine

UniProt release 2017_12

Published December 20, 2017

Headline

Swiss-Prot in the sky with psilocybin: the biosynthesis pathway of a psychedelic drug unveiled

Psychedelic mushrooms, also called ‘magic mushrooms’, have been used by humans since prehistoric times and can be found depicted in Stone Age rock art in Europe and Africa. Some cultures have used them for religious rites and ceremonies, especially in pre-Columbian Mesoamerica. Aztecs and Mazatecs referred to them as genius mushrooms, divinatory mushrooms, and wondrous mushrooms. A Psilocybe species was known to the Aztecs as ‘te&#333;nan&#257;catl’, literally ‘the divine mushroom’.

The effects of many psychedelic mushrooms come from the pro-drug psilocybin. When psilocybin is ingested, this natural compound is rapidly metabolized to yield psilocin. This latter acts as a serotonergic psychedelic substance. Its effects include euphoria, altered thinking, visual hallucinations, altered sense of time and spiritual experiences. Some consider the drug as an entheogen and a tool to supplement practices for transcendence. Psilocybin is considered to have low toxicity and harm potential, although some very rare cases of lethality have been reported. In most countries, psilocybin and psilocin are listed as schedule I drugs, i.e. compounds that have a high potential for abuse and are not recognized for medical use.

Nevertheless, over the last 30 years, the potential medical and psychological therapeutic benefits of psilocybin have been investigated. Clinical studies revealed a positive trend in the treatment of existential anxiety with advanced-stage cancer patients and for nicotine addiction. Studies on the clinical use of psilocybin against depression are ongoing.

The structures of both psilocybin and psilocin were determined in 1959 by Hofmann et al., but the basis of their biosynthesis has remained obscure for almost 60 years. The locus for the biosynthesis of psilocybin, called psi, has been recently identified in 2 out of over 100 species of psilocybin mushrooms, namely Psilocybe cubensis and Psilocybe cyanescens.

The psi locus encodes 4 psilocybin biosynthesis enzymes, including a new type of fungal L-tryptophan decarboxylase (psiD), a kinase (psiK), a methyltransferase (psiM), and a cytochrome P450 monooxygenase (psiH). All 4 have been characterized and are sufficient to produce psilocybin from the amino acid L-tryptophan. The first step of the psilocybin biosynthetic pathway is the decarboxylation of L-tryptophan to tryptamine by psiD. The cytochrome P450 monooxygenase psiH then converts tryptamine to 4-hydroxytryptamine. The kinase psiK catalyzes the 4-O-phosphorylation step by converting 4-hydroxytryptamine into norbaeocystin. The methyltransferase psiM eventually catalyzes iterative methyl transfer to the amino group of norbaeocystin to yield psilocybin via a monomethylated intermediate, called baeocystin. The psi locus also contains 2 major facilitator-type transporters (psiT1 and psiT2), as well as a cluster-specific transcriptional regulator (psiR).

As of this release, expertly annotated Psilocybe cubensis psi locus proteins psiD, psiH, psiK, psiM, psiR, psiT1, and psiT2 are publicly and legally available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted disease

  • Weissenbacher-Zweymueller syndrome

Changes in subcellular location controlled vocabulary

New subcellular locations:

Changes to keywords

New keyword:

UniProt release 2017_11

Published November 22, 2017

Headline

Sex determination in insects: 50 ways to achieve sex-specific splicing

The primary signals triggering sex determination in insects are amazingly diverse among various species and sometimes even between strains of the same species. These various signals converge on a single downstream conserved transformer gene (tra) which undergoes sex-specific splicing. In developing females, splicing results in the production of an active tra protein. Tra in turn regulates sex-specific splicing of another highly conserved gene of this signaling cascade, namely double-sex (dsx) which ultimately decides the sexual fate of the embryo. In males, tra splicing includes an exon containing several in-frame stop codons, resulting in a truncated, inactive isoform, unable to affect dsx splicing, resulting in a male-specific dsx isoform.

The primary signals can be environmental and genetic. In some species, temperature, population density or nutritional status can trigger the sexual fate of the embryo. In the most studied organism, Drosophila (fruit fly), the number of X chromosomes in the embryo is crucial: 2 X chromosomes lead to female development, 1 X results in males. Counting X chromosomes is a mechanism common to drosophilids, but rarely observed outside this genus. In other species, such as wasps, ants and bees, sexual fate depends upon the fertilization process: unfertilized eggs (haploid) give rise to males and fertilized diploid eggs to females. Yet other insects involve dominant Mendelian cues, which can be either male-determining (usually referred to as M-factor) as in many dipterans, or female-determining (F-factor) as in butterflies. Due to their bewildering diversity, these cues are difficult to pinpoint. Nevertheless recent years have seen a few major breakthroughs in the identification of M-factors.

In 2015, Hall et al. identified the M-factor Nix in the yellow fever mosquito Aedes aegypti. Nix is expressed very early in male embryonic development. Knockout of Nix results in the production of the dsx female isoform and feminization, while ectopic expression of Nix in females leads to the formation of nearly complete male genitalia. The evolution of Nix appears confined to a subset of mosquitoes: only the Asian tiger mosquito (Aedes albopictus) has an orthologous gene, while other genera, such as Anopheles or Culex, are negative.

The M-factor of Anopheles gambiae, identified in 2016, is encoded by the Yob gene and consists of a short, 56 amino acid protein. It is not homologous to Nix. Yob is activated at the beginning of zygotic transcription and expressed throughout a male’s life. It controls male-specific splicing of dsx and several lines of evidence suggest that it is also involved in dosage compensation in this species in which females are XX and males XY. Indeed, the ectopic delivery of Yob mRNA is lethal to genetically female embryos, but has no discernible effect on the sexual development of genetic males. Its silencing in nonsexed embryos yields highly significant male deficiency in surviving mosquitoes.

Last, but not least, the third M-factor to be reported was that of the housefly. It was called Mdmd standing for Musca domestica male determiner. It encodes a 1,174 amino acid-long protein that is expressed very early in the zygote and maintained throughout male development until adulthood. In the absence of Mdmd, males turn into females capable of sexual reproduction. Here again, diversity is not an empty word: Mdmd is not conserved in all houseflies. It is absent in at least one strain for which the M-factor has been mapped onto a different chromosome. Mdmd does not share any similarity with Nix or Yob, but it has a paralog, namely the pre-mRNA-splicing factor Cwc22. Cwc22 is a spliceosome-associated protein that is indispensable for the assembly of the exon junction complex (EJC). Interestingly, it has been shown that changes in expression levels of EJC components also affect the splice site selection of alternatively spliced genes. The homology between Mdmd and Cwc22 brings us one step closer to alternative splicing and the mechanism of sex-specific tra production.

Multiple copies of the Mdmd gene have been found on chromosomes Y, II, III, or V. All 4 encoded proteins have been annotated and, along with the A. aegypti Nix and A. gambiae Yob products, they are now publicly available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • Glycyl cysteine thioester (Gly-Cys) (interchain with C-...)

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • 2,3-didehydroalanine (Tyr)

UniProt release 2017_10

Published October 25, 2017

Headline

Of smell and social life

Ants are arguably the greatest success story in the history of terrestrial metazoa. On average, ants constitute 15-20% of the terrestrial animal biomass. All the ~ 12,000 known ant species are eusocial, i.e. they have a sophisticated collective behavior characterized by a division of labor that creates groups, sometimes called castes, specialized in tasks such as reproduction, brood care and foraging . Individuals of one caste usually lose the ability to perform at least one behavior characteristic of individuals in another caste. It is thought that ant sociality depends on their sense of smell. Indeed while the Drosophila melanogaster genome contains about 69 odorant receptor (Or) genes, ant genomes have undergone a dramatic expansion with close to 350 Ors, representing one of the largest yet known repertoires of Ors among insects. Other chemosensory receptor genes, such as gustatory receptors and ionotropic glutamate receptors, have not undergone a similar expansion. Ors are expressed by Or neurons, which project axons to the antennal lobe, a region analogous to the olfactory bulb in vertebrates. The antennal lobe consists of numerous globule-shaped neuropils known as glomeruli, where initial synaptic integration occurs before olfactory information is sent to the central brain. Here again a drastic amplification occurred. Close to 450 glomeruli have been identified in the Camponotus floridanus ant versus only 42 in Drosophila. These observations are consistent with a crucial role for odorant perception in the complex chemical communication in ants, but so far there has no genetic confirmation of this hypothesis.

In insects, Ors dimerize with a highly conserved 7-transmembrane protein called Orco (Odorant Receptor COreceptor) and form ligand-gated ion channels that activate Or neurons upon odorant binding. Orco knockout in fruit flies, locusts, mosquitoes, and moths impairs responses to odorants. An Orco knockout in ants would allow testing of the hypothesis that the expanded Or repertoire is required for chemical communication. However social insects are especially hard to genetically modify, the eggs of ants are very sensitive and difficult to raise without workers, and the life cycle is complicated and drawn out, making it difficult to obtain large quantities of genetically modified offspring in a reasonable time frame.
In spite of these difficulties, 2 teams managed to successfully knockout Orco using CRISPR/Cas9 technology, providing the scientific community with the first genetically modified ants. This achievement was made possible through tenacity and a smart choice of the ant species. Yan et al. worked on Harpegnathos saltator ants. This species shows a remarkable reproductive plasticity: in the absence of a queen or when a worker is completely isolated, non-reproductive workers can become reproductive pseudoqueens (or gamergaters). It is thought that this transition is induced by the lack of queen pheromones which normally would repress it. When isolated, unmated gamergaters lay unfertilized eggs that develop into haploid males. Taking advantage of the gamergate transition, Yan et al. generated hemizygous mutant males. The transgenic males were identified by forewing genotyping. They did not exhibit any overt phenotype and were fully fertile. They could be crossed to receptive females to produce heterozygous and homozygous mutant females. Identification of transgenic females was more complicated. Females have no wings and could be genotyped only after being sacrified. All experiments were therefore done blindly. This denotes a rare enthusiasm for science that merits being emphasized!

Trible et al. chose Ooceraea biroi, a very distantly related species, as it diverged some 100 million years ago from H. saltator. Unlike most other ant species, O. biroi reproduces via parthenogenesis, so stable germ-line modifications can be obtained from the clonal progeny of injected individuals without laboratory crosses.

Both groups observed consistent phenotypes. The response to general odorants was reduced. Mutant insects wandered out of the social group and were unable to forage successfully. They did not produce progeny because they laid very few eggs and did not care for their brood. They appeared to be largely unable to communicate with conspecifics. Unexpectedly they exhibited a dramatic decrease in the size of the antennal lobes, as well as in the number of glomeruli. The remaining glomeruli tended to be bigger than in wild-type ants. The reason for this neuro-anatomical phenotype is unclear at this stage. However these results confirm the central role of olfaction in eusocial behavior.

As of this release, freshly annotated Harpegnathos saltator and Ooceraea biroi Orco entries are available in UniProtKB/Swiss-Prot.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified disease:

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • Cyclopeptide (Gly-Arg)
  • Cyclopeptide (Ser-Lys)

New term for the feature key ‘Lipidation’ (‘LIPID’ in the flat file):

  • S-palmitoleoyl cysteine

New terms for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • ADP-ribosyl glutamic acid
  • N6-(2-hydroxyisobutyryl)lysine
  • N6-butyryllysine
  • N6-poly(beta-hydroxybutyryl)lysine
  • N6-propionyllysine
  • O3-poly(beta-hydroxybutyryl)serine
  • S-poly(beta-hydroxybutyryl)cysteine

Modified term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • N6-(beta-hydroxybutyrate)lysine -> N6-(beta-hydroxybutyryl)lysine

Modified term for the feature key ‘Lipidation’ (‘LIPID’ in the flat file):

  • O-palmitoleyl serine -> O-palmitoleoyl serine

Changes to keywords

New keyword:

Modified keyword:

UniProt release 2017_09

Published September 27, 2017

Headline

Protein translation goes round in circles

Covalently closed circular RNA molecules (circRNAs) were observed over 40 years ago in viruses. Later on, they were discovered in non-infected eukaryotes. In 1993, Capel et al. reported the existence of unusual circular Sry transcripts in mouse testis where they represented the most abundant transcript. These peculiar RNA species have generally been considered to be of low abundance, likely representing errors in splicing. Recent studies have shown however that they may actually be quite numerous and produced by thousands of genes. In addition, they are evolutionarily conserved. CircRNAs are generated by the spliceosome via backsplicing, a process in which the 3'-end of an exon is covalently linked to the 5'-end of an upstream exon. As a result, they lack typical mRNA terminal structures, such as 5' cap and polyA tail. This feature leads to exonuclease resistance, allowing circRNAs to escape from normal RNA turnover processes.

The physiological functions of circRNAs have not yet been extensively explored. Some have been shown to act as microRNA sponges. They can also function as platforms for protein interaction. For instance, circ-FOXO3 represses cell cycle progression by binding to the cell cycle proteins CDK2 and CDKN1A (p21), resulting in the formation of a ternary complex. Circ-MBL/MBNL1 binds to the RNA-binding MBNL1 protein and regulates gene expression by competing with pre-mRNA linear splicing of its linear counterpart.

At this point, you may wonder why UniProtKB, a protein resource, is interested in circRNAs. Most circRNAs originate from protein-coding genes and contain complete exons. In theory they could be translated, but there has been no direct evidence for in vivo translation of endogenous transcripts, and they were classified as non-coding RNAs.

A major breakthrough came from a study done in human and mouse muscles published last April. Muscles not only produce thousands of circular splicing events, but the expression of circRNAs is also differentially regulated during myoblast differentiation. Among them, circ-ZNF609, a transcript that originates from the circularization of the first coding exon of ZNF609 gene, is down-regulated during myogenesis. Circ-ZNF609 contains the initiation codon of the linear ZNF609 transcript, a putative 753-nucleotide open reading frame and a STOP codon created 3 nucleotide after the splice junction by the circularization event with the upstream ZNF609 5'-UTR. In human myoblasts, the knockdown of circ-ZNF609, but not that of its linear transcript, reduces cell proliferation by about 80%, suggesting a specific role in the regulation of myoblast proliferation. Circ-ZNF609 transcripts are located in the cytoplasm where they are associated with heavy polysomes. They are translated in a cap-independent manner, though less efficiently than their linear counterparts and produce a new 250-amino acid long ZNF609 isoform, both in human and mouse cells. The translation is driven by an internal ribosomal entry site (IRES) located within the 5'-UTR. In vivo translation of at least some circRNAs was confirmed in Drosophila in the same issue of Molecular Cell.

In June 1963, Sidney Brenner wrote to Max Perutz: 'It is now widely realized that nearly all the ‘classical’ problems of molecular biology have either been solved or will be solved in the next decade.' One could think that the process in which genetic information is transcribed and processed into functional RNAs would be such ‘classical’ problem, but it seems that there are still plenty of discoveries to be made in this field, for our greatest pleasure.

Human and mouse ZNF609 UniProtKB/Swiss-Prot entries have been updated and the new isoforms encoded by circ-ZNF609 integrated, with the help of Dr. Legnini whom we want to sincerely thank. The revised entries are publicly available as of this release.

Cross-references to CORUM

Cross-references have been added to the CORUM database, a resource of manually annotated protein complexes from mammalian organisms.

CORUM is available at http://mips.helmholtz-muenchen.de/corum/

The format of the explicit links is:

Resource abbreviation CORUM
Resource identifier UniProtKB accession number

Example: P41182

Show all entries having a cross-reference to CORUM.

Text format

Example: P41182

DR   CORUM; P41182; -.

XML format

Example: P41182

<dbReference type="CORUM" id="P41182"/>

RDF format

Example: P41182

uniprot:P41182
  rdfs:seeAlso <http://purl.uniprot.org/corum/P41182> .
<http://purl.uniprot.org/corum/P41182>
rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CORUM> .

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted diseases

  • Adrenocortical insufficiency, without ovarian defect

Changes in subcellular location controlled vocabulary

New subcellular location:

UniProt release 2017_08

Published August 30, 2017

Headline

Curation of human immunoglobulin genes: a fruitful collaboration between UniProtKB/Swiss-Prot and IMGT®

The existence of an agent in the blood that could neutralize diphteria toxin was reported as early as 1890. Over a century after this major discovery, much is known about immunoglobulins (IG) or antibodies. They are large heterodimeric proteins made up of 2 heavy (H) chains and 2 light (L) kappa or lambda chains, held together by disulfide bonds to form a ‘Y’ shaped molecule. Each chain comprises one variable (V) domain at the N-terminal end and one or several (for L and H, respectively) constant (C) domains. The antigen binding site is formed by the V domain of one H chain, together with that of its associated L chain. Thus, each immunoglobulin has 2 antigen binding sites with remarkable affinity for a particular antigen. Each variable domain is encoded by a variable (V) gene, a diversity (D) gene (only for H) and a joining (J) gene which are assembled by a process called V-(D)-J rearrangement and can then be subjected to somatic hypermutations which, after exposure to antigen and selection, allow affinity maturation for a particular antigen. The resulting rearranged V-(D)-J genes are further spliced to C genes. The C region determines the effector properties and the mechanism used to destroy the antigen, such as activation of complement or binding to Fc receptors. An immunoglobulin is encoded by 7 genes (IGHV, IGHD, IGHJ, IGHC for the H chain and IGKV, IGKJ, IGKC for a kappa or IGLV, IGLJ or IGLC for a lambda L chain). The human genome contains 176 functional immunoglobulin genes clustered in 3 loci, IGH on chromosome 14 (50 V, 23 D, 6 J and 9 C), IGK on chromosome 2 (40 V, 5 J and 1 C) and IGL on chromosome 22 (32 V, 5 J and 5 C). During the development of B cells, the mechanisms of diversity involved in the immunoglobulin synthesis (combinatorial V-(D)-J diversity, junctional diversity and somatic hypermutations) lead to the huge potential antibody repertoire of each individual, estimated to comprise 1012 different immunoglobulins, the limiting factor being only the number of B cells that an organism is genetically programmed to produce.

In 2008, we announced the first draft of the complete human proteome in UniProtKB/Swiss-Prot, and have been continuing to update this resource ever since. Recent work performed in collaboration with the IMGT® team has included a thorough review and update of the immunoglobulin genes, for which we now present a representative set of full-length germline immunoglobulin protein sequences. 15 entries showing the sequence of all C gene products and 122 representing all V gene products are now publicly available. These entries can be retrieved with the keyword 'Immunoglobulin C region' and 'Immunoglobulin V region', respectively. D and J gene products are extremely small, with an average of 5 amino acids for D genes and 15-30 for J. In other words, they are too short to be informative on their own. Therefore we have decided to curate a single peptide representative of D gene products and 3 of J gene products, one for H chains and 2 for L chains kappa and lambda. As for other human proteins, the sequences shown match the translation of the reference genome (Genome Reference Consortium GRCh38/hg38). The nomenclature used is the official one from IMGT/GENE-DB, approved by HGNC and endorsed by NCBI Gene and the IUIS-Nomenclature SubCommittee. Cross-references were implemented in the 141 UniProtKB/Swiss-Prot immunoglobulin entries, providing direct access to the dedicated IMGT® resource and its comprehensive sequence repertoire, which currently describes 927 alleles from 462 functional and non-functional genes together with a wealth of additional information concerning immunoglobulins. Reciprocal links to UniProtKB from IMGT® ensure easy navigation between both resources.

We also provide several examples of full-length rearranged immunoglobulins. Among the 1012 predicted sequences, we have selected some of those that have been entirely sequenced at the amino acid level. However, the representation of the full repertoire is beyond the scope of our knowledgebase and UniProtKB users interested in these complex molecules are advised to visit IMGT®.

We would like take this opportunity to thank Marie-Paule Lefranc, Sofia Kossida and the IMGT® team for this fruitful collaboration, which is beneficial not only for both resources, but hopefully also for the scientific community as a whole.

Cross-references to ELM

Cross-references have been added to the Eukaryotic Linear Motif (ELM) resource for functional sites in proteins.

ELM is available at http://elm.eu.org.

The format of the explicit links is:

Resource abbreviation ELM
Resource identifier UniProtKB accession number

Example: P12931

Show all entries having a cross-reference to ELM.

Text format

Example: P12931

DR   ELM; P12931; -.

XML format

Example: P12931

<dbReference type="ELM" id="P12931"/>

RDF format

Example: P12931

uniprot:P12931
  rdfs:seeAlso <http://purl.uniprot.org/elm/P12931> .
<http://purl.uniprot.org/elm/P12931>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/ELM> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Mental retardation, X-linked, syndromic, 10

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniParc news

UniParc XSD change for InterPro annotations

To reduce the sequence redundancy in UniProtKB, we apply a procedure to identify highly redundant proteomes within selected species groups to exclude them from UniProtKB. Their sequences are still available for download from the UniParc sequence archive, which stores protein sequences that are 100% identical and the same length in a single record, with cross-references to the source database where the protein exists. UniParc also includes basic annotation data (taxonomy, gene and protein names, proteome identifier and component) to allow users interested in redundant proteomes to retrieve meaningful data sets, and we have now further enhanced UniParc with InterPro annotations and for this purpose extended the UniParc XSD with new elements and types as shown below in red color:

 <xs:element name="entry">
  <xs:complexType>
   <xs:sequence>
       ...
    <xs:element name="signatureSequenceMatch" type="seqFeatureType" minOccurs="0" maxOccurs="unbounded"/>
                ...
   </xs:sequence>
            ...
  </xs:complexType>
 </xs:element>
    ...
    <xs:complexType name="seqFeatureType">
        <xs:sequence>
   <xs:element name="ipr" type="seqFeatureGroupType" minOccurs="0" maxOccurs="1"/>
   <xs:element name="lcn" type="locationType" minOccurs="1" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="database" type="xs:string" use="required"/>
        <xs:attribute name="id" type="xs:string" use="required"/>
    </xs:complexType>

    <xs:complexType name="seqFeatureGroupType">
        <xs:attribute name="name" type="xs:string"/>
        <xs:attribute name="id" type="xs:string" use="required"/>
    </xs:complexType>

 <xs:complexType name="locationType">
  <xs:attribute name="start" type="xs:int" use="required"/>
  <xs:attribute name="end" type="xs:int" use="required"/>
 </xs:complexType>

UniProt release 2017_07

Published July 5, 2017

Headline

A pseudogene turns into an active DNA methyltransferase dedicated to male fertility

It is well established that in mammals, the DNA methylation machinery is composed of 3 DNA methyltransferase (DNMT) enzymes, DNMT1, DNMT3A, and DNMT3B, and one catalytically inactive cofactor, DNMT3L. Some 46 million years ago, in the last common ancestor of the muroid rodents, the DNMT3B gene was duplicated, giving rise to Gm14490. The genes share about 70% identity, but Gm14490 underwent pseudogenization, and there is no evidence for its transcription. Germline-specific knockouts of DNMT3A or DNMT3B demonstrate the crucial role of these genes in methylation of most imprinted loci in germ cells (and somatic tissues), but some transposon loci, such as minor satellite DNA and intracisternal A particle (IAP) repeats, are only minimally affected, an observation which can be attributed to the functional redundancy of the 2 genes. This is what was thought and published, until recently.

Retrotransposon silencing is of paramount importance, especially in the male germline. Indeed, in the absence of silencing, retrotransposon reactivation leads inexorably to meiotic failure, azoospermia, and sterility marked by small testis size, a phenotype called hypogonadism. It is therefore essential to understand which actors are involved in this process. Barau et al. tackled the issue by generating mutant mice through N-ethyl-N-nitrosourea (ENU) mutagenesis and screening hypogonadal male mice for ectopic retrotransposon activity, followed by whole genome sequencing to identify the culprits. This approach led to the discovery of an ENU-independent mutation, which was identified as a de novo IAP insertion located in an unexpected locus, the last intron of the Gm14490 pseudogene. Serendipity definitely is a scientist’s best friend!

This was only the beginning of surprises. Contrary to what had been previously reported, the Gm14490 gene proved to be expressed, but exclusively in male germ cells. This restriction could explain the absence of corresponding ESTs in databases and the erroneous former assumption that it was untranscribed. During embryonic development, its expression peaks at the time of de novo DNA methylation (between 16.5 to 18.5 dpc) in prospermatogonia. Moreover, Gm14490 appeared to be catalytically active when transfected in ES cells. A new genuine DNA methylase was born and renamed DNMT3C!

In the absence of DNMT3C, either by knockout or by IAP insertion, retrotransposons, and more specifically some types of long interspersed nuclear elements (LINEs) and some endogenous retroviruses (ERV), are reactivated. Interestingly, this reactivation is particularly strong for evolutionarily ‘young’ subfamilies, indicating DNMT3C’s unique selectivity. The existence of a 5th DNA methylase selectively targeted at young retrotransposons, acting only in the context of fetal spermatogenesis, may be of particular relevance in Muroidea, including mice and rats. This lineage is particularly enriched in young transposons with about 25% that have integrated into the genome in the last 25 million years with currently thousands of active copies. In comparison, in the primate ancestor, massive integration occurred long before (80 million years ago for elements such as LINEs) and these transposons have since become extinct.

In view of these results, DNMT3C has been deleted from our pseudogene list, annotated and integrated into UniProtKB/Swiss-Prot, where it is available to you. The knowledgebase contains some other sequences derived from putative pseudogenes (see headline of November 2009). Like all other UniProtKB/Swiss-Prot entries, they are continuously reviewed. Some of them are deleted from UniProtKB, when data pointing at an inactive gene are overwhelming, but they can always be retrieved from UniParc. Other entries are progressively ‘upgraded’, when new data become available, to bona fide proteins as was the case for DNMT3C.

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to keywords

New keyword:

UniProt release 2017_06

Published June 7, 2017

Headline

Sexual reproduction: good ideas shared with viruses

Sexual reproduction is a brilliant eukaryotic invention that allows the reassortment of alleles through recombination. The first step is the formation of haploid male and female gametes that unite to form a new individual. Most gametes unite by membrane fusion, a process mediated by specialized proteins, called fusogens. The study of these proteins is difficult, since they are often scarce. The few identified so far are clade-specific, such as bindin in echinoderms or izumo in mammals, suggesting that each clade has evolved its own fusion strategy. This is at least what was thought until the discovery of hapless-2 (HAP21), also called generative cell specific-1 (GCS1).

Hapless-2 is a single-span transmembrane protein located at the gamete cell surface, typically at mating structures. It is essential for gamete fusion in the green alga Chlamydomonas reinhardtii, but also in other plants, including Arabidopsis thaliana, and Lilium longiflorum and in protozoans, such as Plasmodium berghei or Tetrahymena thermophila. A thorough eukaryotic genome examination reveals the existence of this gene in many major eukaryotic taxa, from slime molds to the honey bee. It is however not present in fungi, nor in most animals, including humans. The wide evolutionary distribution of hapless-2 suggests it was present in the last eukaryotic common ancestor and lost in some clades later on. Disruption of hapless-2 blocks gamete fusion, but not adhesion to gametes of the opposite mating type (or sex), suggesting that gamete adhesion relies on proteins that are species-specific, but that fusion itself is mediated by an ancestral common gene product.

Earlier this year, the 3D-structure of Chlamydomonas reinhardtii hapless-2 was unraveled. The secondary and tertiary structures of the ectodomain are almost identical to viral class II proteins, such as the envelope protein E of flaviviruses, with which hapless-2 shares very low identity at the amino acid level, and which are also involved in membrane fusion. Fédry et al. hypothesize that these fusion proteins most certainly derived from a common ancestor, whose gene has likely been transferred via horizontal exchange.

Like the flavivirus class II proteins, the hapless-2 ectodomain trimerizes concomitantly with insertion into the membrane of the partner gamete. The trigger for trimerization of hapless-2 is not yet known, although acidification, which drives trimerization of flavivirus class II proteins in late endosomes, is not required.

Information gained from the 3D structure of hapless-2 may help in the development of transmission-blocking vaccines (TBVs), a new strategy to fight malaria (and other protozoan diseases). Successful transmission of Plasmodium from humans to mosquitoes relies on hapless-2-dependent fusion of the parasite gametes and fertilization, which occurs rapidly after ingestion by the mosquito. If TBVs could be designed to induce anti-hapless-2 antibodies in human hosts, these would be ingested by Anopheles mosquitoes along with blood Plasmodium gametocytes. The initial gamete fusion step could be prevented and the deadly cycle of transmission blocked. This approach has already been tested in model animals and, although the preliminary results look promising, they are not yet sufficient for clinical development. The identification of new peptides, that are both functionally crucial and immunogenic, may prove very helpful in the design of efficient anti-malaria TBVs.

As of this release, hapless-2 UniProtKB/Swiss-Prot entries have been created and are publicly available.

1 The acronym HAP2 is somewhat unfortunate, since this protein has nothing to do with the yeast HAP2 transcription factor. These are the mysterious ways of nomenclature, which sometimes may be quite confusing...

UniProtKB news

Modification of cross-references to PATRIC

We have modified our cross-references to the PATRIC database in order to reflect the new PATRIC primary identifier scheme. The earlier identifier scheme used simple numeric ids, e.g.

32117610

which were replaced by more informative primary identifiers such as
fig|1427269.3.peg.1028.

Text format

Example: Q9ZNI1

Previous format:

DR   PATRIC; 19579917; VBIStaAur99865_1117.

New format:

DR   PATRIC; fig|93061.5.peg.1117; -.

XML format

Example: Q9ZNI1

Previous format:

<dbReference type=“PATRIC” id=“19579917”>
  <property type=“gene designation” value=“VBIStaAur99865_1117”/>
</dbReference>

New format:

<dbReference type="PATRIC" id="fig|93061.5.peg.1117"/>

RDF format

Example: Q9ZNI1

Previous format:

uniprot:Q9ZNI1
  rdfs:seeAlso <http://purl.uniprot.org/patric/19579917> .
<http://purl.uniprot.org/patric/19579917>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/PATRIC> ;
  rdfs:comment "VBIStaAur99865_1117" .

New format:

uniprot:Q9ZNI1
  rdfs:seeAlso <http://purl.uniprot.org/patric/fig%7C93061.5.peg.1117> .
<http://purl.uniprot.org/patric/fig%7C93061.5.peg.1117>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/PATRIC> .

New file linking deleted entries to their subsequently reinstated versions

Since release 2015_04, we are applying at each release a procedure to identify highly redundant proteomes within selected species groups using a combination of manual and automatic methods. This procedure prevents the creation of UniProtKB/TrEMBL entries from these redundant proteomes, but also means that a huge number of previously existing entries had to be deleted from UniProtKB when the procedure was put in place.

It may happen that proteomes that were identified as redundant are later reinstated as non-redundant, e.g. a proteome for a strain used as a model by a significant community or with proteins that have been crystallized. In the past, it has also happened on rare occasions that entries were deleted but later reinstated for other reasons. In such cases, the UniProtKB entries are created anew, with new accession numbers.

To help users to link deleted to subsequently reinstated entries, we are introducing a file that maps old to new accession numbers via their protein_ids. This file is available (in compressed format) by FTP at

ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/reinstated_map.txt.gz

This mapping will also be used to make queries for obsolete identifiers on the UniProt website more meaningful.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • Cyclopeptide (Glu-Asn)

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • S-methylmethionine

Deleted term

  • N-acetylated lysine

Changes in subcellular location controlled vocabulary

New subcellular location:

Changes to keywords

New keyword:

UniProt release 2017_05

Published May 10, 2017

Headline

A certain taste for light

In most organisms, light perception is essential for survival. It not only mediates image-forming vision, but also performs other functions, such as phototaxis and circadian rhythm. Light-sensing function is carried out by photoreceptors, of which only 2 types are known in metazoans: opsins and cryptochromes. They are typically composed of two moieties: a protein and a prosthetic chromophore, the latter is responsible for light absorption. Consequently, photoreceptor denaturation, which targets the protein moiety, does not abolish light absorption, although it shifts absorbance peaks to different wavelengths. Photoreceptor activation by light induces a signaling pathway, called phototransduction, which involves the activation of a G-protein, the modulation of cGMP levels and ultimately a change in the permeability of cyclic nucleotide-gated channels.

It has been long thought that Caenorhabditis elegans, an eyeless, soil-dwelling nematode, could not sense light. This assumption turned out to be erroneous. Not only does C.elegans sense light, but it vigorously escapes from it. This behaviour is elicited only in response to blue or shorter wavelengths of light, with maximal responsiveness to UV light. This mechanism may have evolved to protect the animal against prolonged direct sunlight exposure that paralyzes and eventually kills it. Indeed worms appear to spend much of their time above ground, living on small surface-dwelling animals or their carcasses and may therefore be frequently exposed to direct sunlight. From the very beginning of the discovery of phototransduction in C.elegans, it was obvious that the lite-1 gene was involved in this process, as its heterologous expression in muscle cells was sufficient to confer light responsiveness on these cells that were normally unresponsive. Lite-1 was also shown to act upstream of G proteins, but its exact function remained unclear. Is it a bone fide photoreceptor? Or is it just sensing light-produced chemicals? Like opsins, which are the most common photoreceptor proteins in metazoan photoreceptor cells, lite-1 contains a 7-transmembrane domain. However, it does not share any sequence similarity with opsins and its topology is opposite to conventional 7-transmembrane receptors, with its N-terminus located intracellularly and its C-terminus extracellularly. In fact, lite-1 belongs to the insect gustatory receptor family of chemoreceptors, rather than opsin family. To clarify its role, Gong et al. purified lite-1 and showed that it directly absorbs photons with an efficiency 10 to 100 times that of all known photoreceptors, capturing both UVA and UVB light. Interestingly, absorption of UVA and UVB light can be separated. For instance, mutations at residues Ala-332 and Ser-226 disrupt UVA absorption, but do not affect UVB absorption. In addition, prolonged light illumination, which bleaches conventional photoreceptors, abolishes lite-1 absorption of UVA, but does not affect that of UVB, which appears to be more stable and relatively resistant to photobleaching.

Another remarkable lite-1 feature is that it loses all photoabsorption abilities upon denaturation, suggesting that this activity strictly depends on its conformation and not upon the presence of a chromophore. Mutational analysis pointed at 2 tryptophan residues (Trp-77 and Trp-328) that are required for the absorption of both UVA and UVB light. In order to confirm the importance of these residues, Gong et al. introduced ‘Trp-77’ by mutagenesis at the equivalent position in a structurally related gustatory receptor, called gur-3, which contains ‘Trp-328’, but is not photosensitive. Amazingly, mutated gur-3 absorbs UVB light with an efficiency of about 30% of that of lite-1. All these observations indicate that lite-1 is a bona fide photoreceptor of a novel type.

C.elegans lite-1 entry has been updated and is publicly available as of this release.

UniProtKB news

Extension of controlled vocabulary for PTM to glycosylation sites

Our controlled vocabulary for post-translational modification, so far used to standardize the annotation of modified residues, lipidation sites and protein cross-links, has been extended to include terms for glycosylation sites.

Change of the nomenclature for glycosylation sites

We have introduced a change to the nomenclature for glycosylation sites.

We previously described the occurrence of the attachment of a glycan (mono- or polysaccharide) to an amino-acid residue with the following elements:

  • The type of linkage (C-, N-, O- or S-linked) to the protein
  • The abbreviation of the reducing terminal sugar (shown between parentheses): If three dots ‘...’ follow the abbreviation, this indicates an extension of the carbohydrate chain. Conversely the absence of dots means that a monosaccharide is linked.

To this we have added:

  • The name of the glycosylated amino acid

The new nomenclature is thus composed of three elements:

<linkage type> (<reducing carbohydrate>) <amino acid name>.

The valid values have been added to our controlled vocabulary for post-translational modifications and applied to all Glycosylation annotations.

Example: Q9HCN3

Previous nomenclature:

FT   CARBOHYD    144    144       N-linked (GlcNAc...).

New nomenclature:

FT   CARBOHYD    144    144       N-linked (GlcNAc...) asparagine.

Note that this information about the type of glycosylation can be complemented by

  • the name of the modified protein form,
  • information on whether the modification is carried out by a host protein,
  • the frequency of the modification or the relationship with another feature (‘partial’, ‘alternate’, ‘transient’),
  • evidence attribution

as documented for modified residues.

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Glycosylation’ (‘CARBOHYD’ in the flat file):

  • C-linked (Man) hydroxytryptophan
  • C-linked (Man) tryptophan
  • N-linked (DATDGlc) asparagine
  • N-linked (GalNAc) asparagine
  • N-linked (GalNAc...) asparagine
  • N-linked (GalNAc...) (glycosaminoglycan) asparagine
  • N-linked (Glc) arginine
  • N-linked (Glc) asparagine
  • N-linked (Glc) (glycation) histidine
  • N-linked (Glc) (glycation) isoleucine
  • N-linked (Glc) (glycation) lysine
  • N-linked (Glc) (glycation) valine
  • N-linked (Glc...) arginine
  • N-linked (Glc...) asparagine
  • N-linked (GlcNAc) arginine
  • N-linked (GlcNAc) asparagine
  • N-linked (GlcNAc...) arginine
  • N-linked (GlcNAc...) asparagine
  • N-linked (GlcNAc...) (complex) arginine
  • N-linked (GlcNAc...) (complex) asparagine
  • N-linked (GlcNAc...) (high mannose) arginine
  • N-linked (GlcNAc...) (high mannose) asparagine
  • N-linked (GlcNAc...) (hybrid) arginine
  • N-linked (GlcNAc...) (hybrid) asparagine
  • N-linked (GlcNAc...) (keratan sulfate) arginine
  • N-linked (GlcNAc...) (keratan sulfate) asparagine
  • N-linked (GlcNAc...) (paucimannose) arginine
  • N-linked (GlcNAc...) (paucimannose) asparagine
  • N-linked (GlcNAc...) (polylactosaminoglycan) arginine
  • N-linked (GlcNAc...) (polylactosaminoglycan) asparagine
  • N-linked (Hex) arginine
  • N-linked (Hex) asparagine
  • N-linked (Hex) tryptophan
  • N-linked (Hex...) arginine
  • N-linked (Hex...) asparagine
  • N-linked (HexNAc) arginine
  • N-linked (HexNAc) asparagine
  • N-linked (HexNAc...) arginine
  • N-linked (HexNAc...) asparagine
  • N-linked (Lac) (glycation) lysine
  • N-linked (Man) tryptophan
  • O-linked (Ara) hydroxyproline
  • O-linked (Ara...) hydroxyproline
  • O-linked (DADDGlc) serine
  • O-linked (DATDGlc) serine
  • O-linked (GATDGlc) serine
  • O-linked (Fuc) serine
  • O-linked (Fuc) threonine
  • O-linked (Fuc...) serine
  • O-linked (Fuc...) threonine
  • O-linked (FucNAc) serine
  • O-linked (FucNAc...) serine
  • O-linked (Gal) hydroxylysine
  • O-linked (Gal) hydroxyproline
  • O-linked (Gal) serine
  • O-linked (Gal) threonine
  • O-linked (Gal...) hydroxylysine
  • O-linked (Gal...) hydroxyproline
  • O-linked (Gal...) serine
  • O-linked (Gal...) threonine
  • O-linked (GalNAc) serine
  • O-linked (GalNAc...) serine
  • O-linked (GalNAc...) (keratan sulfate) serine
  • O-linked (GalNAc) threonine
  • O-linked (GalNAc...) threonine
  • O-linked (GalNAc...) (keratan sulfate) threonine
  • O-linked (GalNAc) tyrosine
  • O-linked (GalNAc...) tyrosine
  • O-linked (Glc) hydroxylysine
  • O-linked (Glc) serine
  • O-linked (Glc...) serine
  • O-linked (Glc) tyrosine
  • O-linked (Glc...) tyrosine
  • O-linked (GlcA) serine
  • O-linked (GlcNAc) hydroxyproline
  • O-linked (GlcNAc...) hydroxyproline
  • O-linked (GlcNAc) serine
  • O-linked (GlcNAc...) serine
  • O-linked (GlcNAc) threonine
  • O-linked (GlcNAc...) threonine
  • O-linked (GlcNAc) tyrosine
  • O-linked (GlcNAc...) tyrosine
  • O-linked (GlcNAc1P) serine
  • O-linked (GlcNAc6P) serine
  • O-linked (Man) serine
  • O-linked (Man...) serine
  • O-linked (Man...) (keratan sulfate) serine
  • O-linked (Man) threonine
  • O-linked (Man...) threonine
  • O-linked (Man...) (keratan sulfate) threonine
  • O-linked (Man1P) serine
  • O-linked (Man1P...) serine
  • O-linked (Man6P) threonine
  • O-linked (Man6P...) threonine
  • O-linked (Xyl) serine
  • O-linked (Xyl...) serine
  • O-linked (Xyl...) (chondroitin sulfate) serine
  • O-linked (Xyl...) (dermatan sulfate) serine
  • O-linked (Xyl...) (heparan sulfate) serine
  • O-linked (Xyl...) (glycosaminoglycan) serine
  • O-linked (Xyl...) (keratan sulfate) threonine
  • O-linked (Xyl...) (glycosaminoglycan) threonine
  • O-linked (Hex) hydroxylysine
  • O-linked (Hex...) hydroxylysine
  • O-linked (Hex) hydroxyproline
  • O-linked (Hex...) hydroxyproline
  • O-linked (Hex) serine
  • O-linked (Hex...) serine
  • O-linked (Hex) threonine
  • O-linked (Hex...) threonine
  • O-linked (Hex) tyrosine
  • O-linked (Hex...) tyrosine
  • O-linked (HexNAc) hydroxyproline
  • O-linked (HexNAc...) hydroxyproline
  • O-linked (HexNAc) serine
  • O-linked (HexNAc...) serine
  • O-linked (HexNAc) threonine
  • O-linked (HexNAc...) threonine
  • O-linked (HexNAc) tyrosine
  • O-linked (HexNAc...) tyrosine
  • S-linked (Gal) cysteine
  • S-linked (Gal...) cysteine
  • S-linked (Glc) cysteine
  • S-linked (Glc...) cysteine
  • S-linked (GlcNAc) cysteine
  • S-linked (GlcNAc...) cysteine
  • S-linked (Hex) cysteine
  • S-linked (Hex...) cysteine
  • S-linked (HexNAc) cysteine
  • S-linked (HexNAc...) cysteine

New terms for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • Cysteine sulfonic acid (-SO3H)

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Cardiomyopathy, dilated 1T
  • Sarcoidosis early-onset

UniRef news

Addition of GO annotation to UniRef90 and UniRef50 clusters

We have started to compute Gene Ontology (GO) annotations for UniRef90 and UniRef50 clusters: A GO term is assigned to a cluster when it is found in all UniProtKB members that are annotated with this term, or when it is a common ancestor of at least one GO term of each such member.

The UniRef XML format now represents the GO annotations with property elements. We have introduced three new types: "GO Molecular Function", "GO Biological Process", "GO Cellular Component". The values of these property elements are GO identifiers.

Example:

<entry id="UniRef50_B0KJL7" updated="2017-03-15"
  <name>Cluster: Animal haem peroxidase</name>
  ...
  <property type="GO Molecular Function" value="GO:0004601"/>
  <property type="GO Biological Process" value="GO:0006979"/>
  ...

This change does not affect the XSD, but may nevertheless require code changes.

UniProt release 2017_04

Published April 12, 2017

Headline

Death (by insulin) in paradise

Have you ever been lucky enough to see cones snails in their natural habitat? Their shells are beautiful and you may be tempted to pick them up to admire them. Try to resist: cone snails hate that! These venomous animals can fire their harpoons and inject toxins under your skin. In some cases, these injections can be fatal. Cone snails produce 100-200 distinct venom peptides, and most of the characterized ones target their prey’s nervous system, including specific receptors, ion channels and transporters.

Cone snails predominantly live in warm seas and feed on fish, worms or molluscs. Fish-hunting cone snails can be classified into 2 categories depending upon their hunting strategy. There are ‘hook-and-line hunters’, who use a venomous harpoon, which is shot into the fish. There are ‘net hunters’, who protrude a sort of stretchy mouth, aim it at fish, and eventually engulf it. Cone snails move very slowly and all this process takes some time, so why does the fish not simply swim away? It has been proposed that cone snails release a subset of narcotizing or relaxing toxins, called the 'nirvana cabal', into water, causing fish to become disoriented and to stop moving.

The analysis of the Conus geographus venom gland transcriptome led to the amazing discovery of 3 transcripts (Con-Ins G1, Con-Ins G2 and Con-Ins G3), expressed at high levels and sharing very high homology with vertebrate insulin. The N-terminal half of Con-Ins G1 is almost identical to that of the fish hormone. It is known that the addition of human insulin to water causes hypoglycemia in fish, which severely affects their swimming behavior, insulin being absorbed via the gills. The effect can be reversed by placing fish in a 2% glucose bath. A similar effect was observed with synthetic Con-Ins G1, suggesting that it is indeed a component of the ‘nirvana cabal’.

Venom insulins are widely used by cone snails. All mollusc eaters produce venom insulins, as do many worm hunters, though not all. In fish hunters, all net hunters produce venom insulins, while hook-and-line do not. Venom insulins found in fish hunting cone snails closely resemble fish insulins, whereas those identified in snail-hunters share sequence and structural similarities with mollusc insulins. Interestingly, while cone snail insulin, produced in nerve rings to control their own glucose homeostasis, is highly conserved across all tested species, venom insulins diverge rapidly, suggesting adaptation to their specific prey.

Cone snail venom insulins are the smallest known insulins found in nature. They lack A- and B-chain C-terminal residues that, in vertebrates, are crucial for hormone storage and activity. In human pancreatic beta-cells, insulin is stored as a hexamer (a trimer of dimers), but it is the monomer that bears the hormonal activity. Hexamer-to-monomer conversion can cause a delay in insulin action that can lead to a delay in blood glucose control following insulin injection in diabetic patients. Attempts to shorten the C-terminus of human insulin B chain in order to abolish self-association have resulted in near-complete loss of activity. By contrast, Con-Ins G1 is monomeric, bypassing the hexamer conversion step, but it also potently binds to the human insulin receptor. It is yet not entirely clear how Con-Ins G1 achieves that. As most conotoxins, C. geographus insulins are extensively post-translationally modified. In the absence of modifications, insulin receptor activation is reduced by approximately 8-fold. The study of Con-Ins G1 crystal structure shows how Con-Ins G1 can compensate for the lack of C-terminal key residues, paving the way for the design of fast-acting therapeutic insulins.

The use of insulins in venoms has not been reported in any other animals, but cone snails. However, the Gila monster, a venomous lizard living in southwestern United States and northwestern Mexico, also targets the glucose homeostasis of its prey. It produces a peptide, called exendin-4, which mimics the incretin hormone glucagon-like peptide 1 (GLP-1), and acts as a potent stimulator of glucose-dependent insulin release. Exendin-4 has been developed as a commercial drug, under the name ‘Exenatide’, for the treatment of type 2 diabetes.

As of this release, the Con-Ins G1 entry is publicly available in the safe conotoxin-free environment of your computer.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Ceroid lipofuscinosis, neuronal, 12

UniProt release 2017_03

Published March 15, 2017

Headline

Viral Short Message Service: peptide texting guides the outcome of infection

Communication is not simply a dispensable tool invented by Homo sapiens to do business and to have an enjoyable social life. Long before the advent of cell phones, most living organisms, from animals and plants to bacteria, were communicating with each other in order to ensure species survival. The recent discovery of a peptide-based communication system in some bacterial viruses extends this observation far beyond our wildest imaginings.

Some bacterial viruses, called temperate bacteriophages, have the ability to infect their host through a lytic (productive infection) or a lysogenic (latent) cycle. The lytic cycle leads to the lysis of the host bacterial cell and release of progeny virions. In the lysogenic cycle, on the other hand, the bacteriophage genome becomes integrated into the host genome as a prophage without any virion production. The decision between lysis and lysogeny is probabilistic in nature, but usually depends on the number of co-infecting viruses and the bacterial nutritional state. When uninfected bacteria are abundant and healthy, the lytic pathway is preferred. In later stages of infection, when the number of uninfected bacteria is reduced, progeny phages are at risk of no longer having a new host to infect. At this point, lysogeny is favoured. Although the molecular mechanism undelying the phage lytic or lysogenic decision is still largely unknown, even in well-studied bacteriophages like Lambda or Mu, a substantial leap forward was made earlier this year.

Erez et al. were investigating whether phage-infected bacteria may produce molecules to alert other bacterial cells of their infection, when they made an amazing discovery. A screening of the culture medium of Bacillus subtilis infected by Phi3T bacteriophages led to the identification not of a bacterial, but of a... viral hexapeptide! This peptide was called AimP. The bacteriophage also encodes a cytoplasmic receptor for AimP, called AimR. In the absence of AimP, the AimR receptor behaves as a DNA-binding homodimer which activates the transcription of a third phage component of the system, AimX. AimX is a regulatory non-coding RNA which favors lysis, either by inhibiting lysogeny or by promoting lysis, in an as yet undefined manner. In the presence of AimP, the AimR receptor becomes a peptide-bound, transcriptionally inactive monomer. As a result, the expression of AimX drops and lysogeny is promoted.

The current experimental data suggest the following model. AimP is synthesized in infected bacteria as a pre-pro-peptide. Its N-terminal signal sequence is recognized by the host secretion system and cleaved off upon secretion. Once released in the extracellular milieu, the inactive pro-peptide is further processed by bacterial extracellular proteases to yield the mature active 6 amino-acid long AimP peptide, which is internalized by surrounding bacteria through the oligopeptide permease transporter (OPP). AimP accumulates in the bacterial cytoplasm. When a phage infects an ‘AimP-rich’ bacterium, the expressed AimR receptor binds AimP and cannot activate the expression of AimX, leading to preferential lysogeny. In other words, a phage can “sense” the level of global infection in the environment and adapt to preserve chances for viable reproduction.

This viral mode of 3-membered communication has been called ‘arbitrium’ (after the Latin word meaning ‘decision’). It may not be restricted to Phi3T bacteriophages. Indeed, Erez et al. found 112 instances of AimR homologues in Bacillus phages and, in all cases, aimR homologues were found upstream of aimP candidate genes.

As of this release, Bacillus phage Phi3T AimP and AimR entries have been updated and are publicly available.

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to keywords

New keywords:

Modified keyword:

UniProt release 2017_02

Published February 15, 2017

Headline

Freshwater fish see red

Vision relies on specialized neurons found in the retina, called photoreceptor cells. Vertebrate photoreceptor cells contain visual pigments consisting of a G-protein-coupled receptor, called opsin, and a covalently bound chromophore derived from vitamin A, most commonly 11-cis retinal (a derivative of vitamin A1). Light-induced isomerization of 11-cis retinal to all-trans triggers a conformational change leading to G-protein activation, release of all-trans retinal and activation of the phototransduction cascade.

Typical rod photopigments have a maximum light absorbance of around 500 nm. However, at the end of the 19th century, Köttgen and Abelsdorff observed that the rod pigments in certain freshwater fish were “red-shifted” towards 20-30 nm longer wavelengths than those of marine fish and terrestrial animals. This difference is due to a change in chromophore. Instead of 11-cis retinal, freshwater vertebrates use 11-cis 3,4-didehydroretinal, a derivative of vitamin A2, whose only difference with vitamin A1 is an additional conjugated double bond within its beta-ionone ring. What is the evolutionary advantage of this modification? Fresh water, in lakes or streams, is often murky. As a result, the light environment is shifted to the red and infrared end of the spectrum. Switching light absorbance seems to be the appropriate response to optimize vision in this specific aquatic milieu.

The chromophore switch is not only specific for certain species, it can also be regulated during life. For example, many amphibians use 11-cis 3,4-didehydroretinal during the tadpole stage, that they spend in ponds. Upon metamorphosis, they switch to 11-cis retinal which provides clear vision to the terrestrial adult they have become. Conversely, salmon live happily with 11-cis retinal in the open ocean. During spawning migration, however, 11-cis retinal is progressively replaced by 11-cis 3,4-didehydroretinal, possibly through the action of thyroid hormones. In zebrafish also, the switch to vitamin A2-based chromophores can be induced by thyroid hormone treatment. Maybe the most striking example of differential usage of visual chromophores is provided by the American bullfrog. This voracious predator spends a large part of its life floating or swimming at the surface of the water, looking for aquatic, as well as aerial prey, with its eyes just above the waterline. Its dorsal retina, steered towards water, contains 11-cis 3,4-didehydroretinal, while its ventral retina uses 11-cis retinal.

While much of this knowledge on vitamins A1 and A2 was acquired long ago, the identity of the dehydrogenase catalyzing the switch between both forms remained elusive until December 2015, when Enright et al. published the identification of the enzyme. The authors compared the expression profile of zebrafish retinal pigment epithelium (RPE) of thyroid hormone-treated versus control animals. The most highly up-regulated transcript was that encoding cyp27c1, a cytochrome P450 family member. cyp27c1 was also strongly expressed in dorsal, but not ventral bullfrog RPE, correlating with the distribution of vitamin A2. In vitro, purified cyp27c1 was able to very efficiently catalyze the conversion of vitamin A1 to vitamin A2. In vivo, cyp27c1 knockout zebrafish survive to adulthood without overt developmental abnormalities. However, upon treatment with thyroid hormone, the mutant fish eyes fail to produce any vitamin A2 and their photoreceptors do not undergo a red-shift in sensitivity. Thus, the expression of a single enzyme, cyp27c1, mediates the dynamic spectral tuning of the entire visual system by controlling the balance of vitamin A1 and A2 in the eye.

Obviously, humans are not adapted for aquatic vision. However, they do produce vitamin A2, as has been documented in keratinocytes, and they express CYP27C1 in liver, kidney and pancreas. The human enzyme catalyzes the same reaction as fish and amphibian orthologs, but the physiological relevance of this observation is not clear at present.

Zebrafish and bullfrog CYP27C1 entries have been annotated in UniProtKB/Swiss-Prot. The preliminary sequence of American bullfrog CYP27C1 was kindly provided by Professor Corbo and Dr. Enright and we would like to thank them sincerely. The human ortholog has been updated. All 3 entries are publicly available as of this release.

Cross-references to Araport

Cross-references have been added to the Arabidopsis Information Portal Araport, an open-access online community resource for Arabidopsis research.

Araport is available at https://www.araport.org/.

The format of the explicit links is:

Resource abbreviation Araport
Resource identifier AGI locus code

Example: Q43125

Show all entries having a cross-reference to Araport.

Text format

Example: Q43125

DR   Araport; AT4G08920; -.

XML format

Example: Q43125

<dbReference type="Araport" id="AT4G08920"/>

RDF format

Example: Q43125

uniprot:Q43125
  rdfs:seeAlso <http://purl.uniprot.org/araport/AT4G08920> .
<http://purl.uniprot.org/araport/AT4G08920>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Araport> .

Cross-references to IMGT/GENE-DB

Cross-references have been added to IMGT/GENE-DB, the genome database of the international Immunogenetics information system (IMGT) for genes encoding immunoglobulins and T-cell receptors.

IMGT/GENE-DB is available at http://www.imgt.org/genedb/.

The format of the explicit links is:

Resource abbreviation IMGT/GENE-DB in entry view, IMGT_GENE-DB in source formats
Resource identifier Gene name

Example: P01871

Show all entries having a cross-reference to IMGT/GENE-DB.

Text format

Example: P01871

DR   IMGT_GENE-DB; IGHM; -.

XML format

Example: P01871

<dbReference type="IMGT_GENE-DB" id="IGHM"/>

RDF format

Example: P01871

uniprot:P01871
  rdfs:seeAlso <http://purl.uniprot.org/imgt_gene-db/IGHM> .
<http://purl.uniprot.org/imgt_gene-db/IGHM>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/IMGT_GENE-DB> .
 

Change of the cross-references to TAIR

We have modified our cross-references to the TAIR database, and now use the TAIR accession number as the primary resource identifier, while continuing to show the TAIR locus name in an additional field.

Text format

Example: Q9ZVI3

Previous format:

DR   TAIR; AT2G38610; -.

New format:

DR   TAIR; locus:2064097; AT2G38610.

XML format

Example: Q9ZVI3

Previous format:

<dbReference type="TAIR" id="AT2G38610"/>

New format:

<dbReference type="TAIR" id="locus:2064097">
  <property type="gene designation" value="AT2G38610"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: Q9ZVI3

Previous format:

uniprot:Q9ZVI3
  rdfs:seeAlso <http://purl.uniprot.org/tair/AT2G38610> .
<http://purl.uniprot.org/tair/AT2G38610>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/TAIR> .

New format:

uniprot:Q9ZVI3
  rdfs:seeAlso <http://purl.uniprot.org/tair/locus:2064097> .
<http://purl.uniprot.org/tair/locus:2064097>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/TAIR> .
  rdfs:comment "AT2G38610" .

Removal of sequence similarity annotations for domains

Sequence similarity annotations were mainly used to describe two types of information:

  1. A family to which the protein belongs, worded as:
Belongs to FamilyName.
  1. A structural domain that the protein contains, worded as:
Contains NumberOfOccurence DomainName.

The domains that a protein contains are also annotated in ‘Domain’, ‘Zinc finger’, ‘Repeat’, ‘Calcium binding’ or ‘DNA binding’ annotations, which describe a domain’s name and sequence coordinates. The ‘Sequence similarity’ annotations of type 2, however, described only a domain’s name and number of occurences. We have therefore removed these less detailed annotations.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Thyroxine-binding globulin deficiency

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • N,N,N-trimethylmethionine

UniProt release 2017_01

Published January 18, 2017

Headline

Sheep in wolves' clothing: human variant reannotation in UniProtKB/Swiss-Prot with ExAC

Annotation of sequence variants has always been an important part of the curation of human proteins in UniProtKB/Swiss-Prot. As of this release, about 76,500 variants are annotated in the knowledgebase. 99% of them are single amino acid polymorphisms (SAPs), the rest are small indels. 38% of the SAPs are associated with a genetic disorder. This high percentage of rare SAPs reflects our strategy to prioritize the annotation of disease-causing and/or functionally characterized variants reported in peer-reviewed scientific literature. Most are annotated as involved in diseases (as disease-causing agents, susceptibility factors or disease modifiers), but for some, the role in the phenotype is not clear, although they have been found in patients and not (yet?) in healthy individuals. These variants are called Variants of Unknown Significance (VUS). In the ‘good old days’, we were quite confident and we associated SAPs with diseases provided some criteria were met, such as cosegregation of the mutation with the phenotype, and absence of the mutation in a reasonably high number of healthy controls. At that time, 100 control individuals, ethnically matched if possible, seemed acceptable. Those days are gone. Nowadays, these simple criteria have been changed for a real roadmap, based on guidelines developed by Richards et al. The stumbling block remains the frequency of a given variant in the population in view of the occurrence of the disease. In other words, if a variant is not found in healthy individuals, is it because it is pathogenic, or simply not looked for hard enough? In this context, the high-quality sequence of almost 61,000 exomes provided by the Exome Aggregation Consortium (ExAC) is a major achievement.

ExAC aims to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The sequence of 60,706 exomes from unrelated individuals is currently available on the ExAC website. Surprisingly, each ExAC exome donor harbored on average 54 mutations reported to be disease-causing in HGMD or ClinVar. The pathogenicity of most of them (41) could be ruled out due to high allele frequency. Take for instance the gene CLN8. Mutations in this gene have been shown to cause neuronal ceroid lipofuscinosis-8 (CLN8), an autosomal recessive neurodegenerative disorder with an onset age of 2 to 7 years. In view of the clinical synopsis, no ‘healthy’ adult homozygous for any disease-causing mutation is expected. ExAC observed 93 individuals homozygous for the p.Pro229Ala variant, which had formerly been reported to be pathogenic. An analogous result was obtained for the variant p.Met1444Ile in GLI2. This mutation was reported to cause holoprosencephaly-9 (HPE9), an autosomal dominant disorder characterized by a wide phenotypic spectrum of brain developmental defects. Although HPE9 has variable expressivity and incomplete penetrance, the presence of this mutation in 20 homozygous individuals analyzed by ExAC lead to its reclassification as a benign polymorphism.

The ExAC publication has a fruitful impact on our annotation. First, 38 variants (in 36 gene entries) reported in UniProtKB/Swiss-Prot and thought to be pathogenic have been reclassified as either benign polymorphisms or VUS. Second, the ExAC database has become an invaluable tool for curators, helping them to tag human variants with the appropriate status ‘Disease’ (disease-associated), ‘Polymorphism’ (innocuous) or ‘Unclassified’ (i.e. VUS). Third, we are learning to be more and more cautious when annotating new variants. The result is an increased number of VUS in UniProtKB/Swiss-Prot (currently representing about 20% of the total number of variants identified in patients). Old variants will be progressively confirmed or reclassified as new knowledge becomes available.

As of this release, the variants updated thanks to ExAC data are available in UniProtKB/Swiss-Prot.

The UniProt team wishes you a Happy New Year!

Cross-references to SFLD

Cross-references have been added to the Structure Function Linkage Database (SFLD), a resource that links evolutionarily related sequences and structures from mechanistically diverse superfamilies of enzymes to their chemical reactions.

SFLD is available at http://sfld.rbvi.ucsf.edu/django/.

The format of the explicit links is:

Resource abbreviation SFLD
Resource identifier SFLD identifier
Optional information 1 SFLD model name
Optional information 2 Number of hits

Example: P00877

Show all entries having a cross-reference to SFLD.

Text format

Example: P00877

DR   SFLD; SFLDS00014; RuBisCO; 1.

XML format

Example: P00877

<dbReference type="SFLD" id="SFLDS00014">
  <property type="entry name" value="RuBisCO"/>
  <property type="match status" value="1"/>
</dbReference>

RDF format

Example: P00877

uniprot:P00877
  rdfs:seeAlso <http://purl.uniprot.org/sfld/SFLDS00014> .
<http://purl.uniprot.org/sfld/SFLDS00014>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/SFLD> ;
  rdfs:comment "RuBisCO" ;
  up:signatureSequenceMatch <http://purl.uniprot.org/isoforms/P00877-1#SFLD_SFLDS00014_match_1> .

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to the controlled vocabulary for PTMs

New terms for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • O-UMP-histidine
  • O-UMP-serine
  • O-UMP-threonine

Changes to keywords

Deleted keyword:

  • Cyclosporin

UniRef news

Change of the UniRef FASTA header

We have added the NCBI taxonomy identifier of the common taxon of a UniRef cluster to the UniRef FASTA header, which now has the format:

>UniqueIdentifier ClusterName n=Members Tax=TaxonName TaxID=TaxonIdentifier RepID=RepresentativeMember

Where:

  • UniqueIdentifier is the primary accession number of the UniRef cluster.
  • ClusterName is the name of the UniRef cluster.
  • Members is the number of UniRef cluster members.
  • TaxonName is the scientific name of the lowest common taxon shared by all UniRef cluster members.
  • TaxonIdentifier is the NCBI taxonomy identifier of the lowest common taxon shared by all UniRef cluster members.
  • RepresentativeMember is the entry name of the representative member of the UniRef cluster.

Example:

>UniRef50_Q9K794 Putative AgrB-like protein n=2 Tax=Bacillus TaxID=1386 RepID=AGRB_BACHD
MLERLALTLAHQVKALNAEETESVEVLTFGFTIILHYLFTLLLVLAVGLLHGEIWLFLQI
ALSFTFMRVLTGGAHLDHSIGCTLLSVLFITAISWVPFANNYAWILYGISGGLLIWKYAP
YYEAHQVVHTEHWERRKKRIAYILIVLFIILAMLMSTQGLVLGVLLQGVLLTPIGLKVTR
QLNRFILKGGETNEENS

This addresses the issue that scientific taxon names can be ambiguous. Example: “Bacillus” refers to both a genus of bacteria as well as a genus of insects.

UniProt release 2016_11

Published November 30, 2016

Headline

From mouth to gut, a new mechanism for fimbria assembly

Fighting the oral microbiome is a daily task. Ineffective oral hygiene leads not only to dental caries, but also to inflammatory gum diseases, such as gingivitis. In some cases, gingivitis can worsen and turn into periodontitis, which involves the chronic destruction of connective tissues, including that of the alveolar bone around the teeth, and consequently loosening and subsequent loss of teeth. We are not all equally affected by periodontal diseases. There are marked differences in disease progression rate and severity, reflecting personal susceptibility, diversity in virulence among the microorganism species (and subspecies) and environmental conditions. Despite these variables, Porphyromonas gingivalis is now recognized as a major contributor to periodontitis. This Gram-negative black-pigmented anaerobic rod resides in subgingival biofilms and harbors an arsenal of virulence factors, among which are fimbriae (also called pili). Described for the first time in the early 1950s, fimbriae are non-flagellar appendages, formed by the assembly of proteins called pilins at the bacterial surface. They are often involved in the initial adhesion of the bacteria to host tissues during colonization, and also in biofilm formation, cell motility (twitching mobility), and transport of proteins and DNA across cell membranes. There are major (long) and minor (short) fimbriae, both containing a structural, stalk-forming subunit (FimA for the major fimbriae, Mfa1 for the minor fimbriae) and 3 accessory subunits (FimC, FimD and FimE for the major fimbriae; Mfa3, Mfa4 and Mfa5 for the minor fimbriae) thought to form the fimbria tip. The last subunit is FimB (major fimbriae) or Mfa2 (minor fimbriae), which anchors the pilus to the outer membrane.

A very thorough study published last April, combining X-ray structure, biochemical and mutational analyses, sheds new light on the fimbria assembly mechanism in several bacteria from the Bacteroidia class, including P. gingivalis. The assembly occurs from tip to base. A tip pilin monomer is incorporated first, followed by stalk-forming structural pilin subunits and finally an anchor pilin at the base. Tip and structural pilins are synthesized in the cytoplasm as lipoprotein precursors, and exported into the periplasm using the Sec pathway. In the periplasm, they are folded and become lipidated at the N-terminus. The modified pilins are then exported across the outer membrane. During this process, they undergo a cleavage that releases the lipid moiety and several amino acids from the N-terminus, creating a groove. At this stage, mature structural pilins adopt an extended “open” conformation, allowing the assembly of the fimbriae where a C-terminal extension binds to the N-terminal groove of the previous subunit, a little like interlocking Lego bricks. The tip pilins exhibit a similar N-terminal groove to accommodate the C-terminal extension from structural pilin, but their C-terminus remains buried. Anchor pilins do not undergo cleavage and remain tethered to the outer membrane. As for structural pilin subunits, their C-terminus is involved in their incorporation into fimbriae.

Although fimbria assembly has been studied in numerous phylogenetically distinct bacteria, until this recent publication, very little was known about pilin structure and assembly in human-associated Bacteroidales members. The reported mechanism was hitherto unseen, but it could be widespread. Indeed, FimA proteins represent a large and diverse superfamily, which is highly represented in the gut microbiome, suggesting that they may confer adaptive advantages in bacterial colonization of this environment.

Close to 30 entries have been updated in UniProtKB/Swiss-Prot to include these new findings. The entries can be consulted just as well before or after brushing your teeth!

UniProtKB news

Changes to the controlled vocabulary of human diseases

New diseases:

RDF news

Change of URIs for Ensembl and Ensembl Genomes

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

We have therefore updated the URIs for the Ensembl and Ensembl Genomes databases from

http://purl.uniprot.org/ensembl/<identifier>
http://purl.uniprot.org/ensemblbacteria/<identifier>
http://purl.uniprot.org/ensemblfungi/<identifier>
http://purl.uniprot.org/ensemblmetazoa/<identifier>
http://purl.uniprot.org/ensemblplants/<identifier>
http://purl.uniprot.org/ensemblprotists/<identifier>
to
  • http://rdf.ebi.ac.uk/resource/ensembl/<identifier>
    for genes
  • http://rdf.ebi.ac.uk/resource/ensembl.transcript/<identifier>
    for transcripts
  • http://rdf.ebi.ac.uk/resource/ensembl.protein/<identifier>
    for proteins

UniProt release 2016_10

Published November 2, 2016

Headline

N-acyl amino acids: a new treatment for obesity?

Mitochondria play a fundamental role in energy production. After glycolysis, glucose products are imported into the mitochondrial matrix, where they go through the citric acid cycle. The electrons produced in this process are transported from one protein complex to the next in the mitochondrial inner membrane. The final electron acceptor is molecular oxygen, which is ultimately reduced to water. During electron transport, the participating protein complexes pump protons out of the matrix space into the intermembrane space and thus create a concentration gradient. This gradient is used by ATP synthase to power the phosphorylation of ADP into ATP. However not all energy liberated from the oxidation of dietary substrates is converted into ATP. Protons can leak back to the matrix through the inner membrane independently of ATP synthase and the energy accumulated is dissipated as heat. Several proteins are known to be involved in this process, called “uncoupled respiration”. One of them, UCP1 has been most extensively studied in the context of thermogenesis mediated by brown and beige adipose tissues.

Adaptive thermogenesis does not rely exclusively upon UCP1. Adipose tissues secrete many bioactive proteins, some of which potentially play a role in the regulation of energy expenditure. Recently, Long et al. identified a protein secreted by brown and beige fat cells, PM20D1. This protein is co-expressed with UCP1 in adipocytes. When injected with PM20D1 viral expression vectors and placed on high fat diet for a period of 47 to 54 days, mice exhibited a blunted weight gain, due to a massive reduction in fat mass compared with control animals. There was no difference in food intake, nor in movement between treated and untreated animals, suggesting the activation of a thermogenic gene program in the classical brown fat (BAT), subcutaneous inguinal white fat (iWAT), or both. Interestingly, UCP1 levels were unchanged in these experiments.

In vitro, PM20D1 appeared to be a bidirectional N-acyl amino acid synthase and hydrolase, the synthase activity being lower than the hydrolase activity. In vivo, plasma levels of N-oleyl-phenylalanine (C18:1-Phe) were indeed elevated in mice injected with PM20D1 expression vector. But what is the effect of N-lipidated amino acids on cells? When treated with N-acyl amino acids, primary BAT adipocytes and differentiated iWAT cells showed increased oxygen consumption in a UCP1-independent manner, indicating respiratory uncoupling activity of these compounds. The N-acyl amino acids tested (N-arachidonyl-glycine (C20:4-Gly), C20:4-Phe, and C18:1-Phe) acted directly on mitochondria, possibly by interaction with mitochondrial transporter proteins, such as SLC25A4 and SLC25A5. Of note, SLC25A4 and SLC25A5 exhibit ADP/ATP symport activity, but are also thought to translocate protons across the inner membrane. Finally treatment of obese mice with C18:1-Leu induced weight loss through the reduction of fat mass and improved glucose tolerance tests.

In the 1930s, the mitochondrial uncoupling 2,4 dinitrophenol (DNP) was used in diet pills to stimulate metabolism and promote weight loss and actually it can still be purchased on the internet for this purpose. Though quite efficient in terms of weight loss, this drug has severe side effects. It can cause an excessive rise in body temperature due to the heat produced during uncoupling. DNP overdose causes fatal hyperthermia, with body temperature rising to as high as 44oC shortly before death. Will N-acyl-amino acids become a new, this time innocuous, treatment of choice for obesity? It’s difficult to anticipate. Chronic treatment of mice with C18:1-Phe or C20:4-Gly not only increases energy expenditure, with no effects on movement, but also reduces food intake, which obviously also contributes to weight loss. However, several N-acyl-amino acids have other biological functions, besides respiratory uncoupling, and hence may have other (undesirable?) effects. Nevertheless the study of Long et al. sheds light on new endogenous mitochondrial uncouplers and new thermogenic mechanisms that are undoubtedly worth further investigation.

As of this release, PM20D1 entries have been updated and are publicly available.

UniProtKB news

Cross-references to DisGeNET

Cross-references have been added to DisGeNET, a discovery platform for the dynamical exploration of human diseases and their genes.

DisGeNET is available at http://www.disgenet.org.

The format of the explicit links is:

Resource abbreviation DisGeNET
Resource identifier Gene identifier (corresponding to GeneID gene identifier)

Example: P02649

Show all entries having a cross-reference to DisGeNET.

Text format

Example: P02649

DR   DisGeNET; 348; -.

XML format

Example: P02649

<dbReference type="DisGeNET" id="348"/>

RDF format

Example: P02649

uniprot:P02649
  rdfs:seeAlso <http://identifiers.org/ncbigene/348> .
<http://identifiers.org/ncbigene/348>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/DisGeNET> .

Cross-references to OpenTargets

Cross-references have been added to OpenTargets. This Target Validation platform brings together information on the relationships between potential drug targets and diseases. The core concept is to identify evidence of an association between a target and disease from various data types.

OpenTargets is available at https://www.targetvalidation.org/.

The format of the explicit links is:

Resource abbreviation OpenTargets
Resource identifier Gene identifier (corresponding to Ensembl gene identifier)

Example: P15056

Show all entries having a cross-reference to OpenTargets.

Text format

Example: P15056

DR   OpenTargets; ENSG00000157764; -.

XML format

Example: P15056

<dbReference type="OpenTargets" id="ENSG00000157764"/>

RDF format

Example: P15056

uniprot:P15056
  rdfs:seeAlso <http://purl.uniprot.org/opentargets/ENSG00000157764> .
<http://purl.uniprot.org/opentargets/ENSG00000157764>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/OpenTargets> .

Change of the cross-references to PhosphoSite

The PhosphoSite resource has changed its name to PhosphoSitePlus and we have updated our cross-references to reflect this name change.

Change of the cross-references to SMR

We have modified our cross-references to the SWISS-MODEL Repository (SMR) database. These cross-references used to indicate the sequence ranges of the UniProt canonical sequence that can be modelled with high confidence. This information is now no longer available in our cross-references, but you can get the most up-to-date data in SMR which is now updated weekly for several model organisms, or by triggering yourself the update of a specific entry in SMR.

Text format

Example: Q00362

Previous format:

DR   SMR; Q00362; 4-376, 492-523.

New format:

DR   SMR; Q00362; -.

XML format

Example: Q00362

Previous format:

<dbReference type="SMR" id="Q00362">
  <property type="residue range" value="4-376, 492-523"/>
</dbReference>

New format:

<dbReference type="SMR" id="Q00362"/>

RDF format

Example: Q00362

Previous format:

uniprot:Q00362
  rdfs:seeAlso <http://purl.uniprot.org/smr/Q00362> .
<http://purl.uniprot.org/smr/Q00362>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/SMR> ;
  rdfs:comment "4-376, 492-523" .

New format:

uniprot:Q00362
  rdfs:seeAlso <http://purl.uniprot.org/smr/Q00362> .
<http://purl.uniprot.org/smr/Q00362>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/SMR> .

Change of RDF representation of the cross-references to PDB

We have modified the representation of our cross-references to PDB. These cross-references indicate the sequence ranges of the UniProt canonical sequence that are covered by a PDB structure when this data is available. This piece of information was provided via a reification of the cross-reference statement and each range was represented with a chain property that had a string literal value. We have introduced a new chainSequenceMapping property to simplify this description.

Example: P00750

Previous format:

uniprot:P00750
  rdfs:seeAlso <http://rdf.wwpdb.org/pdb/1A5H> .

<http://rdf.wwpdb.org/pdb/1A5H>
  rdf:type up:Structure_Resource ;
  up:database <http://purl.uniprot.org/database/PDB> ;
  up:method up:X-Ray_Crystallography ;
  up:resolution "2.90"^^xsd:float .

<#_5030303735300036>
  rdf:type rdf:Statement ;
  rdf:type up:Structure_Mapping_Statement ;
  rdf:subject uniprot:P00750 ;
  rdf:predicate rdfs:seeAlso ;
  rdf:object <http://rdf.wwpdb.org/pdb/1A5H> ;
  up:chain "A/B=311-562" ,
           "C/D=298-304" .

New format:

uniprot:P00750
  rdfs:seeAlso <http://rdf.wwpdb.org/pdb/1A5H> .

<http://rdf.wwpdb.org/pdb/1A5H>
  rdf:type up:Structure_Resource ;
  up:database <http://purl.uniprot.org/database/PDB> ;
  up:method up:X-Ray_Crystallography ;
  up:resolution "2.90"^^xsd:float ;
  up:chainSequenceMapping isoform:P00750-1#PDB_1A5H_tt311tt562 ,
                          isoform:P00750-1#PDB_1A5H_tt298tt304 .

isoform:P00750-1#PDB_1A5H_tt311tt562
  up:chain "A/B=311-562" .

isoform:P00750-1#PDB_1A5H_tt298tt304
  up:chain "C/D=298-304" .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • Hydroxylated arginine
  • N6-(beta-hydroxybutyrate)lysine

UniProt website news

Web browser support update

UniProt strives to support all major web browsers up to the oldest version that is supported by the browser developers. Since Microsoft stopped the support for Internet Explorer versions older than 11 in January 2016, we have dropped the support for these versions from UniProt release 2016_10.

We recommend to use one of the following major web browsers for the UniProt website:

  • Internet Explorer 11+
  • FireFox 45+
  • Chrome (latest update)
  • Safari 9+

Please note that for older versions of these browsers certain features of the website may not be available (you can check here which browser version you are using).

UniProt release 2016_09

Published October 5, 2016

Headline

Ki-67: the great leap from simple marker to functional actor

A marker is 'something (such as a sign or an object) that shows the location, the presence or the existence of something'. Used daily in laboratories worldwide, from basic research to clinics, markers are a scientist/practitioner’s best friend and the community continuously seeks new markers, notably for improving diagnosis and prognosis in medicine. Take for instance Ki-67. This protein, encoded by the MKI67 gene, is present during all active phases of the cell cycle, G1, S, G2, and mitosis, but is absent from resting G0 cells. During interphase, it is predominantly present in the cortex and dense fibrillar components of the nucleolus. During mitosis, it relocates to the periphery of the condensed chromosomes. It is a widely used marker for cell proliferation, very valuable in cancer diagnosis and prognosis. In this case, the term “widely” seems an understatement. A search in the NCBI PubMed database retrieves over 22’200 publications, but hardly any deal with its actual function. Indeed, while Ki-67 association with cellular proliferation is well established, its precise role in this process was unknown until recently. It was quite tempting to suggest that it is ‘required for maintaining cell proliferation’, as it was cautiously stated in the human UniProtKB/Swiss-Prot entry. However, a marker is just a marker and drawing any functional conclusion from expression levels may be hazardous.

At the very beginning of mitosis, chromosomes are compacted into thick fibers. After nuclear envelope breakdown (NEBD), chromosomes separate from one another in the cytoplasm, attach to the mitotic spindle and align along the center of the cell during metaphase. The spindle pulls a set of chromosomes to each pole of the dividing cell. How do chromosomes maintain their structural individuality during this process? As the molecules responsible for chromosome compaction are by themselves unable to distinguish different chromosomes, what are the factors that prevent chromosome coalescence?

Earlier this year, Cuylen et al. tackled this issue. Using automated live-cell imaging, the authors analyzed the effect of removing different proteins from cells. Out of almost 1,300 candidate genes, the knockdown of only one caused the sought-after chromosome clustering phenotype: MKI67. The internal structure of mitotic chromosomes appeared unaffected by Ki-67 depletion, but soon after NEBD, chromosomes merged into a single mass of chromatin, whose access to spindle microtubules was impaired.

Ki-67 is a large, about 3’000 amino acid long, protein that localizes at the chromosome surface from prophase until telophase, as mentioned above. Cuylen et al. show that the protein’s adsorption at the chromosome surface is mediated by its C-terminal region. The elongated N-terminal portion orients perpendicular to the chromosomes, a little like bristles on a brush. Ki-67 size and overall electric charge may form a repulsive shield, preventing coalescence. The range of Ki-67-mediated chromosome repulsion seems to depend on molecular density. When Ki-67 was overexpressed, mitotic chromosomes were spaced further apart.

Hence natural proteins seem to be able to act as surfactants in intracellular compartmentalization. It would be interesting to investigate whether it is also the case for membrane-less organelles, such as nucleoli, with which Ki-67 was also shown to be associated.

As of this release, the human Ki-67 entry has been updated in UniProtKB/Swiss-Prot and is publicly available.

UniProtKB news

Change of RDF representation of the cross-references to family and domain databases

We have modified the representation of our cross-references to family and domain databases. These cross-references indicate the number of matches of the family or domain signature to the UniProt canonical sequence, and this piece of information was provided via a reification of the cross-reference statement. We have introduced a new Signature_Resource class with a signatureSequenceMatch property to describe each match as a resource and thereby simplify this description.

Example: A0AVT1

Previous format:

uniprot:A0AVT1
  rdfs:seeAlso <http://purl.uniprot.org/pfam/PF00899> .

<http://purl.uniprot.org/pfam/PF00899>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Pfam> ;
  rdfs:comment "ThiF" .

<#_4130415654310021>
  rdf:type rdf:Statement ;
  rdf:type up:Domain_Assignment_Statement ;
  rdf:subject uniprot:A0AVT1 ;
  rdf:predicate rdfs:seeAlso ;
  rdf:object <http://purl.uniprot.org/pfam/PF00899> ;
  up:hits 2 .

New format:

uniprot:A0AVT1
  rdfs:seeAlso <http://purl.uniprot.org/pfam/PF00899> .

<http://purl.uniprot.org/pfam/PF00899>
  rdf:type up:Signature_Resource ;
  up:database <http://purl.uniprot.org/database/Pfam> ;
  rdfs:comment "ThiF" .
  up:signatureSequenceMatch isoforms:A0AVT1-1#Pfam_PF00899_match_1 ,
                            isoforms:A0AVT1-1#Pfam_PF00899_match_2 .

Change of RDF representation of the cross-references to EMBL

We have modified the representation of our cross-references to nucleotide CoDing Sequences (CDS) from the INSDC. When a CDS differs substantially from a reviewed UniProtKB/Swiss-Prot sequence, the UniProt curators indicate the nature of the difference in the corresponding cross-reference. This piece of information was provided via a reification of the cross-reference statement. We have introduced a new sequenceDiscrepancy property to simplify this description.

Example: P30154

Previous format:

uniprot:P30154
  rdfs:seeAlso <http://purl.uniprot.org/embl-cds/BAG59103.1> .

<http://purl.uniprot.org/embl-cds/BAG59103.1>
  rdf:type up:Nucleotide_Resource ;
  up:database <http://purl.uniprot.org/database/EMBL> ;
  up:locatedOn <http://purl.uniprot.org/embl/AK296455> .

<#_503330313534001A>
  rdf:type rdf:Statement ;
  rdf:type up:Nucleotide_Mapping_Statement ;
  rdf:subject uniprot:P30154 ;
  rdf:predicate rdfs:seeAlso ;
  rdf:object <http://purl.uniprot.org/embl-cds/BAG59103.1> ;
  rdfs:comment "Frameshift." .

New format:

uniprot:P30154
  rdfs:seeAlso <http://purl.uniprot.org/embl-cds/BAG59103.1> .

<http://purl.uniprot.org/embl-cds/BAG59103.1>
  rdf:type up:Nucleotide_Resource ;
  up:database <http://purl.uniprot.org/database/EMBL> ;
  up:locatedOn <http://purl.uniprot.org/embl/AK296455> ;
  up:sequenceDiscrepancy uniprot:P30154#EMBL_BAG59103.1 .

uniprot:P30154#EMBL_BAG59103.1
  rdfs:comment "Frameshift." .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniProt release 2016_08

Published September 7, 2016

Headline

Butterfly fashion: all they need is cortex

Butterfly and moth wing patterns fulfill various functions, such as mate attraction, thermal regulation, and protection by concealment, mimicry or warning. Patterns are produced by a dust-like layer of tiny colored scales that cover an otherwise transparent membrane. Scales can be pigmented with melanins resulting in black and brown colors. Blue, red and iridescence are usually created by the microstructure of the scales, resulting in the scattering of light. Each scale is produced by a single cell on the wing surface.

Wing pattern and color can change in order to adapt to environmental changes. The classical example of such a phenomenon is provided by Biston betularia. This moth used to camouflage itself against lichen-covered tree trunks. Its peppered white wings makes it almost invisible on this background. With the advent of the industrial revolution in the 19th century in Britain, trunks turned soot black and so did Biston betularia. The new melanic morph was described for the first time in Manchester in 1848 and called carbonaria. It spread all over England and its frequency was over 90% in the 1950s. Several years after the Clean Air Act, in the early 1970s, its frequency started to drop again and nowadays the maximum is evaluated less than 50% and in most places below 10%.

The mutation that gave rise to Biston betularia industrial melanism has just been identified. It is the insertion of a large, tandemly repeated, transposable element into the first intron of the cort gene, which results in increased gene expression. The transposition event is thought to have occurred around 1819, which is consistent with the historical record. Surprisingly, the cort gene does not encode a transcription factor that would be involved in the expression of pigmentation genes. Its only known function has been reported in Drosophila, where the cort-encoded protein cortex is a cell-cycle regulator, required for the completion of meiosis in oocytes. In Heliconius numata tarapotensis and Heliconius melpomene rosina, 2 butterfly species, cortex is expressed in final instar larval hindwing discs, in regions fated to become black in the adult wing. Although cortex function in the regulation of pigmentation patterning is yet unknown, the current hypothesis is that it may regulate scale cell development.

In other latitudes, butterflies escape from predators not by concealment, but by warning that they are unpalatable with bright and distinctive wing colors. Within a given area, experienced birds have been “educated” to avoid certain patterns. This pattern recognition varies upon geographical locations. As a result, in a given area, a number of butterfly species, edible or not, mimic each other and have the same color pattern, even though they may be only distantly related, while Lepidopteria of the same species found in other locations may exhibit very different patterns. A recent study focused on different Heliconius species living in South America. The result was quite striking. In these species too, the cort gene appeared to be a major regulator of color and pattern. This result suggests that the recruitment of cortex to wing patterning may have occurred before the major diversification of the Lepidoptera. This gene has repeatedly been targeted by natural selection to generate both cryptic, as in Biston betularia, and aposematic, as in Heliconius genus, patterns.

As of this release, UniProtKB/Swiss-Prot Biston betularia, Heliconius melpomene and Heliconius erato cortex entries have been updated with this new knowledge and are publicly available.

UniProtKB news

Cross-references to Conserved Domains Database

Cross-references have been added to the Conserved Domains Database (CDD), a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.

CDD is available at https://www.ncbi.nlm.nih.gov/cdd.

The format of the explicit links is:

Resource abbreviation CDD
Resource identifier CDD identifier
Optional information 1 CDD model name
Optional information 2 Number of hits

Example: Q196W5

Show all entries having a cross-reference to CDD.

Text format

Example: Q196W5

DR   CDD; cd04278; ZnMc_MMP; 1.

XML format

Example: Q196W5

<dbReference type="CDD" id="cd04278">
  <property type="entry name" value="ZnMc_MMP"/>
  <property type="match status" value="1"/>
</dbReference>

RDF format

Example: Q196W5

uniprot:Q196W5
  rdfs:seeAlso <http://purl.uniprot.org/cdd/cd04278> .
<http://purl.uniprot.org/cdd/cd04278>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CDD> ;
  rdfs:comment "ZnMc_MMP" .

Change of the cross-references to VectorBase

We have modified our cross-references to the VectorBase database. We now use the VectorBase Transcript identifier as the primary resource identifier, while showing the VectorBase Protein and Gene identifiers in additional fields.

VectorBase is available at http://vectorbase.org.

The new format of the explicit links is:

Resource abbreviation VectorBase
Resource identifier Transcript identifier
Optional information 1 Protein identifier
Optional information 2 Gene identifier

Example: A7UVJ5

Show all entries having a cross-reference to VectorBase.

Text format

Example: A7UVJ5

Previous format:

DR   VectorBase; AGAP001789. Anopheles gambiae.

New format:

DR   VectorBase; AGAP001789-RA; AGAP001789-PA; AGAP001789.

XML format

Example: A7UVJ5

Previous format:

<dbReference type="VectorBase" id="AGAP001789">
  <property type="organism name" value="Anopheles gambiae"/>
</dbReference>

New format:

<dbReference type="VectorBase" id="AGAP001789-RA">
  <property type="protein sequence ID" value="AGAP001789-PA"/>
  <property type="gene ID" value="AGAP001789"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: A7UVJ5

Previous format:

uniprot:A7UVJ5
  rdfs:seeAlso <http://purl.uniprot.org/vectorbase/AGAP001789> .
<http://purl.uniprot.org/vectorbase/AGAP001789>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/VectorBase> ;
  rdfs:comment "Anopheles gambiae" .

New format:

uniprot:A7UVJ5
  rdfs:seeAlso <http://purl.uniprot.org/vectorbase/AGAP001789-RA> .
<http://purl.uniprot.org/vectorbase/AGAP001789-RA>
  rdf:type up:Transcript_Resource ;
  up:database <http://purl.uniprot.org/database/VectorBase> ;
  up:translatedTo <http://purl.uniprot.org/vectorbae/AGAP001789-PA> ;
  up:transcribedFrom <http://purl.uniprot.org/vectorbase/AGAP001789> .

Change of the cross-references to WormBase

Cross-references to WormBase may now be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Example: G5EG14

Changes to the controlled vocabulary of human diseases

New diseases:

UniProt website news

Peptide search tool

We have introduced a new tool called Peptide search that is available from a link in the header of the UniProt website. You can enter one or several peptide sequences (for example from a proteomics experiment) into the search field and the tool quickly finds all UniProtKB sequences that exactly match one of your query sequences. Searches can be restricted to a taxonomic subset of UniProtKB to decrease the search time. The tool returns a results page showing the matched UniProtKB entries in a design consistent with the UniProtKB text search results page, including filters on the left, results on the right and an option to customise the results table through the ‘Columns’ button.

Publications view added to UniProtKB entries

UniProt Knowledgebase (UniProtKB) protein entries now have a dedicated view of publications relevant for a protein. UniProtKB contains more than 350,000 unique publications, with over 210,000 of these fully curated in UniProtKB/Swiss-Prot and the remainder imported in UniProtKB/TrEMBL. This set is complemented by more than 640,000 additional publications that have been computationally mapped from other resources to UniProtKB entries. The publications annotated in UniProtKB have previously been displayed in the main ‘Entry’ view and a link provided access to a separate page that listed the computationally mapped publications. We have now combined all publications into a new ‘Publications’ view that can be accessed from a link under the ‘Display’ heading on the left hand side of a UniProtKB page. In this view you can filter the publications list by source and categories that are based on the type of data a publication contains about the protein (such as function, interaction, sequence, etc.) or the number of proteins it describes (‘small scale’ vs ‘large scale’), see for example P10276.

UniProt release 2016_07

Published July 6, 2016

Headline

(Bacterial) immigration under control

Essentially all our mucosal surfaces are covered by microorganisms, not only bacteria, but also archaea, fungi, protozoans and viruses. Most of them reside within the gastrointestinal tract. Normal gut flora is largely responsible for overall health of the host and it does not trigger any inflammatory response... as long as it remains where it belongs. In order to maintain a subtle, though strict segregation, the colonic epithelium is covered by mucus. The latter is organized in 2 layers. The inner layer adheres firmly to the epithelial cells. It is dense and does not allow bacterial penetration, thus keeping the epithelial cell surface free from bacteria. The outer layer is the habitat of the commensal flora. The inner mucus layer is converted into the outer layer by proteolytic activities provided by the host and also probably by commensal bacterial proteases and glycosidases.

Colonic quietness is not only maintained by the mucus physical barrier, the immune system plays also a crucial role, among others, through the secretion of IgA into the gut lumen. These dimeric immunoglobulins bind flagellin, a highly conserved protein component of the bacterial flagellum that is expressed by many different commensal species. This interaction limits the association of flagellated bacteria with the intestinal mucosa. The mechanism leading to IgA production by B cells in this context is not yet fully uncovered, but it is known that flagellin is sensed by at least 3 different innate immune receptors, including TLR5, which plays an instrumental role in this process.

In this peaceful, though cautious cohabitation, another host protein actor has been recently identified, LYPD8. In the absence of LYPD8, bacteria penetrate the inner mucus layer despite normal mucin production, the main building block of mucus, and further into the crypts of the large intestine, causing severe inflammation. LYPD8 is membrane protein, attached to the plasma membrane through a glycophosphatidylinositol (GPI) anchor. It is selectively expressed in epithelial cells at the uppermost layer of the large intestinal gland and can be released into the gut lumen by the action of specific phospholipases. Once in the extracellular milieu, it binds to flagellated bacteria, including Proteus mirabilis. Contrary to TLR5, this interaction seems to be specific to flagella, a higher order structure comprised of polymerized flagellins, not to monomeric flagellins. This binding severely impairs bacterial swarming activity, thereby regulating gut homeostasis.

Until these recent observations, nothing was known about LYPD8. It had only been identified through large scale cDNA and genome sequencing. The sole annotations provided in UniProtKB were based on protein domain predictions, including that of the GPI anchor (UPAR/Ly6 domain) and of the signal peptide. As of this release, LYPD8 entries have been updated with this new functional information and are publicly available.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Ciliary dyskinesia, primary, 31
  • Jensen syndrome
  • Mental retardation, autosomal dominant 12
  • Thiopurine S-methyltransferase deficiency

UniProt release 2016_06

Published June 8, 2016

Headline

Strength through unity

Reversible phosphorylation of proteins is a fundamental regulatory mechanism for many processes across a wide range of taxa. It has been extensively studied in the context of intracellular events in the nucleus and in the cytoplasm. Less is known about extracellular phosphorylation, but a family of secretory pathway kinases has been identified within the Golgi apparatus and in the extracellular milieu in recent years. Among them, FAM20C has been shown to phosphorylate many secreted proteins involved in biomineralization, including enamel matrix proteins, such as AMBN, AMELX, AMTN and ENAM. The importance of extracellular phosphorylation in bone physiology is further supported by the observation that mutations in FAM20C are associated with Raine syndrome, an autosomal recessive osteosclerotic bone dysplasia with a neonatal lethal outcome.

FAM20A, FAM20C’s closest paralog, exhibits all characteristics of a kinase, except for one residue, a conserved glutamic acid residue which is replaced by a glutamine, causing a loss of enzyme activity. This is not a characteristic unique to FAM20A. About 10% of the proteins classified as protein kinases lack some of the key features required for activity. They are called “pseudokinases”. In spite of its lack of activity, mutations in FAM20A also produce a defect in biomineralization, namely amelogenesis imperfecta 1G.

This apparent paradox was solved by Cui et al. last year. They showed that in the absence of FAM20A, FAM20C activity dramatically drops. Moreover, FAM20A mutants associated with amelogenesis imperfecta 1G fail to activate FAM20C. The proteins have to form a complex for full FAM20C activity.

Kinases are synthesized as inactive proteins. Classically, their activation is achieved through the phosphorylation of a domain called the “activation loop” which induces a conformational change. FAM20C does not have an activation loop that could be phosphorylated. Yet another kind of activation, called “allosteric activation”, has already been reported for kinase-pseudokinase pairs. In this model, it is the pseudokinase binding that induces the shape change of the bona fide kinase into its active conformation. Although the exact mechanism of FAM20C activation is still unclear, experimental results suggest that it may join the growing list of kinases regulated by dimerization-induced allostery.

FAM20A and FAM20B are quite old enzymes, evolutionarily related to kinases found in bacteria and slime molds. The fact that they do not use activation loop phosphorylation suggests that the allosteric mode of kinase activation may be very ancient, before the activation loop evolved. The presence of many conserved pseudokinases in the genomes of higher organisms suggests that allosteric activation may still be an efficient regulatory mechanism.

As of this release, FAM20A and FAM20C have been updated and are publicly available.

UniProtKB news

Removal of the cross-references to NextBio

Cross-references to NextBio have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Deleted diseases

  • Epilepsy, progressive myoclonic 5

RDF news

Change of URIs for neXtProt

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the neXtProt database have therefore been updated from:

http://purl.uniprot.org/nextprot/<ID>

to:

http://nextprot.org/rdf/entry/<ID>

If required for backward compatibility, you can use the following query to add the old URIs:

PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX up:<http://purl.uniprot.org/core/>
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/neXtProt> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/neXtProt> .
   BIND(iri(concat('http://purl.uniprot.org/nextprot/', substr(str(?new),31))) AS ?old)
}

The dereferencing of existing http://purl.uniprot.org/nextprot/<ID> URIs will be maintained.

UniProt release 2016_05

Published May 11, 2016

Headline

Slow/White and the 6 DWORFs

Striated muscle function relies on a cycle of contraction and relaxation. Upon electrical stimulation of the myocyte plasma membrane, Ca(2+) is released from the sarcoplasmic reticulum (SR) into the cytosol. The released calcium activates movement of the molecular motor myosin along actin filaments and contraction occurs. Cytosolic Ca(2+) is then pumped back into the SR, through the action of SERCA proteins, allowing actomyosin relaxation. The SERCA proteins are SR-resident transmembrane ATPases, that couple the hydrolysis of ATP with Ca(2+) translocation.

Recent studies have highlighted a role for a network of (very) small ORFs (smORFs) in SERCA regulation. The first members of this exclusive but growing club were phospholamban (PLN, 52 amino acids) and sarcolipin (SLN, 31 amino acids), which were both isolated by classical biochemical approaches decades ago. Both bind SERCA and reduce the rate of calcium movement in heart and slow skeletal muscle fibers. More recently the SERCA inhibitory micropeptide myoregulin (MRLN, 46 amino acids), was identified in fast muscle fibers by Anderson et al. These authors started by screening for skeletal muscle-specific RNAs and discovered MRLN in an apparent long non-coding RNA (lncRNA). Encouraged by this discovery, Olson lab members continued to look for smORFs in other muscle-specific lncRNAs and found DWORF (34 amino acids), encoded by 2 exons of a 795 bp-long transcript; very difficult to predict using current software. In mouse myocytes, DWORF expression stimulates Ca(2+) uptake in the SR, not by direct activation of SERCA, but rather by relieving MRLN-, PLN- and SLN-mediated inhibition. DWORF expression may be particularly beneficial for recovery from periods of prolonged contraction.

SERCA regulation by micropeptides encoded in supposed lncRNAs is not a vertebrate-specific phenomenon. In Drosophila melanogaster, a single muscle-specific transcript encodes 2 smORFs related to sarcolipin, sarcolamban A and B (SCLA, 28 amino acids, and SCLB, 29 amino acids). Computer simulations predicted that both peptides fit the groove of SERCA, and this has been experimentally verified. While mutant flies deficient in sarcolamban showed no behavioral or morphological muscle phenotype, they do exhibit significantly more arrhythmic cardiac contractions than wild-type flies.

The idea that smORFs may be overlooked in the current genome annotation is not new, and these recent advances in muscle physiology underscore the likelihood that many transcripts annotated as noncoding RNAs may actually encode peptides with important biological functions. These smORFs could represent fast-evolving key regulators of larger molecular complexes. They also highlight the need for expert biocuration to make these data available in databases, as they cannot be automatically predicted, retrieved, nor annotated at the current time.

The 6 dworfs have been curated and integrated into UniProtKB/Swiss-Prot and we continue to survey the literature for other hidden micropeptide treasures (motivated solely by biological interest and not by our desire to find a seventh member for the purposes of this headline).

UniProtKB news

Cross-references to SIGNOR

Cross-references have been added to SIGNOR, the Signaling Network Open Resource, a resource that organizes and stores, in a structured format, signaling information published in the scientific literature. The core of this project is a large collection of manually-annotated causal relationships between proteins that participate in signal transduction.

SIGNOR is available at http://signor.uniroma2.it/.

The format of the explicit links is:

Resource abbreviation SIGNOR
Resource identifier UniProtKB accession number.

Example: P00533

Show all entries having a cross-reference to SIGNOR.

Text format

Example: P00533

DR   SIGNOR; P00533; -.

XML format

Example: P00533

<dbReference type="SIGNOR" id="P00533"/>

RDF format

Example: P00533

uniprot:P00533
  rdfs:seeAlso <http://purl.uniprot.org/signor/P00533> .
<http://purl.uniprot.org/signor/P00533>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/SIGNOR> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniProt website news

Change of UniProt website job identifiers

To enable a more flexible and scalable infrastructure, we have extended the length of the UniProt website’s job identifiers.

Example:

M201604052M3YWGETHB

has become:
M2016040537D007A56D816107CE5B52C10342DB3700000452

We will continue to store job results for 7 days.

UniProt release 2016_04

Published April 13, 2016

Headline

Small changes, big effects

Our brain has the ability to reorganize itself by forming new neural connections throughout life. This plasticity allows neurons to adjust their activities in response to new situations, to changes in their environment, and to compensate for injury and disease. Plasticity is not only due to the creation/destruction of neuronal connections, but also to the modulation of synaptic strength depending upon its activity, a process called ‘short-term synaptic plasticity’ (STP). There are 2 types of STP, with opposite effects, known as ‘depression’ and ‘facilitation’. When neurons receive excitatory input, they generate strong electrical impulses (called spikes) which cause a release of neurotransmitters at the synaptic connections with other neurons. The neurotransmitters stimulate receptors on the postsynaptic neuron and trigger downstream electrical impulses. Action potential activity leads to the depletion of neurotransmitters consumed during the synaptic signaling process at the axon terminal of a presynaptic neuron, causing ‘depression’. It also induces an influx of calcium into the axon terminal. The calcium accumulation increases neurotransmitter release by the next presynaptic spike, facilitating synaptic transmission and temporarily potentiating the synapse (‘facilitation’).

Facilitation is important for the proper function of mammalian brains. It may form the basis of short-term working memory. In the hippocampus, it has been proposed to play a role in the acquisition of spatial information. In the auditory pathway, it allows the maintenance of linear transmission of rate-coded sound intensity.

Although synaptic facilitation was observed more than 70 years ago, the underlying mechanism is not yet fully elucidated. However, a major breakthrough was recently achieved and published in January in Nature. In their article, Jackman et al. identified a synaptotagmin-7 (SYT7) requirement for facilitation to occur in most central synapses. SYT7 is a calcium- and phospholipid-binding protein involved in the exocytosis of many secretory and synaptic vesicles. In SYT7-knockout mice, facilitation was eliminated at all synapses (except for mossy fiber synapses), although calcium influx was not affected by the mutation.

To rule out an indirect effect of SYT7 knockout, the authors tried to rescue facilitation through viral expression of SYT7 in hippocampal CA3 pyramidal cells. To do so, they used an adeno-associated virus that drove bicistronic expression of both channelrhodopsin-2 and SYT7. Channelrhodopsins are unicellular green algae proteins that serve as sensory photoreceptors. When expressed in the experimental setting established by Jackman et al., they enabled light to control electrical excitability only in the fibers expressing SYT7. The result was clear-cut: facilitation was restored. The identification of a protein required for synaptic facilitation may pave the way for future investigations on the functional role of this process.

As of this release, SYT7 proteins have been updated in UniProtKB/Swiss-Prot and are publicly available.

UniProtKB news

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Inclusion body myopathy 2

UniProt service news

New UniProt JAPI

We have developed a new version of the UniProt JAPI. The legacy UniProt JAPI will be retired as of Wednesday, April 13th 2016. If you have any questions or concerns, please feel free to contact us at helpstuff@unipstuffrot.org.

UniProt RDF news

Change of the UniProt RDF files distribution

The UniProt RDF distribution has been available on the UniProt FTP site since 2008 with data split into one file per dataset. Over time the size of the largest files has grown to over 80 Gigabytes. These large files are difficult to download and they also limit the maximum rate at which the data can be loaded into many RDF stores. We have therefore split the files of the three biggest datasets into sets of smaller files:

  • The UniProtKB dataset is split based on taxonomy and whether entries are active or not. The resulting files contain at most 1 million active or 10 million obsolete entries.
  • The UniRef dataset is split into files that contain at most 1 million entries.
  • The UniParc dataset is split into files of approximately 1 Gigabyte in size.

We also reduced the data redundancy between the datasets to further decrease the total data volume:

  • The UniProtKB dataset has always been fully normalized with respect to the taxonomy dataset and it is now also normalized with respect to the keywords, GO and citations datasets. The total number of unique triples across these datasets remains the same, but it means that if you have so far only loaded the UniProtKB and taxonomy RDF files into your RDF store, you must now also load the keywords.rdf.xz, go.owl.xz and citations.rdf.xz files in order to have the same data.
  • The UniRef dataset has been normalized with respect to the UniProtKB and UniParc datasets. It now only describes the UniRef cluster memberships. The sequence and entry information of UniProtKB and UniParc member entries is no longer repeated in the UniRef RDF files.

UniProt release 2016_03

Published March 16, 2016

Headline

From the Zika forest to the Amazon, news from a viral wanderer

In 2015, a large outbreak in Brazil put the Zika virus in the spotlight. Most people who become infected with Zika virus do not become sick and for those who do, the illness is generally mild. However, in some cases, complications can be quite severe. In addition, microcephaly has been reported in some babies born to mothers infected with Zika virus during pregnancy, pointing to the virus as an emerging human pathogen.

Although the Zika virus owes its worldwide infamy to its wandering to the Western hemisphere, it has been circulating in Africa for a long time before. It was first discovered in Uganda, in 1947 in rhesus monkeys living in the Zika Forest (after which it was named), and subsequently in humans in 1952. It is an RNA virus of the flavivirus genus, which also includes dengue, yellow fever and West Nile viruses. Like its relatives, it is transmitted by Aedes mosquitoes originally in endemic regions of central Africa. Taking advantage of modern means of transportation, it started spreading, first in Micronesia in 2007, then French Polynesia in 2013, and Brazil and Central America in 2014.

As it has long been considered insignificant, the Zika virus has not been extensively studied and most of our current knowledge has been inferred from other viruses of the same genus. The Zika virus entry into target cells can be triggered by binding to AXL and TYRO3. Interestingly, these proteins are also involved in Ebola virus and Lassa virus entry in human cells. Attachment to the host receptors is followed by internalization by a process called ‘apoptotic mimicry’ whereby the virus manages to be recognized by the target cell as an apoptotic body. After fusion of the virus membrane with the host endosomal membrane, the RNA genome is released into the cytoplasm. Flaviviruses are remarkable in that their genome encodes a single polyprotein that inserts into the endoplasmic reticulum (ER) membrane forming a complex pattern. This polyprotein is subsequently cleaved into 13 molecules by viral and host peptidases. The non-structural proteins form membrane spherules, presumably to protect the double stranded RNA intermediate of viral replication. The genomic viral RNA is replicated and translated, leading to creation of new Zika virions in the ER. The virions bud by hijacking the host endosomal sorting complex required for transport (ESCRT) system. They are transported to the Golgi apparatus, where further maturation occurs. Eventually fusion-competent virions are released by exocytosis.

As of this release, a Zika virus reference proteome has been manually curated in UniProtKB, where it can be safely visited.

A page dedicated to Zika has also been created in ViralZone to offer a global view of how this particular virus functions and provides access to other databases.

Cross-references to EPD

Cross-references have been added to EPD, the Encyclopedia of Proteome Dynamics, a resource that contains data from multiple, large-scale proteomics experiments aimed at characterising proteome dynamics in both human cells and model organisms.

EPD is available at https://www.peptracker.com/epd/analytics/.

The format of the explicit links is:

Resource abbreviation EPD
Resource identifier UniProtKB accession number.

Example: P00451

Show all entries having a cross-reference to EPD.

Text format

Example: P00451

DR   EPD; P00451; -.

XML format

Example: P00451

<dbReference type="EPD" id="P00451"/>

RDF format

Example: P00451

uniprot:P00451
  rdfs:seeAlso <http://purl.uniprot.org/epd/P00451> .
<http://purl.uniprot.org/epd/P00451>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/EPD> .

Cross-references to TopDownProteomics

Cross-references have been added to TopDownProteomics, a resource from the Consortium for Top Down Proteomics that hosts top down proteomics data presenting validated proteoforms to the scientific community.

TopDownProteomics is available at http://repository.topdownproteomics.org/.

The format of the explicit links is:

Resource abbreviation TopDownProteomics.
Resource identifier UniProtKB accession number.

Example: P10599

Show all entries having a cross-reference to TopDownProteomics.

Cross-references to TopDownProteomics may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Text format

Example: P10599

DR   TopDownProteomics; P10599-1; -. [P10599-1]
DR   TopDownProteomics; P10599-2; -. [P10599-2]

XML format

Example: P10599

<dbReference type="TopDownProteomics" id="P10599-1">
  <molecule id="P10599-1"/>
</dbReference>
<dbReference type="TopDownProteomics" id="P10599-2">
  <molecule id="P10599-2"/>
</dbReference>

RDF format

Example: P10599

uniprot:P10599
  rdfs:seeAlso <http://purl.uniprot.org/topdownproteomics/P10599-1> ,
    <http://purl.uniprot.org/topdownproteomics/P10599-2> .

<http://purl.uniprot.org/topdownproteomics/P10599-1>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/TopDownProteomics> .
<#_5030303735300040>
  rdf:type rdf:Statement ;
  rdf:subject <P10599> ;
  rdf:predicate rdfs:seeAlso ;
  rdf:object <http://purl.uniprot.org/topdownproteomics/P10599-1> ;
  up:sequence isoform:P00750-1 .
<http://purl.uniprot.org/topdownproteomics/P10599-2>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/TopDownProteomics> .
<#_5030303735300040>
  rdf:type rdf:Statement ;
  rdf:subject <P10599> ;
  rdf:predicate rdfs:seeAlso ;
  rdf:object <http://purl.uniprot.org/topdownproteomics/P10599-2> ;
  up:sequence isoform:P00750-2

Changes to the controlled vocabulary of human diseases

New diseases:

UniProt release 2016_02

Published February 17, 2016

Another one (antibiotic) bites the dust

Polymyxin E (also known as colistin) and other polymyxin antibiotics are among our last-resort drugs against multi-drug resistant Gram-negative bacteria such as Klebsiella pneumoniae, Pseudomonas aeruginosa and Acinetobacter.

The initial target of polymyxin antibiotics is the lipopolysaccharide layer (LPS) of the Gram-negative bacterial outer membrane. LPS has two 2-keto-3-deoxyoctonoic acid units bound to lipid A, which itself consists of 2 glucosamine units with attached fatty acyl chains and a phosphate group on each sugar. Lipid A acts as a hydrophobic anchor, in which the tight packing of the fatty acyl chains helps to stabilize the overall outer membrane structure. The positively charged L-2,4-diaminobutyric acid residues of polymyxins interact with the negatively charged phosphate groups on lipid A. The amphipathic antibiotics are thought to form pores that permeabilize the outer membrane. The polymyxins would then insert into and disrupt the inner membrane, leading to further pore formation. There is also some evidence that polymyxins have other intracellular targets.

As the initial contact of polymyxin antibiotics is with lipid A, resistance often occurs via its modification, frequently masking its negative charge. Before August 2015 a number of chromosomal resistance loci were known, but no resistance had been identified on a more easily transferred plasmid. During a routine surveillance of commensal Escherichia coli for antibiotic resistance, scientists in China identified mcr1, a plasmid-encoded gene which encodes a protein of the phosphoethanolamine transferase family. The gene confers both colistin and polymyxin B resistance by modifying lipid A, and probably originated in Paenibacillus. This would seem logical as Paenibacillus is the natural source of polymyxin antibiotics.

The gene was first identified from a pig farm in Shanghai in July 2013. Retrospective screening of isolated E.coli plasmids in China showed an alarming rise in its presence in pork, ranging from 6% in 2011 to 22% in 2014. The gene has also been detected in chicken meat in China, rising from 5% in 2011 to 28% in 2014. Screening hospital inpatients in 2014 showed both E.coli and K.pneumoniae mcr1-containing plasmid; 1.4% from E.coli, 0.7% from K.pneumoniae. The gene was also detected in E.coli genomes from Malaysia. An in situ test in mice showed that the gene was indeed able to confer colistin resistance. The original plasmid can transfer to other E.coli cells via conjugation, but only via transformation into K.pneumoniae or P.aeruginosa; it is stable in the absence of selective pressure.

Since the publication of the paper identifying mcr1 on-line November 15, 2105, numerous papers have appeared reporting retrospective screening for the gene. So far its earliest isolation is from a French calf in 2005, in which a worrying co-localization with a wide-spectrum beta-lactamase resistance gene was also reported. The gene has been found in human fecal samples dating from 2012 on, in Europe, Africa, South America and Asia. It was found in E.coli isolated from pigs in Germany in 2010, from Belgian calves in 2011-2012, in European food samples from June 2011 on, and from animal feces in Asia. The gene is not always isolated from the same plasmid background, and mcr1 is often associated with mobile genetic elements, probably aiding its dispersal.

In short, the gene has been slowly spreading around the world since before we were even aware of its existence. Colistin has been used in agriculture since the 1950s and is widely used in China, which is probably contributing to its steady dissemination. There are increasingly urgent calls for its agricultural use to be reevaluated before resistance spreads even further.

As of this release, Mcr-1 has been annotated and is available in UniProtKB/Swiss-Prot.

Cross-references to SwissPalm

Cross-references have been added to SwissPalm, a manually curated resource to study protein S-palmitoylation. It encompasses S-palmitoylated protein hits from more than 50 species and provides curated information and filters that increase the confidence in true positive hits. SwissPalm integrates predictions of S-palmitoylated cysteine scores, orthologs and isoform multiple alignments.

SwissPalm is available at http://swisspalm.epfl.ch/.

The format of the explicit links is:

Resource abbreviation SwissPalm
Resource identifier UniProtKB accession number.

Example: Q13530

Show all entries having a cross-reference to SwissPalm.

Text format

Example: Q13530

DR   SwissPalm; Q13530; -.

XML format

Example: Q13530

<dbReference type="SwissPalm" id="Q13530"/>

RDF format

Example: Q13530

uniprot:Q13530
  rdfs:seeAlso <http://purl.uniprot.org/swisspalm/Q13530> .
<http://purl.uniprot.org/swisspalm/Q13530>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/SwissPalm> .

Change of the cross-references to Gramene

We have modified our cross-references to the Gramene database.

The new format of the explicit links is:

Resource abbreviation Gramene
Resource identifier Transcript identifier
Optional information 1 Protein identifier
Optional information 2 Gene identifier

Cross-references to Gramene may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

The Gramene database has also been moved from the category “Organism-specific databases” to the category “Genome annotation databases”.

Example: Q10DK7

Show all entries having a cross-reference to Gramene.

Text format

Example: Q10DK7

Previous format:

DR   Gramene; Q10DK7; -.

New format:

DR   Gramene; OS03T0727600-01; OS03T0727600-01; OS03G0727600.

XML format

Example: Q10DK7

Previous format:

<dbReference type="Gramene" id="Q10DK7"/>

New format:

<dbReference type="Gramene" id="OS03T0727600-01">
  <property type="protein sequence ID" value="OS03T0727600-01"/>
  <property type="gene ID" value="OS03G0727600"/>
</dbReference>

This change does not affect the XSD, but may nevertheless require code changes.

RDF format

Example: Q10DK7

Previous format:

uniprot:Q10DK7
  rdfs:seeAlso <http://purl.uniprot.org/gramene/Q10DK7> .
<http://purl.uniprot.org/gramene/Q10DK7>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/Gramene> .

New format:

uniprot:Q10DK7
  rdfs:seeAlso <http://purl.uniprot.org/gramene/OS03T0727600-01> .
<http://purl.uniprot.org/gramene/OS03T0727600-01>
  rdf:type up:Transcript_Resource ;
  up:database <http://purl.uniprot.org/database/Gramene> ;
  up:translatedTo <http://purl.uniprot.org/gramene/OS03T0727600-01> ;
  up:transcribedFrom <http://purl.uniprot.org/gramene/OS03G0727600> .

Removal of the cross-references to GeneFarm

Cross-references to GeneFarm have been removed.

Removal of the cross-references to GenoList

Cross-references to GenoList have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases

  • Periventricular nodular heterotopia 4
  • Transposition of the great arteries dextro-looped 2

UniProt website news

UniProt feature viewer added to UniProtKB entries

UniProt provides sequence annotations, a.k.a. protein features, to describe regions or sites of biological interest; secondary structure regions, domains, post-translational modifications and binding sites among others, play a critical role in the understanding of what the protein does. With the growth in biological data, integration and visualization becomes increasingly important for exposing different data aspects that might be otherwise hidden, unclear or difficult to grasp.

Hence we are introducing the UniProt feature viewer, a BioJS component bringing together protein sequence features in one compact view. Similar to genome viewers, the viewer uses tracks to display different protein features providing an intuitive picture of co-localized elements. Each track can be expanded to reveal a more in-depth view of the underlying data. The variant track offers a novel visualization and presents UniProt curated natural variants along with imported variants from large-scale studies (such as 1000 Genomes and COSMIC).

The UniProt feature viewer is available for every UniProtKB protein entry through the ‘Feature viewer’ link under the ‘Display’ heading on the left hand side.

If you would like to include the feature viewer in your own website or resource, you can find instructions in our technical documentation.

UniProt release 2016_01

Published January 20, 2016

Headline

cGAMP, a welcome stowaway

We are often amazed by the strategies deployed by viruses to trick our defences, but our immune system does not lag behind and it can also fool viral invaders. The detection of viruses by the innate immune system relies on the detection of intracellular DNA by pattern recognition receptors, including cyclic guanosine monophosphate (GMP) adenosine monophosphate (AMP) synthase (cGAS, also called MB21D1). In response to cytosolic DNA, this enzyme synthesizes 2’3'-cyclic GMP-AMP (cGAMP), which then binds to STING (also called TMEM173), an endoplasmic reticulum transmembrane protein, leading to the activation of the type I interferon (IFN) response, thereby inducing an antiviral state.

Last year, Gentili et al. made a puzzling observation. To study cGAS function, they transduced human monocyte-derived dendritic cells with a cGAS-expressing lentivirus. As expected, the cells were strongly activated, but the stimulatory property of the cGAS-encoding lentivirus did not correlate with the transduction efficiency. This led to the hypothesis that it was not cGAS itself that was responsible for the activation of the infected cells, but some other stimulatory signal, which was transferred by the viral vector. Indeed, when dendritic cells were challenged with virus-like particles (VLPs) that did not themselves encode cGAS, but were produced in the presence of cGAS, the cells were stimulated. This effect was abolished when VLPs were produced in the presence of a catalytically inactive cGAS mutant. Concomitantly, Bridgeman et al. found that the incubation of macrophages, epithelial cells or lung fibroblasts with lentiviral particles collected from cells overexpressing cGAS led to the STING-dependent up-regulation of type I interferons and interferon-stimulated genes. All this evidence pointed to cGAMP as the stimulatory signal and indeed both groups identified the dinucleotide in the viral particles, by mass spectrometry, not only in their experimental system, but also in more physiological settings, using a herpes virus (MCMV) and a poxvirus (Modified Vaccinia Anakara virus). It is yet unclear whether the incorporation of cGAMP into virus particles is a selective host-directed process or simply a consequence of random fluid-phase uptake of cytosolic material into viral particles.

cGAMP has previously been shown to diffuse through gap junctions, thereby alerting non-infected neighboring cells to pathogen threat. The discovery by Gentili et al. and Bridgeman et al. suggests that cells located far from the initial infection site may also benefit from cGAMP transfer and initiate rapid antiviral responses bypassing the need for cGAS activation.

Although the downstream fate of the dinucleotide does not directly depend on cGAS enzyme activity, this piece of information has been introduced into cGAS entries as of this release.

Cross-references to CollecTF

Cross-references have been added to the CollecTF database of bacterial transcription factor binding sites. CollecTF stores data on experimentally-validated TFBS and places special emphasis on providing a transparent curation process that captures the experimental support for sites as reported by authors in peer-reviewed publications.

CollecTF is available at http://www.collectf.org.

The format of the explicit links is:

Resource abbreviation CollecTF
Resource identifier CollecTF identifier

Example: A0KST7

Show all entries having a cross-reference to CollecTF.

Text format

Example: A0KST7

DR   CollecTF; EXPREG_00000150; -.

XML format

Example: A0KST7

<dbReference type="CollecTF" id="EXPREG_00000150"/>

RDF format

Example: A0KST7

uniprot:A0KST7
  rdfs:seeAlso <http://purl.uniprot.org/collectf/EXPREG_00000150> .
<http://purl.uniprot.org/collectf/EXPREG_00000150>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/CollecTF> .

Cross-references to GeneDB

Cross-references have been added to GeneDB pathogen genome database from Sanger Institute. GeneDB provides access to the latest sequence data and annotation/curation for the whole range of organisms sequenced by the Sanger Pathogen group.

GeneDB is available at http://www.genedb.org.

The format of the explicit links is:

Resource abbreviation GeneDB
Resource identifier GeneDB identifier

Example: Q8WPT5

Show all entries having a cross-reference to GeneDB.

Text format

Example: Q8WPT5

DR    GeneDB; H25N7.01:pep; -.

XML format

Example: Q8WPT5

<dbReference type="GeneDB" id="H25N7.01:pep"/>

RDF format

Example: Q8WPT5

uniprot:Q8WPT5
  rdfs:seeAlso <http://purl.uniprot.org/genedb/H25N7.01:pep> .
<http://purl.uniprot.org/genedb/H25N7.01:pep>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/GeneDB> .

Cross-references to iPTMnet

Cross-references have been added to iPTMnet integrated resource for PTMs in systems biology context. iPTMnet connects multiple disparate bioinformatics tools and systems text mining, data mining, analysis and visualization tools, and databases and ontologies into an integrated resource to address the knowledge gaps in exploring and discovering PTM networks. iPTMnet database currently contains phosphorylation information.

iPTMnet is available at http://pir.georgetown.edu/iPTMnet.

The format of the explicit links is:

Resource abbreviation iPTMnet
Resource identifier UniProtKB accession number.

Example: Q15796

Show all entries having a cross-reference to iPTMnet.

Text format

Example: Q15796

DR   iPTMnet; Q15796; -.

XML format

Example: Q15796

<dbReference type="iPTMnet" id="Q15796"/>

RDF format

Example: Q15796

uniprot:Q15796
  rdfs:seeAlso <http://purl.uniprot.org/iptmnet/Q15796> .
<http://purl.uniprot.org/iptmnet/Q15796>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/iPTMnet> .

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • ADP-ribosyl aspartic acid

UniProt release 2015_12

Published December 9, 2015

Headline

Host proteins SERINC3 and SERINC5 decrease HIV-1 infectivity

It has long been known that the HIV-1 nef (“negative regulatory factor”) protein increases the infectivity of the HIV-1 virion (PMID:7981973). This mysterious protein is only found in primate lentiviruses. Its function is to manipulate the host’s cellular machinery and thus to allow infection, survival or replication of the virus. The abundant research performed on this topic has unraveled many phenotypes associated with nef, mainly in restricting host protein expression to cellular membrane. However, all these various functions have not allowed a clear understanding of the virion infectivity phenotype, although they have revealed the way HIV-1 avoids the host’s immune response.

Two recent papers in Nature have shown that nef actually prevents the incorporation of host SERINC3 and SERINC5 proteins into the HIV-1 virion. These proteins dramatically decrease virion infectivity when they are part of its membrane. This study improves the understanding of nef function in virion infectivity. The means used by nef to achieve this function are still unknown, but are related to its capacity to prevent specific host proteins from reaching the plasma membrane. Human SERINC3 and SERINC5 functions are still not well understood, but further study on these proteins will reveal their antiviral action.

As of this release, HIV-1 nef and human proteins SERINC3 and SERINC5 have been updated and are publicly available.

UniProtKB news

Displaying human UniProtKB sequence annotations in genome browser tracks

Genome browser tracks allow users to align sequence annotations to the reference genome data and genome annotations. Both UCSC and Ensembl genome browsers have custom tracks for displaying external annotations in their browsers. UniProt would like to announce the beta release of new genome tracks which allow the alignment of protein sequence annotations in our resource to a reference genome. These UniProt genome tracks include genomic locations of protein sequences and annotations such as active sites, metal binding sites, post-translational modifications, variants and domains with supporting literature evidence where available. Each species represented by the genome annotation tracks resource will have protein sequences and annotations defined by the BED and bigBed formats.
The beta release is available in the new dedicated ‘genome_annotation_tracks’ directory on the UniProt FTP site and provides tracks for human with the release of additional species in the future. UniProt would welcome your feedback on this new resource.

Cross-references to SwissLipids

Cross-references have been added to SwissLipids, a comprehensive reference database that links mass spectrometry-based lipid identifications to curated knowledge of lipid structures, metabolic reactions, enzymes and interacting proteins.

SwissLipids is available at http://www.swisslipids.org.

The format of the explicit links is:

Resource abbreviation SwissLipids
Resource identifier SwissLipids identifier

Cross-references to SwissLipids may be isoform-specific (e.g. Q08477). The general format of isoform-specific cross-references was described in release 2014_03.

Example: P52824

Show all entries having a cross-reference to SwissLipids.

Text format

Example: P52824

DR   SwissLipids; SLP:000000740; -.

XML format

Example: P52824

<dbReference type="SwissLipids" id="SLP:000000740"/>

RDF format

Example: P52824

uniprot:P52824
  rdfs:seeAlso <http://purl.uniprot.org/swisslipids/SLP:000000740> .
<http://purl.uniprot.org/swisslipids/SLP:000000740>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/SwissLipids> .

Cross-references to MalaCards

Cross-references have been added to MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and richness of the popular GeneCards database of human genes.

The MalaCards disease and disorders database is organized into “disease cards”, each integrating prioritized information, and listing numerous known aliases for each disease, along with a variety of annotations, as well as inter-disease connections.

MalaCards is available at http://www.malacards.org.

The format of the explicit links is:

Resource abbreviation MalaCards
Resource identifier Gene symbol

Example: P26439

Show all entries having a cross-reference to MalaCards.

Text format

Example: P26439

DR   MalaCards; HSD3B2; -.

XML format

Example: P26439

<dbReference type="MalaCards" id="HSD3B2"/>

RDF format

Example: P26439

uniprot:P26439
  rdfs:seeAlso <http://purl.uniprot.org/malacards/HSD3B2> .
<http://purl.uniprot.org/malacards/HSD3B2>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/MalaCards> .

Change of UniProtKB annotation cardinality constraints

Each UniProtKB entry may contain a variable number of different annotation topics. Most topics can be present more than once in a given entry (e.g. when a precursor protein is cleaved into chains/peptides with different functions, each one is described in a separate Function annotation). But some topics had been limited to occur no more than once per entry. We have lifted this restriction to allow for more flexibility and granularity in our annotations.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases:

  • Fanconi anemia complementation group M
  • Paget disease of bone
  • Spinocerebellar ataxia, autosomal recessive, 5

Changes to the controlled vocabulary for PTMs

New term for the feature key ‘Modified residue’ (‘MOD_RES’ in the flat file):

  • PolyADP-ribosyl aspartic acid

New terms for the feature key ‘Cross-link’ (‘CROSSLNK’ in the flat file):

  • 2-(S-cysteinyl)-methionine (Cys-Met)
  • Cyclopeptide (Cys-Ile)

UniProt service news

Retirement of UniProt BioMart

Based on user surveys and service evaluations, we decided to retire our UniProt BioMart service. For those who relied on the UniProt BioMart for tasks such as ID mapping, bulk retrieval of entries, or programmatic access to entry annotations, we have alternative services that will satisfy your needs. Please visit our YouTube channel and help pages for tutorials and more information about these services:

Please contact us if you have questions about this change.

Retirement of UniProt Distributed Annotation System (DAS)

The Distributed Annotation System (DAS) defines a communication protocol used to exchange annotations on genomic or protein sequences. It was first released in 2001 and UniProt had started to provide its data following the DAS protocol in July 2004. DAS has fulfilled a valuable role in integrating distributed and varied data, particularly for display in genome browsers and other applications that feature data visualisation, but unfortunately the level of usage of DAS in 2015 can no longer justify support and maintenance and we have therefore retired the UniProt DAS server.

Documentation on programmatic access to UniProt data can be found on the UniProt website.

Please contact us if you have questions about this change.

UniProt release 2015_11

Published November 11, 2015

Headline

The sense of a motion

No need to be a great scientist to understand that when a hawk is circling in the sky looking for food, small rodents should run and hide. This does not imply the mere recognition of a static image, or of a global movement, but most importantly to sense an asynchrony between a moving object (the hawk) and its background (the slow-moving clouds above it).

In vertebrates, visual motion sensing takes place in the retina and more specifically in a subset of retinal ganglion cells (RGCs). RGCs are located near the inner surface of the retina, where they receive visual information from photoreceptors via intermediate neurons, bipolar cells and amacrine cells. They extract salient features and send them deeper into the brain for further processing. The final picture is produced by the integration of many signals, each carried by a distinct population of RGCs. It is currently estimated that approximately 70 types of interneurons form specific synapses on roughly 30 types of RGCs. The discovery of the function of each RGC type and of their connections with specific interneurons is like trying to find the proverbial needle in a haystack.

Three years ago, Zhang et al. tackled this issue using a transgenic mouse line, called TYW3. In these mice, strong regulatory elements from the Thy1 gene drive the expression of yellow fluorescent protein (YFP). In the retina, YFP fluorescence could be detected in only a small subset of RGCs. The brightest cell population (W3-RGCs) was chosen for further characterization. Interestingly, these cells remained silent under most common visual inputs, including locomotion in a natural environment obtained with videos from a camera mounted on the head of a freely moving rat. The only condition that elicited reliable responses from W3-RGCs was the movement of small spots differing from that of the background, but not when these movements coincided.

The canonical pathway for delivering visual input to RGCs involves direct connections between bipolar cells and RGCs. In other words, RCGs typically are two synapses away from a photoreceptor, which ensures the fastest transmission of the signal. Surprisingly, W3-RGCs receive strong and selective input from unusual excitatory amacrine cell type interneurons, called VG3-ACs. With the introduction of the VG3-AC partner to the circuit, W3-RGCs appear to be three synapses away from a photoreceptor, slowing visual information delivery to the cells. A possible explanation is that W3-RGCs compare motion in the center and surround of the receptive field, firing only when the two are asynchronous. For the comparison to be temporally precise, input from the surround must arrive at the cell rapidly and/or input from the center must be delayed.

The crucial connection between W3-RGCs and VG3-ACs is ensured by homophilic interactions between Sdk2 proteins expressed at the cell surface of both cell types. Sdk2 is a cell adhesion protein whose expression is detected in the embryonic retina soon before birth and persists into adulthood, spanning the periods of lamina formation and synaptogenesis. Sdk2 knockout caused no alterations in retinal structure, but the strength of synaptic connections between VG3-ACs and W3-RGCs drops about 20-fold.

For your eyes only, the Sdk2 entries have been updated and are publicly available as of this release.

UniProtKB news

Change of the cross-references to eggNOG

We have introduced an additional field in the cross-references to the eggNOG database to indicate the taxonomic scope of an orthologous group.

Text format

Example: U3JAG9

DR   eggNOG; ENOG410IEUN; Eukaryota.
DR   eggNOG; ENOG410YVPU; LUCA.

XML format

Example: U3JAG9

<dbReference type="eggNOG" id="ENOG410IEUN">
  <property type="taxonomic scope" value="Eukaryota"/>
</dbReference>
<dbReference type="eggNOG" id="ENOG410YVPU">
  <property type="taxonomic scope" value="LUCA"/>
</dbReference>

This change did not affect the XSD, but may nevertheless require code changes.

RDF format

Example: U3JAG9

uniprot:U3JAG9
  rdfs:seeAlso <http://purl.uniprot.org/eggnog/ENOG410IEUN> ,
               <http://purl.uniprot.org/eggnog/ENOG410YVPU> .
<http://purl.uniprot.org/eggnog/ENOG410IEUN>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/eggNOG> ;
  rdfs:comment "Eukaryota" .
<http://purl.uniprot.org/eggnog/ENOG410YVPU>
  rdf:type up:Resource ;
  up:database <http://purl.uniprot.org/database/eggNOG> ;
  rdfs:comment "LUCA" .

Changes to the controlled vocabulary of human diseases

New diseases:

Changes to keywords

New keyword:

Changes in subcellular location controlled vocabulary

New subcellular locations:

UniProt release 2015_10

Published October 14, 2015

Headline

The smell of the sea in UniProtKB

Memories left by a walk on the seashore bring into play all our senses, of which smell is not the least. This characteristic ‘smell of the sea’ is carried by a little molecule, dimethylsulfide (DMS), which is an enzymatic cleavage product of dimethylsulfoniopropionate (DMSP).

DMSP is one of the most abundant organic molecules in the world, with a billion tons made and turned over every year. It is produced by marine macroalgae, as well as by single-cell phytoplankton species, such as diatoms, dinoflagellates and haptophytes, and occurs at high concentrations in their cytoplasm. The physiological function of DMSP is not yet fully established. It is thought to function as an osmolyte. It has also been proposed to serve as a cryoprotectant in polar algae. DMSP enzymatic cleavage products, DMS and acrylate, are quite effective at scavenging free radicals and other reactive oxygen species. Hence they may serve as an antioxidant system.

In healthy growing phytoplankton, DMSP freely diffuses in the cytoplasm, and only minute quantities are released. This amount is sufficient to attract zooplankton which start feeding on algae. Organisms grazed upon or infected by viruses as well as stressed or senescent cells release greater amount of DMSP, which is taken up by bacterioplankton, metabolized into DMS and used as a source of carbon and sulfur. DMS is not only used by seawater microorganisms, it is also volatile and a small fraction of it is released into the atmosphere where it creates an olfactory landscape providing seabirds with orientation cues to potential food supplies. In the atmosphere, DMS is oxidized to sulfuric acid and becomes an important source of sulfate aerosols. These act as condensation nuclei, causing water molecules to coalesce and cloud to form. The cycle is closed when rain brings back the sulfur-containing particles into the ocean. Interestingly, phytoplankton appear to convert DMSP into DMS very rapidly when they are stressed by UV radiation. The local increase in volatile DMS increases cloud formation, hence decreasing direct sun light exposure and relieving stress. Through this mechanism, plankton may shape local weather for their own benefit.

DMS release by seaweed was described in 1935 and DMSP was identified as its precursor almost 70 years ago, but the enzyme catalyzing the reaction remained elusive until last June. Using classical biochemical approaches, as well as genomic and proteomic analyses, Alcombri et al. identified ALMA1 from the chloroplastic membrane fraction of the coccolithophore alga Emiliania huxleyi, an abundant bloom-forming marine phytoplankton. This enzyme is a redox-sensitive homotetramer, that belongs to the aspartate/glutamate racemase superfamily and catalyzes DMSP cleavage into DMS and acrylate. Phylogenetic studies show the presence of numerous ALMA1 homologs in major, globally distributed phytoplankton taxa and in other marine organisms. This major discovery paves the way for future investigations on the physiological role of DMS and may allow quantification of the relative biogeochemical contribution of algae and bacteria to global DMS production.

If you want to take a deep, though virtual breath of sea smell, you can visit ALMA1 entries that are available to you as of this release.

UniProtKB news

Cross-references to WBParaSite

Cross-references have been added to WBParaSite, an open access resource providing access to the genome sequences, genome browsers, semi-automatic annotation and comparative genomics analysis of parasitic worms (helminths). WormBase ParaSite is closely integrated with and complementary to the main WormBase resource, the central focus of which is the model nematode Caenorhabditis elegans and its close relatives.

WBParaSite is available at http://parasite.wormbase.org.

The format of the explicit links is:

Resource abbreviation WBParaSite
Resource identifier Transcript identifier
Optional information 1 Protein identifier
Optional information 2 Gene identifier

Cross-references to WBParaSite may be isoform-specific. The general format of isoform-specific cross-references was described in release 2014_03.

Example: A8PGQ3

Show all entries having a cross-reference to WBParaSite.

Text format

Example: A8PGQ3

DR   WBParaSite; Bm6838; Bm6838; WBGene00227099.

XML format

Example: A8PGQ3

<dbReference type="WBParaSite" id="Bm6838">
  <property type="protein sequence ID" value="Bm6838"/>
  <property type="gene ID" value="WBGene00227099"/>
</dbReference>

RDF format

Example: A8PGQ3

uniprot:A8PGQ3
  rdfs:seeAlso <http://purl.uniprot.org/wbparasite/Bm6838> .
<http://purl.uniprot.org/wbparasite/Bm6838>
  rdf:type up:Transcript_Resource ;
  up:database <http://purl.uniprot.org/database/WBParaSite> ;
  up:translatedTo <http://purl.uniprot.org/wbparasite/Bm6838> ;
  up:transcribedFrom <http://purl.uniprot.org/wbparasite/WBGene00227099> .

Removal of the cross-references to CYGD

Cross-references to CYGD have been removed.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniParc news

UniParc cross-reference types changes

UniParc and UniProtKB entries both contain cross-references to external databases. For consistency reasons we have adjusted the names of these databases in UniParc to the ones in UniProtKB. In particular we have changed the following types of cross-references in UniParc:

Old type New type
ENSEMBL Ensembl
FLYBASE FlyBase
H_INV H-InvDB
REFSEQ RefSeq
TAIR_ARABIDOPSIS TAIR
WORMBASE WormBase
WormBase ParaSite WBParaSite

Example:

Previous XML:

<dbReference type="WormBase ParaSite" id="A_03330" version_i="1" active="Y" created="2014-09-12" last="2015-07-09">
  <property type="NCBI_taxonomy_id" value="6185"/>
</dbReference>

New XML:

<dbReference type="WBParaSite" id="A_03330" version_i="1" active="Y" created="2014-09-12" last="2015-07-09">
  <property type="NCBI_taxonomy_id" value="6185"/>
</dbReference>

UniProt release 2015_09

Published September 16, 2015

Headline

Life (and death) in 2D

While the cinema industry struggles to produce ever more realistic 3D, even 4D, films out of 2D images, scientists have achieved the exact opposite: in a collection of (3D) vertebrate embryos, they have identified a mutant that flattens in the course of development.

Vertebrates have a defined body shape in which correct tissue and organ shape and alignment are essential for function. Correct morphogenesis depends on force generation, force transmission through the tissue, and the response of tissues and extracellular matrix to force. In addition, embryos must be able to withstand environmental perturbations, such as gravity. Already in 1917, in his master work “On Growth and Form”, Sir D’Arcy Wentworth Thompson postulated that “the forms as well the actions of our bodies are entirely conditioned (save for certain exceptions in the case of aquatic animals) by the strength of gravity upon this globe”. It is actually from an “aquatic animal”, a fish, that the confirmation of this hypothesis came earlier this year. Screening of a Japanese rice fish mutant identified an embryo that displayed pronounced body flattening around stage 25-28 (50-64 h post fertilization). Although general development was not delayed, the mutant exhibited delayed blastopore closure and progressive body collapse from mid-neurulation, surviving until just before hatching. This mutant was aptly named hirame, which means flatfish in Japanese. When embryos were grown in agarose, their collapse correlated with the direction of gravity, reflecting the mutant’s inability to withstand external forces. The mutants also showed defective fibronectin fibril formation.

The hirame mutation lies within the Yap1 gene and creates a premature stop codon at position 164. Yap1 is a transcriptional co-activator that promotes proliferation and inhibits cell death during embryonic development. Porazinski and colleagues showed that Yap1 is also essential for actomyosin-mediated tissue tension.

The hypothesis with the strongest experimental support is that YAP1 acts on ARHGAP18 expression (and possibly that of other ARHGAP18-related genes), which in turn regulates cortical actomyosin network formation. Actomyosin contraction promotes fibronectin assembly, which could be a critical in vivo mechanism for the integration of mechanical signals, such as tension generated by actomyosin, with biochemical signals, such as integrin signaling, ensuring proper tissue shape and alignment and appropriate organ and body shape.

YAP1 knockdown in the human cell line hTERT-RPE1 caused a phenotype reminiscent of the fish embryo phenotype. When cultured in a 3D spheroid system, these retinal epithelial cells also exhibited collapse upon exposure to external forces, marked reduction of cortical F-actin bundles and lack of typical fibronectin fibril pattern. This suggests that YAP1 orthologs may play a similar role in all vertebrates, and possibly beyond.

As of this release, YAP1 protein entries have been updated and are publicly available.

UniProtKB news

Release of variation files for 27 new species

In collaboration with Ensembl and Ensembl Genomes, UniProt would like to announce the release of variation files for 27 species in addition to human, mouse and zebrafish files currently available in the dedicated variants directory on the UniProt FTP sites. This release includes a further 13 vertebrate species, including agriculturally important species: cow, chicken, pig and sheep. These new variant catalogues also expand the diversity of species with variants for plant, fungi and protist species that includes rice, bread wheat, barley and grape.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

UniProt release 2015_08

Published July 22, 2015

Headline

Pseudo-allergy, real progress

Do you sniffle and sneeze as trees start to bloom and the pollen gets airborne? Your mast cells are to blame. These cells reside at strategic anatomical positions, such as skin, gastrointestinal tract and lung, and provide us with a first line of defence against potential harm from our environment. Besides their beneficial functions, mast cells can also react to compounds that do not represent any threat to our health, such as pollen. This process begins with the interaction of an antigen with immunoglobulin E (IgE) bound to high affinity Fc epsilon receptors at the mast cell surface. It ends with the release of histamine and various inflammatory and immunomodulatory substances, which causes allergy. Most adverse reactions to peptidergic and small molecule therapeutic agents, collectively called basic secretagogues, also rely on mast cell stimulation, but do not correlate with IgE antibody titer. They proceed through a different, not yet fully understood, IgE-independent mechanism called pseudo-allergy, that eventually also leads the release of granule-stored histamine. In human, MRGPRX2 has been proposed, among others, to serve as a receptor for basic secretagogues, but until recently there was no direct proof of its involvement.

Earlier this year, McNeil et al. showed that "basic secretagogues activate mouse mast cells in vitro and in vivo through a single receptor, Mrgprb2, the ortholog of the human G-protein-coupled receptor MRGPRX2. The first achievement of this study was to prove the orthology of these 2 genes, which was not an easy task. In humans, MRGPRX2 is found in a cluster with 3 other MRGPRX family members. This cluster is dramatically expanded in mouse, with 22 potential protein-coding genes that show comparable sequence identity to MRGPRX2. To establish orthology, the authors used 2 criteria: expression pattern (expression in mast cells) and pharmacology (some 16 compounds were tested for mast cell activation). Then Mrgprb2a knockout mice were created. Gene targeting was performed using a zinc-finger-nuclease-based strategy, as classical homologous recombination approach was impossible in this genomic locus due to too many repetitive sequences. The null animals showed no visible phenotype in normal conditions, but didn’t produce any pseudo-allergic reaction in response to small-molecule therapeutic drugs. Secretagogue-induced histamine release, inflammation and airway contraction were abolished.

This elegant study does not deal simply with the identification of “just another receptor”. It addresses an issue that may concern all of us at some point in our lives. Basic secretagogues are compounds that are frequently encountered either in natural fluids, such as the wasp venom toxin mastoparan, or in various drugs, such as cationic peptidergic drugs, antibiotics (fluoroquinolone family), neuromuscular blocking agents, etc. These latter are routinely used in surgery to reduce unwanted muscle movement and are responsible for nearly 60% of allergic reactions in a surgical setting. The majority of these compounds activate mast cells in an Mrgprb2-dependent manner. The animal model created by McNeil et al. could then be used for pre-clinical testing of new drugs in order to minimize pseudo-allergic risks. In addition, the identification a motif common to several Mrgprb2 agonists may allow the prediction of side effects of clinically used compounds.

As of this release, primate MRGPRX2 and mouse Mrgprb2 entries have been updated and are publicly available.

UniProt service news

Programmatic access to UniProt with sparql.uniprot.org

We are happy to announce the public release of the UniProt SPARQL endpoint at sparql.uniprot.org, where you can also find links to the documentation of the UniProt RDF data model and an interactive query interface with sample queries to get you started.

For those unfamiliar with SPARQL, this is a W3C standardized query language for the Semantic Web. If you know SQL, it will look familiar to you and you can do similar types of queries with it. SPARQL also allows you to query and combine data from a variety of SPARQL endpoints, providing a valuable low-cost alternative to building your own data warehouse. You can combine UniProt data from sparql.uniprot.org with that from the SPARQL endpoints hosted by the EBI’s RDF platform, the SIB’s neXtProt SPARQL endpoint, etc.

We look forward to feedback from the community to help us improve this service further.

UniProtKB news

Addition of human somatic protein altering variants from COSMIC

The Catalogue of Somatic Mutations in Cancer (COSMIC) is a database of manually curated somatic variants from peer reviewed publications and genome-wide studies. UniProt, in collaboration with COSMIC, have integrated COSMIC release v71 protein altering variants into the homo_sapiens_variation.txt.gz file. The COSMIC variants provide the standard information found in the homo_sapiens_variation.txt.gz file and additional information on the primary tissue(s) the variant was found in within the Phenotype/Disease field.

Changes to the humdisease.txt file

We have added cross-references to MedGen to the humdisease.txt file. MedGen, the NCBI portal to information about human genetic disorders, conveys multiple disease names, medical terms and information for the same disorder from various sources into a specific concept. Each MedGen concept has a Concept Unique Identifier (CUI) that allows computational access to global disease information. Together with disease nomenclature, this includes disease definitions, clinical findings, available clinical and research tests, molecular resources, professional guidelines, original and review literature, consumer resources, clinical trials, and Web links to other related resources. MedGen is a valuable resource to allow UniProtKB users to access an extensive range of biomedical data.

Changes to the controlled vocabulary of human diseases

New diseases:

Modified diseases:

Deleted diseases:

  • Blepharophimosis-ptosis-intellectual disability syndrome
  • Ehlers-Danlos syndrome 2

UniProt release 2015_07

Published June 24, 2015

Headline

Coding-non-coding RNAs: a game of hide-and-seek

It is well-established that microRNAs (miRNAs) are small eukaryotic non-coding RNA molecules that repress the expression of their target genes. miRNAs are transcribed by RNA polymerase II as large primary transcripts (pri-miRNA), that share the same characteristics as all other RNA polymerase II-transcribed RNAs, such as the presence of a 5'-cap and a 3'-poly(A) tail. pri-miRNAs are processed to smaller pre-miRNAs, which in turn are cleaved to produce mature miRNAs. In animals, this final maturation step occurs in the cytoplasm, while in plants it takes place in the nucleus. Cytosolic mature miRNAs guide the RNA-induced silencing complex (RISC) in repressing target genes through either cleavage or translational repression of their mRNAs.

A recent article published in Nature revealed that plant pri-miRNAs may not be as non-coding as previously assumed. Some do actually encode small regulatory peptides, called miPEPs, which enhance the accumulation of their corresponding mature miRNAs. This has been shown for Medicago truncatula pri-miR171b and Arabidopsis thaliana pri-miR165a which encode miPEP171b and miPEP165a, respectively. These two 20- and 18-amino acid-long peptides have been shown to be translated in vivo and to promote the transcription of their pri-miRNAs, resulting in the accumulation of mature miR171b and miR165a. This increase leads to the reduction of lateral root development in the case of miR171b and stimulation of main root growth for miR165a. The same effects were observed when synthetic peptides were applied to plants, suggesting that miPEPs might have agronomical applications.

Five other pri-miRNAs were experimentally shown to encode active miPEPs, suggesting that the presence of such small regulatory peptides may be widespread in plants. Computer analysis of the 5'-end of 50 pri-miRNAs in Arabidopsis thaliana revealed that all of them contained at least one ORF, which, if translated, could give rise to 3- to 59-amino acid-long peptides of unknown biological activity. No common signature was found among them, possibly due to the specificity of each putative miPEP for its own pri-miRNA.

Arabidopsis thaliana miPEP165a, miPEP160b, miPEP164a and miPEP319a and Medicago truncatula miPEP171b peptides have been manually annotated and are integrated into UniProtKB/Swiss-Prot as of this release. The sequences of the other 2 Medicago truncatula functionally characterized peptides, miPEP169d and miPEP171e, are unfortunately not available.

UniProtKB news

Cross-references to ESTHER

Cross-references have been added to ESTHER, a database of the Alpha/Beta-hydrolase fold superfamily of proteins.

ESTHER is available at http://bioweb.ensam.inra.fr/ESTHER/general?what=index.

The format of the explicit links is:

Resource abbreviation ESTHER
Resource identifier Gene locus.
Optional information 1 Family name.

Example: P0C064

Show all entries having a cross-reference to ESTHER.

Text format

Example: P0C064

DR   ESTHER; bacbr-grsb; Thioesterase.

XML format

Example: P0C064

<dbReference type="ESTHER" id="bacbr-grsb">
  <property type="family name" value="Thioesterase"/>
</dbReference>

Cross-references to Genevisible

Cross-references have been added to Genevisible, a search portal to normalized and curated expression data from GENEVESTIGATOR.

Genevisible is available at http://genevisible.com/search.

The format of the explicit links is:

Resource abbreviation Genevisible
Resource identifier Gene identifier.
Optional information 1 Organism code.

Example: P31946

Show all entries having a cross-reference to Genevisible.

Text format

Example: P31946

DR   Genevisible; P31946; HS.

XML format

Example: P31946

<dbReference type="Genevisible" id="P31946">
  <property type="organism ID" value="HS"/>
</dbReference>

Removal of the cross-references to Genevestigator

Cross-references to Genevestigator have been removed.

Change of the cross-references to PomBase

Cross-references to PomBase may now optionally indicate a gene designation in order to align them with the format of other model organism databases.

Text format