uniprot logo

Forthcoming changes

Table of contents

Change of URIs for OrthoDB

On February 28, 2018

For historic reasons, UniProt had to generate URIs to cross-reference databases that did not have an RDF representation. Our policy is to replace these by the URIs generated by the cross-referenced database once it starts to distribute an RDF representation of its data.

The URIs for the OrthoDB database are therefore going to be updated from:
http://purl.uniprot.org/orthodb/<ID>
to:
http://purl.orthodb.org/odbgroup/<ID>
If required for backward compatibility, you will be able to use the following query to add the old URIs:
PREFIX owl:<http://www.w3.org/2002/07/owl#> 
PREFIX up:<http://purl.uniprot.org/core/> 
INSERT
{
   ?protein rdfs:seeAlso ?old .
   ?old owl:sameAs ?new .
   ?old up:database <http://purl.uniprot.org/database/orthodb> .
}
WHERE
{
   ?protein rdfs:seeAlso ?new .
   ?new up:database <http://purl.uniprot.org/database/OrthoDB> .
   BIND(iri(concat('http://purl.uniprot.org/orthodb/', substr(str(?new),31))) AS ?old)
}
The dereferencing of existing http://purl.uniprot.org/orthodb/<ID> URIs will be maintained.

UniProtKB FASTA headers: Addition of NCBI taxonomy identifier

On February 28, 2018

In order to avoid ambiguities and simplify parsing, we are going to add the NCBI taxonomy identifier to UniProtKB FASTA headers.

Current format:

>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion
New format:
>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

Where:

  • db is ‘sp’ for UniProtKB/Swiss-Prot and ‘tr’ for UniProtKB/TrEMBL.
  • UniqueIdentifier is the primary accession number of the UniProtKB entry.
  • EntryName is the entry name of the UniProtKB entry.
  • ProteinName is the recommended name of the UniProtKB entry as annotated in the RecName field. For UniProtKB/TrEMBL entries without a RecName field, the SubName field is used. In case of multiple SubNames, the first one is used. The ‘precursor’ attribute is excluded, ‘Fragment’ is included with the name if applicable.
  • OrganismName is the scientific name of the organism of the UniProtKB entry.
  • OrganismIdentifier is the unique identifier of the source organism, assigned by the NCBI.
  • GeneName is the first gene name of the UniProtKB entry. If there is no gene name, OrderedLocusName or ORFname, the GN field is not listed.
  • ProteinExistence is the numerical value describing the evidence for the existence of the protein.
  • SequenceVersion is the version number of the sequence.
Examples:
>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana OX=115339 GN=acantho2 PE=1 SV=1
>sp|P27748|ACOX_CUPNH Acetoin catabolism protein X OS=Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) OX=381666 GN=acoX PE=4 SV=2
>sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus OX=10090 PE=1 SV=1

>tr|Q3SA23|Q3SA23_9HIV1 Protein Nef (Fragment) OS=Human immunodeficiency virus 1  OX=11676 GN=nef PE=3 SV=1
>tr|Q8N2H2|Q8N2H2_HUMAN cDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens OX=9606 PE=2 SV=1
The same modification will be applied to FASTA headers of alternative isoforms in UniProtKB/Swiss-Prot), where the new format will be:
>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName OX=OrganismIdentifier[ GN=GeneName]
Example:
>sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis OX=9541 GN=YWHAB