Databases available
The following table lists databases that are available. If there is something not on this list which you would like us to have please send us an email. Some users have their own databases. If you need help with your own data please ask.
| EMBL | Nucleic acid database: Our copy of the whole database has not been updated since Dec 2009. Please ask if you require data: the viral section was rebuilt in Feb 2013. The BLAST format data are in /data4/blast. See nt below. | See information at EBI where several documents may be found explaining the relationship of EMBL and Genbank and what sort of data are in the databases. Typically now we set up specific sets of data from EMBL eg Whole Genome Shotgun (WGS) datasets for individual users as required. |
| Homstrad | Homologous structure alignment database; used with Fugue, updated weekly. | See web site. |
| InterPro | Database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. Available with iprscan, updated when new releases are found. | See the web site for more details. Please seek advice before using this for high-throughput work. For occasional use the web interface just cited will be better. The data are in /data2/iprscan. |
| nr | NCBI Non-redundant protein database available for BLAST and fasta, hmmsearch, fugue etc, updated weekly. The data are in /data4/blast. | This database has more sequences than Uniprot and includes those from PDB (ie NRL3D) but is not available in taxonomic divisions like Uniprot. So you have to search the whole thing. |
| nt | NCBI nucleotide database available for BLAST, updated weekly. The data are in /data4/blast. | This database has entries from the traditional divisions (excluding environmental) of GenBank, EMBL, and DDBJ and excludes a large amount of current data (no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). NB: The database is not non-redundant (cf nr). |
| PDB | A database of structures from WWPDB, held in /data5/pdb, updated weekly. | For further information see WWPDB web site | Pfam | Database of HMMERs (hidden Markov models) of protein sequences. Updated when new releases are found. The data are in /data4/pfam but note that PfamA is also with InterPro. | See the Sanger Pfam site. At present we have versions compatible with hmmscan and the older hmmpfam alongside each other. | UniProt | A protein sequence database consisting of Swiss-Prot and TrEMBL, updated every few weeks. This may be searched in its entirety or, for BLAST and FASTA we also provide the data by divisions and also Unimes and Uniref90. Currently in /whale-data/jcjb/temp. | UniProt knowledgebase consists of two parts, a section containing fully manually-annotated records resulting from information extracted from literature and curator-evaluated computational analyses, and a section with computationally-analysed records awaiting full manual annotation. For the sake of continuity and name recognition, the two sections are referred to as "Swiss-Prot" and "TrEMBL" respectively. There is also a section called Unimes, with metagenomic and environmental sequences. In addition we now keep Uniref90, which is a database providing clustered sets of sequences from UniProt (including splice variants and isoforms) and selected UniParc records, giving faster searches. For the full details see the UNIPROT documents at EBI. |
| UniVec | A Non-redundant cloning vector database, somewhat newer than Vector-IG | See Web site |
| Vector-IG | A database of vector sequences | |
| V-Base | A database of human variable immunoglobulin genes. | See the V-Base web site | Other databases available in EMBOSS | PRINTS a database of fingerprints which are conserved motifs used to characterise protein families. The PROSITE database of protein motifs is used by the patmatmotifs program while the REBASE database of restriction enzymes is used by mapping programs. The TFD database of transcription factor sites may also be used. | See web sites: PRINTS and Prosite . |
| Other genome or transcriptome databases | Many datasets from public sites can be made available for BLAST. | If you want other data please ask. Some users maintain their own blast databases. |
- Resources available
- Computer hardware
- Software
- Databases
- Support and training
