Software available
The following table lists software packages that are available. If there is something not on this list that you would like us to have, or you need more help with running a program, or you notice a newer version of software that I ought to install please send me an email. Most general software is not listed here (except R because it has many bioinformatics packages installed). You will find perl, python, ruby, standard compilers and a wide range of other applications are available but you can always ask if you need more or find we have an outofdate copy. (In some cases you may even need an older copy; this can happen with, eg, python.) Some people run software not listed here: both their own code and software they have obtained elsewhere.
| Package | Description | Further information |
|---|---|---|
| ABySS | A de novo short read assembler for small or large genomes | See web site. |
| analysis | Package for evolutionary genetic analysis | See web site |
| ancestrymap | Screens through the genome in a recently mixed population such as African Americans, searching for segments with increased ancestry from one of the ancestral populations, which can indicate the position of disease genes | See web site |
| aqua | AQUA: automatic quality improvement for multiple sequence alignment started with command aqua. | See text help but note local command is aqua. |
| Artemis | For viewing and analysing (eg with blast) DNA/protein sequences and feature tables. Start with command art . The Artemis Comparison Tool is started with act. Both require X-Windows. | See the
web site.
To customize artemis copy the file /biosoft/artemis/etc/options to your directory and edit it. Or ask for help. The most recent version has not been set up for blast etc. |
| Augustus | A program, and associated scripts, to predict genes in eukaryotic genomic sequences. | There are various ways to perform training using BLAT, PASA and SCIPIO and the authors may also assist you with this. See web site. |
| Babel | Open Babel succeeds Babel: Program and library to interconvert between many file formats used in structural studies. Includes babel, obfit, obgrep and obrotate | See man babel and web site. Note the version installed is that available with the operating system but newer versions may be installed if required. |
| bamtools | Utility for working on BAM files analogous to samtools (see below) thus avoiding need to store and work on larger text SAM files where possible. | See paper |
| bayenv | Bayesian method using environmental correlations to identify loci underlying local adaptation | See web site |
| BEAST | BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences | See the web site. NB BEAST can start from a defined usertree (cf Mr Bayes). Parallel BEAST is available. Example command-line: nice +10 beast -working -beagle -beagle_instances 8 test.xml >& log. There is also now the BEAST2 package with commands set up as beast2 etc. This can restart a chain and has its own web site. |
| BEST | Bayesian Estimation of Species Trees works with Mr Bayes (see below). | See web site. |
| Bioconductor | A large collection of packages used within R for carrying out very many functions in the analysis and comprehension of high-throughput genomic data. | See web site. There are also many non-Bioconductor R packages not listed in this table. Recently not all new Bioconductor packages have been installed. If you would like packages installed for you centrally please ask. Some people also have their own R packages installed locally (you define an R user library path). |
| biogrep | Optimised version of grep for matching patterns against sequences. | See web site. |
| BioPerl | Set of modules to help write bioinformatics Perl scripts, with some functional scripts as well | See web site |
| Biopieces | Set of command-line tools that can be put together to create pipelines. | See website. |
| BioPython | Analogous to BioPerl. Also we have biopy, numpy and scipy | Note that on some servers there is more than one version of Python and also BioPython as versions with the OS may not be the newest. Ask for advice if required or if you need more Python modules/scripts installed. See website for general information. |
| BLAST | Sequence database searching package available in several versions. The new BLAST+ package uses separate names blastp, blastn etc. The previous NBCI names blastall, blastpgp, bl2seq etc are still available alongside. Wu-BLAST is also available. The shared data are in /data4/blast or we can set up your data. BLASTDB is defined for [t]csh users. | The new BLAST+ software is documented separately (PDF) from the old. You can use concatenated queries in FASTA format. |
| blat | Blast-like Alignment Tool to perform rapid mRNA/DNA and cross-species protein alignments. | See FAQ on UCSC web site. |
| Bowtie | Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. The new version has programs with a 2 in the name eg bowtie2 | See version 1 manual and version 2 beta manual. The myrna pipeline is also available. |
| boxshade | Makes shaded multiple alignment files | See text documentation or type man boxshade |
| bwa | Burrows-Wheeler Aligner (BWA) is a program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. | See web manual page |
| CAF tools | Suite of programs for manipulating CAF (Common Assembly Format) sequence assemblies | See web site |
| cap3 | Sequence assembly program | See manual; see also gap4/gap5 in Staden package |
| CEGMA | Pipeline that identifies a core set of eukaryotic genes. It uses WU-BLAST, HMMER 2, geneid and Wise2. This may help check the completeness of a genome assembly. It could be used to train eg snap. May be used in conjunction with Maker. Local wrapper command run_cegma takes same arguments as cegma | See the text manual and public web site |
| CLC Bio Genomics Workbench | A comprehensive package for analyzing next generation sequencing data | See website. Restricted availability under individual licences. Please email me. |
| clump | Program using Monte Carlo method for assessing significance of case-control association studies with multi-allelic markers. Can be used with HTcondor | See text documentation and example submit file for use with 10 input files in.0 to in.9 See also the HTcondor guide. |
| clumpp | A program that deals with label switching and multimodality problems in population-genetic cluster analyses. Command clumpp | See PDF manual |
| clustal | Multiple sequence alignment; clustalw2 is the current version. clustalx2 is an X-Windows version. clustalo is special new fast version for proteins only. | See On-line Clustalw Manual, Clustalx manual [both slightly old: see also help inside the programs] and clustal omega manual. For further information see the web site . There is also on-line help inside the programs. It is also available using HTcondor. See my local guide, which includes a clustalw example. |
| CNS | CNS (Crystallography and NMR System) is a program for macromolecular structure determination | CNS has been implemented using HTcondor. See the CNS web site |
| cnvnator | A program for CNV discovery and genotyping from depth of read mapping | See website |
| consed | A viewer for files made by phrap and an editor for these assemblies. Requires X-windows. | See text documentation. This software requires files to be edited before use so get in touch if you need to use it. |
| consensus | Program to find the consensus in unaligned sequences | See online manual. See also wconsensus. |
| chromopainter and finestructure | Finding haplotypes in sequence data and identifying population structure using dense sequencing data. | See web site |
| Cufflinks | Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one. | See brief text manual or full manual on web site |
| densitree | Program for qualitative analysis of sets of trees. | See PDF Manual |
| DL_POLY | A package for molecular dynamics simulations | This package was implemented on the cluster using mpi. See the web site for general information. It is likely you will want to use the HPC service for DL_POLY to get better performance as it is suited to highly parallel systems. |
| dotter | A graphical dotplot program for detailed comparison of two sequences; Requires X-Windows | See web site |
| edena | Edena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Previous version now edena2 | See PDF manual (version 3), version 2 manual and web site |
| eigensoft | Programs related to ancestrymap (see above) from the Reich Lab for studying human history, evolution and disease gene mapping | See eigenstrat readme, popgen readme, convertf readme and web site |
| EMBOSS | The European Molecular Biology Open Software Suite version 6.3.1 is available. Type emboss6 to over-ride use of version 6.1.0 installed with the operating system. | You use the web interface EMBOSS
Explorer if you ask for Raven access. See On-line program manual for
EMBOSS |
| FAR | Flexible adapter remover software using Needleman Wunsch alignment. Works on many formats of sequencing reads. Command far | Sourceforge web site has vanished. |
| FASTA (including SSEARCH etc) | Sequence database searching package. Slower than blast. May be most useful for some protein searches. | You must specify which program you need (fasta, fastx, tfastx etc). See the fasta manual. SSEARCH is a Smith-Waterman search within the package. The raw binaries in the current version are fasta36 etc. |
| fastphase | Software for haplotype reconstruction, and estimating missing genotypes from population data | See PDF manual and web site. See also phase below. |
| FASTX-Toolkit | A collection of tools for short reads fasta/fastq file preprocessing. | See web site |
| frappe | A program for estimating individual ancestry and admixture proportions using high-density SNP data | See PDF manual and website |
| Fugue | Package for recognizing distant homologues of proteins by sequence-structure comparison. Initialised with command fugue | See web site for more information and tutorial. If that server is down you can consult fugue tutorial, fugue output interpretation, fugue command line use locally. |
| Genehunter | Program to do multipoint linkage analysis; command gh | See postscript documentation or web site |
| Glimmer | Takes a sequence and a set of Markov models for genes and outputs a list of ORFS. | See glimmer, build-icm and long-orfs text documentation |
| GMT | Generics Mapping Tools package for manipulating cartesian datasets | See website |
| GoMiner | GoMiner is a tool for biological interpretation of omic data including data from gene expression microarrays. It is run in X-Windows with the command gominer. | gominer can be set up with a local copy of the GO database, which makes it much faster. To use this, you must specify the database in the menu item File>Load GO terms as jdbc:mysql://localhost/go and the username and password are both access. There is also a high throughput version available if anyone is interested. The web site has more information including a PowerPoint tour of the main features |
| Gossamer | New de novo short read assembler using less RAM. Command goss or gossple.sh for front-end script. | See manual and manual for script |
| gromacs | A molecular dynamics simulation package | See the web site. This has been implemented on CamGrid. |
| hmmer | Sequence analysis with profile Hidden Markov Models. Can search our BLAST format databases. | For version 3 see the PDF manual. For version 2 see man hmmer. An example of how to run many hmmpfam searches using HTcondor is given: the submit file and shell script are used with a file containing the tarred gzipped sequences. |
| hyphy | A scriptable package for evolutionary modelling. | See web site. An MP and an MPI version have been built. Local expert is Simon Frost. |
| IGV | igv: integrative genomics viewer and igvtools | See web site |
| ihs | Integrated Haplotype Score test | See readme file and web site |
| im, ima and ima2 | Programs for population genetic analysis | See web site |
| inGAP | Integrated Next-gen Genome Analysis Platform run with /biosoft/inGAP_linux64/inGAP | See online manual |
| instruct | An alternative to structure (see below) | See PDF manual. Available also as an HTcondor compiled binary. |
| iprscan | Interface to Interpro database (see below). Can be used for one or more types of search at a time eg transmembrane protein, signal protein, Pfam, SMART. | See iprscan readme file and FAQs |
| IQPNNI | (Important Quartet Puzzling and NNI Operation) program to reconstruct a phylogenetic tree from DNA or amino acid sequence data. | See the manual. This has been set up to use HTcondor. Here is an example HTcondor submit script for the sequential version. |
| Jalview | Multiple sequence alignment editor; can read MSF, CLUSTAL, FASTA, BLC,MSP and PIR formats. Requires X-Windows. | See web site |
| jmodeltest | A phylogeny program for selecting the model of nucleotide substitution that best fits the data | See PDF guide. Supersedes modeltest. |
| joy | Program to annotate protein sequence alignments with 3D structural features | See web site |
| Linkage analysis | A range of linkage analysis software started with the command linkage previously is now installed on demand | See package details |
| lucy | Program for cleaning sequence data | See web site |
| mafft | Multiple alignment program | See web site and NAR paper |
| Maker | Genome annotation pipeline with command-line and web interfaces. This uses, or may use, other software including apollo, augustus, exonerate, gbrowse and snap that are not all documented elsewhere on this page at present. | We are now using the command-line interface for preference. Wrapper run_maker or run_maker_mpi, See also the public web site. |
| Mapmaker/sibs | Software package which allows very rapid multipoint mapping of loci in nuclear pedigrees with two or more affected/phenotyped sibs. Command sibs | See Postscript documentation |
| maq | Maq stands for Mapping and Assembly with Quality. It builds assembly by mapping short reads to reference sequences. See also BWA and Bowtie. | See web site |
| MATLAB | Well-known package for technical computing. | See web site |
| mega2 | A data-handling program for facilitating genetic linkage and association analyses | See web site |
| merlin | Linkage analysis program using sparse trees to represent gene flow in pedigrees; one of the fastest pedigree analysis packages | See Text documentation or web site with tutorial |
| MetaVelvet | Modified version of Velvet (see below) for metagenomics. | See website |
| mira | Whole Genome Shotgun and EST sequence assembler | See web site |
| mitoprot | Prediction of mitochondrial targeting sequences | See web site |
| MOCAT | Package for analyzing metagenomics datasets | See website |
| modeller | Protein structure modelling program | See online manual |
| modeltest | A phylogeny program for selecting the model of nucleotide substitution that best fits the data | See PDF guide. See also jmodeltest. |
| Molphy | Phylogeny package | No documentation. See the options when you type: protml, protdst, nucml, nucst, njdist, totalml. See also Phylip and PAUP |
| molscript | Molecular graphics program | See the on-line manual. |
| MrBayes | Program for the Bayesian estimation of phylogeny. | See PDF manual or PDF command reference manual. You can run parallel jobs most easily locally on the server. Start mpd at nice +10 then run with mpirun -np 4 /biosoft/bin/mb [input file]. example MrBayes batch file to use with this (ie as file called mb1.txt). You can restart a job using the checkpoint file with Mr Bayes 3.2 |
| mrcanavar | Copy number caller that analyzes whole-genome NGS mapping read depth to discover large segmental duplications and deletions. | Also mrfast and mrsfast. See website |
| MSPCrunch | A Blast enhancement filter, used with blixem and blx | See blixem |
| mummer | A system for rapidly aligning entire genomes | See man mummer. For further information see the web site. |
| muscle | Program to do multiple alignment (MUSCLE stands for multiple sequence comparison by log-expectation. | See web site |
| naccess | Program to calculate atomic solvent accessible areas of proteins | Users MUST cite: Hubbard,S.J.& Thornton, J.M. (1993), 'NACCESS', Computer Program, Department of Biochemistry and Molecular Biology, University College London." in any publications. See text documentation (command naccess is already defined for you). |
| nmica | Pattern discovery system aimed at finding transcription factor binding sites and similar motifs. | See the PDF manual. |
| Novocraft | Package for aligning short reads to reference genomes | See PDF manual |
| Octave | GNU Octave is a high level language for numerical calculations and is mostly compatible with Matlab. | This had been installed so it could be used on CamGrid but NB MatLab is now available under the University site licence and can be used on servers with a suitable licence. See web site |
| Oligoarray | Software that computes gene specific oligonucleotides for genome-scale oligonucleotide microarray construction | See website. The software is in /biosoft/OligoArray2_1/ |
| PAML | PAML stands for Phylogenetic Analysis by Maximum Likelihood | See PAML User Guide, MCMCTree Guide and FAQs. An example submit file to use this with HTcondor is here |
| Pathway Tools | Pathway Tools is a comprehensive symbolic systems biology software system available via two interfaces | See Web site |
| patser | Program to score the words of a sequence against an alignment matrix | .See text documentation. |
| PAUP | Software package for inference of evolutionary trees (command paup) | See Quick Start Guide (PDF) and Command Reference Manual (PDF). |
| pedcheck | Program for detecting marker typing incompatibilities in pedigree data | See web site |
| phase | A Bayesian statistical method for reconstructing haplotypes from population genotype data | See PDF manual and web site. See also fastphase above | phrap | An assembly program for shotgun DNA sequences. Comes with crossmatch and swat sequence comparison programs. | See phrap and general text documents |
| phred | Interprets sequence traces and assesses their quality, outputting to fasta or other formatted files. Command phred | See text documentation |
| Phylip | Phylogenetic analysis package with many programs | On-line Phylip
Manual It can also be used with HTcondor. See my local guide, which includes a Phylip example. |
| phyml | Software implementing a method for building phylogenies from DNA and protein sequences using maximum likelihood. Command phyml | See Online PDF manual. You can run this with HTcondor. See guide. Here is an example submit file. |
| Phylobayes | A Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction using protein alignments. Compared to other phylogenetic MCMC samplers (e.g. MrBayes), the main distinguishing feature of PhyloBayes is the underlying probabilistic model, CAT. CAT is a mixture model especially devised to account for site-specific features of protein evolution. It is particularly well suited for large multigene alignments, such as those used in phylogenomics. | See manual. Here is an example submit script for running pb on CamGrid. You can make others for running the other programs. |
| plink | A whole genome association analysis toolset | See web site |
| Picard tools | Java-based tools for manipulating SAM files. | See web site. The jar files are in /biosoft/src/picard-1.92 (or latest version) |
| PolyPhred | Package for identifiying SNPS. Used with phred, phrap and consed; commands polyphred and sudophred. | See polyphred manual |
| primer3_core | Primer design program | See text documentation or man primer |
| probcons | Program for multiple alignment of protein sequences | See PDF manual |
| ProtTest | Program to select the best-fit model of protein evolution (sibling program to jmodeltest for DNA). Command runProtTest for commandline or runXProtTest for XWindows. | See Version 2.4 PDF manual. Prottest 3.0 beta available: ask if required. |
| psipred | Protein secondary structure prediction using command runpsipred | See web site |
| pymol | Molecular visualization program | See web site |
| qsra | Quality Value Guided Short Read Assembler | See web site |
| quicktree | Program for the rapid reconstruction of phylogenies by the Neighbor-Joining method | See web site |
| R | The R language and a large number of statistical genetics packages that run in R are available. | See local web documents and links therein. Please ask if you need more packages installed. |
| rasmol | Molecular viewing program also available for a range of other platforms. | See the online-manual. |
| Raster3D | Molecular graphics package | See PDF and HTML manuals |
| RAxML | (Randomized accelerated Maximum Likelihood) program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees | See PDF Manual and web site. raxmlHPC-PTHREADS-SSE3 (version 7.2.8) is current latest installed. raxmlLight-PTHREADS is the Light version, which has its own manual. |
| RepeatMasker | A program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. Installed in /biosoft/RepeatMasker. | This version uses rmblast. See web site |
| RepeatScout | Software to identify repeat family sequences from genomes. Can use output as input to RepeatMasker. | See text manual |
| ROADTRIPS | Program that performs single-SNP, case-control association testing in samples with partially or completely unknown population and pedigree structure; command roadtrips | See PDF manual |
| Roche 454 software | A package for analyzing various types of data from Roche 454 sequencers eg as obtained from the Department of Biochemistry DNA Sequencing Facility. Includes the newbler assembler. | Available to those with suitable data. Please email me. |
| SAM tools | SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. | See project web site. |
| Scilab | A package similar to MATLAB for numerical computation | See web site and MATLAB above. |
| seaview | Muliple alignment editor; requires X-Windows | See seaview web site |
| sff2fastq | Converts bewteen Roche 454 and fastq formats | See web site for details. |
| sff_extract | Extracts reads from 454 sff files and stores in fasta xml or caf text files. | See web site for details of usage. |
| Shogun | A large-scale machine learning toolbox | This has been installed for use on CamGrid. See web site |
| sickle | A windowed adaptive trimming tool for FASTQ files using quality | See website |
| signalp | Predicts the presence and location of signal peptide cleavage sites in amino acids from different organisms. | See the public web site |
| simwalk2 | A statistical genetics application for haplotype, parametric or non-parameric linkage, identity by descent and mustyping analyses | See web site. |
| smoldyn | Program to perform cell-scale simulation, implemented on CamGrid. | See web site. Model submit files are available. |
| splink | Program for linkage analysis using affected sib pairs | See text documentation |
| SOAPdenovo | Short read assembler that can build a de novo draft assembly for human-sized genomes. It is specially designed to assemble Illumnia GA short reads. | See web site. |
| SRA toolkit | Set of programs for converting NCBI SRA archive format into various other formats. | See web site |
| ssaha2 | Package for mapping DNA sequencing reads onto a genomic reference sequence | See PDF Manual |
| stacks | Software pipeline for building loci out of a set of short-read sequenced samples. | See web site. | Staden | Used mainly for sequencing projects but also has other functions. Initialize with the command staden_new for the current X-Windows version inclduing gap5. staden sets up the previous version. | See online Staden manual Note (above) that spin can be used as an interface to EMBOSS under X-Windows. |
| stampy | stampy is a package for mapping short Illumina reads onto a reference genome. It can bs used for genomic resequencing, RNA-Seq and chip-seq. It is good for mapping with reads containing sequence variations relative to the reference, eg insertions or deletions, including highly divergent species. | See manual and web site. |
| strat | Companion software for structure (below). | See web site. |
| structure | Program that implements a model-based clustering method for inferring population structure using genotype data consisting of unlinked markers | See PDF manual. This is a prototype HTcondor submit file. An automated submission system for HTcondor can be used with the command run_structure_condor A version that uses extraparams is run_structure_condor_e. run_structure_condor_vm lets you use a different mainparams file name. See also chromopainter/finestructure. |
| surfnet | A program that generates surfaces and void regions between surfaces from coordinate data supplied in a PDB file | Users MUST cite "Laskowski R A (1995). SURFNET: A program for visualizing molecular surfaces, cavities and intermolecular interestions. J. Mol. Graph., 13, 323-330." in any publications. See manual and web site |
| T-Coffee | Multiple alignment package including t_coffee and other programs, also using many other packages. | See T-Coffee web site |
| tachyon | Extremely fast ray tracing | See web site |
| tmhmm | Program for the prediction of transmembrane regions in proteins | See manual |
| tophat | A fast splice junction mapper for RNA-seq reads that uses bowtie (see above) | See web site |
| tpatterns | Search nucleic acid databases with a [protein] pattern in all six reading frames; analogous to GCG findpatterns. | See man tpatterns |
| transmit | Program for transmission disequilibrium testing | See text documentation |
| Tree-Puzzle | Program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood; command puzzle | See PDF
manual. To run the parallel version use the command mpirun -np 4 /biosoft/bin/ppuzzle. Here is an example submit script for HTcondor using the parallel version and one using a condor_compiled binary instead of parallel. The latter is more flexible. A sample input file shows the type of syntax to use to toggle options. |
| trf | Tandem repeat finder | See information on the public web site |
| vcftools | A package for dealing with VCF (Variant Call Format) files eg from the 1000 genomes project. It provides methods for validating, merging, comparing and calculating some basic population genetic statistics. | See web site. |
| Velvet | Sequence assembler for very short reads: commands velveth and velvetg | See manual (PDF). The additional program for transcriptome assembly is oases: see manual. |
| usearch | Program for sequence clustering, database search and chimeric sequence detection | See PDF manual |
| ViennaRNA | Package for RNA secondary structure prediction and comparison | See Web site |
| wconsensus | Program to find the consensus in unaligned sequences | Differs from consensus in that it will find the width of the pattern being sought. See online manual |
| Wise 2 | Comparison of DNA and protein sequences, especially DNA at the level of protein translation allowing simultaneous comparison of gene structure with homology based alignment. The two programs that are easily used are genewise and estwise. | Online
documentation at EBI The default format expected is fasta. Note we have more that one version installed. Ask about this if you need to know more. |
| xpehh | Cross Population Extended Haplotype Homozygosity test | See readme file and web site |
| xplor-nih | XPLOR-NIH is a structure determination program which builds on the X-PLOR program, including additional tools developed at the NIH. | See web site. Users must cite C.D. Schwieters, J.J. Kuszewski, N. Tjandra, G.M. Clore, ``The Xplor-NIH NMR Molecular Structure Determination Package,'' J. Magn. Res., 160, 66-74 (2003). in publications |
| Others | There are a number of programs in /biosoft/bin that may not appear to be documented here but belong to one of the packages mentioned here or were installed as supporting programs with a package described here. If in doubt please ask. | For news of changes see Bioinformatics News Page |
- Resources available
- Computer hardware
- Software
- Databases
- Support and training
