Bioinformatics and Computational Biology Services

Software available

The following table lists software packages that are available. If there is something not on this list that you would like us to have, or you need more help with running a program, or you notice a newer version of software that I ought to install please send me an email. Most general software is not listed here (except R because it has many bioinformatics packages installed). You will find perl, python, ruby, standard compilers and a wide range of other applications are available but you can always ask if you need more or find we have an outofdate copy. (In some cases you may even need an older copy; this can happen with, eg, python.) Some people run software not listed here: both their own code and software they have obtained elsewhere.

.
Package Description Further information
ABySSA de novo short read assembler for small or large genomesSee web site.
analysisPackage for evolutionary genetic analysis See web site
ancestrymapScreens through the genome in a recently mixed population such as African Americans, searching for segments with increased ancestry from one of the ancestral populations, which can indicate the position of disease genes See web site
aquaAQUA: automatic quality improvement for multiple sequence alignment started with command aqua.See text help but note local command is aqua.
Artemis For viewing and analysing (eg with blast) DNA/protein sequences and feature tables. Start with command art . The Artemis Comparison Tool is started with act. Both require X-Windows. See the web site.
To customize artemis copy the file /biosoft/artemis/etc/options to your directory and edit it. Or ask for help. The most recent version has not been set up for blast etc.
AugustusA program, and associated scripts, to predict genes in eukaryotic genomic sequences.There are various ways to perform training using BLAT, PASA and SCIPIO and the authors may also assist you with this. See web site.
BabelOpen Babel succeeds Babel: Program and library to interconvert between many file formats used in structural studies. Includes babel, obfit, obgrep and obrotateSee man babel and web site. Note the version installed is that available with the operating system but newer versions may be installed if required.
bamtoolsUtility for working on BAM files analogous to samtools (see below) thus avoiding need to store and work on larger text SAM files where possible.See paper
bayenvBayesian method using environmental correlations to identify loci underlying local adaptationSee web site
BEASTBEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences See the web site. NB BEAST can start from a defined usertree (cf Mr Bayes). Parallel BEAST is available. Example command-line: nice +10 beast -working -beagle -beagle_instances 8 test.xml >& log. There is also now the BEAST2 package with commands set up as beast2 etc. This can restart a chain and has its own web site.
BESTBayesian Estimation of Species Trees works with Mr Bayes (see below).See web site.
BioconductorA large collection of packages used within R for carrying out very many functions in the analysis and comprehension of high-throughput genomic data.See web site. There are also many non-Bioconductor R packages not listed in this table. Recently not all new Bioconductor packages have been installed. If you would like packages installed for you centrally please ask. Some people also have their own R packages installed locally (you define an R user library path).
biogrepOptimised version of grep for matching patterns against sequences.See web site.
BioPerlSet of modules to help write bioinformatics Perl scripts, with some functional scripts as wellSee web site
BiopiecesSet of command-line tools that can be put together to create pipelines.See website.
BioPythonAnalogous to BioPerl. Also we have biopy, numpy and scipyNote that on some servers there is more than one version of Python and also BioPython as versions with the OS may not be the newest. Ask for advice if required or if you need more Python modules/scripts installed. See website for general information.
BLAST Sequence database searching package available in several versions. The new BLAST+ package uses separate names blastp, blastn etc. The previous NBCI names blastall, blastpgp, bl2seq etc are still available alongside. Wu-BLAST is also available. The shared data are in /data4/blast or we can set up your data. BLASTDB is defined for [t]csh users. The new BLAST+ software is documented separately (PDF) from the old. You can use concatenated queries in FASTA format.
blatBlast-like Alignment Tool to perform rapid mRNA/DNA and cross-species protein alignments.See FAQ on UCSC web site.
BowtieBowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. The new version has programs with a 2 in the name eg bowtie2See version 1 manual and version 2 beta manual. The myrna pipeline is also available.
boxshadeMakes shaded multiple alignment filesSee text documentation or type man boxshade
bwaBurrows-Wheeler Aligner (BWA) is a program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.See web manual page
CAF toolsSuite of programs for manipulating CAF (Common Assembly Format) sequence assembliesSee web site
cap3Sequence assembly programSee manual; see also gap4/gap5 in Staden package
CEGMAPipeline that identifies a core set of eukaryotic genes. It uses WU-BLAST, HMMER 2, geneid and Wise2. This may help check the completeness of a genome assembly. It could be used to train eg snap. May be used in conjunction with Maker. Local wrapper command run_cegma takes same arguments as cegmaSee the text manual and public web site
CLC Bio Genomics WorkbenchA comprehensive package for analyzing next generation sequencing dataSee website. Restricted availability under individual licences. Please email me.
clumpProgram using Monte Carlo method for assessing significance of case-control association studies with multi-allelic markers. Can be used with HTcondorSee text documentation and example submit file for use with 10 input files in.0 to in.9 See also the HTcondor guide.
clumppA program that deals with label switching and multimodality problems in population-genetic cluster analyses. Command clumppSee PDF manual
clustal Multiple sequence alignment; clustalw2 is the current version. clustalx2 is an X-Windows version. clustalo is special new fast version for proteins only. See On-line Clustalw Manual, Clustalx manual [both slightly old: see also help inside the programs] and clustal omega manual. For further information see the web site . There is also on-line help inside the programs. It is also available using HTcondor. See my local guide, which includes a clustalw example.
CNSCNS (Crystallography and NMR System) is a program for macromolecular structure determinationCNS has been implemented using HTcondor. See the CNS web site
cnvnatorA program for CNV discovery and genotyping from depth of read mappingSee website
consed A viewer for files made by phrap and an editor for these assemblies. Requires X-windows. See text documentation. This software requires files to be edited before use so get in touch if you need to use it.
consensusProgram to find the consensus in unaligned sequencesSee online manual. See also wconsensus.
chromopainter and finestructureFinding haplotypes in sequence data and identifying population structure using dense sequencing data.See web site
CufflinksCufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.See brief text manual or full manual on web site
densitreeProgram for qualitative analysis of sets of trees.See PDF Manual
DL_POLYA package for molecular dynamics simulationsThis package was implemented on the cluster using mpi. See the web site for general information. It is likely you will want to use the HPC service for DL_POLY to get better performance as it is suited to highly parallel systems.
dotterA graphical dotplot program for detailed comparison of two sequences; Requires X-WindowsSee web site
edenaEdena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Previous version now edena2See PDF manual (version 3), version 2 manual and web site
eigensoftPrograms related to ancestrymap (see above) from the Reich Lab for studying human history, evolution and disease gene mappingSee eigenstrat readme, popgen readme, convertf readme and web site
EMBOSS The European Molecular Biology Open Software Suite version 6.3.1 is available. Type emboss6 to over-ride use of version 6.1.0 installed with the operating system. You use the web interface EMBOSS Explorer if you ask for Raven access. See On-line program manual for EMBOSS
FARFlexible adapter remover software using Needleman Wunsch alignment. Works on many formats of sequencing reads. Command farSourceforge web site has vanished.
FASTA (including SSEARCH etc) Sequence database searching package. Slower than blast. May be most useful for some protein searches. You must specify which program you need (fasta, fastx, tfastx etc). See the fasta manual. SSEARCH is a Smith-Waterman search within the package. The raw binaries in the current version are fasta36 etc.
fastphaseSoftware for haplotype reconstruction, and estimating missing genotypes from population data See PDF manual and web site. See also phase below.
FASTX-ToolkitA collection of tools for short reads fasta/fastq file preprocessing.See web site
frappeA program for estimating individual ancestry and admixture proportions using high-density SNP dataSee PDF manual and website
FuguePackage for recognizing distant homologues of proteins by sequence-structure comparison. Initialised with command fugue See web site for more information and tutorial. If that server is down you can consult fugue tutorial, fugue output interpretation, fugue command line use locally.
GenehunterProgram to do multipoint linkage analysis; command ghSee postscript documentation or web site
GlimmerTakes a sequence and a set of Markov models for genes and outputs a list of ORFS. See glimmer, build-icm and long-orfs text documentation
GMTGenerics Mapping Tools package for manipulating cartesian datasetsSee website
GoMinerGoMiner is a tool for biological interpretation of omic data including data from gene expression microarrays. It is run in X-Windows with the command gominer.gominer can be set up with a local copy of the GO database, which makes it much faster. To use this, you must specify the database in the menu item File>Load GO terms as jdbc:mysql://localhost/go and the username and password are both access. There is also a high throughput version available if anyone is interested. The web site has more information including a PowerPoint tour of the main features
GossamerNew de novo short read assembler using less RAM. Command goss or gossple.sh for front-end script.See manual and manual for script
gromacsA molecular dynamics simulation packageSee the web site. This has been implemented on CamGrid.
hmmer Sequence analysis with profile Hidden Markov Models. Can search our BLAST format databases. For version 3 see the PDF manual. For version 2 see man hmmer. An example of how to run many hmmpfam searches using HTcondor is given: the submit file and shell script are used with a file containing the tarred gzipped sequences.
hyphyA scriptable package for evolutionary modelling.See web site. An MP and an MPI version have been built. Local expert is Simon Frost.
IGVigv: integrative genomics viewer and igvtoolsSee web site
ihsIntegrated Haplotype Score testSee readme file and web site
im, ima and ima2Programs for population genetic analysisSee web site
inGAPIntegrated Next-gen Genome Analysis Platform run with /biosoft/inGAP_linux64/inGAP See online manual
instructAn alternative to structure (see below)See PDF manual. Available also as an HTcondor compiled binary.
iprscanInterface to Interpro database (see below). Can be used for one or more types of search at a time eg transmembrane protein, signal protein, Pfam, SMART.See iprscan readme file and FAQs
IQPNNI(Important Quartet Puzzling and NNI Operation) program to reconstruct a phylogenetic tree from DNA or amino acid sequence data.See the manual. This has been set up to use HTcondor. Here is an example HTcondor submit script for the sequential version.
Jalview Multiple sequence alignment editor; can read MSF, CLUSTAL, FASTA, BLC,MSP and PIR formats. Requires X-Windows. See web site
jmodeltestA phylogeny program for selecting the model of nucleotide substitution that best fits the dataSee PDF guide. Supersedes modeltest.
joyProgram to annotate protein sequence alignments with 3D structural featuresSee web site
Linkage analysis A range of linkage analysis software started with the command linkage previously is now installed on demand See package details
lucyProgram for cleaning sequence dataSee web site
mafftMultiple alignment program See web site and NAR paper
MakerGenome annotation pipeline with command-line and web interfaces. This uses, or may use, other software including apollo, augustus, exonerate, gbrowse and snap that are not all documented elsewhere on this page at present.We are now using the command-line interface for preference. Wrapper run_maker or run_maker_mpi, See also the public web site.
Mapmaker/sibsSoftware package which allows very rapid multipoint mapping of loci in nuclear pedigrees with two or more affected/phenotyped sibs. Command sibsSee Postscript documentation
maqMaq stands for Mapping and Assembly with Quality. It builds assembly by mapping short reads to reference sequences. See also BWA and Bowtie. See web site
MATLABWell-known package for technical computing.See web site
mega2A data-handling program for facilitating genetic linkage and association analysesSee web site
merlinLinkage analysis program using sparse trees to represent gene flow in pedigrees; one of the fastest pedigree analysis packagesSee Text documentation or web site with tutorial
MetaVelvetModified version of Velvet (see below) for metagenomics.See website
miraWhole Genome Shotgun and EST sequence assemblerSee web site
mitoprotPrediction of mitochondrial targeting sequences See web site
MOCATPackage for analyzing metagenomics datasetsSee website
modellerProtein structure modelling programSee online manual
modeltestA phylogeny program for selecting the model of nucleotide substitution that best fits the dataSee PDF guide. See also jmodeltest.
MolphyPhylogeny package No documentation. See the options when you type: protml, protdst, nucml, nucst, njdist, totalml. See also Phylip and PAUP
molscriptMolecular graphics program See the on-line manual.
MrBayesProgram for the Bayesian estimation of phylogeny. See PDF manual or PDF command reference manual. You can run parallel jobs most easily locally on the server. Start mpd at nice +10 then run with mpirun -np 4 /biosoft/bin/mb [input file]. example MrBayes batch file to use with this (ie as file called mb1.txt). You can restart a job using the checkpoint file with Mr Bayes 3.2
mrcanavarCopy number caller that analyzes whole-genome NGS mapping read depth to discover large segmental duplications and deletions.Also mrfast and mrsfast. See website
MSPCrunchA Blast enhancement filter, used with blixem and blxSee blixem
mummerA system for rapidly aligning entire genomesSee man mummer. For further information see the web site.
muscleProgram to do multiple alignment (MUSCLE stands for multiple sequence comparison by log-expectation.See web site
naccessProgram to calculate atomic solvent accessible areas of proteinsUsers MUST cite:
Hubbard,S.J.& Thornton, J.M. (1993), 'NACCESS', Computer Program, Department of Biochemistry and Molecular Biology, University College London." in any publications. See text documentation (command naccess is already defined for you).
nmicaPattern discovery system aimed at finding transcription factor binding sites and similar motifs.See the PDF manual.
NovocraftPackage for aligning short reads to reference genomesSee PDF manual
OctaveGNU Octave is a high level language for numerical calculations and is mostly compatible with Matlab.This had been installed so it could be used on CamGrid but NB MatLab is now available under the University site licence and can be used on servers with a suitable licence. See web site
OligoarraySoftware that computes gene specific oligonucleotides for genome-scale oligonucleotide microarray construction See website. The software is in /biosoft/OligoArray2_1/
PAMLPAML stands for Phylogenetic Analysis by Maximum LikelihoodSee PAML User Guide, MCMCTree Guide and FAQs. An example submit file to use this with HTcondor is here
Pathway ToolsPathway Tools is a comprehensive symbolic systems biology software system available via two interfacesSee Web site
patserProgram to score the words of a sequence against an alignment matrixSee text documentation.
PAUPSoftware package for inference of evolutionary trees (command paup)See Quick Start Guide (PDF) and Command Reference Manual (PDF).
pedcheckProgram for detecting marker typing incompatibilities in pedigree dataSee web site
phaseA Bayesian statistical method for reconstructing haplotypes from population genotype dataSee PDF manual and web site. See also fastphase above
phrap An assembly program for shotgun DNA sequences. Comes with crossmatch and swat sequence comparison programs. See phrap and general text documents
phred Interprets sequence traces and assesses their quality, outputting to fasta or other formatted files. Command phred See text documentation
Phylip Phylogenetic analysis package with many programs On-line Phylip Manual
It can also be used with HTcondor. See my local guide, which includes a Phylip example.
phymlSoftware implementing a method for building phylogenies from DNA and protein sequences using maximum likelihood. Command phymlSee Online PDF manual. You can run this with HTcondor. See guide. Here is an example submit file.
PhylobayesA Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction using protein alignments. Compared to other phylogenetic MCMC samplers (e.g. MrBayes), the main distinguishing feature of PhyloBayes is the underlying probabilistic model, CAT. CAT is a mixture model especially devised to account for site-specific features of protein evolution. It is particularly well suited for large multigene alignments, such as those used in phylogenomics. See manual. Here is an example submit script for running pb on CamGrid. You can make others for running the other programs.
plinkA whole genome association analysis toolset See web site
Picard toolsJava-based tools for manipulating SAM files.See web site. The jar files are in /biosoft/src/picard-1.92 (or latest version)
PolyPhredPackage for identifiying SNPS. Used with phred, phrap and consed; commands polyphred and sudophred. See polyphred manual
primer3_corePrimer design program See text documentation or man primer
probconsProgram for multiple alignment of protein sequencesSee PDF manual
ProtTestProgram to select the best-fit model of protein evolution (sibling program to jmodeltest for DNA). Command runProtTest for commandline or runXProtTest for XWindows.See Version 2.4 PDF manual. Prottest 3.0 beta available: ask if required.
psipredProtein secondary structure prediction using command runpsipredSee web site
pymolMolecular visualization programSee web site
qsraQuality Value Guided Short Read AssemblerSee web site
quicktreeProgram for the rapid reconstruction of phylogenies by the Neighbor-Joining methodSee web site
RThe R language and a large number of statistical genetics packages that run in R are available.See local web documents and links therein. Please ask if you need more packages installed.
rasmol Molecular viewing program also available for a range of other platforms. See the online-manual.
Raster3D Molecular graphics package See PDF and HTML manuals
RAxML(Randomized accelerated Maximum Likelihood) program for sequential and parallel Maximum Likelihood based inference of large phylogenetic treesSee PDF Manual and web site. raxmlHPC-PTHREADS-SSE3 (version 7.2.8) is current latest installed. raxmlLight-PTHREADS is the Light version, which has its own manual.
RepeatMaskerA program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. Installed in /biosoft/RepeatMasker.This version uses rmblast. See web site
RepeatScoutSoftware to identify repeat family sequences from genomes. Can use output as input to RepeatMasker.See text manual
ROADTRIPSProgram that performs single-SNP, case-control association testing in samples with partially or completely unknown population and pedigree structure; command roadtrips See PDF manual
Roche 454 softwareA package for analyzing various types of data from Roche 454 sequencers eg as obtained from the Department of Biochemistry DNA Sequencing Facility. Includes the newbler assembler.Available to those with suitable data. Please email me.
SAM toolsSAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.See project web site.
ScilabA package similar to MATLAB for numerical computationSee web site and MATLAB above.
seaviewMuliple alignment editor; requires X-WindowsSee seaview web site
sff2fastqConverts bewteen Roche 454 and fastq formatsSee web site for details.
sff_extractExtracts reads from 454 sff files and stores in fasta xml or caf text files.See web site for details of usage.
ShogunA large-scale machine learning toolboxThis has been installed for use on CamGrid. See web site
sickleA windowed adaptive trimming tool for FASTQ files using qualitySee website
signalpPredicts the presence and location of signal peptide cleavage sites in amino acids from different organisms.See the public web site
simwalk2A statistical genetics application for haplotype, parametric or non-parameric linkage, identity by descent and mustyping analysesSee web site.
smoldynProgram to perform cell-scale simulation, implemented on CamGrid.See web site. Model submit files are available.
splinkProgram for linkage analysis using affected sib pairsSee text documentation
SOAPdenovoShort read assembler that can build a de novo draft assembly for human-sized genomes. It is specially designed to assemble Illumnia GA short reads.See web site.
SRA toolkitSet of programs for converting NCBI SRA archive format into various other formats.See web site
ssaha2Package for mapping DNA sequencing reads onto a genomic reference sequenceSee PDF Manual
stacksSoftware pipeline for building loci out of a set of short-read sequenced samples.See web site.
Staden Used mainly for sequencing projects but also has other functions. Initialize with the command staden_new for the current X-Windows version inclduing gap5. staden sets up the previous version. See online Staden manual Note (above) that spin can be used as an interface to EMBOSS under X-Windows.
stampystampy is a package for mapping short Illumina reads onto a reference genome. It can bs used for genomic resequencing, RNA-Seq and chip-seq. It is good for mapping with reads containing sequence variations relative to the reference, eg insertions or deletions, including highly divergent species.See manual and web site.
stratCompanion software for structure (below). See web site.
structureProgram that implements a model-based clustering method for inferring population structure using genotype data consisting of unlinked markersSee PDF manual. This is a prototype HTcondor submit file. An automated submission system for HTcondor can be used with the command run_structure_condor A version that uses extraparams is run_structure_condor_e. run_structure_condor_vm lets you use a different mainparams file name. See also chromopainter/finestructure.
surfnetA program that generates surfaces and void regions between surfaces from coordinate data supplied in a PDB fileUsers MUST cite "Laskowski R A (1995). SURFNET: A program for visualizing molecular surfaces, cavities and intermolecular interestions. J. Mol. Graph., 13, 323-330." in any publications. See manual and web site
T-CoffeeMultiple alignment package including t_coffee and other programs, also using many other packages.See T-Coffee web site
tachyonExtremely fast ray tracingSee web site
tmhmmProgram for the prediction of transmembrane regions in proteinsSee manual
tophatA fast splice junction mapper for RNA-seq reads that uses bowtie (see above)See web site
tpatternsSearch nucleic acid databases with a [protein] pattern in all six reading frames; analogous to GCG findpatterns. See man tpatterns
transmitProgram for transmission disequilibrium testingSee text documentation
Tree-PuzzleProgram to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood; command puzzle See PDF manual. To run the parallel version use the command
mpirun -np 4 /biosoft/bin/ppuzzle. Here is an example submit script for HTcondor using the parallel version and one using a condor_compiled binary instead of parallel. The latter is more flexible. A sample input file shows the type of syntax to use to toggle options.
trfTandem repeat finderSee information on the public web site
vcftoolsA package for dealing with VCF (Variant Call Format) files eg from the 1000 genomes project. It provides methods for validating, merging, comparing and calculating some basic population genetic statistics.See web site.
VelvetSequence assembler for very short reads: commands velveth and velvetgSee manual (PDF). The additional program for transcriptome assembly is oases: see manual.
usearchProgram for sequence clustering, database search and chimeric sequence detectionSee PDF manual
ViennaRNAPackage for RNA secondary structure prediction and comparisonSee Web site
wconsensusProgram to find the consensus in unaligned sequencesDiffers from consensus in that it will find the width of the pattern being sought. See online manual
Wise 2 Comparison of DNA and protein sequences, especially DNA at the level of protein translation allowing simultaneous comparison of gene structure with homology based alignment. The two programs that are easily used are genewise and estwise. Online documentation at EBI
The default format expected is fasta. Note we have more that one version installed. Ask about this if you need to know more.
xpehhCross Population Extended Haplotype Homozygosity test See readme file and web site
xplor-nihXPLOR-NIH is a structure determination program which builds on the X-PLOR program, including additional tools developed at the NIH. See web site. Users must cite C.D. Schwieters, J.J. Kuszewski, N. Tjandra, G.M. Clore, ``The Xplor-NIH NMR Molecular Structure Determination Package,'' J. Magn. Res., 160, 66-74 (2003). in publications
Others There are a number of programs in /biosoft/bin that may not appear to be documented here but belong to one of the packages mentioned here or were installed as supporting programs with a package described here. If in doubt please ask. For news of changes see Bioinformatics News Page