Skip to main content

Table 2 Most used open-source software and reference databases in genomic, transcriptomic, and metagenomic studies

From: Bioinformatics for agriculture in the Next-Generation sequencing era

Category Task Name Aims and Scope Usage Reference
Software and pipelines Reads pre-processing FastQC Quality check and report of NGS data GM http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
cutadapt Adapter trimming algorithm GM [95]
FASTX-toolkit Toolset for manipulation of sequence data and format conversion GM http://hannonlab.cshl.edu/fastx_toolkit/index.html/
Assembly (META) VELVET/OASES De novo genomic/transcriptomic assembly based on the de Brujin graph GM [96, 97]
SOAP DE NOVO De novo short-read assembler based on the de Brujin graph G [98]
TRINITY De novo assembly of RNA-seq data G [99]
Gene prediction/annotation Ensembl genome annotation Gene annotation pipeline G http://www.ensembl.org/info/genome/genebuild/genome_annotation.html/
Infernal RNA secondary structure prediction based on reference multiple sequence alignments G [100]
(Meta) Genemark Gene prediction with unsupervised and semi-supervised training GM [101]
(Meta) Genomethreader Gene prediction by similarity with cDNA/EST and/or protein sequences GM [102]
NCBI genome annotation Genome annotation pipeline released by NCBI G http://www.ncbi.nlm.nih.gov/books/NBK169439/
tRNAscan-SE tRNA gene prediction G [103]
Repeat masker Similarity-based detection of DNA interspersed repeats and low complexity sequences G http://www.repeatmasker.org/
Mapping Star RNA-seq to genome aligner G [104]
Tophat/cufflinks RNA-seq to genome aligner and quantification tools G [105]
Marker-based metagenome Mothur Tools and software for 16S data clustering, classification, and ecological inference M [106]
Qiime Customizable pipeline for marker-gene-based metagenomics M [107]
RDPipeline RDP-based web interface for bacterial and fungal ribosomal marker gene analysis M [108]
Mixed Galaxy Web-based platform of general purposes GM [109]
transPLANT e-infrastructure for exploring genomic data from crop and model plants G http://www.transplantdb.eu/
Shotgun metagenome Megan Stand-alone blast output parser and mining tool for phylogenetic and functional assignment based on the lowest common ancestor algorithm M [110]
Metamos Customizable pipeline for shotgun data assembly and analysis M [111]
(Mg-)Rast Fully automated online server for analyses of shotgun data GaM [112]
Population genomic Metabel Software for meta-analysis of genome-wide SNP association G [113]
Metal Tool for mining variation data and perform association studies G [114]
Plink Tools for managing genomic variation data GM [115]
SVS Genomic and phenotypic data analysis and visualization G http://www.goldenhelix.com
Tassel Tools and pipelines for genome variation studies G [116]
VcfTools Tools for genome comparisons and mining plant variation data GM [117]
Reference Databases General Genomes online database Metadata repository for genome and metagenome sequencing projects GM https://gold.jgi.doe.gov/
JGI Phytozome Plant Comparative Genomics at the Joint Genome Institute G http://phytozome.jgi.doe.gov/pz/portal.html
INSDC DDBJ, EMBL-EBI, and NCBI, common repository GM http://www.insdc.org/
PLANTGDB Unified plant genomic database G http://pgdbj.jp/
Taxonomic annotation RDP/Silva/Greengenes Repositories of ribosomal RNA genes GM [118120]
Functional annotation KEGG Integrated resources for functional annotation of genes GM [121]
COG Clusters of ortholog groups GM [122]
SEED Integrated resources for functional annotated microbial genes GaM [123]
RFAM RNA families collection G [124]
DFAM Repetitive DNA elements collection G [125]
UNIPROT Database of functional annotated protein sequences G http://www.uniprot.org/
  1. G use in genomics and transcriptomics, M use in metagenomics
  2. aDedicated to microbial genomes