Table 2 Most used open-source software and reference databases in genomic, transcriptomic, and metagenomic studies

Category Task Name Aims and Scope Usage Reference
Software and pipelines Reads pre-processing FastQC Quality check and report of NGS data GM
cutadapt Adapter trimming algorithm GM
FASTX-toolkit Toolset for manipulation of sequence data and format conversion GM
Assembly (META) VELVET/OASES De novo genomic/transcriptomic assembly based on the de Brujin graph GM
SOAP DE NOVO De novo short-read assembler based on the de Brujin graph G
TRINITY De novo assembly of RNA-seq data G
Gene prediction/annotation Ensembl genome annotation Gene annotation pipeline G
Infernal RNA secondary structure prediction based on reference multiple sequence alignments G
(Meta) Genemark Gene prediction with unsupervised and semi-supervised training GM
(Meta) Genomethreader Gene prediction by similarity with cDNA/EST and/or protein sequences GM
NCBI genome annotation Genome annotation pipeline released by NCBI G
tRNAscan-SE tRNA gene prediction G
Repeat masker Similarity-based detection of DNA interspersed repeats and low complexity sequences G
Star RNA-seq to genome aligner G
Tophat/cufflinks RNA-seq to genome aligner and quantification tools G
Mothur Tools and software for 16S data clustering, classification, and ecological inference M
Qiime Customizable pipeline for marker-gene-based metagenomics M
RDPipeline RDP-based web interface for bacterial and fungal ribosomal marker gene analysis M
Galaxy Web-based platform of general purposes GM
transPLANT e-infrastructure for exploring genomic data from crop and model plants G
Megan Stand-alone blast output parser and mining tool for phylogenetic and functional assignment based on the lowest common ancestor algorithm M
Metamos Customizable pipeline for shotgun data assembly and analysis M
(Mg-)Rast Fully automated online server for analyses of shotgun data GaM
Metabel Software for meta-analysis of genome-wide SNP association G
Metal Tool for mining variation data and perform association studies G
Plink Tools for managing genomic variation data GM
SVS Genomic and phenotypic data analysis and visualization G
Tassel Tools and pipelines for genome variation studies G
VcfTools Tools for genome comparisons and mining plant variation data GM
Reference Databases General Genomes online database Metadata repository for genome and metagenome sequencing projects GM
JGI Phytozome Plant Comparative Genomics at the Joint Genome Institute G
INSDC DDBJ, EMBL-EBI, and NCBI, common repository GM
PLANTGDB Unified plant genomic database G
RDP/Silva/Greengenes Repositories of ribosomal RNA genes GM
KEGG Integrated resources for functional annotation of genes GM
COG Clusters of ortholog groups GM
SEED Integrated resources for functional annotated microbial genes GaM
RFAM RNA families collection G
DFAM Repetitive DNA elements collection G
UNIPROT Database of functional annotated protein sequences G
  1. G use in genomics and transcriptomics, M use in metagenomics
  2. aDedicated to microbial genomes