Skip to main content

Table 2 Most used open-source software and reference databases in genomic, transcriptomic, and metagenomic studies

From: Bioinformatics for agriculture in the Next-Generation sequencing era

Category

Task

Name

Aims and Scope

Usage

Reference

Software and pipelines

Reads pre-processing

FastQC

Quality check and report of NGS data

GM

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

cutadapt

Adapter trimming algorithm

GM

[95]

FASTX-toolkit

Toolset for manipulation of sequence data and format conversion

GM

http://hannonlab.cshl.edu/fastx_toolkit/index.html/

Assembly

(META) VELVET/OASES

De novo genomic/transcriptomic assembly based on the de Brujin graph

GM

[96, 97]

SOAP DE NOVO

De novo short-read assembler based on the de Brujin graph

G

[98]

TRINITY

De novo assembly of RNA-seq data

G

[99]

Gene prediction/annotation

Ensembl genome annotation

Gene annotation pipeline

G

http://www.ensembl.org/info/genome/genebuild/genome_annotation.html/

Infernal

RNA secondary structure prediction based on reference multiple sequence alignments

G

[100]

(Meta) Genemark

Gene prediction with unsupervised and semi-supervised training

GM

[101]

(Meta) Genomethreader

Gene prediction by similarity with cDNA/EST and/or protein sequences

GM

[102]

NCBI genome annotation

Genome annotation pipeline released by NCBI

G

http://www.ncbi.nlm.nih.gov/books/NBK169439/

tRNAscan-SE

tRNA gene prediction

G

[103]

Repeat masker

Similarity-based detection of DNA interspersed repeats and low complexity sequences

G

http://www.repeatmasker.org/

Mapping

Star

RNA-seq to genome aligner

G

[104]

Tophat/cufflinks

RNA-seq to genome aligner and quantification tools

G

[105]

Marker-based metagenome

Mothur

Tools and software for 16S data clustering, classification, and ecological inference

M

[106]

Qiime

Customizable pipeline for marker-gene-based metagenomics

M

[107]

RDPipeline

RDP-based web interface for bacterial and fungal ribosomal marker gene analysis

M

[108]

Mixed

Galaxy

Web-based platform of general purposes

GM

[109]

transPLANT

e-infrastructure for exploring genomic data from crop and model plants

G

http://www.transplantdb.eu/

Shotgun metagenome

Megan

Stand-alone blast output parser and mining tool for phylogenetic and functional assignment based on the lowest common ancestor algorithm

M

[110]

Metamos

Customizable pipeline for shotgun data assembly and analysis

M

[111]

(Mg-)Rast

Fully automated online server for analyses of shotgun data

GaM

[112]

Population genomic

Metabel

Software for meta-analysis of genome-wide SNP association

G

[113]

Metal

Tool for mining variation data and perform association studies

G

[114]

Plink

Tools for managing genomic variation data

GM

[115]

SVS

Genomic and phenotypic data analysis and visualization

G

http://www.goldenhelix.com

Tassel

Tools and pipelines for genome variation studies

G

[116]

VcfTools

Tools for genome comparisons and mining plant variation data

GM

[117]

Reference Databases

General

Genomes online database

Metadata repository for genome and metagenome sequencing projects

GM

https://gold.jgi.doe.gov/

JGI Phytozome

Plant Comparative Genomics at the Joint Genome Institute

G

http://phytozome.jgi.doe.gov/pz/portal.html

INSDC

DDBJ, EMBL-EBI, and NCBI, common repository

GM

http://www.insdc.org/

PLANTGDB

Unified plant genomic database

G

http://pgdbj.jp/

Taxonomic annotation

RDP/Silva/Greengenes

Repositories of ribosomal RNA genes

GM

[118–120]

Functional annotation

KEGG

Integrated resources for functional annotation of genes

GM

[121]

COG

Clusters of ortholog groups

GM

[122]

SEED

Integrated resources for functional annotated microbial genes

GaM

[123]

RFAM

RNA families collection

G

[124]

DFAM

Repetitive DNA elements collection

G

[125]

UNIPROT

Database of functional annotated protein sequences

G

http://www.uniprot.org/

  1. G use in genomics and transcriptomics, M use in metagenomics
  2. aDedicated to microbial genomes