Genomics

Bulk Sequencing

LogoTitleDescriptionData inputs

STAR Generate Genome Index capsule

Generates necessary files to run STAR RNA alignment

  • Genome DNA .fasta

  • Genome gene annotation .gtf/.gff

STAR Alignment

RNA-Seq alignment. STAR addresses many of the challenges of RNA-seq data mapping by accounting for spliced alignments. This means that RNA sequences can successfully align to the DNA genome.

  • Short/long read .fastq

  • STAR Index

Salmon Preparing Transcriptome Indices for Mapping-Based Mode

Generates necessary files to run Salmon RNA alignment from genome RNA transcript fasta file and genome DNA genome fasta file.

  • Genome DNA .fasta

  • Transcripts RNA .fasta

Salmon: mapping-based quantification

RNA-Seq quantification. Salmon specifically is designed for speed and is more geared towards quantification of transcripts specifically than precise read alignment.

  • Short/long read .fastq

  • Salmon Index

BWA Generate Genome Index

Generates necessary files to run BWA DNA alignment from a DNA fasta file.

  • Genome DNA .fasta

BWA Mem

BWA is a software package for mapping sequences against a large reference genome, such as the human genome.

  • Short/long read .fastq (designed for short reads)

  • BWA Index

Bowtie2 Generate Genome Index

Generates necessary files to run Bowtie DNA alignment from a DNA fasta file.

  • Genome DNA .fasta

Bowtie2

Bowtie is a software package for mapping sequences against a large reference genome, such as the human genome.

  • Short/long read .fastq (designed for short reads)

  • Bowtie2 Index

Single Cell

LogoTitleDescriptionData Inputs

STAR-Solo Alignment

STAR-Solo analyzes droplet single cell RNA sequencing data for example, 10X Genomics Chromium System. It is intended to be a drop in replacement for CellRanger from 10X

  • Single cell RNA-seq .fastq

  • STAR Index

RShiny Cell

ShinyCell is an R package that allows users to create interactive Shiny-based web applications to visualize single-cell data.

  • Single cell .rds inputs from Seurat (see README)

1-3. Single Cell Analysis Tutorial (Scanpy & Seurat)

Tutorials to describe working with Single Cell data for Scanpy and Seurat:

1. Preprocessing and clustering 3k PBMCs

2. Core Plotting Functions

3. How to preprocess UMI count data with analytic Pearson residuals

  • Tutorial datasets (see README for details)

4. Single Cell Tutorial Seurat to AnnData (Scanpy) tutorial

Tutorial demonstrating an example of how a Seurat object can easily be converted to AnnData (Scanpy).

  • Tutorial datasets (see README for details)

5-6. Single Cell Analysis Tutorial (Scanpy)

Tutorials demonstrating how to regress cell cycle effect and how to simulate data using a literature-curated boolean gene regulatory network.

  • Tutorial datasets (see README for details)

7-10. Single Cell Analysis Tutorial (Scanpy) Advanced

Tutorials for advanced Single Cell processing.

  • Tutorial datasets (see README for details)

Utilities

LogoTitle DescriptionData Inputs

Download data from BaseSpace

Download demultiplexed (fastq.gz) or raw (bcl) Illumina sequencing data through the Illumina BaseSpace CLI. This capsule requires a BaseSpace account and NGS data owned or shared with the user.

  • None

Sambamba Filtering (Duplicates, Multimappers, Unaligned)

Remove optical and PCR duplicates from Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for Picard MarkDuplicates but more performant

  • .bam alignment files.

Sambamba Sort and Index

Sort and Index Illumina data using the software tool Sambamba. Sambamba is intended to be a drop in replacement for samtools but more performant

  • .bam alignment files.

Trim Galore

Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data

  • .fastq files

fastp

A tool designed to provide fast all-in-one preprocessing for FastQ files (adapter trimming, downsampling etc.). This tool is developed in C++ with multithreading supported to afford high performance

  • .fastq files

Other

TitleDescriptionInput Data

MACS PeakCalling

MACS3 is a peak calling tool generally used on ChIP seq data to identify transcript factor binding sites.

  • .bam alignment files

  • compare_sheet.csv (see README)

featureCounts

This capsule will run featureCounts from the Subreads R package to generate an expression matrix.

  • Gene annotation .gtf file

  • .bam alignments

HOMER

Homer contains a useful, all-in-one program for performing peak annotation called annotatePeaks.pl. This capsule uses annotatePeaks.pl to annotate *.bed coordinates with gene features.

  • .bed files containing peaks

  • Genome reference .fasta

  • Gene annotation .gtf file.

Gene Enrichment Analysis (GEA)

This capsule presents a user-friendly Streamlit application designed to facilitate gene enrichment analysis. The analysis results are sourced from reliable and widely-used platforms, namely g-profiler and Panther.

  • File containing gene names

GATK RNAseq short variant discovery (SNPs + Indels)

Based on GATK RNASeq short variant discovery pipeline. Takes in alignments and outputs vcf containing SNPs and indels

  • .bam RNA alignments

Delly somatic complete analysis

Structural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of somatic cells.

  • Genome reference .fasta

  • .bam DNA alignment files

Delly germline complete analysis

Structural variant (SV) prediction to discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data of germline cells.

  • Genome reference .fasta

  • .bam DNA alignment files

ART-Simulation-Illumina

ART is a set of simulation tools to generate synthetic next-generation sequencing reads.

  • .fasta containing the sequence to simulate reads from

PySpark and EMR Serverless

This capsule runs an example PySpark job on EMR Serverless.

  • NOAA Global Surface Summary of Day dataset