Advanced Details (Technical Structure)

How do Pipelines Work?

Understanding the programmatic configuration of pipelines can be helpful in understanding how to string capsules and data assets together. As the pipelines get more complicated, it is essential to know how to set up each capsule correctly.

Similar to how the Code Ocean Environment Editor synchronously configures a Dockerfile, the Visual Pipeline Editor synchronously configures a Nextflow file. Below is an example of part of a Nextflow file for a pipeline consisting of two capsules named RSEM and MultiQC:

The Nextflow file displays the actions that are executed to run each capsule. Datasets are transferred via created Nextflow channels. The git repository corresponding to each capsule is cloned and the reproducible run script is executed. Each run script is run in its own job with AWS batch. The ability to run all data in one job and separate jobs in parallel can be controlled via the Global Toggle. On - all data will be transferred at once. Off - one subdirectory/one file will be transferred at a time.

The Nextflow file can be unlocked and customized manually, however, this is not recommended as it will permanently disable the Pipeline Editor.

The running of the capsule via git cloning is a key point for pipelines, as it is clear which files are available. Only files that are in the core file tree system and are tracked by git in a capsule can be referenced/executed in a pipeline. This requires good organization in your capsule, so best practices are key. Furthermore, if you would like your most recent changes in a capsule to be used, they must be committed via the timeline.

All Code Ocean best practices must be followed to ensure for easy, runnable pipelines

PreviousHow-to-Guide: Bioinformatics Tools NextReleased Pipeline

Last updated 1 year ago