Nextflow Configurations

Explanation and examples of helpful Nextflow configurations.

Advanced Pipeline settings can be defined without disabling the Pipeline UI by creating a configuration filearrow-up-right in the Pipeline’s /pipeline folder. This file must be called nextflow.config in order for it to be used by the Pipeline.

Examples

Retry Strategy with a Delay Between Submissions

In the Pipeline Settings menu, a retry error strategy can be set, but when retrying submissions due to transient AWS outages, it can be beneficial to add a delay between job submissions.

The line sleep(Math.pow(2, task.attempt) * 200 as long) implements an exponential backoff strategy, where sleep pauses execution for the specified number of milliseconds. For example, if a task has failed twice already, it will sleep for 200*2^3 = 1600ms.

Pipeline Cost Monitoring

The cost of Pipeline runs are not currently tracked in the Analytics Dashboard of the Admin Panel, but resource labelsarrow-up-right can be added to the nextflow.config file so that Pipeline jobs are tagged and included in the AWS Cost and Usage Report (CUR)arrow-up-right. To view these tags in the CUR, they first need to be activated as shown herearrow-up-right.

process.resourceLabels = ['your-key': 'your-value']

Replace the key and value with a pattern that suits your organization, for example 'your-key' could be a group's name, and 'your-value' could be the name of the Pipeline. This way the group's Pipeline costs will all appear under the same key in the CUR.

Dynamic Compute Resources

Compute resources selected in the Pipeline UI are encoded in the corresponding process in the main.nf. For example, this is the Nextflow code for "Capsule A" that's allocated 1 core and 8 GB of RAM:

When the compute resources depend on the size of data that's being processed, Nextflow's dynamic resources featurearrow-up-right can be used so that the size of the machine scales with demand. In the nextflow.config example code below, the Capsule will run with 1 core and 8 GB of RAM but if it fails with an out of memory error (exit status between 137-140) it will automatically retry with more resources, up to 3 times. For example, if it fails with an out of memory error 3 times, it will retry with 8.GB * task.attempt = 8.GB * 4 = 32 GB RAM and 1 * task.attempt = 1 * 4 = 4 CPUs. If it fails with any other exit code, the 'finish' error strategy takes effect and the pipeline will initiate an orderly shutdown, pending the completion of any submitted jobs.

circle-info

With this configuration, dynamic resources will only apply to "Capsule A" and all other Capsules will use the error strategy set in the Pipeline Settings.

Was this helpful?