Environment Variables

Code Ocean specific environment variables are available during Reproducible Runs and Cloud Workstation Sessions to reduce coding complexity.

Code Ocean Specific Environment Variables

CO_CPUS: the number of available CPU cores.

CO_MEMORY: the available RAM in bytes.

CO_COMPUTATION_ID: the unique identifier of the current computation.

CO_CAPSULE_ID: the capsule's unique identifier, also available in the metadata page.

CO_PIPELINE_ID: the pipeline's unique identifier, also available in the metadata page. Only present in a pipeline reproducible run.

GIT_ACCESS_TOKEN: only defined if a user has entered their Git credentials in the Account page.

Use in Pipelines

Each capsule in a pipeline will have its own CO_CPUS, CO_MEMORY, and CO_CAPSULE_ID variables but they will all share the same CO_PIPELINE_ID and CO_COMPUTATION_ID.

CO_CPUS and CO_MEMORY variables should always be used over other approaches when the CPU and memory count is used by a capsule. Since Code Ocean pipelines run on AWS Batch, manually calculating the CPU and memory count can be inaccurate due to discrepancies between the resources a job has been allocated and the resources of the machine the job is running on.

Examples

Passing the CPU Count to FastQC

FastQC is a command line tool for quality control of sequencing reads. The following code shows how CO_CPUS can be used as an argument to accelerate the computation by leveraging all available CPUs.

#!/usr/bin/env bash
for file in $(find -L ../data -name "*.fastq*"); do
    echo "Running FastQC on $(basename -a $file)"
    fastqc -t $CO_CPUS --outdir ../results $file
done

Maintaining Traceability When Transferring Data to External Locations

If the result of a capsule or pipeline is transferred outside of Code Ocean, the CO_CAPSULE_ID/CO_PIPELINE_ID and CO_COMPUTATION_ID variables can be used to help maintain traceability. The following code shows how these variables can be saved to a text file and transferred to the same external S3 bucket where the results are saved. Saving these unique identifiers ensures reproducibility by recording the exact capsule and computation that produced the result.

#!/usr/bin/env bash

#Transfer results to S3
aws s3 sync ../results/ s3://${S3_BUCKET_NAME}/path/to/results/

#Save unique identifiers to the same location
echo "Capsule ID: $CO_CAPSULE_ID" >> ../results/traceability.txt
echo "Computation ID: $CO_COMPUTATION_ID" >> ../results/traceability.txt

aws s3 cp ../results/traceability.txt s3://${S3_BUCKET_NAME}/path/to/results/