The Structure of a Compute Capsule

The Interface of the Compute Capsule

The Code Ocean Computational Workbench is divided into three vertical sections from left to right, files, editor, and timeline.

The user interface of the compute capsule IDE (integrated development environment):

Files

The File tab on the toolbar to the left opens the Files panel. Under Core Files, metadata, environment, code, and folders are organized. Under Research Data Drive, is the scratch folder, and Under Results is the result folder, additional files are organized under Other Files.‌

Editor

The editor is the center panel where you can view and edit information that you select from the Files folders. Results files can be selected from the Reproducibility Panel on the right.‌

Timeline

The timeline is the Reproducibility panel on the right which provides a managed history of the Capsule and prompts commit changes and manage versions.

The Components of the Compute Capsule

When running in the Cloud workstation (CW), you will be able to access the whole file system. An extra folder (CW Root FS) is added to address the property of this folder system.

These are the essential research project components:

FolderPurposeContents

Metadata

Capsule's metadata

metadata.yml

Environment

Capsule's environment files

Dockerfile, postInstall script, environment.yml

Code

Code and related files

Source files

Data

Input data required by reproducible runs (RR). May optionally contain small intermediate data

Small data files and attached datasets

Results

Output files generated by reproducible runs

Result files

Scratch (CW)

Large Intermediate data that NOT required by reproducible runs

Large data files

Scratch (RR)

Large Intermediate data that generated during reproducible runs

Large data files

CW Root FS

Package installations, IDE preferences, temporary files, etc.

The whole file system except the capsule workspace, and those mounted folders

These appear in the Files panel as six folders. You can download and upload content to and from all the folders, except the results and scratch folder.

Follow the articles below to learn more details about the components:

Capsule Limitations, Structure, and Practices

Below are the details of the size limitation, the path of the folder in the system, design concepts, and the underline mechanism for each of the folders.

Size Limitation

The limitation of the capsule workspace is 5GB, this includes the files in Metadata, Environment, Code, and Data folder.

FolderCounted in Capsule Workspace LimitComments

Metadata

Environment

Code

Data

The limit doesn't apply to attached datasets as they are mounted to the capsule

Results

Results is mounted and not counted as capsule's workspace

Scratch (CW) Scratch (RR)

Scratch is mounted and not counted as capsule's workspace

CW Root FS

5GB limit, including existing capsule environment installations

The capsule size limit has been added to the bottom of the Files panel. To see the color-coded breakdown and informational text, hover the mouse cursor over Capsule Limit.

Capsule Limit and Root Limit have also been added in Cloud Workstations.

When building the capsule environment, you have a separate and additional 5GB limit, which includes any packages installed on top of the starter environment (starter environment itself is excluded from this limit) and in the postInstall script.

We recommend installing all the packages you need during the building phase. Please check the Environment Guide for more information.

Path of the Folder and the Accessibility for the Computation

FolderAvailable in Reproducible RunAvailable in Cloud WorkstationPath in Capsule WorkspacePath in ComputationAlternative Path in Computation

Metadata

/metadata

/root/capsule/metadata

-

Environment

/environment

/root/capsule/environment

-

Code

/code

/root/capsule/code

/code

Data

/data

/root/capsule/data

/data

Results

-

/root/capsule/results

/restuls

Scratch (CW)

-

/root/capsule/scratch

/scratch

Scratch (RR)

/scratch

CW Root FS

-

/

-

The scratch (RR) is not the same folder as scratch (CW). The scratch (RR) is emptied before the end of each run, the content will not be visible in the capsule IDE.

Design Concepts and Underline Mechanism (AWS Storage and Persistence)

FolderStoragePersistence

Metadata

EBS (local storage)

Capsule lifetime

Environment

EBS (local storage)

Capsule lifetime

Code

EBS (local storage)

Capsule lifetime

Data

Local data: EBS

Internal dataset: mounted from EFS

External dataset: mounted from S3

Local data: Capsule lifetime

Datasets: has it's own lifetime

Results

During computation: EBS

When the computation finished, upload to S3

Capsule lifetime unless explicitly deleted from the timeline

Scratch

Mounted from EFS

Capsule lifetime

CW Root FS

EBS (local storage)

Cloud Workstation session

The source and the size of the Data might vary depending on the project. As covered in the previous table, there are different ways of bringing data into the capsule in the Data folder. Given the complexity of the project, you may need a place to store the intermediate results and use them as input data for further analysis, the Scratch folder is used for this purpose.

Below is the shareability of each type of dataset and or recommended practice of implementing it in the capsule based on the design logic.

Foldertype of DatasetShareabilityRecommended Usage

Data

local (directly upload to capsule)

Only current capsule

Small or example dataset to test the capsule

Data

Internal Dataset

Across capsule

The dataset will be saved in VPC's AWS storage. Works well with immutable data that only need to import to Code Ocean's VPC once.

Data

External Dataset

Across capsule

The dataset will need an AWS credential to access. Works well with a confidential dataset. Dataset can be changed if the source changed

Scratch (CW)

-

Only current capsule

Access this only in the Cloud Workstation for storing the intermediate large dataset/output file. Usually will be converted into an internal dataset for sharing across the capsule and for downstream analysis

Scratch (RR)

-

Only current run

Temporary storage during reproducible for large data that might exceed the capsule's size limit

If your capsule size is more than 5 GB, it could affect the performance. Please make the data into a dataset and attach it to the capsule to reduce the overall size of the capsule. You can go to the Data Assets Guide for further guidance on creating and using a dataset.

Stop Computation from Capsule Dashboard

Note: A Cloud Workstation can be shut down, or put on hold from the capsule Dashboard.

Last updated