The Structure of a Compute Capsule
The Interface of the Compute Capsule
The Code Ocean Computational Workbench is divided into three vertical sections from left to right, files, editor, and timeline.
The user interface of the compute capsule IDE (integrated development environment):
Files
The File tab on the toolbar to the left opens the Files panel. Under Core Files, metadata, environment, code, and folders are organized. Under Research Data Drive, is the scratch folder, and Under Results is the result folder, additional files are organized under Other Files.
Editor
The editor is the center panel where you can view and edit information that you select from the Files folders. Results files can be selected from the Reproducibility Panel on the right.
Timeline
The timeline is the Reproducibility panel on the right which provides a managed history of the Capsule and prompts commit changes and manage versions.
The Components of the Compute Capsule
When running in the Cloud workstation (CW), you will be able to access the whole file system. An extra folder (CW Root FS) is added to address the property of this folder system.
There are six essential research project components:
Folder | Purpose | Contents |
---|---|---|
Metadata | Capsule's metadata | metadata.yml |
Environment | Capsule's environment files | Dockerfile, postInstall script, environment.yml |
Code | Code and related files | Source files |
Data | Input data required by reproducible runs (RR). May optionally contain small intermediate data | Small data files and attached datasets |
Results | Output files generated by reproducible runs | Result files |
Scratch (CW) | Large Intermediate data that NOT required by reproducible runs | Large data files |
Scratch (RR) | Large Intermediate data that generated during reproducible runs | Large data files |
CW Root FS | Package installations, IDE preferences, temporary files, etc. | The whole file system except the capsule workspace, and those mounted folders |
These appear in the Files panel as six folders. You can download and upload content to and from all the folders, except the results and scratch folder.
Follow the articles below to learn more details about the components:
Capsule Limitations, Structure, and Practices
Below are the details of the size limitation, the path of the folder in the system, design concepts, and the underline mechanism for each of the folders.
Size Limitation
The limitation of the capsule workspace is 5GB, this includes the files in Metadata, Environment, Code, and Data folder.
Folder | Counted in Capsule Workspace Limit | Comments |
---|---|---|
Metadata | ||
Environment | ||
Code | ||
Data | The limit doesn't apply to attached datasets as they are mounted to the capsule | |
Results | Results is mounted and not counted as capsule's workspace | |
Scratch (CW) Scratch (RR) | Scratch is mounted and not counted as capsule's workspace | |
CW Root FS | 5GB limit, including existing capsule environment installations |
The capsule workspace is 5GB, but you have 10 GB when building the capsules environment using the environment UI and the postInstall script. We recommend installing all the packages you need during the building phase. Please check the Environment Guide for more information.
Path of the Folder and the Accessibility for the Computation
Folder | Available in Reproducible Run | Available in Cloud Workstation | Path in Capsule Workspace | Path in Computation | Alternative Path in Computation |
---|---|---|---|---|---|
Metadata | /metadata | /root/capsule/metadata | - | ||
Environment | /environment | /root/capsule/environment | - | ||
Code | /code | /root/capsule/code | /code | ||
Data | /data | /root/capsule/data | /data | ||
Results | - | /root/capsule/results | /restuls | ||
Scratch (CW) | - | /root/capsule/scratch | /scratch | ||
Scratch (RR) | /scratch | ||||
CW Root FS | - | / | - |
The scratch (RR) is not the same folder as scratch (CW). The scratch (RR) is emptied before the end of each run, the content will not be visible in the capsule IDE.
Design Concepts and Underline Mechanism (AWS Storage and Persistence)
Folder | Storage | Persistence |
---|---|---|
Metadata | EBS (local storage) | Capsule lifetime |
Environment | EBS (local storage) | Capsule lifetime |
Code | EBS (local storage) | Capsule lifetime |
Data | Local data: EBS Internal dataset: mounted from EFS External dataset: mounted from S3 | Local data: Capsule lifetime Datasets: has it's own lifetime |
Results | During computation: EBS When the computation finished, upload to S3 | Capsule lifetime unless explicitly deleted from the timeline |
Scratch | Mounted from EFS | Capsule lifetime |
CW Root FS | EBS (local storage) | Cloud Workstation session |
Recommended Practices of Data Usage and Storage
The source and the size of the Data might vary depending on the project. As covered in the previous table, there are different ways of bringing data into the capsule in the Data folder. Given the complexity of the project, you may need a place to store the intermediate results and use them as input data for further analysis, the Scratch folder is used for this purpose.
Below is the shareability of each type of dataset and or recommended practice of implementing it in the capsule based on the design logic.
Folder | type of Dataset | Shareability | Recommended Usage |
---|---|---|---|
Data | local (directly upload to capsule) | Only current capsule | Small or example dataset to test the capsule |
Data | Internal Dataset | Across capsule | The dataset will be saved in VPC's AWS storage. Works well with immutable data that only need to import to Code Ocean's VPC once. |
Data | External Dataset | Across capsule | The dataset will need an AWS credential to access. Works well with a confidential dataset. Dataset can be changed if the source changed |
Scratch (CW) | - | Only current capsule | Access this only in the Cloud Workstation for storing the intermediate large dataset/output file. Usually will be converted into an internal dataset for sharing across the capsule and for downstream analysis |
Scratch (RR) | - | Only current run | Temporary storage during reproducible for large data that might exceed the capsule's size limit |
If your capsule size is more than 5 GB, it could affect the performance. Please make the data into a dataset and attach it to the capsule to reduce the overall size of the capsule. You can go to the Data Assets Guide for further guidance on creating and using a dataset.
Stop Computation from Capsule Dashboard
Note: A Cloud Workstation can be shut down, or put on hold from the capsule Dashboard.
Last updated