Data

It is best practice to create Data Assets containing data files. Data Assets facilitate sharing data across the organization and internal Data Assets guarantee reproducibility. See the Data Asset Guide for more information. For small Data Assets, it is possible to upload data files and subfolders to the /data folder.

Below are properties of data depending on the location and type of Data Asset.

FolderType of DatasetShareabilityRecommended Usage

Data

local (directly upload to Capsule)

Only current Capsule

Small or example dataset to test the capsule

Data

Internal Dataset

Across Capsule

The Data Asset will be saved in VPC's AWS storage. Works well with immutable data that only need to import to Code Ocean's VPC once.

Data

External Dataset

Across Capsule

The Data Asset will need an AWS credential to access. Works well with a confidential Data Asset. Data Asset can be changed if the source changed

Scratch (CW)

local (created in the Capsule)

Only current Capsule

Access this only in the Cloud Workstation for storing the intermediate large Data Asset/output file. Usually will be converted into an internal Data Asset for sharing across the capsule and for downstream analysis

Scratch (RR)

local (created in the Capsule)

Only current run

Temporary storage during Reproducible Run for large data that might exceed the Capsule's size limit

For reproducibility purposes, any files written to the /data folder during a Reproducible Run are deleted once it’s completed.