Scratch

Working with Intermediate Data in Code Ocean

The scratch folder is a dedicated folder mounted to the capsule that ensures large intermediate data can be easily used in Code Ocean. It functions differently during Cloud Workstation sessions versus Reproducible Runs. In both cases, the scratch folder is mounted EFS storage that is practically unlimited in size.

The scratch folder is a dedicated folder mounted to the capsule that ensures large intermediate data can be easily used in Code Ocean. It functions differently during Cloud Workstation sessions versus Reproducible Runs. In both cases, the scratch folder is mounted EFS storage that is practically unlimited in size and can be accessed using the absolute path /root/capsule/scratch or from within the code folder using the relative path ../scratch.

When launching a cloud workstation, working with a large volume of data can significantly affect the performance of the capsule. Disk space is limited and copying large volumes of data back and forth between the cloud workstation and capsule is time-consuming and not recommended.

Cloud Workstation Scratch

For Cloud Workstation (CW) sessions the scratch folder is a mounted drive whose contents will persist throughout the lifetime of the capsule. Files written to scratch during a CW session will be visible in the capsule IDE after the session is shutdown and will be available in all subsequent sessions unless deleted by the user. These files will not be available during a Reproducible Run.

The scratch folder is a convenient location to store large data before creating a data asset. Contents of the scratch folder can be made into a data asset from the capsule IDE or during a CW session.

The scratch folder is a convenient location to store large data before creating a data asset. Contents of the scratch folder can be made into a data asset from the capsule IDE or during a CW session.

In a CW session, scratch is on the path /root/capsule/scratch

Files written to scratch during a CW session will be visible in the capsule IDE after the session is shutdown and will be available in all subsequent sessions unless deleted by the user. These files will not be available during a Reproducible Run.

It is best practice to delete files from scratch that are no longer needed to avoid taking up unnecessary storage. Should the capsule be deleted, the scratch folder would be deleted as well.

Reproducible Run Scratch

For Reproducible Runs the scratch folder functions as a temporary folder that is empty at the start of the run and will be emptied at the end of the run. The capsule workspace (i.e. the core files excluding data assets) is limited to 5GB and therefore a Reproducible Run will fail if this limit is exceeded by creating new files during the run. The scratch folder can be used during a run to safely create files or work with intermediate data of any size. Since the folder is emptied before the end of each run any results must be moved to the results folder.

From within the code folder, scratch can be accessed using the relative path ../scratch (best practice) or the absolute path /scratch

Scratch folder in Reproducible Run

For Reproducible Runs the scratch folder functions as a temporary folder that is empty at the start of the run and will be emptied at the end of the run. The capsule workspace (i.e. the core files excluding data assets) is limited to 5GB and therefore a Reproducible Run will fail if this limit is exceeded by creating new files during the run.

The scratch folder can be used during a run to safely create files or work with intermediate data of any size. Since the folder is emptied before the end of each run any results must be moved to the results folder.

Last updated