Scratch

Working with Intermediate Data in Code Ocean

The /scratch folder is a dedicated folder mounted to the Capsule that ensures large intermediate data can be easily used in Code Ocean. It functions differently during Cloud Workstation sessions versus Reproducible Runs. In both cases, the /scratch folder is mounted EFS storage that is practically unlimited in size.

When launching a Cloud Workstation, working with a large volume of data can significantly affect the performance of the Capsule. Disk space is limited and copying large volumes of data back and forth between the Cloud Workstation and Capsule is time-consuming and not recommended.

Cloud Workstation Scratch

For Cloud Workstation (CW) sessions the /scratch folder is a mounted drive whose contents will persist throughout the lifetime of the Capsule. Files written to scratch during a CW session will be visible in the capsule IDE after the session is shutdown and will be available in all subsequent sessions unless deleted by the user. These files will not be available during a Reproducible Run.

The /scratch folder is a convenient location to store large data before creating a Data Asset from either the Capsule IDE or during a CW session.

It is best practice to delete files from scratch that are no longer needed to avoid taking up unnecessary storage. Should the Capsule be deleted, the /scratch folder will be deleted as well.

The Reproducible Run scratch is not the same folder as Cloud Workstation scratch. The Reproducible Run scratch is emptied before the end of each run, the content will not be visible in the Capsule IDE.

Reproducible Run Scratch

For Reproducible Runs the /scratch folder functions as a temporary folder that is empty at the start of the run and will be emptied at the end of the run. The Capsule workspace (i.e. the core files excluding Data Assets) is limited to 5GB and therefore a Reproducible Run will fail if this limit is exceeded by creating new files during the run. The scratch folder can be used during a run to safely create files or work with intermediate data of any size. Since the folder is emptied before the end of each run any results must be moved to the results folder.