Types of Data Assets

Datasets

An internal dataset is a copy of the dataset on Code Ocean in the virtual private cloud deployment. This is achieved by uploading data from a local machine or importing data from a cloud provider, for example, AWS or Google Cloud. The data will be saved on Code Ocean's server. Authorized users can download these from the Data Assets page and access or attach them in a capsule. These assets are saved on S3 and are cached on EFS for quick access when they are actively being used.

Datasets can be added as a link to the remote bucket on AWS S3. To establish the link, the AWS credentials must be provided during setup (see Secret Management Guide for details). Only the link will be saved on Code Ocean's server. Only authorized users will have to provide the credentials for using the external dataset in the capsule. Since the data is not saved in Code Ocean, it cannot be directly downloaded. Workflows that use external data assets cannot be guaranteed to be reproducible. External data assets must point to a top-level directory, not individual files.

Results

A Captured Result is a data asset created from the output of a capsule or pipeline computation. It records the origin of this result, including the capsule code version, type of run, input data assets, and lineage graph. These assets are saved on S3 and are cached on EFS for quick access when they are actively being used.

The provenance is saved as the Reproducibility Info. See General Information Per Data Asset Type for more information.