Adding a New Data Asset

You can add a new dataset from the DATA ASSETS page or from a capsule:

    1. Navigate to the My Data dashboard.

    2. Click + New Data to add a new dataset.

After clicking on the + New Data an interactive form will appear.

By default, a new data asset is private (i.e. only the owner can see it). To learn more about sharing a data asset with others, go to Sharing Data Assets.

Upload From Your Local Machine

  1. Click + New Data.

  2. Choose Local Files.

  3. Drag & drop the file or folders you want to upload from your local drive or click Choose Files to browse.

  4. Complete the fields:

    • Data Asset Name (required)—Use a meaningful name so that others can find the dataset easily.

    • Folder Name (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

    • Description (optional)—Add some text to make the dataset easy to find and understand.

    • Tags (required)—Tags are another way to help people find your dataset.

    • Custom Metadata

  5. Click Create Data Asset to finish.

You can upload a folder of files. The size of a single file to be uploaded is limited to 5GB while there is no limit for the size of the folder. However, the upload timeout is 24 hours.

There is no "resume on failure" which means that if the upload is interrupted (due to a timeout or other issues), you will have to start all over again.

Import From a Cloud Provider

  1. Click + New data.

  2. Choose AWS S3 or Google Cloud and then click Next.

  3. Provide information about the bucket you want to use.

    • For AWS users, select Import from S3 Bucket. To add a remote dataset, check out Establish an External Link to an AWS S3 Bucket).

    • You can upload the entire bucket or a specific folder.

    • For Private Buckets, if the Secret or Role is already in your Code Ocean account, the system will automatically use it to access the bucket. If there is no Secret or Role that provides access, you will be prompted to create a user secret (see Secret Management Guide if you need help creating a secret).

  4. Complete the fields:

    • Data Asset Name (required)—Use a meaningful name so that others can find the dataset easily.

    • Folder Name (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

    • Description (optional)—Add some text to make the dataset easy to find and understand.

    • Tags (required)—Tags are another way to help people find your dataset.

    • Custom Metadata

  5. Click Create Data Asset to finish.

  1. Click + New Data.

  2. Click AWS S3 and then click Next.

  3. Specify the Bucket Name and the Folder Name.

  4. Select Link to S3 Bucket.

    • You can upload the entire bucket or a specific folder.

    • For Private Buckets, if the Secret or Role is already in your Code Ocean account, the system will automatically use it to access the bucket. If there is no Secret or Role that provides access, you will be prompted to create a user secret (see Secret Management Guide if you need help creating a secret).

  5. Complete the fields:

    • Data Asset Name (required)—Use a meaningful name so that others can find the dataset easily.

    • Folder Name (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

    • Description (optional)—Add some text to make the dataset easy to find and understand.

    • Tags (required)—Tags are another way to help people find your dataset.

    • Custom Metadata

  6. Click Create External Data to finish.

  • To improve the traceability of data asset sources when created from an S3 bucket, there is a new “Source” block in Data Asset details.

  • When viewing the contents of an imported/linked S3 bucket data asset, the original s3 bucket of the data source as well as the relative path to a subdirectory (if contents are not at the root of bucket) are viewable.

Create an Internal Data Asset from a Single File in S3

A single file can be specified in Path when creating an internal Data Asset by importing from S3.

Note: This does not apply to creating an external data asset by linking to an S3 bucket.

Create a New Dataset from the Scratch Folder

To create a dataset from the scratch folder, that has been created in the Cloud Workstation:

  1. From the dropdown list click Create Data Asset.

2. Complete the fields:

  • Data Asset Name (required)—Use a meaningful name so that others can find the dataset easily.

  • Folder Name (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

  • Description (optional)—Add some text to make the dataset easy to find and understand.

  • Tags (required)—Tags are another way to help people find your dataset.

  • Custom Metadata

Index External Dataset from Application UI

External data are indexed on creation, files can be viewed in the data asset UI.

A public API is available to improve the indexing of external datasets. When an external dataset is created it should be available to be attached it to a capsule regardless of the indexing process.