Adding a New Dataset

The UI for creating Data Asset in a Cloud Workstation enables all metadata for creating a dataset to be available in on window.

When the Data Asset name is entered, a folder name is simultaneously generated and tags can also be entered.

The UI for creating new data includes:

  • Local files

  • AWS S3

  • Google Cloud Storage

You can add a new dataset from the DATA ASSETS page or from a capsule:

  1. Go to the DATA ASSETS page.

  2. Click + Add dataset to add a new dataset.

After clicking on the + Add dataset an interactive form will appear.

By default, a new data asset is private (i.e. only the owner can see it). To learn more about sharing a data asset with others, go to Sharing Data Assets.

Upload From Your Local Machine

  1. Click + Add dataset.

  2. Click Next (local files is the default option).

  3. Drag & drop the file or folders you want to upload from your local drive or click Choose Files to browse.

  4. Complete the fields:

    • Dataset Name (required)—Use a meaningful name so that others can easily find the dataset.

    • Default Folder (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

    • Description (optional)—Add some text to make the dataset easy to find and understand.

    • Tags (required)—Tags are another way to help people find your dataset.

  5. Click Add Dataset to finish.

You can upload a folder of files. The size of a single file to be uploaded is limited to 5GB while there is no limit for the size of the folder. However, the upload timeout is 24 hours.

There is no "resume on failure" which means that if the upload is interrupted (due to a timeout or other issues), you will have to start all over again.

Import From a Cloud Provider

  1. Click + Add dataset.

  2. Choose AWS S3 Bucket or Google Cloud and then click Next.

  3. Provide information about the bucket you want to use.

    • For AWS users, select Import from S3 Bucket. To add a remote dataset, check out Establish an External Link to an AWS S3 Bucket).

    • You can upload the entire bucket or a specific folder.

    • For Private Buckets, if the Secret or Role is already in your Code Ocean account, the system will automatically use it to access the bucket. If there is no Secret or Role that provides access, you will be prompted to create a user secret (see Secret Management Guide if you need help creating a secret).

  4. Complete the fields:

    • Dataset Name (required)— Use a meaningful name so that others can find the dataset easily.

    • Default Folder (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

    • Description (optional)—Add some text to make the dataset easy to find and understand.

    • Tags (required)—Tags are another way to help people find your dataset.

  5. Click Add Dataset to finish.

  1. Click + Add dataset.

  2. Click AWS S3 Bucket and then click Next.

  3. Specify the Bucket Name and the Folder Name.

  4. Select Link to S3 Bucket.

    • You can upload the entire bucket or a specific folder.

    • For Private Buckets, if the Secret or Role is already in your Code Ocean account, the system will automatically use it to access the bucket. If there is no Secret or Role that provides access, you will be prompted to create a user secret (see Secret Management Guide if you need help creating a secret).

  5. Complete the fields:

    • Dataset Name (required)—Use a meaningful name so that others can find the dataset easily.

    • Default Folder (required)—The folder name inside a capsule. Use a name that’s similar to the dataset name. Spaces and some special characters are not allowed here.

    • Description (optional)—Add some text to make the dataset easy to find and understand.

    • Tags (required)—Tags are another way to help people find your dataset.

  6. Click Add Dataset to finish.

Create a New Dataset from the Scratch Folder

To create a dataset from the scratch folder, that has been created in the Cloud Workstation:

  1. From the dropdown list click Create dataset.

2. Provide the Title, Description and Tags.

3. The file is available as a dataset and can be viewed and used in capsules.

Note: There is a warning for creating Data Assets that, once created, Data Assets are immutable (unable to be changed) for reproducibility.

For private buckets, credential do not need to be specified. The system will automaticallygo through the user's Roles and Secrets to check for access. If no access is available, users will have the option to provide a new User Secret.

Index External Dataset from Application UI

External data are indexed on creation, files can be viewed in the data asset UI.

A public API is available to improve the indexing of external datasets. When an external dataset is created it should be available to be attached it to a capsule regardless of the indexing process.

Last updated