Data Asset API

Prerequisites

  • Token with datasets scope

  • The data asset's ID to pass to the API call

You can find the data asset's ID below the title

Create a New Data Asset

POST https://{domain}codeocean.com/api/v1/data_assets

Path Parameters

NameTypeDescription

{domain}*

String

Your VPC domain

Headers

NameTypeDescription

Content-Type*

String

application/json

-u*

String

\ '${API Token from Code Ocean}:\' \

Request Body

NameTypeDescription

name*

String

The name of the data asset.

description*

String

A description for the data asset.

mount*

String

A folder under the capsule data folder, where the data asset files are found.

tags*

array

Keywords to search the data asset by.

source*

String

Where the data asset originated. Currently only AWS is supported.

source.aws*

String

Describes the S3 bucket from which the data asset was created.

source.aws.access_key_id

String

The AWS key ID used to access the S3 bucket. Not required if it is a public S3 bucket.

source.aws.secret_access_key

String

The AWS secret access key ID used to access the S3 bucket. Not required if it is a public S3 bucket.

source.aws.bucket*

String

The S3 bucket from which the data asset will be created.

source.aws.prefix*

boolean

The folder in the S3 bucket from which the data asset is created. This is only relevant when copying the files over to Code Ocean (keep_on_external_storage=false).

source.aws.keep_on_External_storage*

boolean

When set to true, the data asset files will not be copied over to Code Ocean. The prefix property will be ignored and the entire S3 bucket will be used.

source.aws.index_data*

boolean

When this property is true Code Ocean will index the files in the remote bucket to display the file tree in the dataset and capsule pages. This is only relevant when keep_on_external_storage is set to true. When keep_on_external_storage is false Code Ocean will always index the files.

{
    "created":"created ID",
    "description":"the description provided",
    "files":"the number of the files in the datset",
    "id":"dataset's ID",
    "lastUsed":0,
    "name":"name of the dataset",
    "sizeInBytes":"size of the dataset",
    "state":"DATA_ASSET_STATE_DRAFT",
    "tags":"the tags provided",
    "type":"DATA_ASSET_TYPE_DATASET"
}

Example of creating a new data asset:

Create a data asset from a computation (capture result API)

CURL Example

curl -H "Content-Type: application/json" -u ${CUSTOM_KEY}: -X POST https://acmecorp.codeocean.com/api/v1/data_assets --data-raw '{
   "name": "Data asset From API",
   "description": "An example for creating data asset from CO API",
   "mount": "some-folder",
   "tags": [ "keyword1", "keyword2" ],
   "source": {
       "computation": {
	  "id": "some-computation-id"
       }
   }
}'

To use this example, please change the domain to your VPC instead of acmecorp, change ${CUSTOM_KEY} to your API token, and replace some-computation-id with the computation id of the result you'd like to capture as a data asset.

Create a data asset from a public AWS bucket - no credentials are provided

Request Body

{
   "name": "my dataset",
   "description": "a descriptive description",
   "mount": "mount",
   "tags": [ "t1", "t2" ],
   "source": {
       "aws": {
           "bucket": "MY_BUCKET",
	   "prefix": "PREFIX"
       }
   }
}

CURL Example

curl --location --request POST 'https://acmecorp.codeocean.com/api/v1/data_assets' \
--header 'Content-Type: application/json' \
-u \'${CUSTOM_KEY}:\' \
--data-raw '{
    "name":"import public AWS bucket with dataset api",
    "description":"meaningful-c",
    "mount":"citations",
    "tags":["citations"],
    "source":{
        "aws":{
            "bucket":"ai2-public-datasets",
            "prefix":"meaningful-citations",
            "keep_on_external_storage":false,
            "index_data":false
            }
        }
    }'

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token.

Create an external data asset from a private bucket

Request Body

{
   "name": "external dataset",
   "description": "an external dataset description",
   "mount": "default_mount",
   "tags": [ "ex1", "ex2"],
   "source": {
       "aws": {
           "access_key_id": "MY_ACCCESS_KEY",
           "secret_access_key": "MY_SECRET",
           "bucket": "MY_BUCKET",
           "keep_on_external_storage": true,
           "index_data": true
       }
   }
}

CURL Example

curl --location --request POST 'https://acmecorp-edge.codeocean.com/api/v1/data_assets' \
--header 'Content-Type: application/json' \
-u \'${CUSTOM_KEY}:\' \
--data-raw '{ 
    "name": "My External Data Asset", 
    "description": "External Indexed Dataset From API", 
    "mount": "external-indexed", 
    "tags": [ "t1","t2"],
    "source": {
        "aws": {
            "bucket": "codeocean-datasetapi-test-cs",
            "keep_on_external_storage": true,
            "index_data": true,
            "access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
            "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'"
        }
    }
}'

To use this example, please change the domain to your VPC instead of acmecorp-edge, and change ${CUSTOM_KEY} to your API token.

Since this example is pulling a dataset from a private bucket, you will need to change the "aws" part accordingly.

The examples here are executed from a capsule that utilizes the secret management feature. The API token was added as "Custom Key" and the AWS credential was added as "AWS Cloud Credential"

Get Data Asset Metadata

GET https://{domain}.codeocean.com/api/v1/data_assets/{data assets ID}'

Path Parameters

NameTypeDescription

{domain}*

String

Your VPC domain

{data assets ID}*

String

The dataset ID that you want to get the metadata from

Headers

NameTypeDescription

Authorization*

String

Basic ${API Token from Code Ocean}

{
   "created": 1633277005,
   "description": "a descriptive description",
   "files": 0,
   "id": "fea84ebf-b58b-4ad2-994d-7169dc3880fb",
   "lastUsed": 0,
   "name": "my dataset",
   "sizeInBytes": 0,
   "state": "DATA_ASSET_STATE_DRAFT",
   "tags": [ "t1", "t2" ],
   "type": "DATA_ASSET_TYPE_DATASET"
}
Example of getting data asset metadata:

Request Example

curl --location --request GET 'https://acmecorp.codeocean.com/api/v1/data_assets/37a93748-ce90-4980-913b-2de0908d5212' \
-u \'${CUSTOM_KEY}:\'

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token.

Also, you have to change the dataset ID to in the URL too.

Search Data Assets

GET https://{domain}.codeocean.com/api/v1/data_assets/{data assets ID}'

Path Parameters

NameTypeDescription

{domain}*

String

Your VPC domain

Query Parameters

NameTypeDescription

start

int

describes the search from index

limit

int

sort_field

String

option: created/type/name/size

determines the field to sort by

sort_order

String

option: asc/desc

determines the result sort order. must be provided with sort_field, otherwise ignored

query

String

determines the search query. can be a free text or in the form of “name:... tag:... run_script:... commit_id:...”

type

String

option: dataset/result

if omitted results may include both datasets and results

ownership

String

option: owner/shared

search data asset by ownership

favorite

boolean

search only favorite data assets

archived

boolean

search only archived data assets

Headers

NameTypeDescription

-u*

String

\ '${API Token from Code Ocean}:\' \

{
    "has_more" - boolean: indicates whether there ar more results
    "results" - array: array of dataset found
}
Example of searching data assets:

Request Example

curl --location --request GET 'https://acmecorp.codeocean.com/api/v1/data_assets' \
-u \'${CUSTOM_KEY}:\' \
-d start=0 \
-d limit=10 \
-d sort_order=desc \
-d sort_field=name \
-d type=dataset \
-d ownership=owner \
-d favorite=false \
-d archived=false \
-d query=name:co+tag:x1 \
-G

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token.

Response Example

{
  "has_more": false,
  "results": [
    {
      "created": 1641473110,
      "description": "commoncrawl",
      "files": 24212276,
      "id": "27b4d7ec-a2f8-4c8d-9be5-361d33e50bf0",
      "last_used": 0,
      "name": "commoncrawl",
      "size": 7184869799906702,
      "state": "ready",
      "tags": [
        "x1",
        "x2",
        "y1"
      ],
      "type": "dataset"
    },
    {
      "created": 1641473217,
      "description": "codeocean datasets",
      "files": 554879,
      "id": "f3259982-5d3b-42b2-9069-302df8cbe1b8",
      "last_used": 0,
      "name": "codeocean datasets",
      "size": 941697100208,
      "state": "ready",
      "tags": [
        "x1",
        "x2",
        "y1"
      ],
      "type": "dataset"
    }
  ],
  "total": 2
}

The default number of items returned from Search Data Assets is 10.

Update Data Asset Metadata

PUT https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}

Path Parameters

NameTypeDescription

{domain}*

String

Your VPC domain

{data_set_id}*

String

Data Asset's id

Headers

NameTypeDescription

-u*

String

\ '${API Token from Code Ocean}:\' \

Request Body

NameTypeDescription

name*

String

data asset name

description*

String

data asset description

tags

array

array of string tags

mount

String

default mount folder

{
  "created": float64 - data asset creation time,
  "description": string - data asset descriptionw description",
  "files": int64 - total number of files in the data asset if available,
  "id": string - the data asset internal id,
  "lastUsed": float64 - the last time the data asset was used in seconds since epoch,
  "name": string - data asset name,
  "size": int64 - the total size in bytes of the data asset if available,
  "state": string - data asset state - draft / ready / failed,
  "tags": array of string tags,
  "type": string - dataset / result 
}
Example of updating data asset metadata:

Request Example

curl -X PUT 'https://acmecorp.codeocean/api/v1/data_assets/{data asset id}' \
-u \'${CUSTOM_KEY}:\' \
-H 'Content-Type: application/json' \
--data-raw '{
    "name": "modified name",
    "description": "a new description",
    "tags": ["aaa","bbb","ccc"],
    "mount": "newmount"
}'

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token and the data asset's id accordingly.

Response Example

{
  "created": 1633277005,
  "description": "a new description",
  "files": 0,
  "id": "fea84ebf-b58b-4ad2-994d-7169dc3880fb",
  "last_used": 0,
  "name": "modified name",
  "size": 0,
  "state": "ready",
  "tags": [
    "aaa",
    "bbb",
    "ccc"
  ],
  "type": "dataset"
}

Delete Data Asset

DELETE https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}

Path Parameters

NameTypeDescription

{domain}*

String

Your VPC domain

{data_set_id}*

String

Data Asset's id

Headers

NameTypeDescription

-u*

String

\ '${API Token from Code Ocean}:\' \

{
    // Response
}

Archive Data Asset

PATCH https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}/archive?archive=true

Path Parameters

NameTypeDescription

{domain}*

String

Your VPC domain

{data_set_id}*

String

Data Asset's id

archive*

Boolean

If true will archive a data asset. Otherwise will unarchive it.

Headers

NameTypeDescription

-u*

String

\ '${API Token from Code Ocean}:\' \

{
    // Response
}
Example of archiving and unarchiving data asset:

Archive data asset

curl -H "Content-Type: application/json" -u ${CUSTOM_KEY}: -X PATCH "https://acmecorp.codeocean.com/api/v1/data_assets/{data-asset_id}/archive?archive=true"

Unarchive data asset

curl -H "Content-Type: application/json" -u ${CUSTOM_KEY}: -X PATCH "https://acmecorp.codeocean.com/api/v1/data_assets/{data-asset_id}/archive?archive=false"

To use this example, please change the domain to your VPC instead of acmecorp, change ${CUSTOM_KEY} to your API token, and replace {data-asset_id} with the id of the data asset you'd like to archive/unarchive.

Last updated